Aime works on smart software solutions on the cutting edge between innovation and technology. And when we zoom in on the word technology, we see that the smartness often emerges from combining Big Data and Artificial Intelligence.
But we hear so much about them these days – what are they anyway?
Big data, does my organization have it? AI – or artificial intelligence – how does it work? And then, what are the differences with machine learning and deep learning, two other widely used buzzwords in today’s business communities.
In this blog, we cover the differences between big data, artificial intelligence, machine learning and deep learning. The first and the last three are two different concepts, but we must note that combining them within your organization can be equal to finding gold.
We hear a lot about it: big data. But what can we do with it? What is it, anyway? Defined in a very brief way: large data sets within your organization which are often extremely unstructured, yet can be of enormous value. But with traditional analytic methods, your hands are tied.
All right, let’s add a little bit more detail. Whether a certain data set can be classified as ‘big data’ is often determined by using the so-called V heuristics: words that all start with a v and describe a characteristic for a big dataset. Although today many varying Vs are used, those five are pretty much universally accepted in the business world for identifying big data:
- Volume: big data is large. When the volume of your data is so large that you cannot handle it with a traditional database, the odds are that you’re dealing with big data.
- Variety: big data is highly varied. Measurements next to a highway? Analyses documented in reports? Designs located somewhere on a hard drive? They’re all appearances of big data. You can use them all to make your organization better.
- Velocity: the lead time of information is decreasing rapidly. The speed with which new data enters your organization, on the other hand, only increases day after day.
- Value: there’s a substantial amount of value hidden in your data. It is wise to identify your current customer and your future one, as well as their needs. By then creating new value propositions based on your data, you can support them and keep – or gain – market share.
- Veracity: big data is often credible. Although correlation is no causation, increasing data set volume reduces the odds that outliers disturb your results unnecessarily.
In short: big data is about the data sets and less about what you do with them. Does your organization possess a large amount of unstructured data, such as many Microsoft Word files? Or does it have a large amount of pictures linked to analysis reports? We’re pretty certain that you can create new value with all this data. Even when it’s data that does not fully match the five Vs outlined above.
Okay, but then, how to do that? To answer this question we put forward the second part of our blog: Artificial Intelligence – with sub fields machine learning and deep learning.
The field of AI has very fuzzy boundaries. There is no clear understanding about a common definition. That means that we have to devise our own. At Aime, we use this one:
Human intelligence captured in software.
But does that mean that regular software is AI as well?
Yes, if you follow our definition. That’s because there is human intelligence captured in software. Suppose that based on a picture we want to say who’s pictured in it. In that case, besides the picture, we also need a computer program made by a human, to analyze the picture. In this computer program, rules are encoded that can determine when I am pictured, and when someone else is. Brown hair? Curls? A green sweater? Pretty certain that it’s Christian’s picture. You can imagine that this form of AI is not scalable. Put simply, what if we want the computer to recognize a million pictures? It would mean that we have to encode the rules for a million pictures. That’s a massive challenge and – we hope it doesn’t surprise you – there is nobody who assigns this task to their employees.
So the AI we hear all about in the news must be something different. Then what is it?
The answer: a way of artificial intelligence where we allow computers to learn from the data themselves. We call those machine learning techniques.
Machine learning: a subset of AI techniques
By using those techniques, we pretty much revert the process. We give the input data to the machine (i.e., the million pictures discussed above) – but we also provide the output data. We also call this data, which describe the picture, labels. The label for my picture would be “Christian”.
Based on these data sets, the machine can infer the implicit rules that together define me in a picture on its own. The end result is a program which can be used for new pictures.
Suppose that one hundred pictures of that million ones show me. The model thus learns to recognize itself what I look like. When I subsequently input a new picture of myself into the program, the odds are that it once again shows PING! This is Christian. At least, that’s how it is supposed to work. But obviously, the reliability of such an inferred program is never 100%, similar to how human beings make mistakes in judgement.
When creating such programs, we can configure many aspects of creating the programs ourselves. Over the years, many so called architectures have been designed for various tasks. At the moment we want to classify (the task above) we can choose decision trees, random forests, support vector machines, linear classifiers, basic neural networks, and so on.
All those relatively traditional methods for machine learning have one big drawback. They can handle linear data pretty well, but cannot handle non-linear data. Linear data, as we can see, is data that can be separated by a linear function, in a two dimensional space e.g. a line.
But the real world is not linear. It is perhaps the opposite. Now what if you have a data set that looks as follows?
Can you draw a line through it and make sure that it nicely separates the blue data from the orange data?
And that’s why up to a few years ago, traditional models were widely used and performed quite nicely. But nevertheless, the vivid dream of an exceptionally well-performing AI system was still far away.
Deep learning: a subset of machine learning
The progress in computational power resulted in a turnaround. Traditional models were left behind in terms of research in development, and both scholars and practitioners massively adopted techniques known as deep learning. Especially when in 2012 during an image recognition competition deep neural networks showed a massive boost in classification performance, many people switched to deep learning only.
And even today, deep learning is a very hot topic in the business world.
But what it is?
A collection of machine learning models that scale well with large data sets and can handle non-linear data, such as the blue and orange circles we saw above. Those models are additionally deep, in the sense that data is propagated through a number of layers. Whereas with traditional methods often a number of individual models is used sequentially, deep learning is only one model.
One of the most widely known deep learning models are the so-called deep neural networks. Those neural networks, which are somewhat inspired by the way in which the human brain processes information, are responsible for the boost in accuracy of AI applications in various domains. The meical specialists beaten by DeepMind AIs? An AI capable of navigating without using GPS? All possible thanks to these networks.
Het gaat voor deze blogpost te ver om de precieze werking van een deep neural network uit te leggen. Daar komen we in een volgende blog op terug. Hopelijk kun je nu wel enigszins overweg met de hiërarchie van Artificial Intelligence naar deep learning: de laatste is wel een vorm van de eerste, maar een rule-based systeem is wel AI, maar geen deep learning.
For this blog, it is a bridge too far to explain the inner workings of a deep neural network. We’ll cover them in one of our other upcoming blogs. But we hope that you can navigate the hierarchy from AI to deep learning: the latter one is part of the first, but a rule-based system might be AI, but is no deep learning.
Choose the way forward: smart software driven by AI
Aime can help make your software smarter or can build new software that makes your workflow smart. We always choose a middle ground when doing so: we choose a mix of deep learning techniques which we augment with more traditional methods. Why? Because two pairs of eyes always see more different things than one pair. Combining deep learning and traditional machine learning often yields more success.
We start at your organization. Where are the current bottlenecks? Who is your future customer? How can your current data be exploited to serve them? After answering those questions, we develop a Four Phases of AI and Big Data based roadmap. We then go to work immediately.
Would you like to know more? You are more than welcome! We serve great ☕.
- Date - 10 October 2018