No “raw” data… only numbers
Before we start thinking of these questions, we shall prepare the floor by defining our terms and methodologies. When we talk about computer “thinking” or about Machine Learning we cannot but start with the most important part of the process, i.e., the data. Data can refer to various types of recording and addressing information. It can be used for visual content, such as images and videos, it can also refer to textual matter like this article, or any other type of information, such as audio, numbers, graphs, etc.
One thing we should know is that computers don’t “deal” with all this. Like, a computer cannot actually see an image or read a text, it only understands numbers! Some would say “but how come a computer can actually show me a video or play an audio then?”. Well, that’s simple. For a machine to “deal” with other types of data, they should be mapped to numbers! For instance, if we are trying to address the following sentence to a machine: “I am happy to be with you.” we can simply create a bank of words, or a word bag, that contains all the words used in this sentence and assign them to numbers. This word bag would then assign the words as follows: I → 1, am → 2, happy → 3, to → 4, be → 5, with → 6, you → 7. Then we represent this sentence by an array of numbers, where each number refers to a word in the word bag: [1, 2, 3, 4, 5, 6, 7]. I can then use this vocabulary to represent any other sentence, let’s say “I want you to help me with my homework”. We can clearly see that some of the words are not found in the word bag, so they are replaced with an OOV (out of vocabulary) token, which can be represented as the number 0: [1, 0, 7, 4, 0, 0, 6, 0, 0]. We can definitely increase the size of our vocabulary if we want better performance, but that’s not the point. My point is that a machine can never address “raw” data, everything has to be mapped to numbers first. Such thing is applicable for images by encoding the pixels into triplets with the value of the Red, Green, and Blue intensities, and for audio by representing the change of frequencies or other features.
Brains understand context… machines spot pattern
As we can see here, machines do not interpret data the same way human brains do. What we know as Machine Learning is nothing but doing some calculations based on the input data (which is assumed to be in the form of numbers) and outputting some number(s). This number can then be used for some interpretation, e.g., if the number is between [0, 0.5[ then the image is of a cat, and if it is between [0.5, 1] then it is of a dog. The “learning” part here is that the machine undergoes tens, hundreds, and maybe thousands of rounds trying to decrease the value of the error, i.e., the times when it interpreted an image of a cat as a dog (gave it a value in the range of [0.5, 1]) and vice versa. The trick here is as if there were an ultimate labeler machine that our computer is trying to mimic. This machine can be thought of as a mathematical function that takes an array of numbers as an encoding of an image and outputs a number between 0 and 1 where the value labels the image, whether it is a cat or a dog.
Human brains have a different approach to doing such a job. The key difference here is that they understand the context, where what computers actually do is spotting a pattern among the trained data. Once we understand this difference between the two, we can come to a conclusion that machines behave according to the data they were trained on. For instance, if you haven’t shown the machine a single image of a dolphin, there is no way on Earth it would guess it right. As a matter of fact, it would probably label it as either a cat or a dog (unless there was an option for an “unknown” label). However, on the other hand, if you show a person an image of an animal they haven’t seen before, they are more likely to guess that it doesn’t lie under any of the known animal categories in their mind.
No multitasking!
Another key difference when comparing machines with the human brain is the purpose of the function. A single brain can perform millions of tasks. It can solve equations, play music, write essays, compare the taste of different foods, etc. However, there is no such thing as a super machine that can do everything. For example, the model that we described earlier can only predict if an image refers to a cat or a dog. It cannot let’s say, answer questions. What I am trying to say is that we need a model for every task we can think of, since a single model can only perform one and only one job. Although this job can sometimes be subdivided into several tasks, such as recognizing emotions from speech and facial expressions, that would just make it one complex task. This is a major part where a human brain can outrange hundred machine models.
Summary
Machines are of course superior to human brains in numerous tasks, such as doing calculations or repetitive work. That brings us to another difference between the two. Machines are faster and more efficient when programmed to do a repetitive task, while brains are smarter and more creative and controlling. Machines can label tens of cat and dog images in the time needed by a human to label a few of them. While on the other hand, a human can program a model to do the job for him!
In conclusion, I don’t think of the idea of machines and human brains competing with each other. However, I see it as a chance for collaboration and to benefit from each one’s skills. We don’t need people to do ordinary routine tasks, where we can program a machine to do it instead and in a faster and more efficient way. We can then benefit from the genius human brain in doing more creative tasks that a machine cannot do… or at least, not for now.