I was invited to give a presentation to a group of politicians, some of which are members of Finnish Parliament, today, at the Finnish Parliament Annex building (Pikkuparlamentti). Here is my visitor’s batch:
Visitor’s batch to the Finnish Parliament
My job was to inform them about what artificial intelligence is today to help them make decisions in industrial policy making. Here you can download my original slides in Finnish. Below I will give you a skewed version of my presentation: it is much longer in the beginning and much shorter in the end than the actual presentation was. My first slide was this awesome piece of art produced by me (there is some Python code in the background):
Foreshortened History of Artificial Intelligence and Cognitive Science
Cognitive science and artificial intelligence (AI) have been interlinked ever since. Both cognitive science and AI can be separated into “symbolic”, “rule-based” and “representationalist” on the one hand and “subsymbolic”, “connectionist” and “embodied” on the other. I know very well (and have written a paper on this topic with Otto Lappi) that these distinctions are not as clear as the distinction between day and night, but if one wants to describe the history of those disciplines in less than 5 minutes, this is IMO the way to go. In AI, the former (the symbolic) is represented best in the rule-based language systems. These are systems in which people would program by hand all the grammar, rules and vocabulary of a language to enable the computer to parse language in meaningful ways. These systems did quite well in the sense that they topped other methods and algorithms for decades. This was also the time when cognition was thought to be analogous to symbol manipulation. In this view, the mind learns various symbols, like “tree” and “forest”, and learns relationships between them like “forest consists of trees” but “trees do not consist of forests”. In this way both AI and cognition were thought to be rule-based systems. A connectionist, subsymbolic AI-system, on the other hand, does not operate on pre-given symbols or representations, but learns the necessary patterns by itself from the data in an adaptive and dynamic way. Similarly in cognitive science people have been moving towards the non-representationalist and non-symbolic view where most of the activity in a conscious mind is contentless.
More recently, around 10-15 years ago both (revolutions?) happened at the same: in AI the connectionist, subsymbolic systems became better than their rule-based counter-parts and in cognitive science researchers started moving towards the view that cognition is not so much symbolic and representational, but rather dynamic, embodied and based on sensorimotor contingencies. These different schools still exist and, as I said, their boundaries are not clear, but the tendency is this, especially in AI. In cognitive science this has been augmented by philosophical arguments like those of Alva Noë and some arguments from developmental psychology. In AI some of the progress came from adopting ideas from neuroscience like Hebbian learning.
The Scientific Cycle
My view of the scientific interdependencies. Simplified.
What does the transition from rule-based symbolic systems to subsymbolic connectionist ones in AI mean? It means that the machine learns what used to be human-given “rules” from the data by itself and now they are called “features”. Human programmers no longer have to give it any rules or pre-given semantics. Instead, the machine is only given training data. The model then learns features like this by itself. For example, one feature might look like this:
This is just multiplication. Imagine that black pixels are -1 and white pixels are 1 and grey pixels are in between. Then imagine that you are sliding a smaller version of the filter (square image in the middle) over the image on the left and at each point you multiply the points in the filter with the corresponding points in the image; then take the average of all obtained values and use the result to color the corresponding pixel on the right. Then, as a result, you get the image on the right in which only vertical lines from the left picture essentially remain. This filter can be designed by hand, but the modern deep neural networks learn such filters by themselves optimising for the data that they are fed.
Hierarchical Layers of Deep Networks
Moreover, in current deep networks these features are learned hierarchically. Whereas the first layer learn to recognise vertical and horizontal lines (or something like that) from the pixels of the image, the second layer learns to recognise patterns of those in turn and so on. In higher layers of the networks higher features are recognized.
Of course, this means that the model has thousands or tens of thousands of parameters and needs thousands and tens of thousands of samples for the training. Therefore training these models on normal computers is very hard and one often resorts to the use of GPU’s and now Google is even developing the so-called Tensor Processing Unit which is a processor optimised to use for deep learning with the library TensorFlow.
This is analogous to what is known about visual processing in the brain
You can also go the other way around. After the image is trained, you can generate images which correspond to the activations of various neurons in the network, including the output neurons. In this way some pretty amazing results in e.g. style-transfer can be achieved and industrial applications are behind the corner:
Word2Vec is a way to put words of a language into a vector space such that the distance and directions between words respect their semantic relationships. This is typically done using the weights of a trained neural network. One thing that this model can achieve is filling missing words in sentences like “Mom is to dad what wife is to …” and the answer is achieved by taking the word closest to the vector “wife” + “dad” – “mom”.
Using natural language processing one can train a neural network to classify, say, tweets by their sentiment or emotion, positivity or negativity. In this way one can track how different events are received by the public. The naïve approach where one simply searches for key words such as “good” and “bad” works moderately but misclassifies a lot of cases such as “this was not very good”. More on sentiment analysis.
Image recognition can be used not only to recognise cats (which is of course their main purpose!) but also to see from a snapshot of a skin rash whether it is cancer or not. In the recent article the authors claim to get as good results with their NN as professional dermatologists.
In the same way as machine learning can be used to classify tweets into good and bad, it can be used to classify code into malware and non-malware. An example of this is Israeli company which does exactly that.
Saving millions in an electricity bill
I like this example, because it shows how our own imagination is the only limit to how these techniques can be used. Google has reduced their data centre cooling bill by 40% by teaching a neural network to predict when and where the temperature of a server will go up or down. In this way they could switch the local cooling system on and off depending on the need much better than with previous systems. The network is fed all possible data that is measured from the data centre: sever activity, pump activity, local temperature (which can depend on weather) and so on. The output is the prediction of server temperature for each server in the next hour.
Click on the image for the original article by Deep Mind AI.
We also briefly discussed a bit on the moral implications of the growth in AI. If the machine makes the diagnosis of skin cancer and makes a mistake, then who is to blame about it? According to research in psychology, it is easier for people to accept a human mistake rather than a computer made mistake. I believe that once machines get good enough, we start trusting them in the same way as we trust self-driven trains in the Paris underground. The question remains, however, who has the moral responsibility for such mistakes.
We know less and less what exactly is happening inside a deep neural network after it has been trained. We do not know, without a complicated analysis, or scientific research, which part of the decision are different cells responsible for and we do not know how is the network going to react to unforeseen situations. The more AI gets out of hands in this way, the closer we are to a situation where AI is somehow autonomous and may start optimising for parameters we don’t want it to optimise for.
Thank you for your attention, ladies and gentlemen, your subscription button is below!