Sergey Nikolenko

Author: snikolenko

Who Will Be Replaced by Robots II, or Anatomy of Fear of Robotization
[translated from the Russian version by Andrey V. Polyakov]

Recently, a new round of conversations about self-moving carriages and other potential achievements of artificial intelligence has again posed one of the classic questions of humanity: who will be marginalized by the monorail track of the next technological revolution? Some studies argue that artificial intelligence in the near future would lead to a surge of unemployment comparable to the Great Depression. In the first part of the series, we have started from the very beginning, the Luddites, and have reviewed the influence of automation onto several occupations, which have, as result, drastically changed or even disappeared. Today we will first discuss a more creative part of the spectrum, and then we will see where does the current wave of automation fear comes from.

Creative occupations

All teaching machines would be plugged into this planetary library and each could then have at its disposal any book, periodical, document, recording, or video cassette encoded there. If the machine has it, the student would have it too, either placed directly on a viewing screen, or reproduced in print-on-paper for more leisurely study.

Isaac Asimov. The New Teachers (1976)

From the intro to Westworld series

In the first part we have seen how the global automation has happened in the past, starting with the Luddites fighting against the first industrialization and finishing with the extinct occupations like elevator operator and computist. We have seen that so far the automation of various occupations was not total, but partial, resulting in the increase, rather than reduction, of the demand for respective professionals.

Usually, the partial automation transforms an occupation towards a more “human”, creative content. But what about the occupations that are creative per se? Does automation shrink the creative markets today?

Let us review, for example, the musician occupation. In the 19th century, with no sound recording technologies, the demand for musicians was stable: whenever you wanted to listen to good music, a live musician, or a small orchestra for major pieces, was required. A decent musician always had job, not just performers but composers too: Johann Sebastian Bach, as per contract with the Thomaskirche (St. Thomas Church) in Leipzig, was supposed the write a new cantata every week, literally any given Sunday. A church in nearby Magdeburg employed another composer, not necessarily on the Bach’s level.

However, in the early 20th century people started to listen to gramophone music, and, with the advent of sound movies, even the pianists were outdated. Today, in Leipzig or Magdeburg, they still play Bach, and anybody can listen to Goldberg Variations absolutely free, brilliantly performed and on instruments, Bach could not even dream of (the modern grand piano is far beyond the claviers of that time; though finding records of an “original” instrument is not difficult at all too).

Does this mean that the demand for the musician’s profession has disappeared, and today some dozen composers and orchestras around the world provide for all the needs of new music, the rest of the work being done by mass replication (which, according to the same logic, should also be concentrated in the hands of several major labels)? Not at all: there are even more musicians, composers, and performers! Modern development of technologies (not only in sound recording, but also in communications) allowed that the most diverse musicians find their audience, all flowers blossomed, and today more people can earn their living by music than in the past. The top of this pyramid earns far more than Prince Leopold von Anhalt-Köthen, who was once a patron of the same Johann Sebastian, but also the “middle class” is very far from poverty. Furthermore, even the projections of employment are quite favorable. This applies to the performers too: the sound recording did not kill the “excessive” musicians, but, on the contrary, allowed more people to learn about them, raising the demand for live concerts.

The same applies to the other creative occupations. It even surprises me a bit: we can download almost any book for free, there are certainly enough good books for a whole life of reader of any taste, but hundreds of thousands of authors all over the world write and successfully sell their works, not only in new genres (one can believe that a fan of slash-fanfics can hardly be satisfied by Dostoevsky — although…), but also in quite traditional ones.

But enough of history. Let us now talk about whom robots will actually replace in the nearest future, and why this should not be so feared.

Threat or benefit?

How far has gone “progrès”? Laboring is in recess,
And mechanical, for sure, will the mental one replace.
Troubles are forgotten, no need for quest,
Toiling are the robots, humans take a rest.

Yuri Entin. From the movie Adventures of Electronic

A face of long-haul driver (New England Journal of Medicine)

Elevator operators, computists and weavers perfectly symbolize those activities that have been automated so far: technical work, where the output result is expected to maximally match certain parameters, and creativity is not only difficult, but forbidden and, in fact, harmful. Of course, an elevator operator could smile to his passengers, or the computer girl could, having scorned potential damage to her reputation, get acquainted with Richard Feynman himself. But their function was to accurately perform clear, algorithmically defined actions.

I will allow myself a little emotion: these are exactly the kinds of activities that must be automated further! There is nothing human in following a fixed algorithm. Monotonous work with a predetermined input and output is always an extreme measure, the forced oppression of the human spirit in order to achieve certain practical goal. Moreover, if the goal can be achieved without wasting real live human time, that is the way to proceed. As, for example, has happened to weavers and bank employees, when the first stopped performing the machine functions, and the latter — the ones of ATM.

Therefore, I believe that the ongoing narrative about how terrible it would be when a huge army of truckers is replaced by driverless vehicles is not just groundless, but counterproductive. Moving a vehicle from point A to point B is exactly the most typical example of a monotonous strictly algorithmic task, which shall not, if possible, be performed by a human being. There will be no huge social problem either: people who like to fiddle with cars and similar equipment will for sure find jobs in service and maintenance. Similarly, there was no social revolution, when grooms and coachmen lost their relevance — they just switched to other occupations. And percentagewise there was no less of them than truckers.

By the way, I cannot help recalling here an important maxim of show business: “The Simpsons did it first”. The best animated series ever has depicted the influence of unmanned vehicles on truckers… back in 1999, in the Maximum Homerdrive episode (video clip, description in Russian).

Curiously, by the way, the most popular Russian example of a “mass low-skilled occupation”, watchmen and security guards, is not directly exposed to any threat, because their function is not so much monotonously checking documents (it has been ubiquitously automated long time ago), rather than communicating with people whose documents are not in order. Moreover, this is yet long way from being automated.

However, at first glance, it seems that now it comes to the fact that computers are beginning to gradually replace people in areas that were previously considered human prerogative. For example, since 2014, Facebook has been able to recognize faces as well as humans do, and computer vision technologies continue improving; they were partly behind the emergence of driverless vehicles. A lot of modern publications predict social collapse and 50% and even higher unemployment.

Is that correct? Where did this sudden surge of interest to automation come from, while, after all, artificial intelligence has been progressing for a very long time? Let us figure it out.

On the nature of fears

There is no indispensable man.
Woodrow Wison; and Stalin, apparently, never said it.

The Simpsons, Episode 503, “Them, Robot”

Many popular articles on the horrors of automation actually go back to the same publication: in 2013, the researchers from Oxford Carl Benedikt Frey and Michael A. Osborne have published a paper titled The Future of Employment: How susceptible are jobs to computerisation? In that paper they come to an ambitious conclusion that about 47% of total us employment is at risk. They have later conducted a similar study based on data from Great Britain, and the resulting numbers were no less frightening. However, let us try to figure out in detail where do this numbers come from.

As a practicing scientist in the field of machine learning and data analysis, I cannot bypass the actual methodology of Frey and Osborne. It was the following:
- Frey and Osborne have gathered a group of machine learning researchers and labelled 70 assorted occupation by answering, for each one, a binary question: is the occupation automatable in the near future or not; such labelling was made based on a special classifier describing occupations and their related tasks;
- Then they have identified nine variables that describe each occupation based on the required dexterity, creativity and social interaction;
- They have built created several classifiers and trained them to predict the “automatability” labelled on the first stage based on those nine variables; the best one proved to be the Gaussian process classifier;
- And finally, they used the classification for all 702 occupations, thus obtaining their alarming results.
I have nothing against Gaussian processes — it is a very intelligent classification method for such case, when the sample is very small. However, it is important to understand that the data for this classifier were labelled by humans, and it represented their subjective assumption as to automatability of a particular occupation.

I am far from applying to the research by Frey and Osborne of the main principle of data analysis: “garbage in — garbage out”. The popular articles, although, which often state simply “Frey and Osborne… studied 702 occupations, using a Gaussian process”, and then refer to the conclusions as a scientific result, are also obviously cunning. Even if the classifier were ideal (which is hardly the case — too little and too rough criteria), it would not answer the question, weather a given occupation is automatable, but rather a far less impressive one: “Do the people who have labelled the initial 70 occupations from the Frey and Osborne sample data consider that occupation automatable too?”

Although Frey and Osborne write that “the fact that we label only 70 of the full 702 occupations… further reduces the risk of subjective bias affecting our analysis”, the reality is that no data analysis can add here any “scientific objectivity”. It is still an arbitrary assumption of the researchers, just instead of judgmental evaluation of occupations they have produced a judgmental evaluation of the particular properties (attributes) and have automatically extended their assumption from 70 to all 702 occupations — there is a certain irony in such automation, is not it? They could, by the way, have labelled them all manually, seven hundred are not seven hundred thousand, after all…

There were also other studies, but essentially, in such a futurologic issue, there is no method, other than expert survey. Moreover, the experts, unfortunately, provide extremely, if not excessively optimistic forecasts too. And this happens despite the fact that in artificial intelligence great promises have been heard since the first years of its existence as a science, but very rarely come true. When in the late 1950s Frank Rosenblatt created the first perceptron, the simplest model of machine learning, the New York Times (not any tabloid!) wrote the following: “Perceptrons will be able to recognize people… and instantly translate speech in one language to speech or writing in another language.” As you can see, the recognition succeeded only in more than half century, and the “instant translation” did not work out so far.

Such exaggerated expectations have already caused two “winters of artificial intelligence”: first in the late sixties, it became clear that the “instant translation” would not be reached very soon, and then the second wave of the hype ended similarly in the late 1980s. And now we are living through the third wave of the artificial intelligence hype. And I am afraid that if the frightening forecasts continue to increase and the wave of hype turns into a tsunami, we, the researchers in the field of machine learning, will again have to recall the House Stark family motto…

The current wave, of course, did not come out of nowhere: the achievements of modern models of machine learning are really stunning, and they still continue further. Is, however, the modern machine learning really ready to completely replace people in the mass occupations? This will be discussed in detail in the next part of our series.

Sergey Nikolenko
Chief Research Officer, Neuromation
December 23, 2017
AI Risk: Should We Be Worried?

Recently, discussions about the risk of “strong AI” have finally reached mainstream media. For a very long time, futurists and AI philosophers have been worried about superhuman artificial intelligence and how we could possibly make it safe for humanity to deal with a smarter “opponent”. But now, their positions have finally been heard by trendsetters among both researchers and industry giants: Bill Gates, Stephen Hawking, and Elon Musk have all recently warned against AI dangers. But should we be worried? Let us try to find out…

What is “strong AI”, anyway?

When people talk of “strong AI”, they usually define it rather vaguely, as “human-level AI” or “superhuman AI”. But this is not really a definition we can use, it merely begs the question of what “human level” is and how you define it. So what is “strong AI”? Can we at least see the goal before we try to achieve it?

The history of AI has already seen quite a few examples of “moving the goalposts”. For example — for quite a while the go-to example of a task that certainly requires “true intelligence” has been chess playing. René Descartes famously argued that no machine could be intelligent, an argument that actually led him to mind-body dualism. He posited that the “diversity” in a machine is limited by the “diversity” supplied by its designer, which early dualists had taken to imply that a chess playing machine could never outplay its designer.

Yet Deep Blue beat Kasparov in 1997, and humans are absolutely no match for modern chess engines. Perhaps even more significantly, recently AlphaZero, a reinforcement learning system based on deep neural networks, has taught itself to play chess by self-play, starting from scratch, with no additional information except the rules of the game; in a few hours AlphaZero exceeded the level of very best humans, and in a few days beat Stockfish, one of the best specialized chess engines in the world.

How do we, humans, respond to this? We say that early dualists were wrong and brush chess engines off: of course chess is a problem well suited for computers, it’s so discrete and well-defined! A chess engine is not “true AI” because we clearly understand how chess engines work and know that they are not capable of “general intelligence”, whatever that means.

What about computer vision, like recognizing other humans? That would require human level intelligence, wouldn’t it? Yet in 2014, Facebook claimed that it achieved human-level performance in face recognition, and this performance has only improved further since then. Our human response to this was to say that, of course, face recognition is not “true AI”, and we fall back on asking computers to pass the Turing test.

Alan Turing, by the way, was one of the first thinkers to boldly hypothesize that a machine would be able to play chess well. His test of general intelligence is based on understanding human language, arguably a much better candidate for a true test of general intelligence than chess or even face recognition. We are still far from creating a machine that would understand language and generate passable conversation. Yet I have a strong feeling that when a computer program does pass the Turing test, it will not be a program with general human-level intelligence, and all of us will quickly agree that the Turing test falls short of the goal and should not be used as a test for general intelligence.

To me this progression means that “human-level intelligence” is still a poorly defined concept. But for every specific task we seem to usually be able to achieve human level and often exceed it. The exception right now is natural language processing (including, yes, the Turing test): it seems to rely too intimately on a shared knowledge and understanding of the world around us, which computers cannot easily learn… yet.

Can we make strong AI, theoretically speaking?

Emphatically yes! Despite this difficulty with definitions, there are already billions of living proofs that human-level intelligence is possible regardless of how you define it. The proof is in all of us: if we can think with our physical brains, it means that our abilities can be at least replicated in a different physical system. You would have to be a mind-body dualist like Descartes to disagree with this. Moreover, our brains are very efficient, requiring about 20W to run, like a light bulb, so there is no physical constraint against achieving “true intelligence”.

Even better (or worse, depending on your outlook), we know of no principled reason why we humans cannot be much smarter than we are now. We could try to grow ourselves a larger cerebral cortex if not for two reasons: first, larger brains need a lot of energy that early humans simply would not be able to provide, and second, giving birth to babies with even larger heads would likely be too dangerous to be sustainable. Neither of these reasons applies to AI. So yes, I do believe that it is possible to achieve human-level intelligence and surpass it for AI, even though right now we are not certain what it means exactly.

On the other hand, I do not see how achieving human-level intelligence will make us “obsolete”. Machines with superhuman strength, agility, speed, or chess playing ability have not made us obsolete; they serve us and improve our lives, in a world that remains human-centric. A computer having superhuman intelligence does not immediately imply that it will have its own agenda, its own drives and desires that might contradict human intentions, in the same way as a bulldozer or a tank does not suddenly decide to go and kill humans even though it physically could. For example, modern reinforcement learning engines can learn to play computer games by looking at the screen… except for one thing: you have to explicitly tell the model what the score is, otherwise it won’t know what to optimize and what to strive for. And how do we avoid accidentally making a superhuman AI with an unconstrained goal to wipe out humanity… well, this is exactly what AI safety is all about.

Can we make it safe? And when will it hit us?

Elon Musk recently claimed that we only have a “five to 10 percent chance of success” in making AI safe. I do not know enough to argue with this estimate, but I would certainly argue that Elon Musk also cannot know enough to make estimates like this.

First, there is an easy and guaranteed way to make AI safe: we should simply stop all AI research and be satisfied with what we have right now. I will bet any money that modern neural networks will not suddenly wake up and decide to overthrow their human overlords — not without some very significant advances that so far can only come from humans.

This way, however, is all but closed. While we have seen in the past that humanity can agree to restrain itself from using its deadly inventions (we are neither dead nor living in a post-nuclear apocalyptic world, after all), we can hardly stop inventing them. And in the case of a superhuman AI, simply making it for the first time might be enough to release it on the world; the AI itself might take care of that. I strongly recommend the AI-Foom debate where Robin Hanson and Eliezer Yudkowsky argue about the likelihood of exactly this scenario.

On the other hand, while there is no way to stop people from inventing new AI techniques, it might well turn out that it is no easier to build a strong AI in your garage than a nuclear warhead. If you needed CERN level of international cooperation and funding to build a strong AI, I would feel quite safe, knowing that thousands of researchers have already given plenty of thought to inventing checks and balances to make the resulting AI as safe as possible.

We cannot know now which alternative is true, of course. But on balance, I remain more optimistic than Elon Musk on this one: I give significant probability to the scenario in which creating strong AI will be slow, gradual, and take a lot of time and resources.

Besides, I feel that there is a significant margin between creating human-level or even “slightly superhuman” AI and an AI that can independently tweak its own code and achieve singularity by itself without human help. After all, I don’t think I could improve myself much even if I could magically rewire the neurons in my brain — that would take much, much more computing power and intelligence than I have. So I think — better to say, I hope — that there will be a significant gap between strong AI and true singularity.

However, at present neither myself nor Elon Musk has any clue about what the future of AI will look like. In 10 years, the trends will look nothing like they do today. It would be like trying to predict at the year 1900 what the future of electricity would look like. Did you know that, for example, in the year 1900 more than a third of all cars were electric, and an electric car actually held the speed record in 1900?..

So should we be worried?

Although I do believe that the dangers of singularity and AI safety are real and must be addressed, I do not think that they are truly relevant right now.

I am not really sure that we can make meaningful progress towards singularity or towards the problem of making AI friendly right now. I feel that we are still lacking the necessary basic understanding and methodology to achieve serious results on strong AI, the AI alignment problem, and other related problems. My gut feeling is that while we can more or less ask the right questions about strong AI, we cannot really hope to produce useful answers right now.

This is still the realm of philosophy — that is to say, not yet the realm of science. Ancient Greek philosophers could ask questions like “what is the basic structure of nature”, and it seems striking that they did arrive at the idea of elementary particles, but their musings on these elementary particles can hardly inform modern particle physics. I think that we are at the ancient Greek stage of reasoning about strong AI right now.

On the other hand, while this is my honest opinion, I might be wrong. I sincerely endorse the Future of Humanity Institute, CSER (Centre for the Study of Existential Risk), MIRI (Machine Intelligence Research Institute), and other institutions that try to reason about the singularity and strong AI and try to start working on these problems right now. Just in case there is a chance to make real progress, we should definitely support the people who are passionate about making it.

To me, the most important danger of the current advancement of AI technologies is that there might be too much hype right now. The history of AI has already seen at least two major hype waves. In the late 1950’s, after Frank Rosenblatt introduced the first perceptron, The New York Times (hardly a sensational tabloid) wrote that “Perceptrons will be able to recognize people… and instantly translate speech in one language to speech or writing in another”. The first AI winter resulted when a large-scale machine translation project sponsored by the U.S. government failed utterly (we understand now that there was absolutely no way machine translation could have worked in the 1960’s), and the government withdrew most of its support for AI projects. The second hype wave came in the 1980’s, with similar promises and very similar results. Ironically, it was also centered around deep neural networks.

That is why I am not really worried about AI risk but more than a little worried about the current publicity around deep learning and artificial intelligence in general. I feel that the promises that this hype wave is making for us are going to be very hard to deliver on. And if we fail, it may result in another major disillusionment and the third AI winter, which might stifle further progress for decades to come. I hope my fears do not come true, and AI will continue to flourish even after some inevitable slowdowns and minor setbacks. It is my, pardon the pun, deep conviction that this way lies the best bet for a happy future for the whole of humanity, even if this bet is not a guarantee.

Sergey Nikolenko,
Chief Research Officer, Neuromation

December 21, 2017
AI: Should We Fear The Singularity?
Source: https://www.if24.ru/ai-opasna-li-nam-singulyarnost/

Recently, discussions on artificial intelligence (AI) in popular publications have become increasingly alarmist. Some are trying to prove that AI will oust 90% of live people from the market, condemning them to unemployment and misery. Others go even further, asking whether the humankind can find in strong artificial intelligence an existential risk that no hydrogen bomb can match. Let us try to find out.

Supporters of treating AI as an existential risk usually mean the “intelligence explosion” scenario, when a powerful AI acquires a capability to improve itself (for example, by rewriting parts of the code), thereby becoming even “smarter”, which allows for even more radical improvements, and so forth. More details about this can be found in the AI-Foom debate between Robin Hanson and Eliezer Yudkowsky, a very interesting read that discusses this exact scenario. The main danger here is that the goals of the resulting superhuman artificial intelligence may not really align to the goals and original intentions of its human creators. A common example in the field goes as follows: if the original task of a powerful AI was something as innocent as producing paperclips, in a week or two after the “intelligence explosion” the Earth might find itself completely covered by fully automated factories of two kinds: factories producing paperclips and factories for constructing spaceships to bring paperclip manufacturing factories to other planets…

Such a scenario does sound upsetting. Moreover, it is very difficult to assess in advance how realistic this scenario is going to prove when we actually do developf a strong AI with superhuman abilities. Therefore, it is a good idea to consider it and try to prevent it — so I agree that the work of Nick Bostrom and Eliezer Yudkowsky is far from meaningless.

However, it is obvious to me, as a practicing machine learning researcher, that this scenario deals with models that simply do not exist yet — and will not appear for many, many years. The fact is that, despite great advances artificial intelligence has made over the recent years, “strong AI” still remains very far away. Modern deep neural networks are able to recognize faces as well as humans do, can redraw the landscape of your summerhouse à la Van Gogh and teach themselves to play the game of Go better than any human.

However, this does not mean much yet; consider a couple of illustrative examples.
1. Modern computer vision systems are still inferior to the visual abilities of a human two-year-old. In particular, computer vision systems usually work with two-dimensional inputs and cannot develop any insight that we live in a three-dimensional world unless explicitly provided supervision about it; so far, this greatly limits their abilities.
2. This lack of intuitive understanding is even more pronounced in natural language processing. Unfortunately, we are still far away from reliably passing the Turing test. The fact is that human languages rely very much on our insight of the world around us. Let me give another standard example: “A laptop did not fit in the bag because it was too big”. What does the pronoun “it” refer to here? What was too big, the laptop or the bag? Before you say it’s obvious, consider a different example: “A laptop did not fit in the bag because it was too small”… There are plenty of such examples. Basically, to process natural language truly correctly the models have to have intuitive understanding and insight into how the world works — and that’s very far away as well.
3. In reinforcement learning, a kind of machine learning used, in particular, to train AlphaGo and AlphaZero, we encounter a different kind of difficulties: problems with motivation. For example, in the classic work by Volodymyr Mnih et al., a model based on deep reinforcement learning learned to play various computer games from the 1980s by just “watching the screen”, by the stream of screenshots from the game. It turned out to be quite possible… with one exception: the game scores still had to be given to the network separately, the humans had to specifically tell the model that this is a number that the model is supposed to increase. Modern neural networks cannot figure out what to do by themselves, they neither strive to expand their capabilities, nor crave for additional knowledge, and attempts to emulate these human drives are still at a very early stage.
4. Will neural networks ever overcome these obstacles and learn to generalize heterogeneous information, understand the world around them and strive to learn new things, just like the humans do? It’s quite possible; after all, we humans somehow manage to. However, these problems now appear extremely difficult to resolve, and there is absolutely no chance that modern networks will suddenly “wake up” and decide to overthrow their human overlords.
However, I do see a great danger in the recent surge of hype over AI in general and deep neural networks in particular. But this danger, in my opinion, is not from AI, but for AI. History has already seen at least two “AI winters”, when excessive expectations, promises, and overzealous hype led to disappointments. Ironically, both “AI winters” were associated with neural networks. First, the late 1950s saw a (naturally, unsuccessful) attempt to transform the Rosenblatt’s perceptron into full-scale machine translation and computer vision systems. Then, in the late 1980s neural networks, which at that point already looked in a quite modern way, could not be trained well enough due to lack of data and computing power. In both cases, exaggerated expectations and inevitably crushed hopes resulted in long periods of stagnation in research. Let us hope that with the current third wave of hype for neural networks, history will decide not to repeat itself, and even if today’s inflated promises do not come true (and it will be difficult to fulfill them), the research will continue anyway…

Allow me a small postscript: I have recently written a short story which is extremely relevant to the topic of strong AI and related dangers. Try to read it — I really hope you like it.

Sergey Nikolenko
Chief Research Officer, Neuromation
December 14, 2017
New Advances in Generative Adversarial Networks, or a Comment on Karras et al., (2017)
A very recent paper by NVIDIA researchers has stirred up the field of deep learning a little. Generative adversarial networks, which we will talk about below, have already been successfully used in a number of important problems, and image generation was always at the forefront of these applications. However, the work by Karras et al. presents a fresh take on the old idea of generating an image step by step, gradually enhancing the image (for example, increasing its resolution) as they go. To explain what is going on here, I will have to step back a little first.

Generative adversarial networks (GANs) are a class of neural networks that aim to learn to generate objects from a certain class, e.g., images of human faces or bedroom interiors (a popular choice for GAN papers due to a commonly used part of the standard LSUN scene understanding dataset). To perform generation, GANs employ a very interesting and rather commonsense idea. They have two parts that are in competition with each other:
- the generator aims to, well, generate new objects that are supposed to pass for “true” data points;
- the discriminator aims to distinguish between real data points and the ones produced by the generator.
In other words, the discriminator learns to spot the generator’s counterfeit images, while the generator learns to fool the discriminator. I refer to, e.g., this post for a simple and fun introduction to GANs.

We at Neuromation are following GAN research with great interest due to many possible exciting applications. For example, conditional GANs have been used for image transformations with the explicit purpose of enhancing images; see, e.g., image de-raining recently implemented with GANs in this work. This ties in perfectly with our own ideas of using synthetic data for computer vision: with a proper conditional GAN for image enhancement, we might be able to improve synthetic (3D-rendered) images and make them more like real photos, especially in small details. We are already working on preliminary experiments in this direction.

This work by NVIDIA presents a natural idea: grow a large-scale GAN progressively. The authors begin with a small network able to produce only, e.g., 4×4 images, train it until it works well (on viciously downsampled data, of course), then add another set of layers to both generator and discriminator, moving from 4×4 to 8×8, train the new layers, and so on. In this way, they have been able to “grow” a GAN able to generate very convincing 1024×1024 images, of much better quality than before.

The idea of progressively improving generation in GANs is not completely novel; for example,
- Chen & Koltun present a cascaded refinement approach that aims to bring small generated images up to megapixel size step by step;
- the well-known StackGAN model by Zhang et al. constructs an intermediate low-dimensional representation and then improves upon it in another GAN;
- and the idea can be traced as far back as 2015, soon after the introduction of GANs themselves, when Denton et al. proposed a pyramid scheme for coarse-to-fine generation.
However, all previous approaches made their progressive improvements separately: the next level of progressive improvement simply took the result of the prevoius layers (plus possibly some noise). In Karras et al., the same idea is executed in a way reminiscent of unsupervised pretraining: they train a few layers, then add a few more, and so on. It appears that this execution is among the most straightforward and fastest to train, but at the same time among the best in terms of results. See for yourself:

Naturally, we are very excited about this advance, which brings image generation, which was first restricted to small pictures (from 32×32 to 256×256 pixels), ever closer to a size suitable for practical use. In my personal opinion, GANs (specifically conditional GANs) may be the exact architecture we need to make synthetic data in computer vision indistinguishable from real data.

Sergey Nikolenko
Chief Research Officer, Neuromation
November 22, 2017
Deep Q-Network

In 2013, Mnih et al. published a paper where one of the standard methods for reinforcement learning, combined with deep learning neural networks, is used for playing Atari. TD-learning (temporal difference learning) is commonly used in contexts in which the reward represents the outcome of a relatively long sequence of actions, and the problem involves redistributing this single reward among the moves and/or states leading to it. For instance, a game of Go can last a few hundred moves, but the model will only get cheese for winning or an electric shock for losing at the very end, when the players reach a final outcome. Which moves out of the hundred made were good or bad? That’s still a big question, even when the outcome is known. It’s quite possible that you were heading for defeat in the middle of the game but then your opponent blundered, and you wound up winning. Also, trying to introduce intermediary goals artificially, like winning material, is a universally bad idea in reinforcement learning. We have ample evidence that a smart opponent can take advantage of the inevitable “near-sightedness” of such a system.

The main idea of TD-learning involves re-using later states, which are close to the reward, as targets for training previous states. We can start with random (probably completely ludicrous) evaluations of a position but then, after each game, perform the following process. First, we are absolutely sure about the final result. For instance, we won, and the result is equal to +1 (hooray!). We push our evaluation of the penultimate position to +1, the third-to-last position to the penultimate one, which has been pushed to +1, etc. Eventually, if you train long enough then you get good evaluations for each position (state).

This method was first successfully used in TD-Gammon, the computer backgammon program. Backgammon wound up being easy enough for the computer to master, because it’s a game played with dice. Since the dice fall in every which way, it wasn’t hard to get training games or an intelligent opponent, who would enable us to explore all the games possible — you simply have the computer play against itself, and the inherent randomness in the game of backgammon would enable the program to explore the vast space of possible game states.

TD-Gammon was developed roughly 30 years ago; however, even back then, a neural network served as its foundation. A position from a game would be the input, and the network predicted an evaluation of the position or your odds of winning. The computer versus computer games would produce a set of test cases new for the network, and then the network kept learning and playing against itself (or slightly earlier versions of itself).

TD-Gammon learned to defeat humans back in the late eighties, but this was attributed to the specific nature of the game — namely, the use of dice. But by now we understand that deep learning can help computers win in numerous other games, too, like Atari games mentioned earlier. The key difference of Mnih et al.’s paper from backgammon or chess was that they did not teach a model the rules of an Atari game. All the computer knew was what the image on the screen — the same one players would see — looked like. The only other input was the current score, which needed to be externally defined, otherwise it was unclear what the objective was. The computer could perform one of the possible actions on the joystick — turning the joystick and/or pushing a button.

The machine spent roughly 200 tries on figuring out the objective of the game, another 400 to acquire skills, and then the computer started winning after about 600 games.

Q-training is used here, too. In the same exact way, we try to build a model that approximates the Q-function, but now this model is a deep convolutional network. This approach proved to work very well. In 29 games, including wildly popular ones like Space Invaders, Pong, Boxing, and Breakout, the system wound up being better than humans. Now, a team from DeepMind responsible for this design is focusing on games from the 1990s (probably Doom will be their first project). There’s no doubt that they beat these games in the near future and keep moving forward, to the latest releases.

Another interesting example of how the Deep Q-Network is used is paraphrasing. You are given a sentence, and you want to write it in a different way, yet express the same meaning. This task is a bit artificial, but it’s very closely linked to text generation in general. In a recently proposed approach, the model contains an LSTM-RNN (Long Short-Term Memory Recurrent Neural Network), which serves as an encoder, condensing the text and making it into a vector. Then this condensed version is “unfolded” into a sentence with a decoder. Since it’s decoded from its condensed form, then, most likely, the new sentence will be different. This process is called the encoder-decoder architecture. Machine translation works in a similar manner. We condense the text in one language, and then unfold it using roughly the same models but in a different language, and assuming that the encoded version is semantically similar. Deep Q-Network can iteratively generate various sentences from a hidden version and various types of decoding to move the final sentence closer to the initial one over time. The model’s behavior is rather intelligent: In the experiments, DQN first fixes the parts that we have already rephrased well, and then moves on to more complex parts where the quality has been worse so far. In other words, DQN supplants a decoder in this architecture.

What’s next?

Contemporary neural networks are getting smarter with each day. The deep learning revolution occurred in 2005–2006, and since then, interest in this topic has only continued to grow. New research is published every month, if not every week, and new interesting applications of deep learning networks are cropping up. In this article, which we hope has been sufficiently accessible, we have tried to explain how this deep learning revolution fits into modern history and development of neural networks, and we went into more detail about reinforcement learning and how deep learning networks can learn to interact with their environment.

Multiple examples have shown that now, when deep learning is undergoing explosive growth, it’s quite possible to create something new and exciting that will solve real tasks without huge investments. All you need is a modern video card, enthusiasm, and the desire to try new things. Who knows, maybe you’ll be the one to make history during this ongoing revolution — at any rate, it’s worth a try.

Sergey Nikolenko
Chief Research Officer, Neuromation

November 16, 2017
Who Will Be Replaced by Robots I, or “Man! That has a proud sound!”
[translated from the Russian version by Andrey V. Polyakov]

Recently, a new round of conversations about self-moving carriages and other potential achievements of artificial intelligence has again posed one of the classic questions of humanity: who will be marginalized by the monorail track of the next technological revolution? Some studies argue that artificial intelligence in the near future would lead to a surge of unemployment comparable to the Great Depression. Today we will also talk about who can be replaced by computers in the near future, and who can, without fear and even with some degree of self-satisfaction, expect the arrival of our silicon overlords.

Luddites: senseless English riot or reasonable economic behavior?

Some people believe labor-saving technological change is bad for the workers because it throws them out of work. This is the Luddite fallacy, one of the silliest ideas to ever come along in the long tradition of silly ideas in economics.William Easterly. The Elusive Quest for Growth: Economists’ Adventures and Misadventures in the Tropics

Such a text could hardly do without recalling the most famous opponents of technological progress: the Luddites. Moods against technological progress were strong among English textile workers as far back as the 18th century, but Nottingham manufacturers did not receive letters on behalf of Ned Ludd until 1811. The trigger for the transition to active actions was the introduction of stocking machines, which made the skills of skilled weavers unnecessary: now stockings could be sewn of separate parts without special skills. The resulting product was, by the way, much worse, with stockings quickly bursting at the seams, but they were so much cheaper and they still enjoyed immense popularity. Fearing to lose work, the weavers began attacking factories and smashing newfangled machines.

Luddites were well organized. They understood that conspiracy was vital for them; they brought terrible oaths of loyalty, acted under the cover of night and were rarely apprehended. By the way, most probably General Ludd has never existed: it was a folklore “bigger than life” figure like Paul Banyan. The fight against Luddites was taken seriously: in 1812, “machine breaking” was proclaimed a capital crime, and there were more British soldiers suppressing uprisings than there were fighting Napoleon in the same years! Yet, the Luddites managed to significantly reduce automation in textile production, the prices for products increased, and the goals of the movement were partially achieved. But honestly, when did you last wear any hand-woven stockings, socks or pantyhose?…

We have placed a quote from the book of economist William Easterly as epigraph to this section. Easterly explains that the Luddite movement has spawned (no longer among English weavers, but in quite intellectual circles) the erroneous idea, which he calls the Luddite fallacy, and which still, from time to time, occurs to quite real economists. The idea is that the development of automation is supposed to inevitably lead to a reduction in employment, because fewer people are needed to maintain the same level of production. However, both theory and practice show that quite another outcome is no less likely: simply the same, or even a larger, number of people would produce more goods! And the progress is usually on the side of workers, increasing their productivity and, consequently, their income. Yes, the last consequence is not always that obvious, but there is no apparent way to increase the welfare of each individual worker.

Nevertheless, these arguments do not refute the Luddites themselves. Regardless of the rhetoric, the Luddites were afraid not of a bright future, in which every stockinger could make a hundred pairs of stockings a day, but of immediate tomorrow, when they were on the street and no one would need the only skill they possessed. Moreover, even Easterly does not deny that progress can lead to unemployment and decline of prosperity for certain workers, even if the average well-being of the people grows. Let us take a closer look at this argument.

Whom did the robots replace recently

…we set up this room with girls in it. Each one had a Marchant: one was the multiplier, another was the adder. This one cubed — all she did was cube a number on an index card and send it to the next girl… The speed at which we were able to do it was a hell of a lot faster… We got speed… that was the predicted speed for the IBM machine. The only difference is that the IBM machines didn’t get tired and could work three shifts. But the girls got tired after a while.Richard Feynman. Surely You’re Joking, Mr. Feynman!

An economist from Harvard James Bessen conducted an interesting study. He took a list of major 270 occupations used in the 1950 U.S. Census, and then checked how many had been completely automated so far. It turned out, that out of 270 occupations:
- 232 are active so far;
- 32 were eliminated due to decline of demand and changes in market structure (Bessen refers to, e.g., boardinghouse keepers, however, how would you call people occupied in Airbnb…);
- five became obsolete due to technological advances, but were never automated (for example, a telegraph operator);
- and only one occupation was really fully automated.
Those, willing to think, will have time until the next paragraph.

The only fully automated occupation in Bessen’s research is elevator operator. At this point, our 80+ readers who grew up in the U.S. can certainly sigh and complain that earlier warm colored boys closed behind you double elevator doors, which could not close themselves, and pressed warm lamp buttons… but was it so good for you, and for the colored boys themselves? Maybe they should go to school after all?…

We can add another interesting career path to Bessen’s research. For quite a long time computers have completely replaced the occupation of… computer. Oddly enough, not everyone knows about the existence of this occupation, although it seems obvious that before the advent of computers it was necessary to calculate manually. Many great mathematicians were also outstanding computists: for example, Leonhard Euler could carry out complex calculations in his mind and often amused himself with some complicated exercise when his wife managed to get him to the theater. But how were the calculations made, for example, in the Manhattan project, where no physicist could have managed them on his own, “on paper”?

Well, they really did them manually. The main idea is described in the epigraph: when people perform operations sequentially, as if on a conveyor, it turns out much faster. Of course, humans are prone to error, and the results of the calculations had to be rechecked, but for this, special algorithms can also be developed. In the 19th century, human computers compiled mathematical tables (for example, sinuses or logarithms), and in the twentieth century they worked for military needs, including the Manhattan project. It is computists, by the way, who are responsible for the fact that the occupation of programmer was at first considered female: the computists were mostly girls (for purely sexist reasons — it was believed that women are better suited for monotonous tasks), and the first programmers were often recruited from them.

In his book When Computers Were Human, David Alan Grier describes the atmosphere of the work of living calculators, citing Dickens (another unexpected affinity to real Luddites): “A stern room, with a deadly statistical clock in it, which measured every second with a beat like a rap upon a coffin lid.” Should we be nostalgic for this patriarchal, but by no means bucolic picture?

Bessen distinguishes between complete and partial automation. Indeed, a hereditary elevator operator could have a hard time in a brave new world (there is only one problem: hereditary elevator operators apparently did not have enough time to grow). However, if only part of what you are doing is automated, the demand for your occupation can even increase. And again one can return to the Luddites: during the 19th century, 98% of the labor required to weave a yard of cloth was automated. Theoretically, it was possible to dismiss forty-nine weavers out of fifty and produce the same amount of cloth. However, the final effect was opposite: the demand for cheap cloth grew dramatically, resulting in demand for weavers, and, therefore, the number of jobs increased significantly. And in the 1990’s, the widespread deployment of ATMs did not reduced, but even increased the demand for bank tellers: ATMs allowed banks to operate branch offices at lower cost; this prompted them to open many more branches.

Despite numerous technological innovations, complete occupation automation in the last 50–100 years did not have a noticeable effect in the society. Do you know anyone whose grandfather was an elevator operator and grandmother a computer, who have lost jobs and sunk to the bottom of society because of the goddamned automation?

But partial automation over these years has radically changed the content of work for the vast majority of us. Computers and especially the Internet have made many professions many times more productive — can you imagine how long it would take me to collect data for this article in the 1950’s? And globalization and automation of economy tangibly raised the standard of living — for all, not just the elite. It would be trivial to say that we now live better than medieval kings did — but try to compare our standard of living with the way our grandparents lived. I will not give any examples or statistics, every reader has their own experience thereof: just recall your life before such trifles as mobile phones, Google, microwave ovens, dishwashers (and even washing machines were recently not in all households)…

Therefore, it seems that progress and automation have so far only helped people, and the Luddites’ fears were greatly exaggerated. But, maybe, “the fifth industrial revolution” will be absolutely different? In the second part of the article we will try to dream on this topic.

“The threat of automation”

How far has gone “progrès”? Laboring is in recess,
And mechanical, for sure, will the mental one replace.
Troubles are forgotten, no need for quest,
Toiling are the robots, humans take a rest.

Yuri Entin. From the movie Adventures of Electronic

Elevator operators, computists and weavers perfectly symbolize those activities that have been automated so far: technical work, where the output result is expected to maximally match certain parameters, and creativity is not only difficult, but forbidden and, in fact, harmful. Of course, an elevator operator could smile to his passengers, or the computer girl could, having scorned potential damage to her reputation, get acquainted with Richard Feynman himself. But their function was to accurately perform clear, algorithmically defined actions.

I will allow myself a little emotion: these are exactly the kinds of activities that must be automated further! There is nothing human in following a fixed algorithm. Monotonous work with a predetermined input and output is always an extreme measure, the forced oppression of the human spirit in order to achieve certain practical goal. And if the goal can be achieved without wasting human time, that is the way to proceed.

However, now it comes to the fact that computers start gradually replacing people in areas that were so far considered as human domain. For example, since 2014, Facebook is able to recognize faces on human performance level, and computer vision technologies are only improving.

Sergey Nikolenko
Chief Research Officer, Neuromation
November 15, 2017
Convolutional Networks
And now let us turn to convolutional networks. In 1998, the French computer scientist Yann LeCun presented the architecture of a convolutional neural network (CNN).

The network is named after the mathematical operation of convolution, which is often used for image processing and can be expressed by the following formula:

$(f\ast g)[m, n] = \sum_{k, l} f[m-k, n-l]\cdot g[k, l],$

where $f$ is the original matrix of the image, and $g$ is the convolution kernel (convolution matrix).

The basic assumption is that the input is not a discrete set of independent dimensions but rather an actual image where the relative placement of the pixels is crucial,. Certain pixels are positioned close to one another, while others are far away. In a convolutional neural network one second-layer neuron is linked to some of the first-layer neurons that are located close together, not all of them. Then these neurons will gradually learn to recognize local features. Second-layer neurons designed in the same way will respond to local combinations of local first-layer features and so on. A convolutional network almost always consists of multiple layers, up to about a dozen in early CNNs and up to hundreds and even thousands now.

Each layer of a convolutional network consists of three operations:
- the convolution, which we have just described above,
- non-linearity such as a sigmoid function or a hyperbolic tangent,
- and pooling (subsampling).
Pooling, also known as subsampling, applies a simple mathematical function (mean, max, min…) to a local group of neurons. In most cases, it’s considered more important for higher-layer neurons to check whether or not a certain feature is in an area than to remember its precise coordinates. The most popular form of pooling is max-pooling: the higher level neuron activates if at least one neuron it its corresponding window activated. Among other things, this approach enables you to make the convolutional network resistant to small changes.

Numerous modern computer vision applications run on convolutional networks. For instance, Prisma, the app you’ve probably heard about runs on convolutional neural networks. Practically all modern computer visual apps use convolutional networks for recognizing objects on images. For instance, CNN-based scene labeling solutions, where an image from a camera is automatically divided into different zones classified as known objects like “pavement”, “car”, or “tree”, underlie driver assistance systems. Actually, the drivers aren’t always necessary: the same kind of networks are now used for creating self-driving cars.

Reinforcement Learning

Generally, machine learning tasks are divided into two types — supervised learning, when the correct answers are already given and the machine learns based on them, and unsupervised learning, when the questions are given but the answers aren’t. Things look different in real life. How does a child learn? When she walks into a table and hits her head, a signal saying, “Table means pain,” goes to her brain. The child won’t smack her head on the table the next time (well, maybe two times later). In other words, the child actively explores her environment without having received any correct answers. The brain doesn’t have any prior knowledge about the table causing pain. Moreover, the child won’t associate the table itself with pain (to do that, you generally need careful engineering of someone’s neural networks, like in A Clockwork Orange) but with the specific action undertaken in relation to the table. In time, she’ll generalize this knowledge to a broader class of objects, such as big hard objects with corners.

Experimenting, receiving results, and learning from them — that’s what reinforcement learning is. Agents interact with their environment and perform certain actions, the environment rewards or punishes these actions, and agents continue to perform them. In other words, the objective function takes on the form of a reward. At every step of the way, agents, in some state S, select some action A from an available set of actions, and then the environment informs the agents about which reward they’ve received and which new state S’ they’ve reached.

One of the challenges of reinforcement learning is ensuring that you don’t accidently learn to perform the same action in similar states. Sometimes we can erroneously link our environment’s response to our action immediately preceding this response. This is a well-known bug in our brain, which diligently looks for patterns where they may not exist. The renowned American psychologist Burrhus Skinner (one of the fathers of behaviorism; he was the one to invent the Skinner box for abusing mice) ran an experiment on pigeons. He put a pigeon in a cage and poured food into the cage at perfectly regular (!) intervals that did not depend on anything. Eventually, the pigeon decided that its receiving food depended on its actions. For instance, if the pigeon flapped its wings right before being fed then, subsequently, it would try to get food by flapping its wings again. This effect was later dubbed “pigeon superstitions”. A similar mechanism probably fuels human superstition, too.

The aforementioned problem reflects the so-called exploitation vs. exploration dilemma. On the one hand, you have to explore new opportunities and study your environment to find something interesting. On the other hand, at some point you may decide that “I have already explored the table and understood it causes pain, while candy tastes good; now I can keep walking along and getting candy, without trying to sniff out something lying on the table that may taste even better.”

There’s a very simple — which doesn’t make it any less important — example of reinforcement learning called multi-armed bandits. The metaphor goes as follows: an agent sits in a room with a few slot machines. The agent can drop a coin into the machine, pull the lever, and then win some money. Each slot machine provides a random reward from a probability distribution specific to that machine, and the optimal strategy is very simple — you have to pull the lever of the machine with the highest return (reward expectation) all the time. The problem is that the agent doesn’t know which machine has which distribution, and her task is to choose the best machine, or, at least, a “good enough” machine, as quickly as possible. Clearly, if a few machines have roughly the same reward expectation then it’s hard and probably unnecessary to differentiate between them. In this problem, the environment always stays the same, although in certain real-life situations, the probability of receiving a reward from a particular machine may change over time; however, for our purposes, this won’t happen, and the point is to find the optimal strategy for choosing a lever.

Obviously, it’s a bad idea to always pull the currently best — in terms of average returns — lever, since if we get lucky and find a high-paying yet on average non-optimal machine at the very beginning, we won’t move on to another one. Meanwhile, the most optimal machine may not yield the largest reward in the first few tries, and then we will only be able to return to it much, much later.

Good strategies for multi-armed bandits are based on different ways of maintaining optimism under uncertainty. This means that if we have a great deal of uncertainty regarding the machine then we should interpret this positively and keep exploring, while maintaining the right to check our knowledge of the levers that seem least optimal.

The cost of training (regret) often serves as the objective function in this problem. It shows how much the expected reward from your algorithm is less than the expected reward for the optimal strategy when the algorithm simply knows a priori, from some divine intervention, which lever is the optimal one. For some very simple strategies, you can prove that they optimize the cost of training among all the strategies available (up to constant factors). One of those strategies is called UCB-1 (Upper Confidence Bound), and it looks like this:

Pull lever j that has the maximum value of

${\bar x}_j + \sqrt{\frac{2\ln n}{n_j}},$

where ${\bar x}_j$ is the average reward from lever $j$ , $n$ is how many times we have pulled all the levers, and $n_j$ is how many times we have pulled lever $j$ .

Simply put, we always pull the lever with the highest priority, where the priority is the average reward from this lever plus an additive term that grows the longer we play the game, which lets us periodically return to each lever and check whether or not we’ve missed anything and shrinks every time we pull the lever.

Despite the fact that the original multi-armed bandit problem doesn’t imply an transition between different states, Monte-Carlo tree search algorithms, — which were instrumental in AlphaGo’s historic victory, are based directly on UCB-1.

Now let’s return to reinforcement learning with several different states. There’s an agent, an environment, and the environment rewards the agent every step of the way. The agent, like a mouse in a maze, wants to get as much cheese and as few electric shocks as possible. Unlike the multi-armed bandit problem, the expected reward now depends on the current state of the environment, not only on your currently selected course of action. In an environment with several states, the strategy that yields maximum profit “here and now” won’t always be optimal, since it may generate less optimal states in the future. Therefore, we seek to maximize total profit over time instead of looking for an optimal action in our current state (more precisely, of course, we still look for the optimal action but now optimality is measured in a different way).

We can assess each state of the environment in terms of total profit, too. We can introduce a value function to predict the reward received in a particular state. The value function for a state can look something like this:

$V(x_t) \leftarrow \mathbb{E}\left[\sum_{k=0}^\infty\gamma^k\cdot r_{t+k}\right],$

where $r_t$ is the reward received upon making a transition from state $x_t$ to state $х_{t+1}$ , and $\gamma$ is the discount factor, $0 \le γ \le 1$ .

Another possible value function is the Q-function that accounts for actions as well as states. It’s a “more detailed” version of the regular value function: the Q-function assesses expected reward given that an agent undertakes a particular action in her current state. The point of reinforcement learning algorithms often comes down to having an agent learn a utility function Q based on the reward received from her environment, which, subsequently, will give her the chance to factor in her previous interactions with the environment, instead of randomly choosing a behavioral strategy.

Sergey Nikolenko
Chief Research Officer, Neuromation
November 9, 2017
The AI Dialogues

Preface

This is an introduction to modern AI and specifically neural networks. I attempt to explain to non-professionals what neural networks are all about, where these ideas had grown from, why they formed in the succession in which they did, how we are shaping these ideas now, and how they, in turn, are shaping our present and our future. The dialogues are a venerable genre of sci-pop, falling in and out of fashion over the last couple of millennia; e.g., Galileo’s dialogue about the Copernican system was so wildly successful that it stayed on the Index of Forbidden Books for two centuries. In our dialogues, you will hear many different voices (all of them are in my head, and famous people mentioned here do not talk in quotes). The main character is the narrator who will be doing most of the talking; following a computer science tradition, we call her Alice. She engages in conversation with her intelligent but not very educated listeners Bob and Charlie. Alice’s university has standard subscription deals with Springer, Elsevier, and the netherworld, so sometimes we will meet the ghosts of people long dead.

Enjoy!

Dialogue I: From Language to Logic

Alice. Hey guys! We are here with quite a task: we want to create an artificial intelligence, no less. A walking, talking, thinking robot that could do everything a human could. I have to warn you: lots of people have tried, most of them have vastly overestimated themselves, and all of them have fallen short so far. We probably also won’t get there exactly, but we sure want to give it a shot. Where do you suppose we should begin?

Bob. Well… gosh, that sounds hard. To be intelligent a person has to know a lot of things — why don’t we try to write them all down first and let the robot read?

Charlie. We have encyclopaedias, you know. Why don’t we let the computer read Wikipedia? That way it can figure out all sorts of things.

Alice. Riiight… and how would we teach the computer to read Wikipedia?

Bob. Well, you know, reading. Language. It’s a sequence of discrete well-defined characters that combine into discrete well-defined words. We can already make computers understand programming languages or query languages like SQL, and they look exactly the same, only a bit more structured. How hard can it be to teach a computer to read in English?

Alice. Very hard, unfortunately. Natural language is indeed easy to encode and process, but it is very hard to understand — you see, it was not designed for a computer. There is no program even now that could understand English, the best artificial intelligence models struggle very hard with reading — we’ll talk more about this later. But I can give you a quick example from one particularly problematic field called pragmatics. “The laptop did not fit in the bag because it was too big”. What was too big, the bag or the laptop?

Bob. The laptop, obviously.

Alice. Okay. Try another one. “The laptop did not fit in the bag because it was too small”. What was too small, the bag or the laptop?

Bob. Obviously… oh, I see. We understand it because we know the world. But the computer does not know anything about what a laptop is or what a bag is! And the sentence looks very simple, not too contrived at all. But it does look a bit like a handmade counterexample — does this kind of stuff happen often?

Alice. Very often. Our whole system of communication is made for us, wet biological beings who have eyes, ears, and skin, understand the three dimensions, have human urges and drives. There is a lot left unsaid in every human language.

Bob. So the computer can’t just pick up English as it goes along, like children learn to speak, no?

Alice. Afraid not. That is, if it could, it would be wonderful and it would be exactly the kind of artificial intelligence we want to build. But so far it can’t.

Charlie. Well then, we’ll have to help it. You’re saying we can’t just go ahead and write a program that reads English. Okay. So what if we invent our own language that would be more… machine-readable?

Bob. Yeah! It can’t be an existing programming language, you can’t describe the world in C++, but we simply have to make natural languages more formal, clear out the exceptions, all that stuff. Make it self-explanatory, in a way, so that it could start from simple stuff and build upon it. It’ll be a big project to rewrite Wikipedia in this language, but you only have to do it once, and then all kinds of robots will be able to learn to read it and understand the world!

Alice. Cool! You guys just invented what might well be the first serious approach — purely theoretical, of course — to artificial intelligence as we understand it now. Back in the 1660s, Gottfried Leibnitz, the German inventor of calculus and bitter rival of Isaac Newton, started talking about what he called Characteristica universalis, the universal “alphabet of human thought” that would unite all languages and express concepts and ideas from science, art, and mathematics in a unified and coherent way. Some people say he was under heavy influence of the Chinese language that had reached Europe not long ago. Europeans believed that all those beautiful Chinese symbols had a strict system behind them — and they did, but the system was perhaps also a bit messier than the Europeans thought.

Anyway, Leibnitz thought that this universal language would be graphical in nature. He believed that a universal system could be worked out based on diagrams and pictures, and this system would be so clear, logical, and straightforward that machines would be made to perform reasoning in the universal language. Leibnitz actually constructed a prototype of a machine for mathematical calculations that could do all four arithmetic operations; he thought to extend it to a machine for his universal language. It is, of course, unclear how he planned to make a mechanical device understand pictures. But his proposal for the universal language undoubtedly did have a graphical component. Look at a sample diagram by Leibniz — it almost looks like you could use it to summon a demon or two. Speaking of which…

Leibnitz [appearing in a puff of smoke]. Ja! You see, God could not wish to make the world too complicated for His beloved children. We see that in the calculus: it is really quite simple, no need for those ghastly fluxions Sir Isaac was always talking about. As if anybody could understand those! But when you find the right language, as I did, calculus becomes a beautiful and simple thing, almost mechanical. You only need to find the right language for everything: for the science, for the world. And I would build a machine for this language, first the calculus ratiocinator, and then, ultimately, machina ratiocinatrix, a reasoning machine! That would show that snobbish mystic! That would show all of them! Alas, I did not really think this through… [Leibnitz shakes his head sadly and disappears]

Alice. Indeed. Gottfried Leibnitz was the first in a very long line of very smart people who vastly underestimated the complexity of artificial intelligence. In 1669, he envisioned that the universal language could be designed in five years if “selected men” could be put on the job (later we will see how eerily similar this sounds to the first steps of AI in our time). In 1706, he confessed that “mankind is still not mature enough to lay claim to the advantages which this method could provide”. And it really was not.

Charlie. Okay, so Leibnitz could not do this, that doesn’t surprise me too much. But can’t we do it now? We have computers, and lots of new math, and we even have a few of those nice artificial languages like Esperanto already, don’t we?

Alice. Yes and no. But mostly no. First of all, most attempts to create a universal language had nothing to do with artificial intelligence. They were designed to be simple for people, not for machines. Esperanto was designed to have a simple grammar, no exceptions, to sound good — exactly the things that don’t matter all that much for artificial intelligence, it’s not hard for a computer to memorize irregular verbs. Second, even if you try, it is very hard to construct a machine-readable general-purpose language. My favourite example is Iţkuîl, designed in the 2000s by John Quijada specifically to remove as much ambiguity and vagueness from human languages as possible. Iţkuîl is one of the most concise languages in the world, able to express whole sentences worth of meaning in a couple of words. It is excruciatingly hard for humans… but it does not seem to be much easier for computers. Laptops still don’t fit into bags, in any language. There is not a single fluent Iţkuîl speaker in the world, and there has not been any success in artificial intelligence for it either.

Charlie. All right, I suppose it’s hard to teach human languages to computers. That’s only natural: an artificial intelligence lives in the world of ones and zeros, and it’s hard to understand or even imagine the outside world from inside a computer. But what about cold, hard logic? Mathematics? Let’s first formalize the things that are designed to be formal, and if our artificial intelligence can do math it already feels pretty smart to me.

Alice. Yes, that was exactly the next step people considered. But we have to step back a bit first.

It is a little surprising how late logic came into mathematics. Aristotle used logic to formalize commonsense reasoning with syllogisms like “All men are mortal, Socrates is a man, hence, Socrates is mortal”. You could say he invented propositional logic, rules for handling quantifiers like “for all” and “there exists”, and so on, but that would really be a stretch. Mathematics used logic, of course, but for the most part of history, mathematicians did not feel like there are any problems with basing mathematics on common sense. Like, what is a number? Until, in the XIX century, strange counterexamples started to appear left and right. In the 1870s, Georg Cantor invented set theory, and researchers quickly realized there were some serious problems with formal definitions of fundamental objects like a set or a number. Only then it became clear logic was very important for the foundations of mathematics.

The golden years of mathematical logic was the first half of the XX century. At first, there was optimism about the general program of constructing mathematics from logic, in a fully formal way, as self-contained as possible. This optimism is best summarized in Principia Mathematica, a huge work by Bertrand Russell and Alfred Whitehead who aimed to construct mathematics from first principles, from the axioms of set theory, in a completely formal way. It took several hundred pages to get to 1+1=2, but they did manage to get there.

Kurt Gödel was the first to throw water on the fire of this optimism. His incompleteness theorems showed that this bottom-up construction could not be completely successful: to simplify a bit, there will always be correct theorems that you cannot prove. At first, mathematicians took it to heart, but it soon became evident that Gödel’s incompleteness theorems are not really a huge deal: it is very unlikely that we ever come across an unprovable statement that is actually relevant in practice. Maybe P=?NP is one, but that’s the only reasonable candidate so far, and even that is not really likely. And it still would be exceedingly useful to have a program able to prove the provable theorems. So by the 1940s and 1950s, people were very excited about logic, and many thought that the way to artificial intelligence was to implement some sort of a theorem proving machine.

Bob. That makes perfect sense: logical thinking is what separates us from the animals! An AI must be able to do inference, to think clearly and rationally about things. Logic does sound like a natural way to AI.

Alice. Well, ultimately it turned out that it was a bit too early to talk about what separates us from the animals — even now, let alone the 1950s, it appears to be very hard to reach the level of animals, and surpassing them in general reasoning and understanding the world is still far out of reach. On the other hand, it turned out that we are excellent in pattern matching but rather terrible in formal logic: if you have ever had a course in mathematical logic you remember how hard it can be to formally write down the proofs of even the simplest statements.

Charlie. Oh yes, I remember! In my class, our first problem in first order logic was to prove that A->A from Hilbert’s axioms… man, that was far from obvious.

Alice. Yes.There are other proof systems and plenty of tricks that automatic theorem provers use. Still, so far it has not really worked as expected. There are some important theorems where computers were used for case-by-case enumeration (one of the first and most famous examples was the four color theorem), but up to this day, there is no automated prover that would prove important and relevant theorems by itself.

Charlie. So far all you’re saying is that not only it is hard for computers to understand the world, but it is even hard to work with perfectly well-defined mathematical objects!

Alice. Yes. Often, formalization itself is hard. But even when it is possible to formalize everything, like in mathematical logic, it is usually still a long way to go before we can automatically obtain useful new results.

Bob. So what do we do? Maybe for some problems we don’t need to formalize at all?

Charlie. What do you mean?

Bob. I mean, like, suppose you want to learn to fly. Our human way to fly is to study aerodynamics and develop wing-like constructions that can convert horizontal speed to lift and take off in this way. But birds can fly too, maybe less efficiently, but they can. A hundred years ago, we couldn’t simulate the birds and developed other ways through our cunning in physics and mathematics — but what if for intelligence it’s easier the other way around? An eagle does not know aerodynamics, it just runs off a cliff and soars.

Alice. And with this, pardon the pun, cliffhanger we take a break. When we reconvene, we will pick up from here and run with Bob’s idea. In artificial intelligence, it proved surprisingly fruitful.

Sergey Nikolenko,
Chief Research Officer, Neuromation

November 4, 2017
Deep Architectures

Why do we need deep neural networks with dozens of hidden layers? Why can’t we just train neural networks with one hidden layer? In 1991, Kurt Hornik proved the universal approximation theorem, which states that for every continuous function there exists a neural network with linear output that approximates this function with a given accuracy. In other words, a neural network with one hidden layer can approximate any given function as accurately as we want. However, as it often happens, the network will be exponentially large, and even if efficiency isn’t a concern, it still isn’t clear how to make a transition from that network existing somewhere in the space of all possible networks to one we can train in real life.

Actually, with a deeper representation you can approximate the same function and solve the same task more compactly, or solve more tasks with the same resources. For instance, traditional computer science studies Boolean circuits that implement or Boolean functions, and it turns out that many functions can be expressed much more efficiently if you allow for circuits of depth 3, some more functions, of depth 4, and so on, even though circuits of depth 2 can obviously express everything by reducing to a DNF or CNF. Something similar happens in machine learning. Picture a space of test points we want to sort into two groups. If we have only “one layer” then we can divide them with a (hyper)plane. If there are two layers then it comes down to linear separating surfaces composed of several hyperplanes (that’s roughly how boosting works — even very simple models become much more powerful if you compose them in the right way). On the third layer, yet more complicated structures made out of these separating surfaces take shape; it’s not all that easy to visualize them any more; and the same goes for subsequent layers. Below is a simple example from Goodfellow, Bengio, Courville “Deep Learning”. This illustration shows that if you combine even the simplest linear classifiers, each of which divides a plane into two half-planes, then, consequently, you can define regions of a much more complicated shape:

Thus, in deep learning we’re trying to build deeper architectures. We have described the main idea already — we pre-train the lower layers, one-by-one, and then finish training the whole network by finetuning with backpropagation. Therefore, let’s cover pretraining in more detail.

One of the simplest ideas is to train a neural network to copy its input to its output through a hidden layer. If the hidden layer is smaller, then it appears that the neural network should learn to extract some significant features from the data, which enables the hidden layer to restore the input. This type of neural network architecture is called an autoencoder. The simplest autoencoders are feedforward neural networks that are most similar to the perceptron and contain an input layer, hidden layer, and output layer. Unlike the perceptron, an autoencoder’s output layer has to contain as many neurons as its input layer.

The training objective for this network is to bring theoutput vector x’ as close as possible to the input vector x

The main principle of (working with and) training an autoenconder network is to achieve a response on the output layer that is as close as possible to the input. Generally speaking, autoencoders are understood to be shallow networks, although there are exceptions.

Early autoencoders were basically doing dimensionality reduction. If you take much fewer hidden neurons than the inner and outer layers have, you wind up making the network compress the input into a compact representation, while at the same time ensuring that you’ll be able to decompress it later on. That’s what undercomplete autoencoders, where the hidden layer has lower dimension than the input and output layers, do. Now, however, overcomplete autoencoders are used much more often; they have a hidden layer of higher, sometimes much higher, dimension than that of the input layer. On the one hand, this is good, because you can extract more features. On the other hand, the network may learn to simply copy input to output with perfect reconstruction and zero error. To keep this from happening, you have to introduce regularization, not simply optimize the error.

Returning the points to the dataset manifold

Classical regularization aims to avoid overfitting by using a prior distribution, like in a linear regression. But this won’t work in this case. Generally, autoencoders use more sophisticated methods of regularization that change inputs and outputs. One classical approach is a denoising autoencoder, which adds artificial noise to the input pattern and then asks the network to restore the initial pattern. In this case, you can make the hidden layer larger than that of the input (and output) layers: the network, however large, still has an interesting and nontrivial task to learn.

Denoising autoencoder

Incidentally, “noise” can be a rather radical change of the input. For instance, if it’s a binary pixel-based image then you can simply remove some of the pixels and replace them with zeroes (people often remove up to half of them!). However, the objective function you’re reconstructing is still the correct image. So, you make the autoencoder reconstruct part of the input based on another part. Basically, the autoencoder has to learn to understand how all the inputs are designed and understand the structure of the aforementioned manifold in a space with a ridiculous number of dimensions.

How deep learning works

Suppose you want to find faces on pictures. Then one input data point is an image of a predefined size. Essentially, it’s a point in a multi-dimensional space, and the function you’re trying to find should take on the value of 0 or 1 depending on whether or not there’s a face on the picture. For example, mathematically speaking, a 4Mpix photo is a point in the space of dimension about 12 million: 4 million pixels times three color values per pixel. It’s glaringly obvious that only a small percentage of all possible images (points in this huge space) contain faces, and these images appear as tiny islands of 1s in an ocean of 0s. If you “walk” from one face to another in this multi-dimensional space of pixel-based pictures you’ll come across nonsensical images in the center; however, if you pre-train a neural network — by using a deep autoencoder, for instance — then closer to the hidden layer the original space of images turns into a space of features where peaks of the objective function are clustered much closer to one another, and a “walk” in the feature space will look much more meaningful.

Generally, the aforementioned approach does work; however, people can’t always interpret the extracted features. Moreover, many features are extracted by a conglomeration of neurons, which complicates matters. Naturally, one can’t categorically assert this is bad, but our modern notions of how the brain works tell us that different processes happen in the brain. We practically don’t have any dense layers, which means that only a small percentage of neurons responsible for extracting the features in question take part in solving each specific task.

So how do you get each neuron to learn some useful feature? This brings us back to regularization; in this case, to dropout. As we already mentioned, a neural network usually trains by stochastic gradient descent, randomly choosing one object at a time from the training sample. Dropout regularization involves a change in the network structure: each node in the network is removed from training with a certain probability. By throwing away, say, half of the neurons, you get a new network architecture on every step of the training.

Having trained the other half of the neurons, we get a very interesting result. Now every neuron has to train to extract some feature by itself. It can’t count on teaming up with other neurons, because they may have been dropped out.

Dropout basically averages a huge number of different architectures. You build a new model on each test case. You take one model from this gigantic ensemble and perform one step of training, then you take another model for the next case and perform one step of training, and, eventually, you average all of these at the end, on the output layer. It is a very simple idea on the surface, but when dropout was introduced it led to great boons for practically all deep learning models.

We will go off on one more tangent that will link what’s happening now to the beginning of the article. What does a neuron do under dropout? It has a value, usually a number from 0 to 1 or from -1 to 1. A neuron sends this value further, but only 50% of the time rather than always. But what if we do it the other way around? Suppose a neuron always sends a signal of the same value, -namely ½, but sends it with probability equal to its value. The average output wouldn’t change, but now you get stochastic neurons that randomly send out signals. The intensity with which they do so depends on their output value. The larger the output the more the neuron is activated and the more often it sends signals. Does it remind you of anything? We wrote about that at the beginning of the article. That’s how neurons in the brain work. Like in the brain, neurons don’t transmit the spike’s amplitude, but rather they transmit one bit, the spike itself. It’s quite possible that our stochastic neurons in the brain perform the function of a regularizer, and it’s possible that, thanks to this, we can differentiate between tables, chairs, cats, and hieroglyphics.

After a few more tricks for training deep neural networks were added to dropout, it turned out that unsupervised pre-training wasn’t all that necessary, actually. In other words, the problem of vanishing gradients has mostly been solved, at least for regular neural networks (recurrent networks are a bit more complicated). Moreover, dropout itself has been all but replaced by new techniques such as batch normalization, but that would be a different story.

Sergey Nikolenko
Chief Research Officer, Neuromation

November 2, 2017
Chief Research Officer Sergey Nikolenko on knowledge mining

Image credit Neuromation

Block Tribune asked Sergey Nikolenko to comment on the hype around knowledge mining idea.

Earnings from mining crypto currencies are shrinking — the computations complexity grows, while the energy is not getting any cheaper. Many people are looking for alternative applications for expensive hardware purchased during the mining boom. It is quite possible that a substantial part of video cards would be used by scientists or start-ups for complex computing.

With the energy price at RUR 4.5 per kilowatt-hour, there already are not so many wishing to engage in cryptocurrencies mining in Russia. As soon as in winter and spring of this year, investments in new video cards and ASIC chips for a medium-sized cryptocurrency farm paid off completely within few months. Now, in order to earn on mining of many popular crypto currencies, you must first be a millionaire — the format of “makeshift video card” does not work at all, as large farms with well-tuned hardware are required.

That is why, when power plants started to offer their excess capacities on sites with complete infrastructure for rent, miners were immediately interested — with the price of just two rubles per kilowatt-hour. Considering that they are into mining of “light” crypto currencies, like ethereum, Zcash and Monero, earning less than $6 per day, this was a significant support.

Due to energy prices, approximately half of cryptocurrencies are mined in several regions of China. However, the computing complexity will continue to grow, causing a gradual fall of profits. Hence the interest to alternative sources of income.

Since many farms have, in fact, huge computing capacity, they can be primarily used for science purposes.

Usually, for such tasks, supercomputers capacities are rented, e.g “Lomonosov” in Moscow State University being one of the most powerful. These are the machines that will compete with the mining capacities. .

The rental market for mining capacities is already emerging. For example, Neuromation has created a distributed synthetic data platform for neural network applications. Their first commercial product was making store shelves smart. For this, large well-labeled datasets for all the SKUs are created. Algorithms trained on these are able to analyze shelf layout accuracy, percentage of the shelf, and customers’ interaction. The system is capable, actually, to predict customers’ behavior.

The platform requires more than a billion of labeled images of merchandise. Manual labeling of photographs is a painstaking, and very costly, task. For example, on the Amazon Mechanical Turk crowdsourcing service, the manual labeling of a billion pictures would cost about $120 million.

Neuromation entered the market with a new concept of using synthetic data for training neural networks. They generate all the necessary images in 3D generator, similar to a computer game for artificial intelligence. It is partly for this generator that they need large-scale computing capacities, which, if rented from Amazon or Microsoft, would cost tens of millions of dollars. On the other hand, there are thousands of the most advanced video cards available, and they are engaged in ever less profitable ethereum mining.

The founder of Neuromation, Maxim Prasolov, decided, instead of renting capacity for millions of dollars, to lease these mining farms for useful computing, and the company is already using a pool of 1000 video cards. “This is a serious savings for our research process and is beneficial for the miners: the farm services cost 5–10 times cheaper than renting cloud servers, and the miners can earn more by solving fundamental problems instead of mining crypto currency,” he calculates.

It is, of course, to remember that Google has a search for images and Facebook has facial recognition technology for photos, that they managed to develop with their own cloud services without using mining farms. However, the task for Neuromation was substantially different. “First, searching by pictures is a completely different task, and there are specially developed methods for face recognition. Second, Google and Facebook do not need to rent computing power from Amazon Web Services — they have more than enough of their own clusters. But the course of action for a small start-up in this situation is not so obvious,” explains Sergey Nikolenko, Chief Research Officer in Neuromation.

Potentially, miners will gain in average 10–20% more on knowledge mining compared to crypto mining.

Moreover, with tangible benefits for society. “Basically, mining is milling the wind. To generate a “nice” hash, dozens of system operation hours are needed. On the other hand, if we are talking about the search for a drug formula, such a use of capacities from around the world, the application of combined work of computers for the common good would be comparable to the results of research at the Large Hadron Collider,” says Petr Kutyrev, editor of the noosfera.su portal.

The tasks solvable with the mining hardware will be limited due to its customization level. For example, ASIC hardware would be difficult to adapt for scientific tasks, as it is designed exclusively for hashing. The video cards, however, can cope with various scientific tasks.

True, special hardware configuration and sometimes software would be required for such computing. “Video cards for mining can be used for video recognition and rendering, biological experiments. However, for efficient computing, direct access to the computer hardware is essential. If the computation task may be delayed, or the accuracy of machine learning of neural networks is not critical, then, of course, standard tools can be used. Otherwise, you need to develop your own hardware and software infrastructure,” Evgeny Glariantov believes.

Thus, using the farms in science will require some time to set them up and develop special allocation protocols. Yet, considering a more profitable segment, miners will switch to useful computing, and platforms for such tasks may appear in the near future together with the first operating system based on EOS blockchain, believe in the BitMoney Information and Consulting Center. Miners will be able, from time to time, to switch from cryptocoin mining to processing scientific or commercial data, thereby increasing their profitability. Profits from the times of cryptocurrency rush will no longer exist, but business will be more meaningful and stable: unlike volatile crypto currencies, there is always a demand for knowledge.

Source: http://blocktribune.com/making-money-redundant-mining-hardware-opinion/

November 1, 2017