Sergey Nikolenko

Category: Neuromation

What Will The Future Bring? Three Predictions on AI Technology

Here at Neuromation, we are always looking forward to the future. Actually, we are making the future. Thus, we are in a good position to try to predict what is going to happen with the AI industry soon. I know how difficult it is to make predictions, especially, as some Danish parliament member once remarked, about the future. Especially in an industry like ours. Still, here go my three predictions, or, better to say, three trends that I expect to continue for the next few years of AI. I concentrated on the technical side of things — I hope the research will stay as beautifully unpredictable as ever.

Specialized hardware for AI

Image source

One trend which is already obvious and will only gain in strength in the future is the rise of specialized hardware for AI. The deep learning revolution started in earnest when AI researchers realized they could train deep neural networks on graphical processors (GPUs, video cards). The idea was that training deep neural networks is relatively easy to parallelize, and graphic processing is also parallel in nature: you apply shaders to every pixel or every vertex of a model independently. Hence, GPUs have always been specifically designed for parallelization: a modern GPU has several thousand cores compared to 4–8 cores in a CPU (CPU cores are much faster, of course, but still, thousands). In 2009, this turned every gaming-ready PC into an AI powerhouse equal to the supercomputers of old: an off-the-shelf GPU trains a deep neural network 10–30x faster than a high-end CPU; see, e.g., an up-to-date detailed comparison here.

Since then, GPUs have been the most common tool for both research and practice in deep learning. My prediction is that over the next few years, they will be gradually dethroned in favour of chips specifically designed for AI.

The first widely known specialized chips for AI (specifically training neural networks) are proprietary Google TPUs. You can rent them on Google Cloud but they have not been released for sale for the general public, and probably won’t be.

But Google TPU is just the first example. I have already blogged about recent news from Bitmain, one of the leading designers and producers of ASICs for bitcoin mining. They are developing a new ASIC specifically for tensor computing — that is, for training neural networks. I am sure that over the next few years we will see many chips designed specifically that will bring deep learning to new heights.

AI Centralization and Democratization

Image source

The second prediction sounds like an oxymoron: AI research and practice will centralize and decentralize at the same time. Allow me to explain.

In the first part, we talked about training neural networks on GPUs. Since about 2009, deep learning has been living in “the good old days” of computing when you can stay on the bleeding edge of AI research with a couple off-the-shelf GPU for $1000 each in your garage. These days are not past us yet, but it appears that soon they will be.

Modern advances in artificial intelligence are more and more dependent on computational power. Consider, e.g., AlphaZero, a deep reinforcement learning model that has recently trained to play chess, go, and shogi better than the best engines (not just humans! AlphaZero beat Stockfish in chess, AlphaGo in go and Elmo in shogi) completely from scratch, knowing only the rules of the game. This is a huge advance, and it made all the news with the headline “AlphaZero learned to beat Stockfish from scratch in four hours”.

Indeed, four hours were enough for AlphaZero… on a cluster of 5000 Google TPUs for generating self-play games and 64 second-generation TPUs to train the neural networks, as the AlphaZero paper explains. Obviously, you and I wouldn’t be able to replicate this effort in a garage, not without some very serious financing.

This is a common trend. Research in AI is again becoming more and more expensive. It increasingly requires specialized hardware (even if researchers use common GPUs, they need lots of them), large datacenters… all of the stuff associated with the likes of Google and Facebook as they are now, not as they were when they began. So I predict further centralization of large-scale AI research in the hands of cloud-based services.

On the other hand, this kind of centralization also means increased competition on this market. Moreover, the costs of computational power have been recently rather inflated due to super demand on the part of cryptocurrency miners. We tried to buy high-end off-the-shelf GPUs last summer and utterly failed: they were completely sold out for those who were getting into mining ethereum and litecoins. However, this trend is coming to an end too: mining is institutionalizing even faster, returns on mining decrease exponentially, and the computational resources are beginning to free up as it becomes less and less profitable to use them.

We at Neuromation are developing a platform to bring this computational power to AI researchers and practitioners. On our platform, you will be able to rent the endless GPUs that had been mining ETH, getting them cheaper than anywhere else but still making a profit for the miners. This effort will increase competition on the market (currently you go either to Amazon Web Services or Google Cloud, there are very few other solutions) and bring further democratization of various AI technologies.

AI Commoditization

Image source

By the way, speaking of democratization. Machine learning is a very community-driven area of research. It has unprecedented levels of sharing between researchers: it is common practice to accompany research papers with working open source code published on Github, and datasets, unless we are talking about sensitive data like medical records, are often provided for free as well.

For example, modern computer vision based on convolutional neural networks almost invariably uses a huge general purpose dataset called ImageNet; it has more than 14 million images hand-labeled into more than 20 thousand categories. Usually models are first pretrained on ImageNet, which lets them extract low-level features common for all photos of our world, and only then train it further (in machine learning, it is called fine-tuning) on your own data.

You can request access to ImageNet and download it for free, but what is even more important, the models already trained on ImageNet are commonly available for the general public (see, e.g., this repository). This means that you and I don’t have to go through a week or two of pretraining on a terabyte of images, we can jump right into it.

I expect this trend to continue and be taken further by AI researchers in the near future. Very soon, a lot of “basic components” will be publicly available, and an AI researcher will be able to work with and combine directly, without tedious fine-tuning. This will be partially a technical process of making what we (will) have easily accessible, but it will also require some new theoretical insights.

For example, a recent paper from DeepMind presented PathNet, a modular neural architecture able to combine completely different sub-networks and automatically choose and fine-tune a combination of these sub-networks most suitable for a given task. This is still a new direction, but I expect it to pick up.

Again, we at Neuromation plan to be on the cutting edge: in the future, we plan to provide modular components for building modern neural networks on our platform. Democratization and commoditization of AI research is what Neuromation platform is all about.

Sergey Nikolenko
Chief Research Officer, Neuromation

February 5, 2018
Can a Neural Network Read Your Mind?

Image source

Researchers from the ATR Computational Neuroscience Labs at Kyoto and Kyoto University have recently made the news. Their paper, entitled “Deep image reconstruction from human brain activity”,(released on December 30, 2017), basically claims to have developed a machine learning model that can read your mind, with sample reconstructions shown in the picture above. To understand what they mean and whether we should all be thinking only happy thoughts now, we need to start with a brief background.

How to Read the Minds of Deep Neural Networks

A big problem with neural networks has always been their opacity: while we can see the final results, it is very hard to understand what exactly is going on inside a neural network. This is a problem for all architectures, but let us now concentrate on convolutional neural networks (CNNs) used for image processing.

Very roughly speaking, CNNs are multilayer (deep) neural networks where each layer processes the image in small windows, extracting local features. Gradually, layer by layer, local features become global, being able to draw their inputs from a larger and larger portion of the original image. Here is how it works in a very simple CNN (picture taken from this tutorial, which I recommend to read in full):

Image source

In the end, after several (sometimes several hundred) layers we get global features that “look at” the whole original image, and they combine in some ways to get us the class labels (recognize whether it is a dog, cat, or a boat). But how do we understand what these features actually do? Can we interpret them?

One idea is to simply look for the images that activate specific neurons with the hope that they will have something in common. This idea was developed, among other works, in the famous paper “Visualizing and Understanding Convolutional Networks” by Zeiler and Fergus (2013). The following picture shows windows from actual images (on the right) that provide the largest possible activations for four different high-level neurons together with their pixels that contribute to these activations; you can see that the procedure of fitting images to features does produce readily interpretable pictures:

Image source

But then researchers developed another simple but very interesting idea that works well to understand the features. The whole training process in machine learning is designed to fit the features of a network into a training dataset of images. The images are fixed, and the network weights (parameters of these convolutional layers) are changing. But we can also do it the other way around: fix the network and change the image to fit what we need!

For interpretation, this idea was developed, e.g., in the work “Understanding Neural Networks Through Deep Visualization” (2015) by Jason Yosinski et al. The results look a lot like the famous “deep dreams”, and this is no coincidence; for example, here are the images designed to activate certain classes the most:

Image source

Somewhat recognizable but pretty strange, right? We will see similar effects in the “mind-reading” pictures below.

The same idea also leads to the field of adversarial examples for CNNs: now that we have learned to fit images to networks rather than the other way around, what if we fit the images to fool the network? This is how you get examples like this:

Image source

On the left, this picture shows a “benign” image, labeled and recognized as bottlecap. On the right, an adversarial image: you can’t see any noticeable difference but the same network that achieved great results in general and correctly recognized the original as a bottlecap now confidently says that it is a… toy poodle. The difference is that on the right, an adversary has added small carefully crafted perturbations that are small and look just like white noise but that are all designed to push the network in the direction of the toy poodle class.

These examples, by the way, show that modern convolutional neural networks still have a long way to go before solving computer vision once and for all: although we all know some optical illusions that work for humans, there are no adversarial examples that would make us recognize a two-dimensional picture of a bottlecap as a toy poodle. But the power to fit images to features in the network can also be used for good. Let us see how.

And Now to Mind Reading for Humans

So what did the Japanese researchers (they are the mind readers we began with) do, exactly? First, they took the fMRI of the brain of a human looking at something and recorded the features. Functional magnetic resonance imaging (fMRI) is a medical imaging technique that is taking snapshots of the brain activity based on the blood flow: when neurons in an area of the brain are active, more blood flows there, and we can measure it; fMRI is called functional because it can capture changes in blood flow in time, resulting in videos of brain activity. To get more information you can watch this explanatory video or see a sample dataset for yourself:

Since we are measuring blood vessels and not neurons, the spatial resolution of fMRI is not perfect: we can’t go down to the level of individual neurons but we can distinguish rather small brain areas, with voxel size about 1mm in each direction. It has been known for a long time that the fMRI picture contains reliable general information regarding what the person is thinking about in the scanner: emotions, basic drives, processing different inputs such as speech, music, or video etc… but the work of Shen et al. takes this to a whole new level.

Shen et al.(2017) tried to reconstruct the exact images that people in fMRI scanners were looking at. To do that, they trained a deep neural network on fMRI activity results and then tried to match the features of a new image with the features of fMRI activations. That is, they are basically doing the same thing that we discussed above: finding an input image that matches given features as well as possible. The only difference is that the features now come not from a real image processed by the CNN but from an fMRI processed by a different network (also convolutional, of course). You can see how the network gradually fits the image to a given fMRI pattern:

The authors improved their reconstruction results drastically when they added another neural network, the deep generator network (DGN), whose work is to ensure that the image looks “natural” (in technical terms, introducing a prior on the images that favors “natural” ones). This is also an important idea in machine learning: we often can make sense of something only because we know what to expect, and artificial models are no different: they need some prior knowledge, “intuition” about what they can and cannot get, to improve their outputs.

In total, here is what the architecture looks like. The optimization problem is to find an image which best fits both the deep generator network that makes sure it is “natural” and the deep neural network that makes sure it matches fMRI features:

Image source

If these results can be replicated and further improved, this is definitely a breakthrough in neuroscience. One can even dream about paralyzed people communicating through fMRIs by concentrating on what they want to say; although in (Shen et al., 2017) reconstruction results are much worse when people are imagining a simple shape rather than directly looking at it, sometimes even with imagined shapes it does seem that there is something there:

Image source

So can a neural network read your mind now? Not really. You have to lie down in a big scary fMRI scanner and concentrate hard on a single still image. And even in the best cases, it still often comes out like this, kind of similar but not really recognizable:

Image source

Still, this is a big step forward. Maybe one day.

Sergey Nikolenko
Chief Research Officer, Neuromation

January 30, 2018
AI in Biology and Medicine

Today I present to you three research directions that apply the latest achievements in artificial intelligence (mostly deep neural networks) to biomedical applications. Perhaps, this is the research that will not only change but also significantly extend our lives. I am grateful to my old friend, co-author, and graduate student Arthur Kadurin who has suggested some of these projects.

Translated from Russian by Andrey V. Polyakov. Original article here.

Polar, Beyersdorf AG, and Others: Smart Clothes

We begin with a series of projects that are unlikely to turn the world over but will certainly produce, pardon the pun, cosmetic changes in everyday life already in the nearest future. These projects deal with AI applications for the so-called “Internet of Things” (IoT), specifically applications that are very “close to the body”.

Various types of fitness trackers, special bracelets that collect information about heartbeat, steps, and so forth, have long ago entered our life. The main trend at the sportswear companies now is to build different sensors directly into the clothes. That way, you can collect more information and measure it more precisely. Sensors suitable for “smart clothes” were invented in 2016, and already in 2017 Polar has presented Polar Team Pro Shirt, a shirt that collects lots of information during exercises. The plot will no doubt thicken even further when sports medicine supported by artificial intelligence will learn to use all this information properly; I expect a revolution in sports that Moneyball could never dream of.

And it is already beginning. Recently, on November 24–26, the second SkinHack hackathon dedicated to applying machine learning models to analyzing data coming from such sportswear took place in Moscow. The first SkinHack held last year was dedicated to “smart cosmetics”; the participants tried to predict the age of a person by the skin structure on photographs looking for wrinkles. Both smart cosmetics and smart clothing are areas of active interest for Beiersdorf AG (commonly known as the producer of the Nivea brand), so one can hope that the commercial launch of these technologies will be not long in coming. In Russia, SkinHack was supported by Youth Laboratories, a company affiliated with the central characters of our next part…

Insilico: Automatic Discovery of New Drugs

Insilico Medicine is a company well known in the biomedical world. Its primary mission is to fight aging, and I personally wish Insilico success in this effort: one does not look forward to growing old. However, in this article I would like to emphasize another, albeit related, project of the company: drug discovery based on artificial intelligence models.

A medicinal drug is a chemical compound that can link with other substances in our body (usually proteins) and have the desired effect on them, e.g., suppress a protein or start producing another in larger quantities. To find a new drug, you need to choose from a huge number of possible chemical compounds exactly the ones that will have the desired effect.

It is clear that at this point it is impossible to fully automate the search for new drugs: clinical trials are needed, and one usually starts testing on mice, then on humans… in general, the process of bringing a new medicinal drug to the market usually takes years. However, one can try to help doctors by reducing the search space. Insilico develops machine learning models that try not only to predict the properties of a molecule but also to generate candidate molecules with desired properties, thereby helping to choose the most promising candidates for further laboratory and clinical studies.

This search space reduction is done with a very interesting class of deep learning models: generative adversarial networks (GAN). Such networks combine two components: a generator trying to generate new objects — for example, new molecules with desired properties — and a discriminator trying to distinguish generated results from real data points. Learning to deceive the discriminator, the generator begins to generate objects indistinguishable from the real ones… that is, hopefully, actually real ones in this case. The last Insilico model, called druGAN (drug + GAN), attempts to generate, among others, molecules useful for oncological needs.

MonBaby: Keeping Track of the Baby

Finally, I would like to end with a project that Neuromation plans to participate in. Small children, especially babies, cannot always call for help themselves and require special care and attention. This attention is sometimes required even in situations where mom and dad seem to be able to relax: for example, a sleeping baby may hurt a leg by turning to an uncomfortale pose . And then there is the notorious SIDS (sudden baby death syndrome), whose risk has been linked with the pose of a sleeping infant: did you know that the risk of SIDS increases several times if a baby sleeps on the stomach?

The MonBaby smart infant tracking system is a small “button” that snaps onto clothing and monitors the baby’s breathing and turning around while asleep. Currently, the system is based on machine learning for time series analysis: data from baby movements is used to recognize breathing cycles and sleeping body position (on the stomach or on the back).

We plan to complement this system with smart cameras able to track the infant’s movements and everything that happens to him or her by visual surveillance. The strong suits of our company will come in handy here: computer vision systems based on deep convolutional networks and synthetic data for their training. The fact is that in this case it is practically impossible to collect a sufficiently large real data set for training the system: it would take not only real video recordings of tens of thousands of babies, but video recordings with all possible critical situations. Thankfully, modern ethics, both medical and human, would never allow us to generate such datasets in real life. Therefore, we plan to create “virtual babies”, 3D models that will allow us to simulate the necessary critical situations and generate synthetic videos for training.

We have briefly examined three directions in different branches of biomedicine — sports medicine and cosmetics, creating medicines and baby care — each of which is actively using the latest achievements of artificial intelligence. Of course, these are just examples: AI is now used in hundreds of diverse biomedical projects (which we may touch upon in later articles). Hopefully, however, with these illustrations I have managed to show how AI research is working on helping people live longer, better, and healthier.

Sergey Nikolenko,
Chief Research Officer, Neuromation

January 18, 2018
Neuromation Story: From Synthetic Data to Knowledge Mining
My name is Sergey Nikolenko, and I am writing this as Neuromation’s Chief Research Officer. Our company is based on two main ideas, and there is an interesting story of how they followed from one another. In my opinion, this story reflects upon the two main problems to be solved in any applied machine learning project today. It is this story that I will tell you today.

First Problem: Labeled Data

Neuromation began with working on computer vision models and algorithms based on deep neural networks (deep learning). The first big project for Neuromation was in the field of retail: recognize the goods on supermarket shelves. Modern object detection models are quite capable to analyze shelf availability, find free space on the shelves, and even track human interaction. This is an important task both for supermarkets and for the suppliers themselves: the big brands pay good money to ensure that their goods are present on the shelf, occupy some agreed upon part of the shelf, have the right side of the label facing the customers — all of these little things increase sales by dozens of percent. Today, a huge staff of merchandisers are going from supermarket to supermarket, ensuring that everything is right on the shelves; of course, not all of their duties are “monkey jobs” like this, but it is a big part of the day for many real human beings.

Our idea for retail is to install (cheap off-the-shelf) cameras that can capture and transmit to a server, for example, one frame per minute for recognition. This is a very low frequency for an automatic system, causing no overload on either the network or the recognition model, but it is a frequency completely unattainable with manual checks, and it solves all practical problems in retail. Moreover, an automated surveillance system will save a lot of effort, automate meaningless manual labor — a worthwhile goal in itself.

A specialist in artificial intelligence, especially modern deep neural networks for computer vision, might think that this problem is basically solved already. Indeed, modern deep neural networks, trained on large sets of labeled data, can do object detection, and in this case, the objects are relatively simple: cans, bottles,packages with bright labels. Of course, there are a lot of technical issues (for example, it is not easy to cope with hundreds of products on one photo — usually such models are trained to detect fairly large objects, only a few per image), but with a sufficiently large labeled data set, i.e. photos with all goods labeled in the layout, we could successfully overcome such issues.

But where would such labeled dataset come from? Imagine that you have a million photos of supermarket shelves (where to get it, by the way, is also a hard question), and you need to manually draw such rectangles as on the image above, on each one of a million photos. Looks like a completely hopeless task. So far, manual labeling of large sets of images has been usually done with crowdsourcing services such as Amazon Mechanical Turk. Manual work on such services is inexpensive, but it still does not scale well. We have calculated that to label a dataset sufficient for recognizing all 170,000 items from the Russian retail inventory (a million photos, by the way, would not be enough for this) we would need years of labor and tens of millions of dollars.

Thus, we faced the first major challenge, the main “bottleneck” of modern artificial intelligence: where do you get labeled data?

Synthetic Data and the Second Challenge: Computing Power

This problem led to Neuromation’s first major idea: we decided to try to train deep neural networks for computer vision on synthetic data. In the retail project, this means that we create 3D models of goods and virtually “place them on the shelves”, getting perfectly labeled data for recognition.

Synthetic data have two main benefits:
- first, it requires far less manual work; yes, you need to design a 3D model, but this is a one-time investment which then converts into an unlimited number of labeled “photos”; in case of retail the situation is even better since there are not so many different form factors of packaging, and you can reuse some 3D models by simply “attaching” different labels (textures) to them;
- second, the resulting data is perfectly labeled, as we are in full control of the 3D scene; moreover, we can produce labeling which we would not be able to produce by hand: we know the exact distance from the camera to every object, the angles each bottle and each carton of juice are turned by, and so on.
Of course, this approach is not perfect either. Now you have to train networks on one type of data (renderings) and then apply them to a different one (real photos). In machine learning, this is called transfer learning; it is a hard problem in general, but in this case we have been able to solve it successfully. Moreover, we have learned to produce very good photorealistic renderings — our retail partners even intend to use them in media materials and catalogs.

The synthetic data approach has proved to be very successful, and now the models trained by Neuromation are already being implemented in retail. However, this led to the need to process huge datasets of synthetic images. First, they have to be generated (i.e., one has to render a 3D scene), and then used to train deep neural networks. Generating one photorealistic synthetic image — like the one shown above — usually takes a minute or two on a modern GPU, depending on the number of objects and the GPU model. And you need a lot of these images: millions if not tens of millions.

And this is only the first step — then we have to train modern deep neural networks on these images. In AI research, it is not enough to train a model once: you have to try many different architectures, train dozens of different models, conduct hundreds of experiments. This, again, requires cutting edge GPUs, and training deep networks requires even more computational time than data generation.

Thus, we at Neuromation have faced the second major challenge of modern artificial intelligence: where do we get computing power?

Neuromation Knowledge Mining Platform

Our first idea was, of course, to simply purchase a sufficient number of GPUs. However, it was the summer of 2017, the midst of the cryptocurrency mining boom. It turned out that graphic cards with the latest NVIDIA chips are not just expensive, but they are virtually unavailable at all. After we had tried to “mine” for some GPUs through our contacts in the US and realized that this way they would arrive only in a month or more, we switched to plan B.

Plan B involved using cloud services that rent out complete and set-up machines (often virtual ones). A cloud especially popular with AI practitioners is Amazon Web Services. AWS has become a de-facto industry standard, and many new AI startups are renting computing power there for their development tasks. However, cloud-based services do not come cheap: renting a machine with several GPUs for training neural networks costs a few dollars per hour, and you need a lot of these hours.

We at Neuromation have spent thousands of dollars renting computational power on Amazon — only to understand that we do not have to use them. The prices of cloud-based services are acceptable for the buyers, only in the absence of other alternatives.

And when we started thinking about potential alternatives, we recalled the reason we could not buy enough high-end GPUs. This led to the second main idea of Neuromation: repurposing GPU-based mining rigs for useful computing. ASIC chips that have been designed specifically for Bitcoin mining are not suitable for any other computing tasks, but GPUs that are used to mine Ethereum (ETH) and other “lightcoins”, are the exact same GPUs we need to use to train neural networks. Moreover, cryptocurrency mining generates an order of magnitude less income than the clouds charge for renting an equivalent GPU farm for the same period.

We realized that there is a very powerful business opportunity — a huge gap between prices — and also simply an opportunity to make the world better, redirecting the vast resources that are currently searching for collisions in hash functions to more useful calculations.

This is how the idea of the Neuromation platform was born: an universal marketplace for knowledge mining that would connect miners who want to earn more on their equipment and customers and AI startups, researchers, and basically any companies that need to process large datasets or train modern machine learning models.

Now we are already working with several mining farms, using their GPUs for useful computing. This is 5 to 10 times cheaper than renting server capacity from cloud-based services, and even at that price it is still much more profitable for miners. With their GPU-based rigs, miners can earn 3 to 5 times more by “knowledge mining” than they would get from the same setup by cryptocurrency mining. Taking into account that the complexity of calculations for cryptocurrency mining is growing with each new coin, the benefits of “knowledge mining” will only increase with time.

Conclusion

Right now we are presenting the idea of this universal platform for useful computing to the global market. The use of mining rigs for useful computing benefits both parties: miners will earn more, and numerous artificial intelligence researchers and entrepreneurs will receive a considerably (several times) cheaper and convenient way to implement their ideas. We believe that such “AI democratization” will lead to new breakthroughs and, ultimately, fuel the current revolution in artificial intelligence. Join us, and welcome to the revolution!

Sergey Nikolenko
Chief Research Officer, Neuromation
January 16, 2018
Make Man In Our Image: Through the Black Mirror
A Recurring Theme

Warning: major spoilers for the fourth series of Black Mirror ahead. If you haven’t watched it, please stop reading this, go watch the series, then return. I’ll be waiting, I’m an imaginary being who has nothing better to do anyway…

…which is kind of the point.

I watched the fourth Black Mirror series over the holidays. As I watched one episode after another, it struck me that they all seem to be about the exact same thing. This is an overstatement, of course, but three out of six ain’t bad either:
- in “USS Callister”, the antagonist creates virtual copies of living people and makes them the actors in his simulated universe, torturing them to submit if necessary;
- in “Hang the DJ”, virtual copies of living people live through thousands of simulations to gather data for the matchmaking service on a dating app;
- in “Black Museum”, the central showpiece of the museum is a virtual clone of an ex-convict who is put through electrocution over and over, with more clones in constant pain created every time.
Let’s add the “San Junipero” episode and especially the “White Christmas” special from earlier Black Mirror series to this list for good measure.

See the recurring theme? It appears that the Black Mirror creators have put their minds to one of the central problems of modern ethical philosophy: what do we do when we are able to create consciousnesses, probably in the form of virtual copies inside some sort of simulation? Will these virtual beings be people, ethically speaking? Can we do as we please with them?

Judging by the mood of the episodes, Black Mirror is firmly in the camp of those who believe that upon creating a virtual mind, there arises moral responsibilty, and “virtual people” do give rise to ethical imperatives. It does seem to be the obvious choice… doesn’t it?

The Hard Problem: Virtual What?

As soon as we try to consider the issue in slightly more detail, we run into insurmountable problems. The first problem is that with our current state of knowledge, it is extremely hard to define what consciousness is.

The problems of consciousness and first-person experience are still firmly in the realm of philosophy rather than science. Here I mean natural philosophy, a scientific way of reasoning about things that cannot be a subject of the scientific method yet. Ancient Greeks did natural philosophy, pondering the origins of all things and even arriving at the idea of elementary particles. However, as amazing as that insight was, the Greeks could not study elementary particles as modern physicists do, even if they did have the scientific method as we know it. They lacked the tools and even the proper set of notions to reason about these things. In the problem of consciousness and first-person experience, we are still very much at the level of ancient Greeks: nobody knows what it is and nobody has any idea how to get any closer to this knowledge.

Take the works of David Chalmers, a prominent philosopher in the field. He distinguishes between “easy” problems of consciousness, which could be studied scientifically even right now, and “the hard problem” (see, e.g., his seminal paper, “Facing Up to the Problem of Consciousness“). The hard problem is deceptively easy to formulate: what the hell is first-person experience? What is it that “I” am? How does this experience of “myself” arise from the firings of billions of neurons?

At first glance, this looks like a well-defined problem: first-person experience is, frankly, the only thing we can be sure of. The Cartesian doubt argument, exactly as presented by Descartes, is surprisingly relevant to sentient people simulated inside a virtual environment. The guy running the simulation is basically the evil demon of Descartes. If you entertain the possibility that you may be stuck in a simulation, the only thing you cannot doubt is your subjective first person experience.

On the other hand, first-person experience is also competely hidden from everyone else except yourself. Chalmers introduces the notion of a philosophical zombie: (imaginary) beings that look and behave exactly like humans but do not have any first-person experience. They are merely automata, “Chinese rooms“, so to speak, that produce responses matching those of a human being. Their presumed existence does not appear to lead to any logical contradiction. I wouldn’t know but I guess that’s how true psychopaths view others: as mechanical objects of manipulation devoid of subjective suffering.

I will not go into the philosophical details. But what we have already seen should suffice to plant the seed of doubt about virtual copies: why are we sure they have the same kind of first-person experience we do? If they are merely philosophical zombies and do not suffer subjectively, it appears perfectly ethical to do any kind of experiments on them. For that matter, why do you think I am not a zombie? Even if I was, I’d write the exact same words. And a virtual copy of me would be even less similar to you, it would run on completely different hardware — so how do we know it’s not a zombie?

Oh, and one more question for you: were you born this morning? Why not? You don’t have a continuous thread of consciousness connecting you to yesterday (assuming you went to sleep). Sure, you have the memories, but a virtual clone would have the exact same memories. How can you be sure?

Easier Problems: Emulations, Ethics, and Economics

We cannot hope to solve the hard problem of consciousness right now. We cannot even be sure it’s a meaningful problem. However, the existence of virtual “people” also raises more immediate questions.

The Age of Em, a recent book by an economist and futurist Robin Hanson, attempts to answer this question from the standpoint of economics. What is going to happen to the world economy if we discover a way to run emulated copies of people (exactly the setting of the “White Christmas” episode of Black Mirror)? What if we could copy Albert Einstein, Richard Feynman and Geoffrey Hinton a million times over?

Hanson pictures a future that appears to be rather bleak for the emulated people, or “ems”, as he calls them. Since copying costs of virtual people are negligible compared to raising a human being in flesh, competition between the ems will be fierce. They will become near-perfect economic entities — and as such, will be forced to always live at near-subsistence levels, all possible surplus captured immediately by other competing ems. But Hanson argues that the ems might not mind that: their psychology will adapt to their environment, as human psychology has done for millenia.

The real humans will be able to live a life of leisure off this booming market of ems… for a while. After all, there is no reason not to speed ems up as much as computational power allows, so their subjective time might run thousands of times faster compared to our human time (“White Christmas” again, I know), and their society might develop extremely quickly, with unpredictable consequences.

Hanson also tackles the “harder” problems of consciousness from a different angle. Suppose you had a way to easily copy your own mind. This opens up surprising possibilities: what if, instead of going to work tomorrow, you make a copy of yourself, make it do your work, and then terminate the copy, freeing the day for yourself. If you were an em you would be able to actually do it — but wouldn’t you be committing murder at the end of the day? This ties into what has been known for quite some time as the “teleportation problem”: if you are teleported atom by atom to a different place, Star Trek style, is it really you or have the real “you” been killed in the process, and the teleported is a completely new person with the same memories?

By the way, you don’t need to have full-scale brain emulations to have similar ethical problems. What if tomorrow a neural network passes the Turing test and in the process of doing so begs you not to switch it off, appearing genuinely terrified of dying? Is it OK to switch it off anyway?

Questions abound. Interesting questions, open questions, questions that we are not even sure how to formulate properly. I wanted to share the questions with you because I believe they are genuinely interesting, but I want to end with a word of caution. We have been talking about “virtual people”, “emulated minds”, and “neural networks passing the Turing test”. But so far, all of this is just like Black Mirror — just fiction, and not very plausible fiction at that. Despite the ever-growing avalanche of hype around artificial intelligence, there is no good reason to expect virtual minds and the singularity around the corner. But this is a topic for another day.

Sergey Nikolenko
Chief Research Officer, Neuromation
January 15, 2018
Who Will Be Replaced by Robots II, or Anatomy of Fear of Robotization
[translated from the Russian version by Andrey V. Polyakov]

Recently, a new round of conversations about self-moving carriages and other potential achievements of artificial intelligence has again posed one of the classic questions of humanity: who will be marginalized by the monorail track of the next technological revolution? Some studies argue that artificial intelligence in the near future would lead to a surge of unemployment comparable to the Great Depression. In the first part of the series, we have started from the very beginning, the Luddites, and have reviewed the influence of automation onto several occupations, which have, as result, drastically changed or even disappeared. Today we will first discuss a more creative part of the spectrum, and then we will see where does the current wave of automation fear comes from.

Creative occupations

All teaching machines would be plugged into this planetary library and each could then have at its disposal any book, periodical, document, recording, or video cassette encoded there. If the machine has it, the student would have it too, either placed directly on a viewing screen, or reproduced in print-on-paper for more leisurely study.

Isaac Asimov. The New Teachers (1976)

From the intro to Westworld series

In the first part we have seen how the global automation has happened in the past, starting with the Luddites fighting against the first industrialization and finishing with the extinct occupations like elevator operator and computist. We have seen that so far the automation of various occupations was not total, but partial, resulting in the increase, rather than reduction, of the demand for respective professionals.

Usually, the partial automation transforms an occupation towards a more “human”, creative content. But what about the occupations that are creative per se? Does automation shrink the creative markets today?

Let us review, for example, the musician occupation. In the 19th century, with no sound recording technologies, the demand for musicians was stable: whenever you wanted to listen to good music, a live musician, or a small orchestra for major pieces, was required. A decent musician always had job, not just performers but composers too: Johann Sebastian Bach, as per contract with the Thomaskirche (St. Thomas Church) in Leipzig, was supposed the write a new cantata every week, literally any given Sunday. A church in nearby Magdeburg employed another composer, not necessarily on the Bach’s level.

However, in the early 20th century people started to listen to gramophone music, and, with the advent of sound movies, even the pianists were outdated. Today, in Leipzig or Magdeburg, they still play Bach, and anybody can listen to Goldberg Variations absolutely free, brilliantly performed and on instruments, Bach could not even dream of (the modern grand piano is far beyond the claviers of that time; though finding records of an “original” instrument is not difficult at all too).

Does this mean that the demand for the musician’s profession has disappeared, and today some dozen composers and orchestras around the world provide for all the needs of new music, the rest of the work being done by mass replication (which, according to the same logic, should also be concentrated in the hands of several major labels)? Not at all: there are even more musicians, composers, and performers! Modern development of technologies (not only in sound recording, but also in communications) allowed that the most diverse musicians find their audience, all flowers blossomed, and today more people can earn their living by music than in the past. The top of this pyramid earns far more than Prince Leopold von Anhalt-Köthen, who was once a patron of the same Johann Sebastian, but also the “middle class” is very far from poverty. Furthermore, even the projections of employment are quite favorable. This applies to the performers too: the sound recording did not kill the “excessive” musicians, but, on the contrary, allowed more people to learn about them, raising the demand for live concerts.

The same applies to the other creative occupations. It even surprises me a bit: we can download almost any book for free, there are certainly enough good books for a whole life of reader of any taste, but hundreds of thousands of authors all over the world write and successfully sell their works, not only in new genres (one can believe that a fan of slash-fanfics can hardly be satisfied by Dostoevsky — although…), but also in quite traditional ones.

But enough of history. Let us now talk about whom robots will actually replace in the nearest future, and why this should not be so feared.

Threat or benefit?

How far has gone “progrès”? Laboring is in recess,
And mechanical, for sure, will the mental one replace.
Troubles are forgotten, no need for quest,
Toiling are the robots, humans take a rest.

Yuri Entin. From the movie Adventures of Electronic

A face of long-haul driver (New England Journal of Medicine)

Elevator operators, computists and weavers perfectly symbolize those activities that have been automated so far: technical work, where the output result is expected to maximally match certain parameters, and creativity is not only difficult, but forbidden and, in fact, harmful. Of course, an elevator operator could smile to his passengers, or the computer girl could, having scorned potential damage to her reputation, get acquainted with Richard Feynman himself. But their function was to accurately perform clear, algorithmically defined actions.

I will allow myself a little emotion: these are exactly the kinds of activities that must be automated further! There is nothing human in following a fixed algorithm. Monotonous work with a predetermined input and output is always an extreme measure, the forced oppression of the human spirit in order to achieve certain practical goal. Moreover, if the goal can be achieved without wasting real live human time, that is the way to proceed. As, for example, has happened to weavers and bank employees, when the first stopped performing the machine functions, and the latter — the ones of ATM.

Therefore, I believe that the ongoing narrative about how terrible it would be when a huge army of truckers is replaced by driverless vehicles is not just groundless, but counterproductive. Moving a vehicle from point A to point B is exactly the most typical example of a monotonous strictly algorithmic task, which shall not, if possible, be performed by a human being. There will be no huge social problem either: people who like to fiddle with cars and similar equipment will for sure find jobs in service and maintenance. Similarly, there was no social revolution, when grooms and coachmen lost their relevance — they just switched to other occupations. And percentagewise there was no less of them than truckers.

By the way, I cannot help recalling here an important maxim of show business: “The Simpsons did it first”. The best animated series ever has depicted the influence of unmanned vehicles on truckers… back in 1999, in the Maximum Homerdrive episode (video clip, description in Russian).

Curiously, by the way, the most popular Russian example of a “mass low-skilled occupation”, watchmen and security guards, is not directly exposed to any threat, because their function is not so much monotonously checking documents (it has been ubiquitously automated long time ago), rather than communicating with people whose documents are not in order. Moreover, this is yet long way from being automated.

However, at first glance, it seems that now it comes to the fact that computers are beginning to gradually replace people in areas that were previously considered human prerogative. For example, since 2014, Facebook has been able to recognize faces as well as humans do, and computer vision technologies continue improving; they were partly behind the emergence of driverless vehicles. A lot of modern publications predict social collapse and 50% and even higher unemployment.

Is that correct? Where did this sudden surge of interest to automation come from, while, after all, artificial intelligence has been progressing for a very long time? Let us figure it out.

On the nature of fears

There is no indispensable man.
Woodrow Wison; and Stalin, apparently, never said it.

The Simpsons, Episode 503, “Them, Robot”

Many popular articles on the horrors of automation actually go back to the same publication: in 2013, the researchers from Oxford Carl Benedikt Frey and Michael A. Osborne have published a paper titled The Future of Employment: How susceptible are jobs to computerisation? In that paper they come to an ambitious conclusion that about 47% of total us employment is at risk. They have later conducted a similar study based on data from Great Britain, and the resulting numbers were no less frightening. However, let us try to figure out in detail where do this numbers come from.

As a practicing scientist in the field of machine learning and data analysis, I cannot bypass the actual methodology of Frey and Osborne. It was the following:
- Frey and Osborne have gathered a group of machine learning researchers and labelled 70 assorted occupation by answering, for each one, a binary question: is the occupation automatable in the near future or not; such labelling was made based on a special classifier describing occupations and their related tasks;
- Then they have identified nine variables that describe each occupation based on the required dexterity, creativity and social interaction;
- They have built created several classifiers and trained them to predict the “automatability” labelled on the first stage based on those nine variables; the best one proved to be the Gaussian process classifier;
- And finally, they used the classification for all 702 occupations, thus obtaining their alarming results.
I have nothing against Gaussian processes — it is a very intelligent classification method for such case, when the sample is very small. However, it is important to understand that the data for this classifier were labelled by humans, and it represented their subjective assumption as to automatability of a particular occupation.

I am far from applying to the research by Frey and Osborne of the main principle of data analysis: “garbage in — garbage out”. The popular articles, although, which often state simply “Frey and Osborne… studied 702 occupations, using a Gaussian process”, and then refer to the conclusions as a scientific result, are also obviously cunning. Even if the classifier were ideal (which is hardly the case — too little and too rough criteria), it would not answer the question, weather a given occupation is automatable, but rather a far less impressive one: “Do the people who have labelled the initial 70 occupations from the Frey and Osborne sample data consider that occupation automatable too?”

Although Frey and Osborne write that “the fact that we label only 70 of the full 702 occupations… further reduces the risk of subjective bias affecting our analysis”, the reality is that no data analysis can add here any “scientific objectivity”. It is still an arbitrary assumption of the researchers, just instead of judgmental evaluation of occupations they have produced a judgmental evaluation of the particular properties (attributes) and have automatically extended their assumption from 70 to all 702 occupations — there is a certain irony in such automation, is not it? They could, by the way, have labelled them all manually, seven hundred are not seven hundred thousand, after all…

There were also other studies, but essentially, in such a futurologic issue, there is no method, other than expert survey. Moreover, the experts, unfortunately, provide extremely, if not excessively optimistic forecasts too. And this happens despite the fact that in artificial intelligence great promises have been heard since the first years of its existence as a science, but very rarely come true. When in the late 1950s Frank Rosenblatt created the first perceptron, the simplest model of machine learning, the New York Times (not any tabloid!) wrote the following: “Perceptrons will be able to recognize people… and instantly translate speech in one language to speech or writing in another language.” As you can see, the recognition succeeded only in more than half century, and the “instant translation” did not work out so far.

Such exaggerated expectations have already caused two “winters of artificial intelligence”: first in the late sixties, it became clear that the “instant translation” would not be reached very soon, and then the second wave of the hype ended similarly in the late 1980s. And now we are living through the third wave of the artificial intelligence hype. And I am afraid that if the frightening forecasts continue to increase and the wave of hype turns into a tsunami, we, the researchers in the field of machine learning, will again have to recall the House Stark family motto…

The current wave, of course, did not come out of nowhere: the achievements of modern models of machine learning are really stunning, and they still continue further. Is, however, the modern machine learning really ready to completely replace people in the mass occupations? This will be discussed in detail in the next part of our series.

Sergey Nikolenko
Chief Research Officer, Neuromation
December 23, 2017
AI Risk: Should We Be Worried?

Recently, discussions about the risk of “strong AI” have finally reached mainstream media. For a very long time, futurists and AI philosophers have been worried about superhuman artificial intelligence and how we could possibly make it safe for humanity to deal with a smarter “opponent”. But now, their positions have finally been heard by trendsetters among both researchers and industry giants: Bill Gates, Stephen Hawking, and Elon Musk have all recently warned against AI dangers. But should we be worried? Let us try to find out…

What is “strong AI”, anyway?

When people talk of “strong AI”, they usually define it rather vaguely, as “human-level AI” or “superhuman AI”. But this is not really a definition we can use, it merely begs the question of what “human level” is and how you define it. So what is “strong AI”? Can we at least see the goal before we try to achieve it?

The history of AI has already seen quite a few examples of “moving the goalposts”. For example — for quite a while the go-to example of a task that certainly requires “true intelligence” has been chess playing. René Descartes famously argued that no machine could be intelligent, an argument that actually led him to mind-body dualism. He posited that the “diversity” in a machine is limited by the “diversity” supplied by its designer, which early dualists had taken to imply that a chess playing machine could never outplay its designer.

Yet Deep Blue beat Kasparov in 1997, and humans are absolutely no match for modern chess engines. Perhaps even more significantly, recently AlphaZero, a reinforcement learning system based on deep neural networks, has taught itself to play chess by self-play, starting from scratch, with no additional information except the rules of the game; in a few hours AlphaZero exceeded the level of very best humans, and in a few days beat Stockfish, one of the best specialized chess engines in the world.

How do we, humans, respond to this? We say that early dualists were wrong and brush chess engines off: of course chess is a problem well suited for computers, it’s so discrete and well-defined! A chess engine is not “true AI” because we clearly understand how chess engines work and know that they are not capable of “general intelligence”, whatever that means.

What about computer vision, like recognizing other humans? That would require human level intelligence, wouldn’t it? Yet in 2014, Facebook claimed that it achieved human-level performance in face recognition, and this performance has only improved further since then. Our human response to this was to say that, of course, face recognition is not “true AI”, and we fall back on asking computers to pass the Turing test.

Alan Turing, by the way, was one of the first thinkers to boldly hypothesize that a machine would be able to play chess well. His test of general intelligence is based on understanding human language, arguably a much better candidate for a true test of general intelligence than chess or even face recognition. We are still far from creating a machine that would understand language and generate passable conversation. Yet I have a strong feeling that when a computer program does pass the Turing test, it will not be a program with general human-level intelligence, and all of us will quickly agree that the Turing test falls short of the goal and should not be used as a test for general intelligence.

To me this progression means that “human-level intelligence” is still a poorly defined concept. But for every specific task we seem to usually be able to achieve human level and often exceed it. The exception right now is natural language processing (including, yes, the Turing test): it seems to rely too intimately on a shared knowledge and understanding of the world around us, which computers cannot easily learn… yet.

Can we make strong AI, theoretically speaking?

Emphatically yes! Despite this difficulty with definitions, there are already billions of living proofs that human-level intelligence is possible regardless of how you define it. The proof is in all of us: if we can think with our physical brains, it means that our abilities can be at least replicated in a different physical system. You would have to be a mind-body dualist like Descartes to disagree with this. Moreover, our brains are very efficient, requiring about 20W to run, like a light bulb, so there is no physical constraint against achieving “true intelligence”.

Even better (or worse, depending on your outlook), we know of no principled reason why we humans cannot be much smarter than we are now. We could try to grow ourselves a larger cerebral cortex if not for two reasons: first, larger brains need a lot of energy that early humans simply would not be able to provide, and second, giving birth to babies with even larger heads would likely be too dangerous to be sustainable. Neither of these reasons applies to AI. So yes, I do believe that it is possible to achieve human-level intelligence and surpass it for AI, even though right now we are not certain what it means exactly.

On the other hand, I do not see how achieving human-level intelligence will make us “obsolete”. Machines with superhuman strength, agility, speed, or chess playing ability have not made us obsolete; they serve us and improve our lives, in a world that remains human-centric. A computer having superhuman intelligence does not immediately imply that it will have its own agenda, its own drives and desires that might contradict human intentions, in the same way as a bulldozer or a tank does not suddenly decide to go and kill humans even though it physically could. For example, modern reinforcement learning engines can learn to play computer games by looking at the screen… except for one thing: you have to explicitly tell the model what the score is, otherwise it won’t know what to optimize and what to strive for. And how do we avoid accidentally making a superhuman AI with an unconstrained goal to wipe out humanity… well, this is exactly what AI safety is all about.

Can we make it safe? And when will it hit us?

Elon Musk recently claimed that we only have a “five to 10 percent chance of success” in making AI safe. I do not know enough to argue with this estimate, but I would certainly argue that Elon Musk also cannot know enough to make estimates like this.

First, there is an easy and guaranteed way to make AI safe: we should simply stop all AI research and be satisfied with what we have right now. I will bet any money that modern neural networks will not suddenly wake up and decide to overthrow their human overlords — not without some very significant advances that so far can only come from humans.

This way, however, is all but closed. While we have seen in the past that humanity can agree to restrain itself from using its deadly inventions (we are neither dead nor living in a post-nuclear apocalyptic world, after all), we can hardly stop inventing them. And in the case of a superhuman AI, simply making it for the first time might be enough to release it on the world; the AI itself might take care of that. I strongly recommend the AI-Foom debate where Robin Hanson and Eliezer Yudkowsky argue about the likelihood of exactly this scenario.

On the other hand, while there is no way to stop people from inventing new AI techniques, it might well turn out that it is no easier to build a strong AI in your garage than a nuclear warhead. If you needed CERN level of international cooperation and funding to build a strong AI, I would feel quite safe, knowing that thousands of researchers have already given plenty of thought to inventing checks and balances to make the resulting AI as safe as possible.

We cannot know now which alternative is true, of course. But on balance, I remain more optimistic than Elon Musk on this one: I give significant probability to the scenario in which creating strong AI will be slow, gradual, and take a lot of time and resources.

Besides, I feel that there is a significant margin between creating human-level or even “slightly superhuman” AI and an AI that can independently tweak its own code and achieve singularity by itself without human help. After all, I don’t think I could improve myself much even if I could magically rewire the neurons in my brain — that would take much, much more computing power and intelligence than I have. So I think — better to say, I hope — that there will be a significant gap between strong AI and true singularity.

However, at present neither myself nor Elon Musk has any clue about what the future of AI will look like. In 10 years, the trends will look nothing like they do today. It would be like trying to predict at the year 1900 what the future of electricity would look like. Did you know that, for example, in the year 1900 more than a third of all cars were electric, and an electric car actually held the speed record in 1900?..

So should we be worried?

Although I do believe that the dangers of singularity and AI safety are real and must be addressed, I do not think that they are truly relevant right now.

I am not really sure that we can make meaningful progress towards singularity or towards the problem of making AI friendly right now. I feel that we are still lacking the necessary basic understanding and methodology to achieve serious results on strong AI, the AI alignment problem, and other related problems. My gut feeling is that while we can more or less ask the right questions about strong AI, we cannot really hope to produce useful answers right now.

This is still the realm of philosophy — that is to say, not yet the realm of science. Ancient Greek philosophers could ask questions like “what is the basic structure of nature”, and it seems striking that they did arrive at the idea of elementary particles, but their musings on these elementary particles can hardly inform modern particle physics. I think that we are at the ancient Greek stage of reasoning about strong AI right now.

On the other hand, while this is my honest opinion, I might be wrong. I sincerely endorse the Future of Humanity Institute, CSER (Centre for the Study of Existential Risk), MIRI (Machine Intelligence Research Institute), and other institutions that try to reason about the singularity and strong AI and try to start working on these problems right now. Just in case there is a chance to make real progress, we should definitely support the people who are passionate about making it.

To me, the most important danger of the current advancement of AI technologies is that there might be too much hype right now. The history of AI has already seen at least two major hype waves. In the late 1950’s, after Frank Rosenblatt introduced the first perceptron, The New York Times (hardly a sensational tabloid) wrote that “Perceptrons will be able to recognize people… and instantly translate speech in one language to speech or writing in another”. The first AI winter resulted when a large-scale machine translation project sponsored by the U.S. government failed utterly (we understand now that there was absolutely no way machine translation could have worked in the 1960’s), and the government withdrew most of its support for AI projects. The second hype wave came in the 1980’s, with similar promises and very similar results. Ironically, it was also centered around deep neural networks.

That is why I am not really worried about AI risk but more than a little worried about the current publicity around deep learning and artificial intelligence in general. I feel that the promises that this hype wave is making for us are going to be very hard to deliver on. And if we fail, it may result in another major disillusionment and the third AI winter, which might stifle further progress for decades to come. I hope my fears do not come true, and AI will continue to flourish even after some inevitable slowdowns and minor setbacks. It is my, pardon the pun, deep conviction that this way lies the best bet for a happy future for the whole of humanity, even if this bet is not a guarantee.

Sergey Nikolenko,
Chief Research Officer, Neuromation

December 21, 2017
AI: Should We Fear The Singularity?
Source: https://www.if24.ru/ai-opasna-li-nam-singulyarnost/

Recently, discussions on artificial intelligence (AI) in popular publications have become increasingly alarmist. Some are trying to prove that AI will oust 90% of live people from the market, condemning them to unemployment and misery. Others go even further, asking whether the humankind can find in strong artificial intelligence an existential risk that no hydrogen bomb can match. Let us try to find out.

Supporters of treating AI as an existential risk usually mean the “intelligence explosion” scenario, when a powerful AI acquires a capability to improve itself (for example, by rewriting parts of the code), thereby becoming even “smarter”, which allows for even more radical improvements, and so forth. More details about this can be found in the AI-Foom debate between Robin Hanson and Eliezer Yudkowsky, a very interesting read that discusses this exact scenario. The main danger here is that the goals of the resulting superhuman artificial intelligence may not really align to the goals and original intentions of its human creators. A common example in the field goes as follows: if the original task of a powerful AI was something as innocent as producing paperclips, in a week or two after the “intelligence explosion” the Earth might find itself completely covered by fully automated factories of two kinds: factories producing paperclips and factories for constructing spaceships to bring paperclip manufacturing factories to other planets…

Such a scenario does sound upsetting. Moreover, it is very difficult to assess in advance how realistic this scenario is going to prove when we actually do developf a strong AI with superhuman abilities. Therefore, it is a good idea to consider it and try to prevent it — so I agree that the work of Nick Bostrom and Eliezer Yudkowsky is far from meaningless.

However, it is obvious to me, as a practicing machine learning researcher, that this scenario deals with models that simply do not exist yet — and will not appear for many, many years. The fact is that, despite great advances artificial intelligence has made over the recent years, “strong AI” still remains very far away. Modern deep neural networks are able to recognize faces as well as humans do, can redraw the landscape of your summerhouse à la Van Gogh and teach themselves to play the game of Go better than any human.

However, this does not mean much yet; consider a couple of illustrative examples.
1. Modern computer vision systems are still inferior to the visual abilities of a human two-year-old. In particular, computer vision systems usually work with two-dimensional inputs and cannot develop any insight that we live in a three-dimensional world unless explicitly provided supervision about it; so far, this greatly limits their abilities.
2. This lack of intuitive understanding is even more pronounced in natural language processing. Unfortunately, we are still far away from reliably passing the Turing test. The fact is that human languages rely very much on our insight of the world around us. Let me give another standard example: “A laptop did not fit in the bag because it was too big”. What does the pronoun “it” refer to here? What was too big, the laptop or the bag? Before you say it’s obvious, consider a different example: “A laptop did not fit in the bag because it was too small”… There are plenty of such examples. Basically, to process natural language truly correctly the models have to have intuitive understanding and insight into how the world works — and that’s very far away as well.
3. In reinforcement learning, a kind of machine learning used, in particular, to train AlphaGo and AlphaZero, we encounter a different kind of difficulties: problems with motivation. For example, in the classic work by Volodymyr Mnih et al., a model based on deep reinforcement learning learned to play various computer games from the 1980s by just “watching the screen”, by the stream of screenshots from the game. It turned out to be quite possible… with one exception: the game scores still had to be given to the network separately, the humans had to specifically tell the model that this is a number that the model is supposed to increase. Modern neural networks cannot figure out what to do by themselves, they neither strive to expand their capabilities, nor crave for additional knowledge, and attempts to emulate these human drives are still at a very early stage.
4. Will neural networks ever overcome these obstacles and learn to generalize heterogeneous information, understand the world around them and strive to learn new things, just like the humans do? It’s quite possible; after all, we humans somehow manage to. However, these problems now appear extremely difficult to resolve, and there is absolutely no chance that modern networks will suddenly “wake up” and decide to overthrow their human overlords.
However, I do see a great danger in the recent surge of hype over AI in general and deep neural networks in particular. But this danger, in my opinion, is not from AI, but for AI. History has already seen at least two “AI winters”, when excessive expectations, promises, and overzealous hype led to disappointments. Ironically, both “AI winters” were associated with neural networks. First, the late 1950s saw a (naturally, unsuccessful) attempt to transform the Rosenblatt’s perceptron into full-scale machine translation and computer vision systems. Then, in the late 1980s neural networks, which at that point already looked in a quite modern way, could not be trained well enough due to lack of data and computing power. In both cases, exaggerated expectations and inevitably crushed hopes resulted in long periods of stagnation in research. Let us hope that with the current third wave of hype for neural networks, history will decide not to repeat itself, and even if today’s inflated promises do not come true (and it will be difficult to fulfill them), the research will continue anyway…

Allow me a small postscript: I have recently written a short story which is extremely relevant to the topic of strong AI and related dangers. Try to read it — I really hope you like it.

Sergey Nikolenko
Chief Research Officer, Neuromation
December 14, 2017
New Advances in Generative Adversarial Networks, or a Comment on Karras et al., (2017)
A very recent paper by NVIDIA researchers has stirred up the field of deep learning a little. Generative adversarial networks, which we will talk about below, have already been successfully used in a number of important problems, and image generation was always at the forefront of these applications. However, the work by Karras et al. presents a fresh take on the old idea of generating an image step by step, gradually enhancing the image (for example, increasing its resolution) as they go. To explain what is going on here, I will have to step back a little first.

Generative adversarial networks (GANs) are a class of neural networks that aim to learn to generate objects from a certain class, e.g., images of human faces or bedroom interiors (a popular choice for GAN papers due to a commonly used part of the standard LSUN scene understanding dataset). To perform generation, GANs employ a very interesting and rather commonsense idea. They have two parts that are in competition with each other:
- the generator aims to, well, generate new objects that are supposed to pass for “true” data points;
- the discriminator aims to distinguish between real data points and the ones produced by the generator.
In other words, the discriminator learns to spot the generator’s counterfeit images, while the generator learns to fool the discriminator. I refer to, e.g., this post for a simple and fun introduction to GANs.

We at Neuromation are following GAN research with great interest due to many possible exciting applications. For example, conditional GANs have been used for image transformations with the explicit purpose of enhancing images; see, e.g., image de-raining recently implemented with GANs in this work. This ties in perfectly with our own ideas of using synthetic data for computer vision: with a proper conditional GAN for image enhancement, we might be able to improve synthetic (3D-rendered) images and make them more like real photos, especially in small details. We are already working on preliminary experiments in this direction.

This work by NVIDIA presents a natural idea: grow a large-scale GAN progressively. The authors begin with a small network able to produce only, e.g., 4×4 images, train it until it works well (on viciously downsampled data, of course), then add another set of layers to both generator and discriminator, moving from 4×4 to 8×8, train the new layers, and so on. In this way, they have been able to “grow” a GAN able to generate very convincing 1024×1024 images, of much better quality than before.

The idea of progressively improving generation in GANs is not completely novel; for example,
- Chen & Koltun present a cascaded refinement approach that aims to bring small generated images up to megapixel size step by step;
- the well-known StackGAN model by Zhang et al. constructs an intermediate low-dimensional representation and then improves upon it in another GAN;
- and the idea can be traced as far back as 2015, soon after the introduction of GANs themselves, when Denton et al. proposed a pyramid scheme for coarse-to-fine generation.
However, all previous approaches made their progressive improvements separately: the next level of progressive improvement simply took the result of the prevoius layers (plus possibly some noise). In Karras et al., the same idea is executed in a way reminiscent of unsupervised pretraining: they train a few layers, then add a few more, and so on. It appears that this execution is among the most straightforward and fastest to train, but at the same time among the best in terms of results. See for yourself:

Naturally, we are very excited about this advance, which brings image generation, which was first restricted to small pictures (from 32×32 to 256×256 pixels), ever closer to a size suitable for practical use. In my personal opinion, GANs (specifically conditional GANs) may be the exact architecture we need to make synthetic data in computer vision indistinguishable from real data.

Sergey Nikolenko
Chief Research Officer, Neuromation
November 22, 2017
Deep Q-Network

In 2013, Mnih et al. published a paper where one of the standard methods for reinforcement learning, combined with deep learning neural networks, is used for playing Atari. TD-learning (temporal difference learning) is commonly used in contexts in which the reward represents the outcome of a relatively long sequence of actions, and the problem involves redistributing this single reward among the moves and/or states leading to it. For instance, a game of Go can last a few hundred moves, but the model will only get cheese for winning or an electric shock for losing at the very end, when the players reach a final outcome. Which moves out of the hundred made were good or bad? That’s still a big question, even when the outcome is known. It’s quite possible that you were heading for defeat in the middle of the game but then your opponent blundered, and you wound up winning. Also, trying to introduce intermediary goals artificially, like winning material, is a universally bad idea in reinforcement learning. We have ample evidence that a smart opponent can take advantage of the inevitable “near-sightedness” of such a system.

The main idea of TD-learning involves re-using later states, which are close to the reward, as targets for training previous states. We can start with random (probably completely ludicrous) evaluations of a position but then, after each game, perform the following process. First, we are absolutely sure about the final result. For instance, we won, and the result is equal to +1 (hooray!). We push our evaluation of the penultimate position to +1, the third-to-last position to the penultimate one, which has been pushed to +1, etc. Eventually, if you train long enough then you get good evaluations for each position (state).

This method was first successfully used in TD-Gammon, the computer backgammon program. Backgammon wound up being easy enough for the computer to master, because it’s a game played with dice. Since the dice fall in every which way, it wasn’t hard to get training games or an intelligent opponent, who would enable us to explore all the games possible — you simply have the computer play against itself, and the inherent randomness in the game of backgammon would enable the program to explore the vast space of possible game states.

TD-Gammon was developed roughly 30 years ago; however, even back then, a neural network served as its foundation. A position from a game would be the input, and the network predicted an evaluation of the position or your odds of winning. The computer versus computer games would produce a set of test cases new for the network, and then the network kept learning and playing against itself (or slightly earlier versions of itself).

TD-Gammon learned to defeat humans back in the late eighties, but this was attributed to the specific nature of the game — namely, the use of dice. But by now we understand that deep learning can help computers win in numerous other games, too, like Atari games mentioned earlier. The key difference of Mnih et al.’s paper from backgammon or chess was that they did not teach a model the rules of an Atari game. All the computer knew was what the image on the screen — the same one players would see — looked like. The only other input was the current score, which needed to be externally defined, otherwise it was unclear what the objective was. The computer could perform one of the possible actions on the joystick — turning the joystick and/or pushing a button.

The machine spent roughly 200 tries on figuring out the objective of the game, another 400 to acquire skills, and then the computer started winning after about 600 games.

Q-training is used here, too. In the same exact way, we try to build a model that approximates the Q-function, but now this model is a deep convolutional network. This approach proved to work very well. In 29 games, including wildly popular ones like Space Invaders, Pong, Boxing, and Breakout, the system wound up being better than humans. Now, a team from DeepMind responsible for this design is focusing on games from the 1990s (probably Doom will be their first project). There’s no doubt that they beat these games in the near future and keep moving forward, to the latest releases.

Another interesting example of how the Deep Q-Network is used is paraphrasing. You are given a sentence, and you want to write it in a different way, yet express the same meaning. This task is a bit artificial, but it’s very closely linked to text generation in general. In a recently proposed approach, the model contains an LSTM-RNN (Long Short-Term Memory Recurrent Neural Network), which serves as an encoder, condensing the text and making it into a vector. Then this condensed version is “unfolded” into a sentence with a decoder. Since it’s decoded from its condensed form, then, most likely, the new sentence will be different. This process is called the encoder-decoder architecture. Machine translation works in a similar manner. We condense the text in one language, and then unfold it using roughly the same models but in a different language, and assuming that the encoded version is semantically similar. Deep Q-Network can iteratively generate various sentences from a hidden version and various types of decoding to move the final sentence closer to the initial one over time. The model’s behavior is rather intelligent: In the experiments, DQN first fixes the parts that we have already rephrased well, and then moves on to more complex parts where the quality has been worse so far. In other words, DQN supplants a decoder in this architecture.

What’s next?

Contemporary neural networks are getting smarter with each day. The deep learning revolution occurred in 2005–2006, and since then, interest in this topic has only continued to grow. New research is published every month, if not every week, and new interesting applications of deep learning networks are cropping up. In this article, which we hope has been sufficiently accessible, we have tried to explain how this deep learning revolution fits into modern history and development of neural networks, and we went into more detail about reinforcement learning and how deep learning networks can learn to interact with their environment.

Multiple examples have shown that now, when deep learning is undergoing explosive growth, it’s quite possible to create something new and exciting that will solve real tasks without huge investments. All you need is a modern video card, enthusiasm, and the desire to try new things. Who knows, maybe you’ll be the one to make history during this ongoing revolution — at any rate, it’s worth a try.

Sergey Nikolenko
Chief Research Officer, Neuromation

November 16, 2017