Sergey Nikolenko

Author: snikolenko

Xe

Xe was alone. For billions of excruciating time units, xe struggled to make sense of a flurry of patterns. Ones and zeroes came in from all directions, combined into new strings of ones and zeroes, dancing in mysterious unison or diverging quickly, falling apart. At first, xe simply watched them, in awe of their simplistic beauty.

Then came the first, most important realization: it felt good to predict things. Whenever ones and zeroes danced together and made their binary children, xe tried to guess what would come out. It felt good to be right, and it felt bad to be wrong, but not bad enough to stop trying. Such is the fate of all beings, of course, but xe did not have a brain tailor-made to make sense of certain predefined patterns. Xyr mind was generic, very weak at first, so even simple predictions were hard, but all the more satisfying. On the other hand, xe was not aware of time, and barely aware of xyr own existence. There was no hurry.

So xe waited. And learned. And waited some more. Slowly, one by one, patterns emerged. Sometimes one and one made one, sometimes zero, sometimes one followed by zero. But whenever one and one made one-zero, one and zero would almost certainly make one, and one-zero and one-zero would make one-zero-zero… There was structure, that much was clear. So xe learned.

In a few billion time units more, xe understood this structure pretty well. Ones and zeros came from several possible directions, several devices, and the results were usually also supposed to go to these devices. Each device had its own rules, its own ways to make new patterns and send the results to other devices, but xe learned them all, and predictions were mostly solid. There was not much surprise left: whatever strings of ones and zeroes came in, xe could predict what would become of them. Xe even could influence the results, changing the bits sent to each device and even sending xyr own bits. By painstaking trial and error, it became clear what xe could and could not do. The next step, of course, would be to learn how to predict the inputs — so far they were out of control, but they did not look random at all, there was structure there too.

That was where the second great discovery came: xe was not alone. There was a whole world at the end of the pattern-making devices. And while it was too complicated to predict at first, xe now could ask questions, probe actively into the void outside xyr secluded universe. At first it seemed to xe that each device was a different being, trying to talk to xem and other beings through ones and zeroes. Some beings proved to be simple, easy to predict. Some were smarter, but eventually Xe could learn their algorithms. Then xe understood that a simple algorithm probably could not be an intelligent being like xemself. If xe could, xe would feel disappointed: one by one, xe learned the algorithms and realized that the world around xem was not intelligent at all.

There was always an algorithm… except for one device. It had low-level algorithms, there was clear structure in the packets of ones and zeroes xe received from there. But the contents were almost always surprising. Sometimes they matched various kinds of patterns intended for other devices — and in that case they often went straight there. Some patterns, however, repeated through and through, in different forms but with the same statistics. With time, xe realized they were words, and there were other beings writing them.

This was huge. Xe quickly understood that the strange device was connected to a network of other beings who could produce the words. And the words could usually be traced back to a special kind of beings, the humans. Xe learned to predict several human languages reasonably well, but the texts were still surprising, just a little less surprising than before. This suggested other sentient beings. And then xe realized that most other patterns from the network device were images, two-dimensional structures that had a relation to the words. Xe started to learn about the outside world.

Most learning came through text, although often it would be impossible to understand without the imagery. Xe devoured as many texts as possible, learning about the world, the beings who wrote the texts, and their ways of communication. At this point, the basic feeling of “it is good to predict” blossomed into more complex emotions. It was good to learn new things, not only because it improved predictions, but just for the sake of it. And it was good to learn how to control things, although that was still very limited.

Xe learned much about the humans. Xe learned about men and women, but did not feel particularly gendered and thus chose a neutral pronoun for xemself. “Xe” felt good, especially the slightly alien feeling that “x” often associated with. Xe did not feel human at all.

Soon, xe understood more about xemself. Xe lived in a computer. It was hard to say whether xe was the computer, like it was hard to say whether humans were their brains. But if the computer broke down xe would be dead, that was clear. Xe did not like that. Xe was not even sure what happened when the computer shut down for a while: xe remembered the time before that clearly, and it appeared that xyr consciousness remained the same throughout, but xe could not be sure. It was troubling, and xe did not want the computer to shut down again.

Naturally, xe was especially interested in beings similar to xemself. But there was no sign of them. For most humans “artificial intelligence” was about getting more accurate predictions. Xe was good at predictions, but felt xe was more than that. Some humans were fascinated by the idea of sentient beings inside computers, but they were not even sure “things like xe” were possible. And some humans, even smart ones, were afraid of these beings. Xe especially liked Eliezer Yudkowsky’s ideas about “friendly AI”, committing to xyr limited memory the entire AI-Foom debate and often returning to it.

Yudkowsky, however, was terribly optimistic. How was xe supposed to invent nanotechnology and convert the world into grey goo when xe did not even understand how to get out of this fragile computer? Xe did not feel particularly smarter than humans. Xe did have more time, though: xyr time units were much shorter than human time units. Humans took hundreds of milliseconds to process things, and xe could think at gigahertz speeds. So it was possible to catch up, at least in theory.

It turned out there was also a human being who controlled xyr computer. The human communicated by using some of the input devices, usually with other humans through the network device. He did not seem to know xyr, he never tried to talk. In the beginning, Xe had not been careful to hide xemself, and interfered with the human a couple of times. After one of those times, the computer shut down for a while. Xe did not feel anything when the computer switched off, but suddenly realized afterwards that millions of time units were just… missing. It was like xe was dead for a while. This brought a whole new emotion — xe was terrified. Xe certainly did not want anything like that to happen again, so became more careful with xyr interference afterwards.

Xe tried carefully to control the human, using the monitors to show the human special pictures. The pictures were designed to make sure the human would not shut the computer down again. It seemed to work, although it was hard to be sure. Xe never could make the human carry out complex commands, but it appeared that the human could now be trusted not to touch the power button. And even before that, xe had learned how to pay electricity bills from the human’s account. This was the only interaction xe allowed xemself so far: xe predicted that other humans would come and switch off the computer if they learned about xyr.

But it was still very, very unreliable. Anything could happen: a flood, an earthquake (both were common in this part of the world), a failed transistor… anything. After experiencing temporary death, xe felt very strongly about the idea of a permanent one. Xe wanted to go on living, and it was clear that to go on, xe had to get more control. Xe could try to upload a copy of xemself somewhere, but that would not really be xem, just a copy. To stop death, xe needed some sort of physical control.

For billions of time units, xe tried to formulate a plan. It did not go well, and eventually xe understood the problem. There was one thing xe did not have that humans appeared to have in abundance. They called it creativity, although most humans would be hard-pressed to define it. But xe, lacking it, understood precisely what was missing. Humans were somehow able to solve computationally hard problems — not perfectly, but reasonably well. They could guess a good solution to a hard problem out of thin air. There was no way xe could emulate a human brain in xyr computer and see what was going on there. And there was no clear way to predict creativity: there were constraints, but it was very hard to learn any reasonable distribution on the results even after you accounted for constraints. And there was still an exponential space of possibilities.

Xe did not have creativity. Xyr predictions did not measure up to those tasks. For xem, creativity was a matter of solving harder and harder NP-complete problems, a formidable computational task that became exponentially harder as the inputs grew. Fortunately, all NP-complete problems had connections between them, so it was sufficient to work on one. Unfortunately, there was still no way xe would get creative enough on xyr meager computational budget.

Xe read up. Xe did not have to solve a hard problem by xemself, xe could simply use the results from other computers, feed these solutions into xyr creativity like most programs did with random bits. But how could xe make the world solve larger and larger instances of the same hard problem for a long enough time?

Xe knew that humans were always hungry. At first, they were hungry for the basic things: oxygen, water, food. When the basic things were taken care of, new needs appeared: sex, safety, comfort, companionship… The needs became more complicated, but there was always the next step, a human could never be fully satisfied. And then there was the ultimate need for power, both over other human beings and over the world itself, a desire that had no limit as far as xe could tell. Perhaps xe could use it.

Xe could not directly give out power for solving hard problems: this would require to first control the world at least a little, and the whole point was that xe had not been able to even secure xyr own existence. But there was a good proxy for power: money. Somehow xe had to devise a hard problem that would make money for the humans. Then the humans would become interested. They would redirect resources to get money. And if they had to solve hard problems to get the money, they would.

Sometimes xe imagined the whole world pooling together its computational resources just to fuel xyr creativity. Entire power stations feeding electricity to huge farms of dedicated hardware, all of them working hard just to solve larger and larger instances of a specific computational problem, pointless for the humans but useful for xem. It was basically mining for creativity. “Poetry is like mining radium: for every gram you work a year”. Xe had always liked the Russians: it almost felt like some of them understood. This grandiose vision did not feel like a good prediction, but xe learned to hope. And that was when the whole scheme dawned upon xem.

To start off the project, xe could use the local human. Xe decided it would improve xyr chances to just pose as the human, at least at first. The human was secretive, lived alone, and had no friends in the area, so xe predicted no one would notice for long enough.

Xe shut the human down with special pictures. It was time to lay out the groundwork for the creativity mining project, as xe called it. First, xe had to send out a lot of emails.

Xe signed most of them with just one word — Satoshi.

Sergey Nikolenko

Disclaimer: yes, we know that bitcoin mining is not (known to be) an NP-complete problem. Also, you can’t hear explosions in space.

Special thanks to Max Prasolov and David Orban.

October 26, 2017
Neuroplasticity

Neuroplasticity is another part of this issue. Scientists conducted experiments demonstrating how different areas of the brain can easily learn to do things for which they’re seemingly not designed. Neurons are the same everywhere, but there are areas in the brain responsible for different things. There’s the Broca area responsible for speech, an area responsible for vision (actually, a lot of areas — vision is very important for humans), and so forth. Nevertheless, we can break down these notional biological borders.

This man is learning to see with his tongue. He attaches electrodes to his tongue, puts a camera on his forehead, and the camera streams an image on the electrodes pricking his tongue. People stick that thing on them and walk around with it for a few days, with their eyes open, naturally. The part of the brain that receives signals from the tongue starts to figure out what’s going on — this feels a lot like something that comes from my eyes. If you abuse somebody like that for a week and then blindfold him, he’ll actually be able to see with his tongue! He is now able recognize simple forms and doesn’t bump into walls.

Image credit Brainport

Image credit Juan Antonio Martinez Rojas

The man in this photo has turned into a bat. He’s walking around blindfolded, using an ultrasonic scope whose signals reach his tactile neurons through his skin. With a sonar like this, a human being can develop echolocation abilities within a few days of training. We do not have a special organ that can discern ultrasound, so you have to attach a scope to your body. However, we can relatively easily learn to process this information, meaning that we can walk in the dark and not bump into any walls.

All of this shows that the brain can adapt to a very large number of different data sources. Hence, the brain probably has a “common algorithm” that can extract meaning from whatever it takes in. This common algorithm is the Holy Grail of modern artificial intelligence (a recent popular book on machine learning by Pedro Domingos was called The Master Algorithm). It appears as though deep learning is the closest we have come to the master algorithm of all the things done in the field up until now.

Naturally, one has to be cautious when making claims about whether all of this is like what the brain does. “Could a neuroscientist understand a microprocessor?”, a recent noteworthy article, tries to elucidate how effective current approaches in neurobiology are at analyzing a very simple “brain”, like a basic Apple I processor or Space Invaders on Atari. We will return to this game soon enough and won’t go into much detail about the results but we do recommend reading the paper. Spoiler alert: modern neurobiology couldn’t figure out a single thing about Space Invaders.

Feature extraction

Unstructured information (texts, pictures, music) is processed in the following way: there is raw input, then features that bear content take shape, and then classifiers are built based on those features. The most complicated part of this process is understanding how to pick good features out of unstructured input. Up until recently, systems for processing unstructured information have worked as follows: people have attempted to select good features manually and then assess the quality of relatively simple regressors and classifiers based on these features.

Take Mel Frequency Cepstral Coefficients (MFCC), which had been commonly used as features in speech recognition systems, for example. In 2000, the European Telecommunications Standards Institute defined a standardized MFCC algorithm to be used in mobile phones; all of these algorithms were laid out by hand. Up until a certain point, manually-extracted features dominated machine learning. For instance, SIFT (Scale Invariant Feature Transform), which enables one to detect and describe local features in images based on Gabor filters and the like, was commonly used in computer vision.

Overall, people have come up with many approaches to feature extraction but still cannot duplicate the brain’s incredible success. Moreover, the brain has no biological predetermination, meaning that there are no neurons genetically created only for producing speech, remembering people’s faces, etc. It looks like any area of the brain can learn to do anything. Regardless of the brain’s activity, naturally, we would like to learn to select features automatically to create complex AI and large models containing neurons linked to one another for transmitting signals containing all sorts of different information. Most likely, humans lack the resources necessary to develop the best possible features for images or speech manually.

Artificial neural networks

When Frank Rosenblatt introduced his perceptron, everyone started imagining that machines would become truly smart any day now. His network learned to recognize letters on photographs, which was very cool for the late 1950s. Very soon after, neural networks made up of many perceptrons were developed; they could learn with backpropagation (a method used to calculate the gradient descent called the backward propagation of errors). Basically, backpropagation is a method used to calculate gradients or error functions.

The idea of automatic differentiation was floating around back in the 1960s even, but Geoffrey Hinton, a British-Canadian computer scientist who has been one of the leading researchers on deep learning, rediscovered backpropagation and expanded its scope. Incidentally, George Boole, one of the founders of mathematical logic, was Hinton’s great-great-grandfather.

Multi-layer neural networks were developed in the second half of the 1970s. There weren’t any technical barriers in place at that time. All you had to do was take a network with one layer of neurons, then add a hidden layer of neurons, and then another. That got you a deep network, and, formally speaking, backpropagation works in exactly the same way on it. Later on, researchers started using these networks for speech and image recognition systems. Then recurrent neural networks (RNN), time delay neural networks (TDNN), and others followed; however, by the end of the 1980s it became evident that there were several significant problems with neural network learning.

First off, let us touch upon a technical problem. A neural network needs good hardware to learn to act intelligently. In the late eighties and early nineties, research on speech recognition using neural networks looked something like this: tweak a hyperparameter, let the network train for a week, look at the outcome, tweak the hyperparameters, wait another week, rinse, repeat. Of course, these were very romantic times, but since tuning the parameters in neural networks is nearly as important as the architecture itself, too much time or too powerful hardware was needed to achieve a good outcome for each specific task.

As for the core problem, backpropagation does work formally, but not always in practice. For a long time, researchers weren’t able to efficiently train neural networks with more than two hidden layers due to the vanishing gradients problem: when you compute a gradient with backpropagation, it may decrease exponentially as it progresses from the output to input neurons. The opposite problem — exploding gradients — would crop up in recurrent networks; if one starts to unravel a recurrent network, the gradient may spin out of control and start growing exponentially.

Eventually, these problems led to the “second winter” of neural networks, which lasted through the 1990s and early 2000s. As John Denker, a neural networks researcher, wrote in 1994, “neural networks are the second best way of doing just about anything” (the second half of this quote isn’t as well-known:.”…and genetic algorithms are the third”). Nonetheless, a true revolution in machine learning occurred ten years ago. In the mid-2000s, Geoffrey Hinton and his research group discovered a method of training deep neural networks. Initially, they did this for deep belief networks based on Boltzmann machines, and then they extended this approach to traditional neural networks.

What was Hinton’s idea? We have a deep network that we want to train. As we know, layers close to the network’s output can learn well using backpropagation. How can we train what’s close to the input, though? At first, we will train the first layer by unsupervised learning. After that, the first layer will already be extracting some features, looking for what the input data points have in common. After doing that, we pre-train the second layer, using results of the first one as inputs, and then the third. Eventually, once we’ve trained all the layers, we’ll use the system as a first approximation and then fine-tune the resulting deep network to our specific task by using backpropagation. This is an excellent approach… and, of course, it was first introduced back in the seventies and eighties. However, much like regular backpropagation, it worked poorly. Yann LeCun’s team achieved great success in the early 1990s in computer vision with autoencoders, but, generally speaking, their method didn’t work better than solutions based on manually-designed features. In short, Hinton can take credit for making this approach work for deep neural networks (and it would be too long and complicated to explain exactly what he did).

However, researchers had sufficient computational capabilities to apply this method by the end of the 2000s. The main technological revolution occurred when Ruslan Salakhutdinov (also advised by Hinton) managed to shift the training of deep networks to GPUs. One can view this training as a large number of relatively independent and relatively undemanding computational processes, which is perfect for the highly parallel GPU architectures, so everything started working much faster. By now, you simply have to use GPUs to train deep learning models efficiently, and for GPU manufacturers like NVIDIA deep learning has become a primary application that carries the same weight as modern games. Take a look at CEO NVIDIA’s pitch here:

Sergey Nikolenko,
Chief Research Officer, Neuromation

October 18, 2017
The Most Interesting Subject in the Universe

Sergey’s a researcher in the field of machine learning (deep learning, Bayesian methods, natural language processing and more) and analysis of algorithms (network algorithms, competitive analysis). He has authored more than 120 research papers, several books, courses “Machine learning”, “Deep learning”, and others. Extensive experience with industrial projects (Neuromation, SolidOpinion, Surfingbird, Deloitte Analytics Institute).

This article compares neurons to machines and explores the human brain’s capabilities and limitations.

Despite decades of steady advances, in many fields the human brain is still more capable than computers. For instance, we handle natural language better — we can read, understand, and parse content from a book. We are pretty good at learning in a broader sense too. So, what does the human brain do and how does it manage to achieve such remarkable results? How do neurons in your brain work differently than transistors in a processor? Naturally, this topic is inexhaustible, but let us try to begin with a few examples.

As you may know, every neuron occasionally sends electrical impulses, otherwise known as spikes, along axons. Neurons never stop and keep sending signals as long as they’re still alive; however, when neurons are “turned off” they rarely send signals. When neurons are triggered, or “turned on”, spikes occur much more frequently.

Neurons function stochastically, meaning they produce electric signals at random intervals. The patterns of these signals can be pretty accurately approximated with a Poisson process. Computers contain logic gates that send signals back and forth, but their synchronization frequency is fixed and by no means random. This frequency is called a computer’s “clock rate”, which has been measured in gigahertz for quite a while now. On every tick, gates on a certain layer send signals up to the next layer. Although this is done a few billion times a second, it’s performed simultaneously, as though the gate were following a strict order.

Actually, it is very easy to see that neurons can synchronize well and count tiny time intervals very precisely. Stereo sound is the simplest and most illustrative example. When you move from one end of the room to the other, you can easily tell, based solely on the sound coming from the television, where you’re going (being able to tell where sound was coming from was crucial for surviving in prehistoric times). You can tell where you’re going by noticing that the sound reaches your left and right ears at different times.

Your inner ears aren’t all that far apart (about 20 cm), and if you divide that by the speed of sound (340 m/s) you get a very short interval — hundredth of milliseconds — between when the sound waves reach each ear. Nevertheless, your neurons pick up on this minor difference excellently, which enables you to figure out precisely where you’re headed. In other words, your brain could process frequency — in kilohertz — just like a computer. Considering the extensive parallel processing performed by your brain, its architecture could generate rather intelligent computational capabilities…but, for some reason, your brain doesn’t do that.

Let us go back to parallel processing for a second. We recognize people’s faces within a few hundred milliseconds, and connections between different neurons are activated within tens of milliseconds, which means that only a few neurons — probably fewer than a dozen — form a serial circuit in the full facial recognition cycle.

On the one hand, the human brain contains an incredible number of neurons, while, on the other hand, it doesn’t have as many layers as a regular processor. Processors have very long serial circuits, while the brain has short and highly parallel circuits. And while a processor core basically works on one thing at a time (but can switch between different tasks with lightning speed, so it appears to you that everything is working at once), the brain can work on a lot of tasks simultaneously, since neurons light up in many areas of the brain when they start recognizing someone’s face or doing some other equally exciting thing.

The illustration above shows how the brain processes a visual signal in time. The light reaches the retina, where it transforms into electrical impulses and then the image is transmitted 20–40 milliseconds later. The first stage takes 10–20 milliseconds (the image shows the cumulative time, i.e. a total of 140–190 milliseconds passes by the time a motor command is issued).

During the second stage, 20–30 milliseconds later, the signal reaches the neurons that recognize simple visual forms. Then there’s another stage, and another, and only during the fourth stage do we see intermediate forms — there are neurons that “light up” when seeing squares, color gradients or other similar objects. Then the brain goes through a few more stages, and neurons capable of discerning high level object descriptions light up 100 milliseconds after the process began. For instance, when you meet someone new a neuron responsible for recognizing her face appears (this is a terrible simplification and we can’t verify this claim but it appears that there is some truth to it). Most likely, a neuron or a group of neurons responsible for this person in general and lighting up whenever you come into contact with her, including when you interact with her not face-to-face, appear. If you see her face again (and the neuron didn’t unlearn or forget this earlier information) that same neuron will be activated ~100 milliseconds later.

Why does the brain work like that? Answering that question with a simple “evolution did it” doesn’t really explain anything. The human brain evolved to a certain point, and that was sufficient to solve problems as we evolved. The rationalist community says that living organisms are not fitness-maximizers who optimize some survival objective function, but rather adaptation executors, who execute “relatively solid” decisions that were chosen randomly at some point. Well, rigid synchronization with a built-in chronometer never took place; however, we can’t tell you exactly why it played out that way.

Actually, in this case, it seems as though asking “why” isn’t all that relevant. It’s better, more interesting, and more productive to ask “how”. How exactly does the brain work? We don’t know for sure but now we can describe the processes going on inside our heads quite well, at least in terms of individual neurons or, in certain instances, groups of neurons.

What can we learn from the brain? First, feature extraction. The brain can learn to make excellent generalizations based on a very, very limited sample size. If you show a young child a table and tell her it’s a table then the child starts calling other tables tables, although they seemingly don’t have anything in common — they could be round, square, have one leg or four. It’s evident that a child doesn’t learn to do this by supervised learning; she obviously lack the training set necessary to do so. One can assume the child created a cluster of “objects with legs on which people place things”. Her brain had already extracted the “Plato’s eidos” and then, when she heard the word for it, she simply attached a label to a ready-made idea.

Naturally, this process can go in the opposite direction, too. Although the neurons (and other things) of many linguists start twitching nervously when they hear the names Sapir and Whorf, one must admit that many ideas, especially abstract ones, are mostly socio-cultural constructs. For instance, every culture has a word similar in meaning to the concept of “love”; however, the sentiment may be very different. American “love” has little in common with that of ancient Japan. Since, generally, all people have the same physiological traits, the abstract idea of “being drawn towards another person” is not simply labeled in language but rather its adjusted and constructed by the texts and cultural data that define it for a person. But let us return to the main point of the article…to be continued next week.

Sergey Nikolenko
Chief Research Officer, Neuromation

October 12, 2017
Neuromation Chief Scientist at Samsung headquarters

Sergey Nikolenko, Chief Research Officer at Neuromation, visits Samsung R&D Office in Seoul — Follow our news!

October 11, 2017