Computing power and memory storage have doubled approximately every two years, allowing today's computers to memorise essentially everything. In tandem, new machine learning techniques are being developed that harness this wealth of data to extract knowledge, make predictions, and generalize to unseen data; many of these with artificial neural networks at their core. This combination has led to impressive new solutions to numerous real world problems, including image classification and speech processing.
Despite this progress, computers still lag behind human performance on more general-purpose tasks. In particular, current methods are not well suited to learning in non-stationary settings (where the data is changing over time): a desirable system would learn new things quickly, without forgetting what it knew before. To clarify these ideas, consider an artificial neural network trained to classify clothes from images. This is a non-stationary task, because fashions change and innovate, so the network must continually learn from new examples. However, it must do so without forgetting previous examples (e.g. summer clothes, not seen for all of winter), otherwise it would have to relearn about summer clothes from scratch each spring. In practice, to handle new examples, the network needs to learn at a high rate, but this high learning rate has the side-effect of overwriting old memories; that is, the system is forgetting quickly. Conversely, if the learning rate is low, the network remembers for much longer, but then learning is impractically slow, and no longer agile enough to deal with changing environments.
This research challenge of fast learning on non-stationary tasks without forgetting is therefore a fundamental one, and is recognized as a stumbling block in current approaches to transfer learning, continual learning or life-long learning. But of course, there exists one system that has solved the apparent dilemma: the human brain. We humans live our life in a non-stationary world, and we can both learn quickly and remember for a long time. A classical example from experimental psychology shows that the rate at which a person forgets a series of previously memorised random letters follows a power-law, i.e., the decay is equally large between 1h and 2h as it is between 2h and 4h, or between 1 week and 2 weeks. In contrast, forgetting in artificial systems happens exponentially, i.e., the decay is the same between 1h and 2h as it is between 100h and 101h, and therefore much faster than observed in humans.
In the brain, learning is based on the modification of the connection strength between neurons when a new pattern enters, a process called synaptic plasticity. This change can last for different amounts of time, giving rise to the three timescales: short-term plasticity, long-term plasticity and synaptic consolidation.
The research hypothesis of this proposal is that we can reach human-level performance by building a learning system that takes inspiration from these learning mechanisms of the brain, in particular the different time scales of synaptic plasticity and their interplay. The intuition is the following: an incoming memory is learnt quickly using the fastest learning rate, then this memory is slowly transferred to another component that operates at a slower learning rate, so that it is not overwritten by new incoming memories.
This proposal therefore addresses two research challenges. I intend to build a unifying learning rule across all three learning timescales, just like I unified long-term and very long-term in past work. I will then investigate the learning and forgetting speed in plastic networks with the unifying learning rule. The network will learn to categorise on non-stationary data, but be tested on all the seen data, currently a very difficult task in machine learning.
|