How do babies learn to do anything?
Newborn babies are pretty much idiots. They can’t walk, they can’t talk, and they chew on most things before they find out that they’re inedible. That’s why they need the constant care of their parents to stop them from hurting themselves. Yet despite how many times they are held down to a stop, babies are always scouring around, looking for the next interesting object to chew on. This is because babies learn about the world by exploring what sparks their interest. This curiosity allows for the baby to interact with various objects, try different actions, and nurture them. By being curious, a baby trains oneself in its sense and sensibility.
Leveraging curiosity as a means of learning about the world is a popular approach for training artificial intelligence (AI) agents. By forcing the agent to be curious, it sets out exploring the world, in the process visiting different places developing various skills aiding its exploration. Critically, this learning of skills is done without actually teaching the agent how to perform it; the agent simply finds a need to learn the skill so that it can visit the places it is curious about. So how is such curiosity implemented in the agent?
In the work ‘Curiosity-driven Exploration by Self-supervised Prediction’,[1] Pathak et al implement curiosity as a state prediction. Given a state s of the world, curiosity is measured by how predictable the next state, s′ under the agent’s action a. For example, imagine holding a glass bottle (s). If you loosen your grip (a), you know that it will shatter (s′). However, Babies and untrained AI agents don’t know this outcome, therefore they will try dropping the bottle to satiate their curiosity. After observing the combination of (s, a, s′), they don’t try it again because next time they encounter the bottle they won’t be as curious about it.
Learning about the world with curiosity has multiple benefits. In the glass bottle experiment of above, the baby/agent learns the following. (1) How to drop objects: when holding an object, one can loosen its hand to drop directly below. (2) Intuition of physics: Objects fall to the ground when dropped. (3) Properties of glass and the ground: Glass shatters when it hits the ground. Compounding different states and actions allows for even more skills to be learned and knowledge of the world to be gained.
An even more interesting phenomenon is when AI agents can learn in a simulation, such as in a game. In a game, there is much more freedom of movement, as there are no physical repercussions. No matter how many times an agent dies in the game, it can be resurrected and return back to its original state. AI agents can capitalize on this freedom to try a myriad of different actions. In addition, the return to the original state makes the consequences of dying ’predictable’, or ’boring’, thus demotivating the agent from dying further. By learning in a simulation, the agent learns to survive and explore, all by itself without any outside supervision.
By the use of curious exploration, the agent can learn many things about the world. However, this method is not fool proof. For example, agents are susceptible to unpredictable signals. A very common example of unpredictable signals is a TV with a random pattern. Just like humans, interesting television programs will fixate the agent’s gaze, stopping it from learning anything useful. However, unlike humans, any random signal interests AI agents, and so the agent will be stuck trying to find understand randomness, which is impossible.
Pathak et al mitigate this problem by learning only about parts of the world that can be affected by the agent’s action. However, while this works in many settings, it fails when there are important parts of the world affecting the agent that is not controllable by the agent.
Overall, it is interesting to develop methods of curious exploration due to its ability to train agents to understand the world and gain skills without any direct supervision. The concept of curiosity has been developed further in the field since Pathak et al, and has been combined with video prediction models[2] and with task-specific fine-tuning.[3] It would be interesting to note the direction of this research and develop it further for deployment in a practical setting.
Works Cited
[1] Deepak Pathak et al. “Curiosity-driven exploration by self-supervised prediction”. In: International conference on machine learning. PMLR. 2017, pp. 2778–2787
[2] Younggyo Seo et al. Reinforcement Learning with Action-Free Pre-Training from Videos. 2022. doi: 10.48550/ ARXIV.2203.13880. url: https://arxiv.org/abs/2203.13880.
[3] Michael Laskin et al. URLB: Unsupervised Reinforcement Learning Benchmark. 2021. doi: 10.48550/ARXIV. 2110.15191. url: https://arxiv.org/abs/2110.15191.
'EDITORIAL > 과학 :: Science & Tech' 카테고리의 다른 글
요즘 핫한 NFT, 대체 무엇일까? (0) | 2022.03.20 |
---|---|
3 Reasons Why Robots Won't (Can't) Take Over the World (0) | 2022.03.06 |
Harvard Research Reveals: Don't Think Too Much, You Might Die Earlier (0) | 2019.11.15 |
How far can we go with artificial intelligence? (0) | 2019.03.15 |
Gene Editing: Making The Perfect Baby (0) | 2019.02.22 |