An artificial neural network to acquire grounded representations of robot actions and language
To best assist human users while they complete everyday tasks, robots should be able to understand their queries, answer them and perform actions accordingly. In other words, they should be able to flexibly generate and perform actions that are aligned with a user’s verbal instructions.
To understand a user’s instructions and act accordingly, robotic systems should be able to make associations between linguistic expressions, actions and environments. Deep neural networks have proved to be particularly good at acquiring representations of linguistic expressions, yet they typically need to be trained on large datasets including robot actions, linguistic descriptions and information about different environments.
Researchers at Waseda University in Tokyo recently developed a deep neural network that can acquire grounded representations of robot actions and linguistic descriptions of these actions. The technique they created, presented in a paper published in IEEE Robotics and Automation Letters, could be used to enhance the ability of robots to perform actions aligned with a user’s verbal instructions.
„We are tackling the problem of how to integrate symbols and the real world, the ’symbol grounding problem,'“ Tetsuya Ogata, one of the researchers who carried out the study, told TechXplore. „We already published multiple papers related this problem with robots and neural networks.“
The new deep neural network-based model can acquire vector representations of words, including descriptions of the meaning of actions. Using these representations, it can then generate adequate robot actions for individual words, even if these words are unknown (i.e., if they are not included in the initial training dataset).
„Specifically, we convert the word vectors of the deep learning model pre-trained with a text corpus into different word vectors that can be used to describe a robot’s behaviors,“ Ogata explained. „In normal language-corpus learning, similarity vectors are given to words that appear in similar contexts so the meaning of the appropriate action cannot be obtained. For example, ‚fast‘ and ’slowly‘ have similar vector representations in the language, but they have opposite meanings in the actual action. Our method solves this problem.“
Ogata and his colleagues trained their model’s retrofit layer and its bidirectional translation model alternately. This training process allows their model to transform pre-trained word embeddings and adapt them to existing pairs of actions and associated descriptions.
Read the full article here.
An article by Ingrid Fadelli for Tech Explore