Recently, it was reported that Robotics at Google, Technical University of Berlin and Google Research team jointly launched the largest visual language model PaLM-E, with the final parameters as high as 562 billion. It is understood that this model has the ability to understand images, understand generation languages and handle complex machine instructions.
In this regard, Google said that the model also has an environmentally adaptive response and has the ability to face possible unexpected situations. It is reported that the model is robust to interference because it is integrated in a control loop.
It is reported that this model is a combination of PaLM-540B language model and Vit-22B visual Transformer model, and its core is its powerful language processing ability. The highlight is that the model can use visual data to enhance its own language processing ability after acquiring and processing visual data. For example, the corresponding traffic rules can be solved by pictures of traffic signs, the production process can be understood by pictures of ingredients, or the robot can be guided to complete relatively complicated actions by inputting instructions.
It is understood that PaLM-E has another outstanding advantage, that is, it has strong positive migration ability. In the relevant test results released by Google, the researchers believe that PaLM-E has the ability of self-learning, so it can perform planning and cross-length tasks on different entities. For example, after the model guides the robot to complete the "color block by color", it can further guide the robot to push the green color block to the ornaments that have never been seen before.
Some people think that although the guidance given by PaLM-E to robots does not seem very complicated at present, with the change of data training, it will give robots more thinking ability, and it is expected to be able to plan and execute the commands issued by humans more reasonably in the future, and make great breakthroughs in industrial application and design.
It is understood that in the artificial intelligence track, Microsoft previously published a similar case mentioned in the above research in February this year, that is, through the program written by ChatGPT to guide drones how to find drinks.
[The picture in this article comes from the network]