Openai has just presented Operator, an experimental digital agent fueled by a new model called Computer-USing Agent (CUA). Although the concept is revolutionary, allowing AI to perform tasks on the web by interacting with graphic interfaces as humans do, it is clear that it technology There is still a long way to go before being able to count on him for complex tasks of the real world.
What is Operator and how does it work?
Basically, Operator is an AI capable of interpreting visual signals on a screen, such as pimples, menus and text fields, and to use these signals to perform tasks. Powered by CUA, the model combines GPT-4O’s vision capacities with the reasoning learned thanks to strengthening learning. This allows it to navigate digital environments without using APIs specific to the operating system or the web. In theory, this means that Optor could manage tasks on various platforms with minimal human intervention.
Even if it may seem impressive, the real performance of the model leaves much to be desired. CUA is designed to divide stages tasks and adapt when it encounters obstacles. However, this process is still at its beginnings, with frequent errors and failed along the way.
Mixed results and low success rates
During the tests, CUA reached a success rate of 38.1 % on OSWORLD, which simulates complete use of the computer. For web -based tasks, the figures were slightly better but still not impressive: 58.1 % on Webarena and 87 % on webvoyer. Although these figures may seem encouraging, they are far from the type of reliability necessary for an AI system to be really useful in daily tasks.
Essentially, even if the CUA can perform tasks, it often encounters difficulties, which highlights the limits of current AI models when it comes to performing real actions in several stages without human intervention.
Safety problems and limited availability
One of Operator’s most worrying aspects is its access to the web. Allowing an AI to navigate, click and interact with various online platforms presents significant risks in terms of security and ethics. OPENAI has clearly indicated that security is an absolute priority, but with this type of technology, it is difficult not to worry about the involuntary consequences to allow an AI agent to freely access the digital spaces. Errors or improper use could lead to serious problems, ranging from confidentiality of data to involuntary actions.
In order to respond to these concerns, Openai deploys Operator slowly, initially offering it to pro tier users in the United States. This cautious approach allows the company to collect user comments and refine security features. But even with this limited deployment, the risks linked to the authorization of an AI agent of unlimited access to the web cannot be neglected.
The road to follow
Even if Operator is an interesting advance in the AI landscape, it is clear that technology is still far from perfect. Despite all its potential, it fights in terms of reliability, precision and consistency. Given significant differences in its performance, it is difficult to imagine how this technology could be used in critical applications in the near future.
In addition, even if CUA’s ability to understand and interact with graphic interfaces is a major advance, the reality of having an AI system which requires constant adjustment and supervision in fact less a digital assistant than research project at this stage.