At first of this 12 months we mentioned Which means it gives laptop utilization capabilities to builders via the Gemini API. At this time, Gemini 2.5 computer usage modela brand new specialised mannequin constructed on Gemini 2.5 Professional’s visible understanding and reasoning capabilities to energy brokers that may work together with person interfaces (UIs). Outperforms main options on a number of net and cell management benchmarks, all with decrease latency. Builders can entry these options via the Gemini API. Google AI Studio and Vertex AI.
Though AI fashions can work together with software program via structured APIs, many digital duties nonetheless require direct interplay with graphical person interfaces, similar to filling out and submitting varieties. To finish these duties, brokers should work together with net pages and functions like people by clicking, typing, and scrolling. The power to natively fill out varieties, work together with interactive components like dropdowns and filters, and function behind a login is a crucial subsequent step in constructing highly effective general-purpose brokers.
construction
The core performance of the mannequin is uncovered via the brand new `computer_use` device within the Gemini API and have to be manipulated inside a loop. Inputs to the device are person requests, screenshots of the surroundings, and a historical past of latest actions. The enter additionally permits you to specify whether or not to exclude features. Complete list of supported UI actions Or specify extra customized features to incorporate.

