Co-authored-by: George Weale <gweale@google.com> PiperOrigin-RevId: 858763407
Computer Use Agent
This directory contains a computer use agent that can operate a browser to complete user tasks. The agent uses Playwright to control a Chromium browser and can interact with web pages by taking screenshots, clicking, typing, and navigating.
This agent is to demo the usage of ComputerUseToolset.
Overview
The computer use agent consists of:
agent.py: Main agent configuration using Google's gemini-2.5-computer-use-preview-10-2025 modelplaywright.py: Playwright-based computer implementation for browser automationrequirements.txt: Python dependencies
Setup
1. Install Python Dependencies
Install the required Python packages from the requirements file:
uv pip install -r contributing/samples/computer_use/requirements.txt
2. Install Playwright Dependencies
Install Playwright's system dependencies for Chromium:
playwright install-deps chromium
3. Install Chromium Browser
Install the Chromium browser for Playwright:
playwright install chromium
Usage
Running the Agent
To start the computer use agent, run the following command from the project root:
adk web contributing/samples
This will start the ADK web interface where you can interact with the computer_use agent.
Example Queries
Once the agent is running, you can send queries like:
find me a flight from SF to Hawaii on next Monday, coming back on next Friday. start by navigating directly to flights.google.com
The agent will:
- Open a browser window
- Navigate to the specified website
- Interact with the page elements to complete your task
- Provide updates on its progress
Other Example Tasks
- Book hotel reservations
- Search for products online
- Fill out forms
- Navigate complex websites
- Research information across multiple pages
Technical Details
- Model: Uses Google's
gemini-2.5-computer-use-preview-10-2025model for computer use capabilities - Browser: Automated Chromium browser via Playwright
- Screen Size: Configured for 600x800 resolution
- Tools: Uses ComputerUseToolset for screen capture, clicking, typing, and scrolling
Troubleshooting
If you encounter issues:
- Playwright not found: Make sure you've run both
playwright install-deps chromiumandplaywright install chromium - Dependencies missing: Verify all packages from
requirements.txtare installed - Browser crashes: Check that your system supports Chromium and has sufficient resources
- Permission errors: Ensure your user has permission to run browser automation tools
Notes
- The agent operates in a controlled browser environment
- Screenshots are taken to help the agent understand the current state
- The agent will provide updates on its actions as it works
- Be patient as complex tasks may take some time to complete