Files
adk-python/contributing/samples/computer_use
Josh Soref 59d422ca21 chore: Fix spelling in contributing
Merge https://github.com/google/adk-python/pull/3394

This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling)

Note: while I use tooling to identify errors, the tooling doesn't _actually_ provide the corrections, I'm picking them on my own. I'm a human, and I may make mistakes.

### Testing Plan

The misspellings have been reported at https://github.com/jsoref/adk-python/actions/runs/19056081305/attempts/1#summary-54426435973

The action reports that the changes in this PR would make it happy: https://github.com/jsoref/adk-python/actions/runs/19056081446/attempts/1#summary-54426436321

**Unit Tests:**

- [ ] I have added or updated unit tests for my change.
- [ ] All unit tests pass locally.

_Please include a summary of passed `pytest` results._

**Manual End-to-End (E2E) Tests:**

_Please provide instructions on how to manually test your changes, including any
necessary setup or configuration. Please provide logs or screenshots to help
reviewers better understand the fix._

### Checklist

- [x] I have read the [CONTRIBUTING.md](https://github.com/google/adk-python/blob/main/CONTRIBUTING.md) document.
- [x] I have performed a self-review of my own code.
- [ ] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have added tests that prove my fix is effective or that my feature works.
- [ ] New and existing unit tests pass locally with my changes.
- [ ] I have manually tested my changes end-to-end.
- [ ] Any dependent changes have been merged and published in downstream modules.

### Additional context

- https://github.com/google/adk-python/pull/3382#issuecomment-3488654110

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/adk-python/pull/3394 from jsoref:spelling-contributing c3d5e342c4350f7cae9f8f0c6638b176f2e30e80
PiperOrigin-RevId: 828659867
2025-11-05 15:43:25 -08:00
..
2025-10-16 14:59:27 -07:00

Computer Use Agent

This directory contains a computer use agent that can operate a browser to complete user tasks. The agent uses Playwright to control a Chromium browser and can interact with web pages by taking screenshots, clicking, typing, and navigating.

This agent is to demo the usage of ComputerUseToolset.

Overview

The computer use agent consists of:

  • agent.py: Main agent configuration using Google's gemini-2.5-computer-use-preview-10-2025 model
  • playwright.py: Playwright-based computer implementation for browser automation
  • requirements.txt: Python dependencies

Setup

1. Install Python Dependencies

Install the required Python packages from the requirements file:

uv pip install -r internal/samples/computer_use/requirements.txt

2. Install Playwright Dependencies

Install Playwright's system dependencies for Chromium:

playwright install-deps chromium

3. Install Chromium Browser

Install the Chromium browser for Playwright:

playwright install chromium

Usage

Running the Agent

To start the computer use agent, run the following command from the project root:

adk web internal/samples

This will start the ADK web interface where you can interact with the computer_use agent.

Example Queries

Once the agent is running, you can send queries like:

find me a flight from SF to Hawaii on next Monday, coming back on next Friday. start by navigating directly to flights.google.com

The agent will:

  1. Open a browser window
  2. Navigate to the specified website
  3. Interact with the page elements to complete your task
  4. Provide updates on its progress

Other Example Tasks

  • Book hotel reservations
  • Search for products online
  • Fill out forms
  • Navigate complex websites
  • Research information across multiple pages

Technical Details

  • Model: Uses Google's gemini-2.5-computer-use-preview-10-2025 model for computer use capabilities
  • Browser: Automated Chromium browser via Playwright
  • Screen Size: Configured for 600x800 resolution
  • Tools: Uses ComputerUseToolset for screen capture, clicking, typing, and scrolling

Troubleshooting

If you encounter issues:

  1. Playwright not found: Make sure you've run both playwright install-deps chromium and playwright install chromium
  2. Dependencies missing: Verify all packages from requirements.txt are installed
  3. Browser crashes: Check that your system supports Chromium and has sufficient resources
  4. Permission errors: Ensure your user has permission to run browser automation tools

Notes

  • The agent operates in a controlled browser environment
  • Screenshots are taken to help the agent understand the current state
  • The agent will provide updates on its actions as it works
  • Be patient as complex tasks may take some time to complete