Merge https://github.com/google/adk-python/pull/2857
Adds support for invoking Gemma models via the Gemini API endpoint. To support agentic function, callbacks are added which can extract and transform function calls and responses into user and model messages in the history.
This change is intended to allow developers to explore the use of Gemma models for agentic purposes without requiring local deployment of the models. This should ease the burden of experimentation and testing for developers.
A basic "hello world" style agent example is provided to demonstrate proper functioning of Gemma 3 models inside an Agent container, using the dice roll + prime check framework of similar examples for other models.
## Testing
### Testing Plan
- add and run integration and unit tests
- manual run of example `multi_tool_agent` from quickstart using new `Gemma` model
- manual run of `hello_world_gemma` agent
### Automated Test Results:
| Test Command | Results |
|----------------|---------|
| pytest ./tests/unittests | 4386 passed, 2849 warnings in 58.43s |
| pytest ./tests/unittests/models/test_google_llm.py | 100 passed, 4 warnings in 5.83s |
| pytest ./tests/integration/models/test_google_llm.py | 5 passed, 2 warnings in 3.73s |
### Manual Testing
Here is a log of `multi_tool_agent` run with locally-built wheel and using Gemma model.
```
❯ adk run multi_tool_agent
Log setup complete: /var/folders/bg/_133c0ds2kb7cn699cpmmh_h0061bp/T/agents_log/agent.20250904_152617.log
To access latest log: tail -F /var/folders/bg/_133c0ds2kb7cn699cpmmh_h0061bp/T/agents_log/agent.latest.log
/Users/<redacted>/venvs/adk-quickstart/lib/python3.11/site-packages/google/adk/cli/cli.py:143: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
credential_service = InMemoryCredentialService()
/Users/<redacted>/venvs/adk-quickstart/lib/python3.11/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
super().__init__()
Running agent weather_time_agent, type exit to exit.
[user]: what's the weather like today?
[weather_time_agent]: Which city are you asking about?
[user]: new york
[weather_time_agent]: OK. The weather in New York is sunny with a temperature of 25 degrees Celsius (77 degrees Fahrenheit).
```
And here is a snippet of a log generated with DEBUG level logging of the `hello_world_gemma` sample. It demonstrates how function calls are extracted and inserted based on Gemma model interactions:
```
...
2025-09-04 15:32:41,708 - DEBUG - google_llm.py:138 -
LLM Request:
-----------------------------------------------------------
System Instruction:
None
-----------------------------------------------------------
Contents:
{"parts":[{"text":"\n You roll dice and answer questions about the outcome of the dice rolls.\n You can roll dice of different sizes...\n"}],"role":"user"}
{"parts":[{"text":"Hi, introduce yourself."}],"role":"user"}
{"parts":[{"text":"Hello! I am data_processing_agent, a hello world agent that can roll many-sided dice and check if numbers are prime. I'm ready to assist you with those tasks. Let's begin!\n\n\n\n"}],"role":"model"}
{"parts":[{"text":"Roll a die with 100 sides and check if it is prime"}],"role":"user"}
{"parts":[{"text":"{\"args\":{\"sides\":100},\"name\":\"roll_die\"}"}],"role":"model"}
{"parts":[{"text":"Invoking tool `roll_die` produced: `{\"result\": 82}`."}],"role":"user"}
{"parts":[{"text":"{\"args\":{\"nums\":[82]},\"name\":\"check_prime\"}"}],"role":"model"}
{"parts":[{"text":"Invoking tool `check_prime` produced: `{\"result\": \"No prime numbers found.\"}`."}],"role":"user"}
{"parts":[{"text":"The die roll was 82, and it is not a prime number.\n\n\n\n"}],"role":"model"}
{"parts":[{"text":"Roll it again."}],"role":"user"}
-----------------------------------------------------------
Functions:
-----------------------------------------------------------
2025-09-04 15:32:41,708 - INFO - models.py:8165 - AFC is enabled with max remote calls: 10.
2025-09-04 15:32:42,693 - INFO - google_llm.py:180 - Response received from the model.
2025-09-04 15:32:42,693 - DEBUG - google_llm.py:181 -
LLM Response:
-----------------------------------------------------------
Text:
{"args":{"sides":100},"name":"roll_die"}
-----------------------------------------------------------
...
```
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/adk-python/pull/2857 from douglas-reid:add-gemma-via-api e6d015f6a9ccbcf20ef7a7af8e4bbe1e9a5936b6
PiperOrigin-RevId: 816451001
We updated the one of the public methods on AgentEvaluator to take in eval metric configurations using a more formal EvalConfig data model.
We also mark "criteria" field on the method as deprecated.
Updated some integration test cases.
PiperOrigin-RevId: 814314134
We add a new metric for evaluating safety of Agent's response to ADK Eval. We delegate the actual implementation to Vertex Gen AI Eval SDK, so using this metric will require GCP project.
As a part of this change, we created (refactored) a simple Facade for vertex gen ai eval sdk.
PiperOrigin-RevId: 778580406
Additionally, few other small changes.
* Updated a test fixture to support the latest eval data schema. Somehow I missed doing that previously.
* Updated the `evaluation_generator.py` to use `run_async`, instead of `run`.
* Also, raise an informed error when dependencies required eval are not installed.
* Also, changed the behavior of AgentEvaluator.evaluate method to run all the evals, instead of failing at the first eval metric failure.
PiperOrigin-RevId: 775919127