Details:
- We plan on introducing Rubric based metrics in subsequent changes. This change introduces the data model needed that allows agent developer to provide rubrics.
- We also introduce a data model for the config that the eval system has been using for quite some time. It was loosely and informally described as a dictionary of metric names and expected thresholds. In this change, we actually formalize it using a pydantic data model, and extend it allow developers to specify rubrics as a part of their eval config.
What is a rubric based metric?
A rubric based metric is the assessment of a Agent's response (final or intermediate) along some rubric. This evaluation of agent's response significantly differs from the strategy where one has to provide a golden response.
PiperOrigin-RevId: 805488436
These tests verify that `ValueError` is raised when `Runner` is initialized without providing either an `app` instance or both `app_name` and `agent`.
PiperOrigin-RevId: 805427256
Use the A2A Python SDK for client support for A2A Remote clients. This enables A2A based agents that use gRPC or RESTful interfaces, as well as the jsonrpc support. This also simplifies creation of clients and provides simpler mechanisms to inject credentials and observability into the remote agent interactions.
PiperOrigin-RevId: 804711466
Changed default values for `session_service`, `artifact_service`, and `run_config` from instances of mutable classes to `None`. Instances are now created within the function body if the argument is not provided, preventing unexpected shared state across function calls.
PiperOrigin-RevId: 804624564
The system instructions for agent transfer now include a NOTE section that lists all agents available for the `transfer_to_agent` function. This also has the target agents and, if there is one that applies, the parent agent. New unit tests are added to verify the correct generation of this NOTE.
PiperOrigin-RevId: 804569691
Changed default values for `session_service`, `artifact_service`, and `run_config` from instances of mutable classes to `None`. Instances are now created within the function body if the argument is not provided, preventing unexpected shared state across function calls.
PiperOrigin-RevId: 804560641
Merge https://github.com/google/adk-python/pull/1629
close https://github.com/google/adk-python/issues/2170
### Summary
This PR introduces `GkeCodeExecutor`, a new code executor that provides a secure and scalable method for running LLM-generated code by leveraging GKE Sandbox. It serves as a robust alternative to local or standard containerized executors by leveraging the **GKE Sandbox** environment, which uses gVisor for workload isolation.
For each code execution request, it dynamically creates an ephemeral Kubernetes Job with a hardened Pod configuration, offering significant security benefits and ensuring that each code execution runs in a clean, isolated environment.
### Key Features of GkeCodeExecutor
* **Dynamic Job Creation**: Uses the Kubernetes `batch/v1` API to create a new Job for each code snippet.
* **Secure Code Mounting**: Injects code into the Pod via a temporary `ConfigMap`, which is mounted to a read-only file.
* **gVisor Sandboxing**: Enforces execution within a `gvisor` runtime for kernel-level isolation.
* **Hardened Security Context**: Pods run as non-root with all Linux capabilities dropped and a read-only root filesystem.
* **Resource Management**: Applies configurable CPU and memory limits to prevent abuse.
* **Automatic Cleanup**: Uses the `ttl_seconds_after_finished` feature on Jobs for robust, automatic garbage collection of completed Pods and Jobs.
* **Node Scheduling**: The executor uses Kubernetes `tolerations` in its Pod specification. This allows the k8s scheduler to place the execution Pod onto a **_pre-configured_** gVisor-enabled node.
* **Module Integration**: The `GkeCodeExecutor` is registered in the `code_executors/__init__.py`, making it available for use by agents. The `ImportError` handling is configured to check for the required `kubernetes` SDK.
### Execution Flow:
1. Agent invokes `GkeCodeExecutor` with the LLM-generated code.
2. The `GkeCodeExecutor` will `execute_code` – creates a temporary `ConfigMap`, and then create a k8s `Job` to run it.
3. This Job runs a standard `python:3.11-slim` container. The image is pulled once to the node and cached. The Job will mount the ConfigMap as `/app/code.py`
4. The GkeCodeExecutor will monitor the Job to completion, fetch `stdout/stderr` logs from the container, return `CodeExecutionResult` to the LlmAgent, and ensure all temp resources are deleted.
5. The calling agent formats the result and provides a final response to the user. If the result contains error, it will retry up to `error_retry_attempts` times.
PiperOrigin-RevId: 804511467
This includes:
- Test verifying multiple spans are written during E2E runner execution.
- Regression tests for the "ContextVar was created in a different Context" exceptions caused by the interplay of context based instrumentation and async generators getting indeterminately suspended.
PiperOrigin-RevId: 804333483
- Added `tests/unittests/apps/test_apps.py` with basic tests for `App` initialization.
- Modified `tests/unittests/test_runners.py` to include a test that verifies `Runner` raises a `ValueError` when both `app` and `app_name` are provided during initialization.
PiperOrigin-RevId: 803556826
This change introduces type descriptions for the functions which convert between A2A and GenAI `Part`s. It then allows passing instances of those functions to the various A2A-related functions/classes, effectively allowing users to inject their own logic for how part conversion should occur.
The benefit of this pattern is that users can create decorators around the core `Part` conversion logic, which allows them to intercept the cases they care about while delegating the ones they do not to the core converter. This is a pattern we use a lot in the A2A Python SDK.
One example where this type of logic is useful is for extensions: this allows extension logic to, for example, interpret an A2A DataPart into a FunctionResponse using extension-specific logic.
PiperOrigin-RevId: 803186799
The convention:
- If some fields(like plugin) are defined both at root_agent and app, then a error will be raised.
- app code should be located within agent.py.
- an instance named app should be created
PiperOrigin-RevId: 803155804
Before this change, other agent's reply with thought will still be inserted in the outgoing LlmRequest due to the wrong `else` statement for calling all other type of part.
This commit also refactors test_contents.py to be behavior-oriented tests, instead of implementation-oriented, and add more test cases to cover expected scenarios.
The tests are divided into the following files with different focus:
- test_contents.py: covers the basic logic of event filter;
- test_contents_branch.py: covers the behavior related to branch, which takes effect when ParallelAgent is used.
- test_contents_other_agent.py: covers the retelling behavior to include other agents' reply as context for the current agent.
- test_contents_function.py: covers the function_call/function_response rearrangement logic mainly for `LongRunningFunctionTool`.
PiperOrigin-RevId: 802759821
Before this change: `thought` flags was incorrectly removed if the current agent enables BuiltInPlanner.
After this change:
- When it's BuiltInPlanner, keep the thought flag in content history, so that model has full context of its previous thinking.
- When it's PlanReactPlanner, removes the `thought` flag in content history, so that model sees as-is when the content was generated.
PiperOrigin-RevId: 802737130
Merge https://github.com/google/adk-python/pull/2791Fixes#2789
## Summary
Forward `state_delta` from the FastAPI `/run` request to `Runner.run_async(...)`, aligning behavior with the documented
API and the `/run_sse` endpoint.
## Why
The documentation for `/run` explicitly includes:
> `state_delta` (object, optional): A delta of the state to apply before the run.
However, the non‑SSE `/run` handler did not pass this value through, so `Runner.run_async` always received `None`. The
`/run_sse` path already forwarded it correctly.
## Changes
- `src/google/adk/cli/adk_web_server.py`
- Add `state_delta=req.state_delta` to the "/run" handler’s `runner.run_async(...)` call.
- `tests/unittests/cli/test_fast_api.py`
- Add `test_agent_run_passes_state_delta` to test the fix.
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/adk-python/pull/2791 from pguerra-ce:fix-state-delta-missing-in-run 83eec8d28b80757e24ae900285eb59530863adbd
PiperOrigin-RevId: 802703072