fix: Optimize Stale Agent with GraphQL and Search API to resolve 429 Quota errors

Merge https://github.com/google/adk-python/pull/3700

### Description
This PR refactors the `adk_stale_agent` to address `429 RESOURCE_EXHAUSTED` errors encountered during workflow execution. The previous implementation was inefficient in fetching issue history (using pagination over the REST API) and lacked server-side filtering, causing excessive API calls and huge token consumption that breached Gemini API quotas.

The new implementation switches to a **GraphQL-first approach**, implements server-side filtering via the Search API, adds robust concurrency controls, and significantly improves code maintainability through modular refactoring.

### Root Cause of Failure
The previous workflow failed with the following error due to passing too much context to the LLM and processing too many irrelevant issues:
```text
google.genai.errors.ClientError: 429 RESOURCE_EXHAUSTED.
Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_paid_tier_input_token_count
```
### Key Changes

#### 1. Optimization: REST → GraphQL (`agent.py`)
*   **Old:** Fetched issue comments and timeline events using multiple paginated REST API calls (`/timeline`).
*   **New:** Implemented `get_issue_state` using a single **GraphQL** query. This fetches comments, `userContentEdits`, and specific timeline events (Labels, Renames) in one network request.
*   **Refactoring:** The complex analysis logic has been decomposed into focused helper functions (_fetch_graphql_data, _build_history_timeline, _replay_history_to_find_state) for better readability and testing.
*   **Configurable:** Added GRAPHQL_COMMENT_LIMIT and GRAPHQL_TIMELINE_LIMIT settings to tune context depth
*   **Impact:** Drastically reduces the data payload size and eliminates multiple API round-trips, significantly lowering the token count sent to the LLM.

#### 2. Optimization: Server-Side Filtering (`utils.py`)
*   **Old:** Fetched *all* open issues via REST and filtered them in Python memory.
*   **New:** Uses the GitHub Search API (`get_old_open_issue_numbers`) with `created:<DATE` syntax.
*   **Impact:** Only fetches issue numbers that actually meet the age threshold, preventing the agent from wasting cycles and tokens on brand-new issues.

#### 3. Concurrency & Rate Limiting (`main.py` & `settings.py`)
*   **Old:** Sequential execution loop.
*   **New:** Implemented `asyncio.gather` with a configurable `CONCURRENCY_LIMIT` (set to 3).
*   **New:** Added `urllib3` retry strategies (exponential backoff) in `utils.py` to handle GitHub API rate limits (HTTP 429) gracefully.

#### 4. Logic Improvements ("Ghost Edits")
*   **New Feature:** The agent now detects "Ghost Edits" (where an author updates the issue description without posting a new comment).
*   **Action:** If a silent edit is detected on a stale candidate, the agent now alerts maintainers instead of marking it stale, preventing false positives.

### File Comparison Summary

| File | Change |
| :--- | :--- |
| `main.py` | Switched from `InMemoryRunner` loop to `asyncio` chunked processing. Added execution timing and API usage logging. |
| `agent.py` | Replaced REST logic with GraphQL query. Added logic to handle silent body edits. Decomposed giant get_issue_state into helper functions with docstrings. Added _format_days helper. |
| `utils.py` | Added `HTTPAdapter` with Retries. Added `get_old_open_issue_numbers` using Search API. |
| `settings.py` | Removed `ISSUES_PER_RUN`; added configuration for CONCURRENCY_LIMIT, SLEEP_BETWEEN_CHUNKS, and GraphQL limits. |
| `PROMPT_INSTRUCTIONS.txt` | Simplified decision tree; removed date calculation responsibility from LLM. |

### Verification
The new logic minimizes token usage by offloading date calculations to Python and strictly limiting the context passed to the LLM to semantic intent analysis (e.g., "Is this a question?").

*   **Metric Check:** The workflow now tracks API calls per issue to ensure we stay within limits.
*   **Safety:** Silent edits by users now correctly reset the "Stale" timer.
*   **Maintainability:** All complex logic is now isolated in typed helper functions with comprehensive docstrings.

Co-authored-by: Xuan Yang <xygoogle@google.com>
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/adk-python/pull/3700 from ryanaiagent:feat/improve-stale-agent 888064eff125ae74f7c3a9ad6c74f98de80243a2
PiperOrigin-RevId: 838885530
This commit is contained in:
Rohit Yanamadala
2025-12-01 12:25:22 -08:00
committed by Copybara-Service
parent 2a1a41d3ec
commit cb19d0714c
7 changed files with 944 additions and 407 deletions
+6 -20
View File
@@ -1,57 +1,43 @@
# .github/workflows/stale-issue-auditor.yml
# Best Practice: Always have a 'name' field at the top.
name: ADK Stale Issue Auditor
# The 'on' block defines the triggers.
on:
# The 'workflow_dispatch' trigger allows manual runs.
workflow_dispatch:
# The 'schedule' trigger runs the bot on a timer.
schedule:
# This runs at 6:00 AM UTC (e.g., 10 PM PST).
# This runs at 6:00 AM UTC (10 PM PST)
- cron: '0 6 * * *'
# The 'jobs' block contains the work to be done.
jobs:
# A unique ID for the job.
audit-stale-issues:
# The runner environment.
runs-on: ubuntu-latest
timeout-minutes: 60
# Permissions for the job's temporary GITHUB_TOKEN.
# These are standard and syntactically correct.
permissions:
issues: write
contents: read
# The sequence of steps for the job.
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v5
- name: Set up Python
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: '3.11'
- name: Install dependencies
# The '|' character allows for multi-line shell commands.
run: |
python -m pip install --upgrade pip
pip install requests google-adk
- name: Run Auditor Agent Script
# The 'env' block for setting environment variables.
env:
GITHUB_TOKEN: ${{ secrets.ADK_TRIAGE_AGENT }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
OWNER: google
OWNER: ${{ github.repository_owner }}
REPO: adk-python
ISSUES_PER_RUN: 100
CONCURRENCY_LIMIT: 3
LLM_MODEL_NAME: "gemini-2.5-flash"
PYTHONPATH: contributing/samples
# The final 'run' command.
run: python -m adk_stale_agent.main