fix: Optimize Stale Agent with GraphQL and Search API to resolve 429 Quota errors

Merge https://github.com/google/adk-python/pull/3700 ### Description This PR refactors the `adk_stale_agent` to address `429 RESOURCE_EXHAUSTED` errors encountered during workflow execution. The previous implementation was inefficient in fetching issue history (using pagination over the REST API) and lacked server-side filtering, causing excessive API calls and huge token consumption that breached Gemini API quotas. The new implementation switches to a **GraphQL-first approach**, implements server-side filtering via the Search API, adds robust concurrency controls, and significantly improves code maintainability through modular refactoring. ### Root Cause of Failure The previous workflow failed with the following error due to passing too much context to the LLM and processing too many irrelevant issues: ```text google.genai.errors.ClientError: 429 RESOURCE_EXHAUSTED. Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_paid_tier_input_token_count ``` ### Key Changes #### 1. Optimization: REST → GraphQL (`agent.py`) * **Old:** Fetched issue comments and timeline events using multiple paginated REST API calls (`/timeline`). * **New:** Implemented `get_issue_state` using a single **GraphQL** query. This fetches comments, `userContentEdits`, and specific timeline events (Labels, Renames) in one network request. * **Refactoring:** The complex analysis logic has been decomposed into focused helper functions (_fetch_graphql_data, _build_history_timeline, _replay_history_to_find_state) for better readability and testing. * **Configurable:** Added GRAPHQL_COMMENT_LIMIT and GRAPHQL_TIMELINE_LIMIT settings to tune context depth * **Impact:** Drastically reduces the data payload size and eliminates multiple API round-trips, significantly lowering the token count sent to the LLM. #### 2. Optimization: Server-Side Filtering (`utils.py`) * **Old:** Fetched *all* open issues via REST and filtered them in Python memory. * **New:** Uses the GitHub Search API (`get_old_open_issue_numbers`) with `created:<DATE` syntax. * **Impact:** Only fetches issue numbers that actually meet the age threshold, preventing the agent from wasting cycles and tokens on brand-new issues. #### 3. Concurrency & Rate Limiting (`main.py` & `settings.py`) * **Old:** Sequential execution loop. * **New:** Implemented `asyncio.gather` with a configurable `CONCURRENCY_LIMIT` (set to 3). * **New:** Added `urllib3` retry strategies (exponential backoff) in `utils.py` to handle GitHub API rate limits (HTTP 429) gracefully. #### 4. Logic Improvements ("Ghost Edits") * **New Feature:** The agent now detects "Ghost Edits" (where an author updates the issue description without posting a new comment). * **Action:** If a silent edit is detected on a stale candidate, the agent now alerts maintainers instead of marking it stale, preventing false positives. ### File Comparison Summary | File | Change | | :--- | :--- | | `main.py` | Switched from `InMemoryRunner` loop to `asyncio` chunked processing. Added execution timing and API usage logging. | | `agent.py` | Replaced REST logic with GraphQL query. Added logic to handle silent body edits. Decomposed giant get_issue_state into helper functions with docstrings. Added _format_days helper. | | `utils.py` | Added `HTTPAdapter` with Retries. Added `get_old_open_issue_numbers` using Search API. | | `settings.py` | Removed `ISSUES_PER_RUN`; added configuration for CONCURRENCY_LIMIT, SLEEP_BETWEEN_CHUNKS, and GraphQL limits. | | `PROMPT_INSTRUCTIONS.txt` | Simplified decision tree; removed date calculation responsibility from LLM. | ### Verification The new logic minimizes token usage by offloading date calculations to Python and strictly limiting the context passed to the LLM to semantic intent analysis (e.g., "Is this a question?"). * **Metric Check:** The workflow now tracks API calls per issue to ensure we stay within limits. * **Safety:** Silent edits by users now correctly reset the "Stale" timer. * **Maintainability:** All complex logic is now isolated in typed helper functions with comprehensive docstrings. Co-authored-by: Xuan Yang <xygoogle@google.com> COPYBARA_INTEGRATE_REVIEW=https://github.com/google/adk-python/pull/3700 from ryanaiagent:feat/improve-stale-agent 888064eff125ae74f7c3a9ad6c74f98de80243a2 PiperOrigin-RevId: 838885530
2026-03-30 10:57:20 -07:00 · 2025-12-01 12:25:22 -08:00
parent 2a1a41d3ec
commit cb19d0714c
7 changed files with 944 additions and 407 deletions
@@ -1,57 +1,43 @@
-# .github/workflows/stale-issue-auditor.yml
-
-# Best Practice: Always have a 'name' field at the top.
 name: ADK Stale Issue Auditor

-# The 'on' block defines the triggers.
 on:
-  # The 'workflow_dispatch' trigger allows manual runs.
  workflow_dispatch:

-  # The 'schedule' trigger runs the bot on a timer.
  schedule:
-    # This runs at 6:00 AM UTC (e.g., 10 PM PST).
+    # This runs at 6:00 AM UTC (10 PM PST)
    - cron: '0 6 * * *'

-# The 'jobs' block contains the work to be done.
 jobs:
-  # A unique ID for the job.
  audit-stale-issues:
-    # The runner environment.
    runs-on: ubuntu-latest
+    timeout-minutes: 60

-    # Permissions for the job's temporary GITHUB_TOKEN.
-    # These are standard and syntactically correct.
    permissions:
      issues: write
      contents: read

-    # The sequence of steps for the job.
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5

      - name: Set up Python
-        uses: actions/setup-python@v5
+        uses: actions/setup-python@v6
        with:
          python-version: '3.11'

      - name: Install dependencies
-        # The '|' character allows for multi-line shell commands.
        run: |
          python -m pip install --upgrade pip
          pip install requests google-adk

      - name: Run Auditor Agent Script
-        # The 'env' block for setting environment variables.
        env:
          GITHUB_TOKEN: ${{ secrets.ADK_TRIAGE_AGENT }}
          GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
-          OWNER: google
+          OWNER: ${{ github.repository_owner }}
          REPO: adk-python
-          ISSUES_PER_RUN: 100
+          CONCURRENCY_LIMIT: 3
          LLM_MODEL_NAME: "gemini-2.5-flash"
          PYTHONPATH: contributing/samples

-        # The final 'run' command.
        run: python -m adk_stale_agent.main