コンテンツにスキップ

LT Token Keeper Runbook

CRITICAL: lt-token-keeper runs in parallel with Python aegis-token-keeper-live during the G4 migration observation phase. The two keepers write to DIFFERENT token cache files to avoid race conditions. Do NOT remove the AEGIS_SAXO_TOKEN_CACHE_PATH override in docker-compose.lt-token-keeper.yml until the G4 Day 4 cutover.

What It Is

lt-token-keeper is the Rust-native Saxo OAuth2 refresh daemon, replacement for Python scripts/token_keeper.py running in the aegis-token-keeper-live container. It is the G4 phase of the Python → Rust trading infrastructure migration.

The daemon loop: 1. Load the token cache file via auth::load_token_cache 2. Classify the token freshness (AccessValid / NeedsRefresh / Expired24hToken) 3. If NeedsRefresh, call auth::refresh_token with the Saxo OAuth endpoint and save the new token via auth::save_token_cache (atomic tmp + rename) 4. Sleep interval_secs (default 900s from YAML) 5. Repeat until SIGTERM or SIGINT

On errors: - invalid_grant: log ERROR "Manual re-authorization required" and continue - Any other refresh error: log ERROR and continue - Never exits except on signal

Prerequisites

Central Synology .env at /volume1/aegis/.env MUST provide:

  • SAXOBANK_APP_KEY_LIVE — OAuth client ID (live environment)
  • SAXOBANK_APP_SECRET_LIVE — OAuth client secret (live environment)

The daemon does NOT need: - SAXOBANK_ACCESS_TOKEN_LIVE — the refresh_token from the cache file is used instead - SAXOBANK_ACCOUNT_*_LIVE — refresh does not require account context

Paths that MUST exist on Synology:

  • /volume1/aegis/.env (must contain the two OAuth vars above)
  • /volume1/aegis/tokens/saxobank_tokens_live.json (canonical cache, managed by Python aegis-token-keeper-live during observation)
  • /volume1/aegis/tokens/saxobank_tokens_live_rust.json (parallel observation cache, bootstrapped by the deploy workflow on first run)
  • /volume1/aegis/repo/aegis_v3/configs/lt_token_keeper.yaml (daemon config)

How To Start

Preferred: trigger the dedicated workflow.

gh workflow run deploy-lt-token-keeper.yml

Manual (operator on Synology, emergency only):

cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
    --env-file /volume1/aegis/.env \
    -f docker-compose.lt-token-keeper.yml \
    up -d --build aegis-lt-token-keeper

How To Stop

cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
    -f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper

Delete the container entirely:

cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
    -f docker-compose.lt-token-keeper.yml rm -f aegis-lt-token-keeper

How To Verify It Is Working

Container status (read-only SSH OK):

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker ps --format 'table {{.Names}}\t{{.Status}}' | grep aegis-lt-token-keeper"

Recent logs:

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker logs aegis-lt-token-keeper --tail 100 2>&1"

Token cache mtime progression (should update every interval_secs = 900s by default):

ssh fukutani.ryo@192.168.42.252 "stat -c '%Y %n' /volume1/aegis/tokens/saxobank_tokens_live_rust.json"
# Compare with previous value — should increase by ~interval_secs each observation

Token expiry inspection:

ssh fukutani.ryo@192.168.42.252 "cat /volume1/aegis/tokens/saxobank_tokens_live_rust.json | python3 -c 'import json,sys;d=json.load(sys.stdin);print(f\"issued_at={d.get(\\\"issued_at\\\")} expires_in={d.get(\\\"expires_in\\\")}\")'"

Parallel Observation (G4 Day 2-3)

During parallel observation, BOTH keepers run simultaneously: - Python aegis-token-keeper-live writes /volume1/aegis/tokens/saxobank_tokens_live.json (canonical) - Rust aegis-lt-token-keeper writes /volume1/aegis/tokens/saxobank_tokens_live_rust.json (parallel)

Other Rust containers (aegis-lt-rust, aegis-lt-scan-cycle, future aegis-lt-wft) continue to read the canonical Python-managed cache. The Rust keeper is NOT yet the source of truth.

Observation acceptance criteria

After 24h parallel running, both keepers should exhibit: - saxobank_tokens_live.json mtime updates every ~300s (Python, 5min interval) - saxobank_tokens_live_rust.json mtime updates every ~900s (Rust, 15min interval) - Both files have valid JSON with non-empty access_token - No invalid_grant errors in either container log - issued_at timestamps monotonically increase on both files

Cutover Path (G4 Day 4, NOT active yet)

Switching from parallel observation to Rust-as-source-of-truth requires:

  1. Pre-cutover verification
  2. 24h clean parallel observation (see criteria above)
  3. 0 401 errors in lt-scan-cycle logs attributable to stale token
  4. WORK_LOG entry with explicit cutover approval

  5. Cutover sequence (hard cutover, not gradual)

    # a. Stop Python keeper FIRST
    ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker stop aegis-token-keeper-live"
    
    # b. Wait for Python's last write to flush (atomic rename is immediate but
    #    any in-flight refresh call may take 5-10s)
    sleep 15
    
    # c. Update docker-compose.lt-token-keeper.yml to REMOVE the override
    #    (commit the change to main first, then trigger deploy workflow)
    #    Edit: delete the line
    #      - AEGIS_SAXO_TOKEN_CACHE_PATH=/data/tokens/saxobank_tokens_live_rust.json
    #    Commit + push + wait for GHA deploy to succeed
    
    # d. After cutover deploy, the Rust keeper writes the canonical
    #    /data/tokens/saxobank_tokens_live.json
    # (container_token_cache_path fallback triggers when no override is set)
    
    # e. Delete Python token keeper container image to prevent accidental restart
    ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker rm aegis-token-keeper-live"
    

  6. Post-cutover monitoring (24h on-call)

  7. Watch lt-scan-cycle for 401 errors
  8. Watch saxobank_tokens_live.json mtime updates (should now come from Rust every 900s)
  9. If any issue: rollback per below

Rollback

During parallel observation

Safe rollback — Rust keeper can be stopped independently without affecting Python or any downstream container:

ssh fukutani.ryo@192.168.42.252 \
  "cd /volume1/aegis/repo/aegis_v3/lt-rust-docker && \
   sudo /usr/local/bin/docker compose \
     -f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper"

After cutover

Post-cutover rollback requires restarting Python keeper:

# a. Stop Rust keeper
ssh fukutani.ryo@192.168.42.252 \
  "cd /volume1/aegis/repo/aegis_v3/lt-rust-docker && \
   sudo /usr/local/bin/docker compose \
     -f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper"

# b. Start Python keeper back up (container image must still be present)
ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker start aegis-token-keeper-live"

# c. Verify Python keeper resumed refreshing the canonical cache
ssh fukutani.ryo@192.168.42.252 \
  "sudo /usr/local/bin/docker logs aegis-token-keeper-live --tail 20 2>&1 | grep -i 'token valid\\|refreshed'"

Total rollback time: < 1 分 (if Python image is still present).

Troubleshooting

"Token cache missing" log at startup

  • Expected on first deploy BEFORE the bootstrap step runs
  • GHA workflow handles this by copying the canonical file once
  • If seen on a subsequent deploy, check the bootstrap step logs

invalid_grant error

  • Means the refresh_token has been revoked at Saxo
  • Requires manual re-authorization via the Saxo OAuth flow
  • Daemon continues running (does not exit) so operator has time to act

Rust keeper refresh errors but Python is fine

  • Check SAXOBANK_APP_KEY_LIVE / SAXOBANK_APP_SECRET_LIVE in /volume1/aegis/.env
  • Rust uses the same env vars as Python so they should match
  • If Rust fails and Python succeeds, the issue is likely container-local (env not loaded, typo)

mtime not updating

  • Check container is running: docker ps | grep aegis-lt-token-keeper
  • Check logs for refresh failures
  • Check YAML interval_secs hasn't been set to an unreasonable value