LT Token Keeper Runbook¶

CRITICAL: lt-token-keeper runs in parallel with Python aegis-token-keeper-live during the G4 migration observation phase. The two keepers write to DIFFERENT token cache files to avoid race conditions. Do NOT remove the AEGIS_SAXO_TOKEN_CACHE_PATH override in docker-compose.lt-token-keeper.yml until the G4 Day 4 cutover.

What It Is¶

lt-token-keeper is the Rust-native Saxo OAuth2 refresh daemon, replacement for Python scripts/token_keeper.py running in the aegis-token-keeper-live container. It is the G4 phase of the Python → Rust trading infrastructure migration.

The daemon loop: 1. Load the token cache file via auth::load_token_cache 2. Classify the token freshness (AccessValid / NeedsRefresh / Expired24hToken) 3. If NeedsRefresh, call auth::refresh_token with the Saxo OAuth endpoint and save the new token via auth::save_token_cache (atomic tmp + rename) 4. Sleep interval_secs (default 900s from YAML) 5. Repeat until SIGTERM or SIGINT

On errors: - invalid_grant: log ERROR "Manual re-authorization required" and continue - Any other refresh error: log ERROR and continue - Never exits except on signal

Prerequisites¶

Central Synology .env at /volume1/aegis/.env MUST provide:

SAXOBANK_APP_KEY_LIVE — OAuth client ID (live environment)
SAXOBANK_APP_SECRET_LIVE — OAuth client secret (live environment)

The daemon does NOT need: - SAXOBANK_ACCESS_TOKEN_LIVE — the refresh_token from the cache file is used instead - SAXOBANK_ACCOUNT_*_LIVE — refresh does not require account context

Paths that MUST exist on Synology:

/volume1/aegis/.env (must contain the two OAuth vars above)
/volume1/aegis/tokens/saxobank_tokens_live.json (canonical cache, managed by Python aegis-token-keeper-live during observation)
/volume1/aegis/tokens/saxobank_tokens_live_rust.json (parallel observation cache, bootstrapped by the deploy workflow on first run)
/volume1/aegis/repo/aegis_v3/configs/lt_token_keeper.yaml (daemon config)

How To Start¶

Preferred: trigger the dedicated workflow.

gh workflow run deploy-lt-token-keeper.yml

Manual (operator on Synology, emergency only):

cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
    --env-file /volume1/aegis/.env \
    -f docker-compose.lt-token-keeper.yml \
    up -d --build aegis-lt-token-keeper

How To Stop¶

cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
    -f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper

Delete the container entirely:

cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
    -f docker-compose.lt-token-keeper.yml rm -f aegis-lt-token-keeper

How To Verify It Is Working¶

Container status (read-only SSH OK):

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker ps --format 'table {{.Names}}\t{{.Status}}' | grep aegis-lt-token-keeper"

Recent logs:

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker logs aegis-lt-token-keeper --tail 100 2>&1"

Token cache mtime progression (should update every interval_secs = 900s by default):

ssh fukutani.ryo@192.168.42.252 "stat -c '%Y %n' /volume1/aegis/tokens/saxobank_tokens_live_rust.json"
# Compare with previous value — should increase by ~interval_secs each observation

Token expiry inspection:

ssh fukutani.ryo@192.168.42.252 "cat /volume1/aegis/tokens/saxobank_tokens_live_rust.json | python3 -c 'import json,sys;d=json.load(sys.stdin);print(f\"issued_at={d.get(\\\"issued_at\\\")} expires_in={d.get(\\\"expires_in\\\")}\")'"

Parallel Observation (G4 Day 2-3)¶

During parallel observation, BOTH keepers run simultaneously: - Python aegis-token-keeper-live writes /volume1/aegis/tokens/saxobank_tokens_live.json (canonical) - Rust aegis-lt-token-keeper writes /volume1/aegis/tokens/saxobank_tokens_live_rust.json (parallel)

Other Rust containers (aegis-lt-rust, aegis-lt-scan-cycle, future aegis-lt-wft) continue to read the canonical Python-managed cache. The Rust keeper is NOT yet the source of truth.

Observation acceptance criteria¶

After 24h parallel running, both keepers should exhibit: - saxobank_tokens_live.json mtime updates every ~300s (Python, 5min interval) - saxobank_tokens_live_rust.json mtime updates every ~900s (Rust, 15min interval) - Both files have valid JSON with non-empty access_token - No invalid_grant errors in either container log - issued_at timestamps monotonically increase on both files

Cutover Path (G4 Day 4, NOT active yet)¶

Switching from parallel observation to Rust-as-source-of-truth requires:

Pre-cutover verification
24h clean parallel observation (see criteria above)
0 401 errors in lt-scan-cycle logs attributable to stale token
WORK_LOG entry with explicit cutover approval

Cutover sequence (hard cutover, not gradual)

# a. Stop Python keeper FIRST
ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker stop aegis-token-keeper-live"

# b. Wait for Python's last write to flush (atomic rename is immediate but
#    any in-flight refresh call may take 5-10s)
sleep 15

# c. Update docker-compose.lt-token-keeper.yml to REMOVE the override
#    (commit the change to main first, then trigger deploy workflow)
#    Edit: delete the line
#      - AEGIS_SAXO_TOKEN_CACHE_PATH=/data/tokens/saxobank_tokens_live_rust.json
#    Commit + push + wait for GHA deploy to succeed

# d. After cutover deploy, the Rust keeper writes the canonical
#    /data/tokens/saxobank_tokens_live.json
# (container_token_cache_path fallback triggers when no override is set)

# e. Delete Python token keeper container image to prevent accidental restart
ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker rm aegis-token-keeper-live"

Post-cutover monitoring (24h on-call)
Watch lt-scan-cycle for 401 errors
Watch saxobank_tokens_live.json mtime updates (should now come from Rust every 900s)
If any issue: rollback per below

Rollback¶

During parallel observation¶

Safe rollback — Rust keeper can be stopped independently without affecting Python or any downstream container:

ssh fukutani.ryo@192.168.42.252 \
  "cd /volume1/aegis/repo/aegis_v3/lt-rust-docker && \
   sudo /usr/local/bin/docker compose \
     -f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper"

After cutover¶

Post-cutover rollback requires restarting Python keeper:

# a. Stop Rust keeper
ssh fukutani.ryo@192.168.42.252 \
  "cd /volume1/aegis/repo/aegis_v3/lt-rust-docker && \
   sudo /usr/local/bin/docker compose \
     -f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper"

# b. Start Python keeper back up (container image must still be present)
ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker start aegis-token-keeper-live"

# c. Verify Python keeper resumed refreshing the canonical cache
ssh fukutani.ryo@192.168.42.252 \
  "sudo /usr/local/bin/docker logs aegis-token-keeper-live --tail 20 2>&1 | grep -i 'token valid\\|refreshed'"

Total rollback time: < 1 分 (if Python image is still present).

Troubleshooting¶

"Token cache missing" log at startup¶

Expected on first deploy BEFORE the bootstrap step runs
GHA workflow handles this by copying the canonical file once
If seen on a subsequent deploy, check the bootstrap step logs

`invalid_grant` error¶

Means the refresh_token has been revoked at Saxo
Requires manual re-authorization via the Saxo OAuth flow
Daemon continues running (does not exit) so operator has time to act

Rust keeper refresh errors but Python is fine¶

Check SAXOBANK_APP_KEY_LIVE / SAXOBANK_APP_SECRET_LIVE in /volume1/aegis/.env
Rust uses the same env vars as Python so they should match
If Rust fails and Python succeeds, the issue is likely container-local (env not loaded, typo)

mtime not updating¶

Check container is running: docker ps | grep aegis-lt-token-keeper
Check logs for refresh failures
Check YAML interval_secs hasn't been set to an unreasonable value