LT Token Keeper Runbook¶
CRITICAL:
lt-token-keeperruns in parallel with Pythonaegis-token-keeper-liveduring the G4 migration observation phase. The two keepers write to DIFFERENT token cache files to avoid race conditions. Do NOT remove theAEGIS_SAXO_TOKEN_CACHE_PATHoverride indocker-compose.lt-token-keeper.ymluntil the G4 Day 4 cutover.
What It Is¶
lt-token-keeper is the Rust-native Saxo OAuth2 refresh daemon, replacement for Python scripts/token_keeper.py running in the aegis-token-keeper-live container. It is the G4 phase of the Python → Rust trading infrastructure migration.
The daemon loop:
1. Load the token cache file via auth::load_token_cache
2. Classify the token freshness (AccessValid / NeedsRefresh / Expired24hToken)
3. If NeedsRefresh, call auth::refresh_token with the Saxo OAuth endpoint and save the new token via auth::save_token_cache (atomic tmp + rename)
4. Sleep interval_secs (default 900s from YAML)
5. Repeat until SIGTERM or SIGINT
On errors:
- invalid_grant: log ERROR "Manual re-authorization required" and continue
- Any other refresh error: log ERROR and continue
- Never exits except on signal
Prerequisites¶
Central Synology .env at /volume1/aegis/.env MUST provide:
SAXOBANK_APP_KEY_LIVE— OAuth client ID (live environment)SAXOBANK_APP_SECRET_LIVE— OAuth client secret (live environment)
The daemon does NOT need:
- SAXOBANK_ACCESS_TOKEN_LIVE — the refresh_token from the cache file is used instead
- SAXOBANK_ACCOUNT_*_LIVE — refresh does not require account context
Paths that MUST exist on Synology:
/volume1/aegis/.env(must contain the two OAuth vars above)/volume1/aegis/tokens/saxobank_tokens_live.json(canonical cache, managed by Pythonaegis-token-keeper-liveduring observation)/volume1/aegis/tokens/saxobank_tokens_live_rust.json(parallel observation cache, bootstrapped by the deploy workflow on first run)/volume1/aegis/repo/aegis_v3/configs/lt_token_keeper.yaml(daemon config)
How To Start¶
Preferred: trigger the dedicated workflow.
Manual (operator on Synology, emergency only):
cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
--env-file /volume1/aegis/.env \
-f docker-compose.lt-token-keeper.yml \
up -d --build aegis-lt-token-keeper
How To Stop¶
cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
-f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper
Delete the container entirely:
cd /volume1/aegis/repo/aegis_v3/lt-rust-docker
sudo /usr/local/bin/docker compose \
-f docker-compose.lt-token-keeper.yml rm -f aegis-lt-token-keeper
How To Verify It Is Working¶
Container status (read-only SSH OK):
ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker ps --format 'table {{.Names}}\t{{.Status}}' | grep aegis-lt-token-keeper"
Recent logs:
ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker logs aegis-lt-token-keeper --tail 100 2>&1"
Token cache mtime progression (should update every interval_secs = 900s by default):
ssh fukutani.ryo@192.168.42.252 "stat -c '%Y %n' /volume1/aegis/tokens/saxobank_tokens_live_rust.json"
# Compare with previous value — should increase by ~interval_secs each observation
Token expiry inspection:
ssh fukutani.ryo@192.168.42.252 "cat /volume1/aegis/tokens/saxobank_tokens_live_rust.json | python3 -c 'import json,sys;d=json.load(sys.stdin);print(f\"issued_at={d.get(\\\"issued_at\\\")} expires_in={d.get(\\\"expires_in\\\")}\")'"
Parallel Observation (G4 Day 2-3)¶
During parallel observation, BOTH keepers run simultaneously:
- Python aegis-token-keeper-live writes /volume1/aegis/tokens/saxobank_tokens_live.json (canonical)
- Rust aegis-lt-token-keeper writes /volume1/aegis/tokens/saxobank_tokens_live_rust.json (parallel)
Other Rust containers (aegis-lt-rust, aegis-lt-scan-cycle, future aegis-lt-wft) continue to read the canonical Python-managed cache. The Rust keeper is NOT yet the source of truth.
Observation acceptance criteria¶
After 24h parallel running, both keepers should exhibit:
- saxobank_tokens_live.json mtime updates every ~300s (Python, 5min interval)
- saxobank_tokens_live_rust.json mtime updates every ~900s (Rust, 15min interval)
- Both files have valid JSON with non-empty access_token
- No invalid_grant errors in either container log
- issued_at timestamps monotonically increase on both files
Cutover Path (G4 Day 4, NOT active yet)¶
Switching from parallel observation to Rust-as-source-of-truth requires:
- Pre-cutover verification
- 24h clean parallel observation (see criteria above)
- 0 401 errors in lt-scan-cycle logs attributable to stale token
-
WORK_LOG entry with explicit cutover approval
-
Cutover sequence (hard cutover, not gradual)
# a. Stop Python keeper FIRST ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker stop aegis-token-keeper-live" # b. Wait for Python's last write to flush (atomic rename is immediate but # any in-flight refresh call may take 5-10s) sleep 15 # c. Update docker-compose.lt-token-keeper.yml to REMOVE the override # (commit the change to main first, then trigger deploy workflow) # Edit: delete the line # - AEGIS_SAXO_TOKEN_CACHE_PATH=/data/tokens/saxobank_tokens_live_rust.json # Commit + push + wait for GHA deploy to succeed # d. After cutover deploy, the Rust keeper writes the canonical # /data/tokens/saxobank_tokens_live.json # (container_token_cache_path fallback triggers when no override is set) # e. Delete Python token keeper container image to prevent accidental restart ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker rm aegis-token-keeper-live" -
Post-cutover monitoring (24h on-call)
- Watch lt-scan-cycle for 401 errors
- Watch
saxobank_tokens_live.jsonmtime updates (should now come from Rust every 900s) - If any issue: rollback per below
Rollback¶
During parallel observation¶
Safe rollback — Rust keeper can be stopped independently without affecting Python or any downstream container:
ssh fukutani.ryo@192.168.42.252 \
"cd /volume1/aegis/repo/aegis_v3/lt-rust-docker && \
sudo /usr/local/bin/docker compose \
-f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper"
After cutover¶
Post-cutover rollback requires restarting Python keeper:
# a. Stop Rust keeper
ssh fukutani.ryo@192.168.42.252 \
"cd /volume1/aegis/repo/aegis_v3/lt-rust-docker && \
sudo /usr/local/bin/docker compose \
-f docker-compose.lt-token-keeper.yml stop aegis-lt-token-keeper"
# b. Start Python keeper back up (container image must still be present)
ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker start aegis-token-keeper-live"
# c. Verify Python keeper resumed refreshing the canonical cache
ssh fukutani.ryo@192.168.42.252 \
"sudo /usr/local/bin/docker logs aegis-token-keeper-live --tail 20 2>&1 | grep -i 'token valid\\|refreshed'"
Total rollback time: < 1 分 (if Python image is still present).
Troubleshooting¶
"Token cache missing" log at startup¶
- Expected on first deploy BEFORE the bootstrap step runs
- GHA workflow handles this by copying the canonical file once
- If seen on a subsequent deploy, check the bootstrap step logs
invalid_grant error¶
- Means the refresh_token has been revoked at Saxo
- Requires manual re-authorization via the Saxo OAuth flow
- Daemon continues running (does not exit) so operator has time to act
Rust keeper refresh errors but Python is fine¶
- Check
SAXOBANK_APP_KEY_LIVE/SAXOBANK_APP_SECRET_LIVEin/volume1/aegis/.env - Rust uses the same env vars as Python so they should match
- If Rust fails and Python succeeds, the issue is likely container-local (env not loaded, typo)
mtime not updating¶
- Check container is running:
docker ps | grep aegis-lt-token-keeper - Check logs for refresh failures
- Check YAML
interval_secshasn't been set to an unreasonable value