コンテンツにスキップ

2026-04-13 Monday Market Open — Rust Parallel Observation Smoke Test

Purpose: on Monday 2026-04-13 09:30 ET (13:30 UTC), verify that the 4 Rust parallel observation containers start writing clean artifacts, without repeating the 2026-04-10 Friday failure where 186/187 parity records were lost to the Polygon float volume parse bug (fix commit 35011d07e).

Created: 2026-04-10 (Claude). Runs: 2026-04-13 09:30 ET onward and daily through Day 10 cutover judgment.

Context

  • 2026-04-10 Fri afternoon market session: 187 parity_log.jsonl records written, 100% hit the Polygon float volume parse error (get_spy_history failed: Polygon payload parse error: invalid type: floating point 68441672.317886, expected i64). Regime was null for every scan, entries_proposed=0, exits_proposed=0. The clean parity observation is effectively lost for Day 1.
  • Polygon volume-as-float fix was merged at 35011d07e on 2026-04-10 19:38 UTC (after the market had already been running with the broken code for 6+ hours).
  • lt-scan-cycle container restarted at 2026-04-10 21:38:18 UTC with args --scenario-multi-yaml=/config/scenarios/lt_wft.yaml, --comparison-log-path=/data/state/lt_rust/lt_wft_comparison.jsonl, --comparison-interval-scans=20, --wft-batch-size=3, --dry-run, --scan-interval=45. That is the version of the binary that will see first market open on Monday.
  • Market was closed all weekend (2026-04-11 Sat, 2026-04-12 Sun).
  • Monday 2026-04-13 09:30 ET = 13:30 UTC is the first clean multi-scenario observation window.

Pre-open checks (run at 09:00-09:29 ET Monday)

1. Container health (all 4 must be "Up")

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker ps --format 'table {{.Names}}\t{{.Status}}' | grep -E 'aegis-lt-(scan-cycle|ldas-intraday|token-keeper$|quote-collector)'"

Expected — all 4 "Up" with reasonable uptime (days):

aegis-lt-scan-cycle       Up 2 days (or similar)
aegis-lt-ldas-intraday    Up 2 days
aegis-lt-token-keeper     Up 2 days
aegis-lt-quote-collector  Up 2 days

If any container is Restarting, Exited, or missing → ABORT the smoke test and debug before market open. Re-deploy via gh workflow run deploy-lt-{scan-cycle,ldas,token-keeper,quote-collector}.yml and wait for completion.

2. lt-scan-cycle is in multi-scenario mode

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker inspect aegis-lt-scan-cycle --format '{{json .Args}}'"

Expected — must contain --scenario-multi-yaml (not --config):

["--scenario-multi-yaml=/config/scenarios/lt_wft.yaml",
 "--state-dir=/data/state/lt_rust/lt_scan_cycle",
 "--parity-log=/data/state/lt_rust/parity_log.jsonl",
 "--comparison-log-path=/data/state/lt_rust/lt_wft_comparison.jsonl",
 "--comparison-interval-scans=20",
 "--wft-batch-size=3",
 "--dry-run",
 "--scan-interval=45"]

If the args start with --config=/config/scenarios/lt_rc.yaml instead → single-scenario mode is still live and the multi-scenario promote commit never deployed. Dispatch deploy-lt-scan-cycle.yml with force_recreate=true and wait.

3. Saxo token cache is fresh

ssh fukutani.ryo@192.168.42.252 "stat -c '%Y %n' /volume1/aegis/tokens/saxobank_tokens_live.json /volume1/aegis/tokens/saxobank_tokens_live_rust.json"

Both files must have mtime within the last 15 min (both the Python canonical and the Rust parallel cache should have refreshed recently). If the Rust cache is > 30 min old → Rust token-keeper is broken, debug before market open.

4. Polygon API key reachable

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker exec aegis-lt-scan-cycle sh -c 'test -n \"\$POLYGON_API_KEY\" && echo OK || echo MISSING'"

Must echo OK. If MISSING, the pt-docker/.env env-file load broke.

5. Python aegis-wft is still running (non-interference)

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker ps --filter name=aegis-wft --filter status=running --format '{{.Names}} {{.Status}}'"

Must show aegis-wft Up N days (healthy). If missing → the Rust parallel observation accidentally took out Python PT, which is a CRITICAL failure.

Market-open checks (09:30-10:00 ET Monday)

6. First scan record — Polygon fix is live

Wait until 09:32 ET (~2 min after open) and inspect the tail of parity_log.jsonl:

ssh fukutani.ryo@192.168.42.252 "jq -c 'select(.record_kind==\"scan\" and .scan_count > 186) | {scan_count, ts, regime, vix: .vix_value, errors, entries: (.entries_proposed|length), exits: (.exits_proposed|length)}' /volume1/aegis/wft_state/lt_rust/parity_log.jsonl | tail -5"

Pass criteria (the Polygon fix is working): - regime is non-null ("NORMAL", "CAUTION", or "CRISIS") - vix is a reasonable number (10-80) - errors is empty or at most short-lived (not the Polygon float volume error) - scan_count is incrementing every ~45 seconds

Fail signal (Polygon bug resurfaced or fix didn't land): - regime: null for every scan - Same get_spy_history failed: Polygon payload parse error in errors

If the fail signal appears, immediately check the container image git sha:

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker inspect aegis-lt-scan-cycle --format '{{.Image}}'"
ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker inspect aegis-lt-scan-cycle --format '{{.Config.Labels}}'"

and verify the build post-dates commit 35011d07e. If the container was built before the fix, re-dispatch deploy-lt-scan-cycle.yml with force_recreate=true.

7. Multi-scenario lanes both running

Look for the startup log from the most recent restart:

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker logs aegis-lt-scan-cycle --since 2h 2>&1 | grep -E 'Priority scenarios|WFT scenarios|Completed startup reconcile|scenarios_loaded' | head -20"

Expected (the Sprint 1a-1d runner is alive): - Completed startup reconcile for priority scenarios applied_events=N scenarios=1 - Priority scenarios paused because market is closed (pre-open) then transitions to running after open - WFT scenarios paused because market is closed (pre-open) then transitions to running after open

If you see only Priority scenarios logs but no WFT scenarios logs → multi-scenario wiring is broken, WFT lane not starting. Debug with docker logs aegis-lt-scan-cycle --tail 200 2>&1 | grep -E 'error|panic|scenario'.

8. First comparison log record (after 20 WFT batches)

lt_wft_comparison.jsonl gets written every 20 WFT batches. With --scan-interval=45 and --wft-batch-size=3, that is roughly 45 * 20 / 3 ≈ 300 seconds = 5 min between records. So expect the first record between 09:35 and 09:40 ET.

ssh fukutani.ryo@192.168.42.252 "ls -la /volume1/aegis/wft_state/lt_rust/lt_wft_comparison.jsonl 2>&1"

If the file does not yet exist at 09:35 ET → wait. If it still does not exist at 09:45 ET → multi-scenario comparison writer is broken.

Inspect the first record:

ssh fukutani.ryo@192.168.42.252 "jq '.' /volume1/aegis/wft_state/lt_rust/lt_wft_comparison.jsonl | head -60"

Expected: - timestamp: ISO-8601 ET - date: "2026-04-13" - scan_count: 20 or similar - scenarios: array of 11 entries (LT_RC + WFT_A..I + PT_XW) - each scenario has label, equity, closed_trades, win_rate_pct, etc. matching the Python pt_wft_comparison.jsonl schema

If scenarios has fewer than 11 entries → lt_wft.yaml wrapper is incomplete. Check /volume1/aegis/repo/aegis_v3/configs/scenarios/lt_wft.yaml.

9. First per-scenario parity diff with lt-wft-parity

As soon as both Python pt_wft_comparison.jsonl AND Rust lt_wft_comparison.jsonl have at least one record for today, run lt-wft-parity via a one-shot docker run --rm (NOT via docker exec, because the running aegis-lt-scan-cycle container only mounts wft_state/lt_rust/ — it can't see the Python comparison log at wft_state/pt_wft_comparison.jsonl):

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker run --rm --entrypoint lt-wft-parity \
  -v /volume1/aegis/wft_state/pt_wft_comparison.jsonl:/data/pt_wft.jsonl:ro \
  -v /volume1/aegis/wft_state/lt_rust/lt_wft_comparison.jsonl:/data/lt_wft.jsonl:ro \
  lt-rust-docker-aegis-lt-scan-cycle \
    --python-log /data/pt_wft.jsonl \
    --rust-log /data/lt_wft.jsonl \
    --date 2026-04-13 \
    --tolerance-equity 50 \
    --tolerance-trade-count 0"

The --rm flag ensures the one-shot container is cleaned up immediately. The --entrypoint lt-wft-parity override is REQUIRED — the lt-rust-docker image has a wrapper entrypoint (lt-rust-entrypoint.sh) that defaults to launching lt-shadow, so without the override the one-shot container tries to start the shadow daemon and crashes before reaching lt-wft-parity. The two file-level bind mounts (:ro) expose just the specific JSONL files without touching the directory structure the parallel-observation scan-cycle container relies on.

Verified working: 2026-04-10 23:17 UTC smoke-tested this exact command against --date 2026-04-13. Result: both sides correctly reported "NO RECORD FOR 2026-04-13" (expected — Friday's data exists for 2026-04-10, not 2026-04-13) and the binary exited with Overall: PASS (zero gated failures on an empty-record-set input). This confirms the docker run pattern, the mounts, and the binary path all work end-to-end.

Expected on Monday morning: both sides start from equity ~$32,000 with 0 closed_trades, so the diff should be trivially within tolerance. Overall: PASS expected.

Fail signal: Python "NO RECORD FOR 2026-04-13" or Rust "NO RECORD FOR 2026-04-13" — means the comparison log writer for that side isn't running. If Python is missing → aegis-wft container is broken. If Rust is missing → see check 8.

Note: lt-wft-parity is built into the lt-rust-docker-aegis-lt-scan-cycle container image as of commit d07c99bd5 (2026-04-10) and verified working on 2026-04-10 23:06 UTC after the --remove-orphans recovery. If the image does not have the binary, re-dispatch deploy-lt-scan-cycle.yml with force_recreate=true.

Note on docker run vs docker exec: the alternative would be to bind-mount the Python path into the running container, but that requires a compose file edit + force_recreate which touches the live observation loop. The one-shot docker run --rm pattern is fully out-of-band and does not risk the live container state.

10. lt-ldas-intraday writes first 15-min cycle

The intraday collector runs every 15 minutes between 09:35 and 16:00 ET. The first cycle of the day is 09:35 ET.

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker logs aegis-lt-ldas-intraday --since 1h 2>&1 | grep -E 'cycle|snapshot|Inside intraday window' | tail -10"

Expected at ~09:35 ET: - Inside intraday window, running cycle - cycle start + per-symbol snapshot logs - cycle complete with N parquet files written

Verify the Rust archive received new parquet files:

ssh fukutani.ryo@192.168.42.252 "find /volume1/aegis/live_data_archive_rust/options -name '13*_pt_*.parquet' -newer /tmp/anchor 2>&1 | wc -l"

Should be > 0 within 5 min after 09:35 ET.

11. lt-quote-collector resumes market-hours throttle

Pre-open the collector should be in off-market mode (rpm=60). After 09:30 ET it switches to market-hours mode (rpm=15 per the 2026-03-21 incident prevention — slower during market hours).

ssh fukutani.ryo@192.168.42.252 "cat /volume1/aegis/quote_samples_rust/collector_heartbeat.json 2>/dev/null | python3 -m json.tool"

Expected at 09:31 ET: - status: "collecting" - market_hours: true - rpm: 15.0 (not 60) - api_calls_this_cycle: incrementing

If market_hours: false at 09:35 ET or later → the clock detection is broken.

Checkpoint (10:00 ET / 14:00 UTC — 30 min after open)

Run the daily summary for the observation window:

ssh fukutani.ryo@192.168.42.252 "
  echo '=== parity_log scan count ==='
  jq -c 'select(.record_kind==\"scan\" and (.ts | startswith(\"2026-04-13\")))' /volume1/aegis/wft_state/lt_rust/parity_log.jsonl | wc -l

  echo '=== parity_log error rate ==='
  jq -c 'select(.record_kind==\"scan\" and (.ts | startswith(\"2026-04-13\"))) | .errors | length' /volume1/aegis/wft_state/lt_rust/parity_log.jsonl | awk '{s+=\$1; n++} END {printf \"%.1f%% (%d errors in %d scans)\n\", s/n*100, s, n}'

  echo '=== comparison log record count ==='
  wc -l /volume1/aegis/wft_state/lt_rust/lt_wft_comparison.jsonl

  echo '=== ldas intraday files written today ==='
  find /volume1/aegis/live_data_archive_rust/options -name '13*_pt_*.parquet' -newer /tmp/monday_anchor 2>/dev/null | wc -l

  echo '=== quote collector cycles ==='
  jq '.cycle' /volume1/aegis/quote_samples_rust/collector_heartbeat.json
"

Pass criteria (Day 1 clean baseline): - scan_count ≥ 30 (30 min / 45s/scan ≈ 40) - error rate < 5% - comparison log records ≥ 5 (first record at 09:35-09:40, one every ~5 min) - ldas intraday: at least 1 file written at 09:35 ET - quote collector cycles ≥ 5

If all green: update WORK_LOG with "2026-04-13 Day 1 clean baseline observation started" and begin the 5-business-day clock.

Mid-day and end-of-day checks

Every 2-3 hours during market hours (11:30 ET, 14:00 ET, 15:45 ET), re-run step 9 (lt-wft-parity) and step 10 for intraday progress. Log the Overall: PASS/FAIL verdict + per- scenario deltas to WORK_LOG. Any FAIL triggers immediate investigation (parity drift is the G1 cutover blocker signal).

After market close (16:00 ET / 20:00 UTC), run the full-day diff via the same docker run --rm pattern:

ssh fukutani.ryo@192.168.42.252 "sudo /usr/local/bin/docker run --rm --entrypoint lt-wft-parity \
  -v /volume1/aegis/wft_state/pt_wft_comparison.jsonl:/data/pt_wft.jsonl:ro \
  -v /volume1/aegis/wft_state/lt_rust/lt_wft_comparison.jsonl:/data/lt_wft.jsonl:ro \
  lt-rust-docker-aegis-lt-scan-cycle \
    --python-log /data/pt_wft.jsonl \
    --rust-log /data/lt_wft.jsonl \
    --date 2026-04-13 \
    --tolerance-equity 50 \
    --tolerance-trade-count 0 \
    --json" > /tmp/monday_parity.json

cat /tmp/monday_parity.json | jq '.overall, .scenarios[] | {label, failures}'

Attach this JSON to the WORK_LOG entry. This is the first formal Day 1 per-scenario parity snapshot — save it as the baseline for the 5-business-day observation window.

Failure playbook

Symptom Action
Container not running gh workflow run deploy-lt-<name>.yml -f force_recreate=true, wait for completion
Polygon parse error recurrence Verify container image post-dates commit 35011d07e; force_recreate if older
Multi-scenario WFT lane silent Check lt_wft.yaml, verify 11 scenarios loaded, re-read Sprint 1b wiring
Python aegis-wft stopped ⛔ CRITICAL — restore Python container via deploy-pt.yml before anything else (Rust is dry-run, Python is real money)
lt-wft-parity command not found Container image predates commit d07c99bd5, force_recreate
Both logs present but parity FAIL Dig into the specific scenario's equity/trades delta, check whether Python had a bugfix merged that Rust didn't

Next steps after Day 1

If Day 1 is clean, repeat steps 6-10 daily Tue-Fri (Day 2-5 of the 5-business-day observation window). If all 5 days are Overall: PASS, the cutover judgment meeting happens on the Monday of Week 3 (2026-04-20).