CryptoPrism-DB gave us a powerful daily view of 1,000 coins — but daily candles are too coarse for short-term trading signals. A coin can dump 30% and recover in hours without the daily chart showing anything alarming. We needed the same DMV signal quality, but on an hourly heartbeat.
The constraint: we couldn't just run the existing daily pipeline faster. The R data ingestion script was designed for 110-day lookbacks on 1,000 coins — at hourly frequency that would timeout and burn API quota. We needed a ground-up redesign scoped to 250 coins and a 5-day rolling window.
Two separate GitHub Actions workflows, offset by 4 minutes to guarantee sequential execution without explicit dependencies:
- r_cron.yml runs at :01 — R script fetches OHLCV for 250 coins from CoinMarketCap via the crypto2 package, writes into cp_ai. Takes ~6 minutes.
- py_cron.yml runs at :05 — Python picks up fresh OHLCV and runs four parallel analysis scripts (TVV, OSC, MOM, RAT), then the DMV core aggregator joins them. Takes ~4 minutes.
The aggregator runs last — it's the only script that touches FE_DMV_ALL and FE_DMV_SCORES, and it JOINs the outputs of all four analysis scripts. This dependency ordering is enforced purely by script order in py_cron.yml, not by orchestration tooling.
Historical data appends to cp_backtest_h automatically — every hourly run is captured, so backtesting against hourly signals requires no extra work.
Python needs fresh OHLCV data from R before it can run. We can't use depends-on between workflows in GitHub Actions. Solution: fixed 4-minute offset between cron schedules. R consistently finishes in under 4 minutes — tested across 200+ runs. Simple and reliable.
The DMV core script JOINs FE_TVV, FE_OSC, FE_MOM, and FE_RAT. If any upstream script fails mid-run, the JOIN produces incomplete rows silently. Solution: the core aggregator checks row counts before joining — if any table has fewer rows than expected, it raises an alert and aborts instead of writing partial data.
cp_ai stores the live rolling 5-day view. If we just appended every hour, it would grow forever. Solution: each run deletes rows older than 5 days before inserting new data. cp_backtest_h is the append-only historical store — never pruned.
Time offsets are a surprisingly effective poor-man's orchestrator for two-step pipelines. Formal orchestration tools (Airflow, Prefect) add significant complexity — for a two-step pipeline with predictable runtimes, a 4-minute cron offset is more reliable than it sounds.
The aggregator validation check (row count before JOIN) has caught three partial failures in production that would have silently written bad data. Worth the 10 lines of code.
Building cp_backtest_h as an append-only store from day one was the right call — retroactively trying to reconstruct hourly history would have been impossible.