← back_to_portfolio
the_problem

CryptoPrism-DB gave us a powerful daily view of 1,000 coins — but daily candles are too coarse for short-term trading signals. A coin can dump 30% and recover in hours without the daily chart showing anything alarming. We needed the same DMV signal quality, but on an hourly heartbeat.

The constraint: we couldn't just run the existing daily pipeline faster. The R data ingestion script was designed for 110-day lookbacks on 1,000 coins — at hourly frequency that would timeout and burn API quota. We needed a ground-up redesign scoped to 250 coins and a 5-day rolling window.

Daily pipeline = strategic positioning. Hourly pipeline = tactical entries and exits. You need both — one without the other is incomplete.
my_approach

Two separate GitHub Actions workflows, offset by 4 minutes to guarantee sequential execution without explicit dependencies:

  • r_cron.yml runs at :01 — R script fetches OHLCV for 250 coins from CoinMarketCap via the crypto2 package, writes into cp_ai. Takes ~6 minutes.
  • py_cron.yml runs at :05 — Python picks up fresh OHLCV and runs four parallel analysis scripts (TVV, OSC, MOM, RAT), then the DMV core aggregator joins them. Takes ~4 minutes.

The aggregator runs last — it's the only script that touches FE_DMV_ALL and FE_DMV_SCORES, and it JOINs the outputs of all four analysis scripts. This dependency ordering is enforced purely by script order in py_cron.yml, not by orchestration tooling.

Historical data appends to cp_backtest_h automatically — every hourly run is captured, so backtesting against hourly signals requires no extra work.

architecture
GitHub Actions :01 UTC → r_cron.yml :05 UTC → py_cron.yml r_cron.yml └── gcp_ohlcv_1h_250coins.R ├── CoinMarketCap API (crypto2 pkg) ├── 250 coins × 5-day OHLCV └── WRITE → cp_ai py_cron.yml (sequential) ├── gcp_dmv_tvv_pct_1h.py → FE_TVV (Volume/Value/Risk) ├── gcp_dmv_osc_mom_rat_1h.py → FE_OSC, FE_MOM, FE_RAT └── gcp_dmv_core_1h.py → FE_DMV_ALL, FE_DMV_SCORES (aggregator — JOINs all signal tables) PostgreSQL Databases (GCP) ├── dbcp → listings, top 1000 universe ├── cp_ai → live hourly signals (rolling 5-day) └── cp_backtest_h → append-only historical (every hour preserved)
hard_parts
Challenge 01
Sequential execution without an orchestrator

Python needs fresh OHLCV data from R before it can run. We can't use depends-on between workflows in GitHub Actions. Solution: fixed 4-minute offset between cron schedules. R consistently finishes in under 4 minutes — tested across 200+ runs. Simple and reliable.

Challenge 02
DMV aggregator ordering

The DMV core script JOINs FE_TVV, FE_OSC, FE_MOM, and FE_RAT. If any upstream script fails mid-run, the JOIN produces incomplete rows silently. Solution: the core aggregator checks row counts before joining — if any table has fewer rows than expected, it raises an alert and aborts instead of writing partial data.

Challenge 03
Rolling window without unbounded growth

cp_ai stores the live rolling 5-day view. If we just appended every hour, it would grow forever. Solution: each run deletes rows older than 5 days before inserting new data. cp_backtest_h is the append-only historical store — never pruned.

results
24×/day
Pipeline executions
250
Coins per run
6 tables
Signal outputs per run
0
Manual interventions needed
vs_daily_pipeline
Feature │ CryptoPrism-DB (Daily) │ CryptoPrism-DB-H (Hourly) ──────────────────┼────────────────────────┼────────────────────────── Frequency │ Once daily │ Every hour (24×/day) Universe │ 1,000 coins │ 250 coins (top) Lookback │ 110 days │ 5-day rolling Use case │ Strategic positioning │ Tactical entries/exits Historical store │ cp_backtest │ cp_backtest_h Exec time │ ~15 min/day │ ~10 min/hour
lessons_learned

Time offsets are a surprisingly effective poor-man's orchestrator for two-step pipelines. Formal orchestration tools (Airflow, Prefect) add significant complexity — for a two-step pipeline with predictable runtimes, a 4-minute cron offset is more reliable than it sounds.

The aggregator validation check (row count before JOIN) has caught three partial failures in production that would have silently written bad data. Worth the 10 lines of code.

Building cp_backtest_h as an append-only store from day one was the right call — retroactively trying to reconstruct hourly history would have been impossible.