> the_problem
Forex data is genuinely scattered. Economic calendars live on ForexFactory — behind a web interface designed for human browsing, not programmatic access. Session timings exist in blog posts and spreadsheets. Volatility windows get passed around as rules of thumb.
The data problem wasn't lack of data — it was fragmentation. ForexFactory has excellent event data. Session overlap calculations are pure math. Neither was hard in isolation. The gap was a pipeline that pulled it all together into a unified, queryable, always-current view.
> my_approach
Three-layer architecture, each layer with a single responsibility:
- Layer 1 // ForexFactory Scraper Scrapes economic events with impact level (high/medium/low), currency affected, actual vs forecast vs previous values, and event time. Handles pagination across weekly views. Respectful rate limiting prevents bans. Session handling maintains cookies across requests.
- Layer 2 // Normalization Pipeline Raw scraped data lands in a staging table. The pipeline normalizes: timezone conversion to UTC, impact level classification, currency pair tagging, null handling for unreleased actual values. Output is a clean events table ready for querying.
- Layer 3 // Session Dashboard Computes live session status (open/closed), overlap windows (London/NY is the key one — 70% of daily volume), and surfaces upcoming high-impact events in the next N hours. Built to be queried, not just displayed.
> architecture
┌────────────────────────────────────────────────┐
│ FOREXFACTORY.COM │
│ Weekly calendar pages (impact, currency, │
│ actual, forecast, previous, event time) │
└──────────────────────┬─────────────────────────┘
│ HTTP + session cookies
▼
┌────────────────────────────────────────────────┐
│ SCRAPER LAYER │
│ BeautifulSoup HTML parser │
│ Rate-limited requests (respectful crawling) │
│ Session handling + weekly pagination │
└──────────────────────┬─────────────────────────┘
│ raw event rows
▼
┌────────────────────────────────────────────────┐
│ NORMALIZATION PIPELINE │
│ UTC timezone conversion │
│ Impact level classification (H/M/L) │
│ Null handling for unreleased actuals │
│ Currency pair tagging │
└──────────┬───────────────────────┬─────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────────┐
│ PostgreSQL / │ │ SESSION ENGINE │
│ Storage Layer │ │ Sydney / Tokyo / │
│ (events table) │ │ London / New York │
└──────────┬───────┘ │ overlap detection │
│ │ DST-aware UTC math │
└──────┬─────┘ │
▼ │
┌─────────────────────────────────────────────┘
│ SESSION DASHBOARD
│ Live session status (open / closed)
│ Overlap windows highlighted
│ Upcoming high-impact events (next N hrs)
│ Volatility window alerts
└─────────────────────────────────────────────
> hard_challenges
> results
> lessons_learned
Timezone handling in finance is genuinely hard. The correct approach is non-negotiable: store everything in UTC, convert to local time only at the display layer. Any shortcut — fixed UTC offsets, local time storage, "it works most of the year" logic — will fail during DST transitions at exactly the moment you're watching a high-impact news release.
Scraping doesn't have to be adversarial. Respectful rate limiting, real session handling, and targeting stable HTML structures (not JavaScript-rendered state) produces scrapers that run for months without breaking. The investment in resilience upfront beats debugging silent failures later.
The dashboard's value isn't in any individual data point — it's in having all three layers (events, sessions, overlaps) queryable in one place. The integration is the product.