tickdistill-learn

How We Validate Market Data Before It Becomes a Signal

By TickDistill — order-flow microstructure signals. Educational content about our data engineering.

A signal is only as trustworthy as the data underneath it

We are, fundamentally, a data company. A clever signal computed on a silently corrupted day of trades is worse than no signal — it looks authoritative and is wrong. So before any market data is allowed to become a signal you can query or backtest, it passes through a set of quality gates. This page explains how, because for a data vendor, correctness is the product.

The failure modes we refuse to ship

Raw market data fails in quiet, dangerous ways:

Any one of these, shipped silently, poisons both the live signal and the backtest a quant would trust their capital to.

The gate: deterministic, fail-closed, never silent

Every day of data runs through a deterministic QA gate before it is allowed downstream. The rule is simple and strict: at the first violation, stop and report — never continue silently. A fail-closed circuit breaker beats a pipeline that limps on with bad data and discovers it weeks later.

We keep these checks deterministic (explicit assertions, not a model guessing) because data validation must be exact and reproducible. Representative checks:

When a check fails, the gate halts and emits a precise report — which day, which package, which metric — so the problem is fixed at the source rather than papered over.

Reproducible by construction: idempotent, resumable, point-in-time

Quality is not only “is this byte correct” — it is “can I trust the whole history.”

We process the raw tick data and then discard it — on purpose

We stream each day of raw tick data, compute the derived values, and discard the raw. Because our sources are free, re-deriving later costs time, not money — and it keeps us from hoarding terabytes we do not need. The quality gates run before the discard, so nothing is thrown away until it has been validated and the derived outputs are written and checked.

Why we tell you this

You cannot independently audit every byte we ingest, so we earn that trust by being explicit about how we handle data: fail-closed gates, deterministic checks, reproducible and point-in-time pipelines. We sell shovels, not gold — and a shovel that bends the first time you use it is worthless. The reliability of the data underneath is the part we will never cut.


TickDistill sells clean, computed order-flow inputs — not trading advice or guaranteed alpha. Backtests are illustrative and not a promise of future results.