By TickDistill — order-flow microstructure signals. Educational content, not financial advice.
Every TickDistill signal is a function signal = f(primitive, params), not a frozen number. Raw market events are reduced to normalized primitive measurements; a parameterized signal function then maps those primitives to an output. Changing the parameters — the “knobs” — changes the output without re-running the full historical data pipeline.
A parametric signal is a signal defined by a function signature, not by a hardcoded output. The function takes two inputs: (1) a set of primitives — pre-computed, normalized microstructure measurements — and (2) a set of parameters — thresholds and weights the caller supplies. The same function, different parameters, produces a different signal.
The alternative is a fixed pipeline that bakes one configuration into the computation and stores the result. Fixed pipelines are simple. They are also a dead end: every new configuration requires re-processing the full history.
The parametric design makes tuning a configuration decision, not a data-engineering project.
A primitive is a pre-computed, normalized microstructure event stored above a conservative global floor threshold. For example, a big-order primitive records that a single aggressive trade print (one aggTrade row) exceeded the floor magnitude — expressed as a sigma z-score, not as a raw contract count — along with its timestamp, side, and price context. (A single large print is a proxy for conviction, not proof of one decisive actor; exchanges aggregate by price and side within a short window, so true single-actor attribution would require lower-level data.)
The primitive feature-store (PrimitiveStore) holds these events for the full historical range. It is the repository that makes parameterization cheap: instead of reprocessing raw ticks whenever a parameter changes, the signal function re-queries the already-normalized primitives and applies new thresholds locally.
Storing the primitives rather than the final signal is what makes the knobs free to move after the pipeline has run.
The floor is a global minimum threshold that determines which primitive events are recorded at all. Events below the floor are silently discarded during ingestion; events at or above it enter the primitive store.
The floor is a one-way door because raw tick data is discarded after ingestion (see the single-pass streaming ETL architecture). Once the backfill is complete, the floor cannot be lowered retroactively — there is no raw history to re-query. Events that fell below the floor are gone.
This means the floor must be set conservatively before backfill begins. It is calibrated from a sample of historical data so that the recorded primitives cover the full useful range of the downstream signal knobs. A knob threshold can never be set below the floor; the floor is the hard lower bound on parameterization.
The decision “what to store” is therefore made once, under discard, and it governs all future signal configurations.
The causal baseline is the rolling historical distribution — mean and standard deviation — used to z-score each primitive measurement at the moment it is observed. “Causal” means the baseline uses only observations strictly before time t; no future data enters the computation.
The z-score formula for a quantity x at time t is:
z(t) = (x(t) - μ(t)) / σ(t)
where μ(t) and σ(t) are estimated from observations {x(t') : t' < t} only.
This is a standard requirement in time-series research (Arlot and Celisse, 2010, survey the general principle; it is foundational to any honest walk-forward evaluation). The violation — using a global mean or a centered window that includes future observations — is look-ahead bias. It makes backtest results artificially strong and live performance disappointing.
TickDistill enforces causal baselines across every signal and every primitive. It also excludes recurring mechanical windows — such as perpetual funding settlements — from the baseline estimation, so that structurally heavy-volume periods do not distort the rarity reading. The baseline reflects genuine informational activity, not calendar mechanics. For the full treatment, see What Is Point-in-Time Correctness?.
Given a window of primitives, the signal function computes several quantities from the microstructure literature and applies threshold tests to them.
Trade imbalance (directional ratio) is the standard, trade-only signed-ratio construct computed over aggressor-classified trades — the same quantity as cumulative volume delta (CVD), measured on a window. For a set of events with buy weight B and sell weight S and total weight V:
r = (B - S) / V (defined when V > 0)
This is an L1, trade-only measurement: it uses only executed trades and their aggressor side, not limit-order placements or cancellations. (It is distinct from Order-Flow Imbalance (OFI), the L2 book-event quantity of Cont, Kukanov, and Stoikov (2014, The Price Impact of Order Book Events, Journal of Financial Econometrics 12(1):47-88) — OFI requires order-book data TickDistill v1 does not use. See What Is Trade Imbalance in Order Flow? for the distinction.) The directional persistence that makes a large r meaningful — institutional metaorders split into same-side child orders — is documented by Lillo, Mike, and Farmer (2005, Theory for long-memory in supply and demand, Physical Review E 71, 066122) and Bouchaud, Farmer, and Lillo (2009, How Markets Slowly Digest Changes in Supply and Demand).
The raw ratio r is then z-scored against its own causal baseline to produce a rarity reading r_z. A threshold on r_z asks: “how anomalously directional is this window compared to what this market normally shows?” The same threshold value means the same rarity level across different assets and volatility regimes.
Density measures the clustering rate of primitive events in the window. Xu and Zhou (2020, Modeling aggressive market order placements with Hawkes factor models, PLoS ONE 15(1):e0226667) and Bacry, Mastromatteo, and Muzy (2015, Hawkes Processes in Finance, arXiv:1502.04592) document that large aggressive orders arrive in self-exciting clusters — not as a Poisson process — because institutional metaorders are split into sequential child orders. Density z-scored against its causal baseline captures the signature of that clustering.
Price containment is the price range spanned by the events in the window, z-scored against its causal baseline. Kyle (1985, Continuous Auctions and Insider Trading, Econometrica) defines market impact as proportional to order flow; the square-root law (Strict universality of the square-root law in price impact across stocks, 2024, arXiv:2411.13965) states I(Q) ∝ √Q. When a large directional volume load leaves price in an anomalously narrow range, the implied impact is low relative to what the square-root law predicts — a signature of resting liquidity absorbing the flow (Donier and Bonart, 2015, arXiv:1412.4503).
The signal fires when these z-scored quantities jointly exceed their respective calibrated thresholds. The exact threshold values and their combination formula are proprietary. The structure — z-score each primitive independently against its causal baseline, then apply threshold logic — is the publicly described mechanism.
TickDistill offers two operating modes for every signal package, corresponding to two use cases.
| Mode | Parameters | Computation | Tier |
|---|---|---|---|
| Standard | Default knob settings | Precomputed, stored, served instantly | Free |
| Custom | User-specified knobs | Computed on-demand against the primitive store | Paid |
Standard signals are precomputed at the default parameter set and stored in columnar format (Parquet). Serving them is effectively free — a JSON response of a few hundred bytes — and no recomputation occurs per request.
Custom signals apply user-specified knob values to the same primitive store. The computation runs on-demand (query against the stored primitives, apply new thresholds, return the result). The parametric architecture means this is a lightweight re-aggregation, not a full data pipeline re-run. Popular custom configurations are cached.
Pro / primitive access exports the primitive dataset itself alongside an SDK function compute_signal(primitives, signal, **knobs). The quant runs the signal function locally with arbitrary knob sweeps. This is the highest tier: the compute burden moves to the client, and the moat shifts from the formula (which is provided) to the primitives themselves — which are not reconstructible without the full ingestion pipeline, the calibrated floor, the causal baselines, and the exclusion mask.
The function f(primitive, params) is the interface contract that makes the entire system composable.
The primitives are sigma-normalized, point-in-time correct, and exclusion-masked before they reach the signal function. A signal author writes logic that operates on clean, normalized inputs — not on raw ticks. Adding a new signal on existing primitives requires only a new function; adding a new market requires only a new MarketProfile (per-market calibration parameters) and a new data adapter. The signal function itself does not change.
The layering looks like this:
raw ticks → ingestion adapter → normalized primitives (above floor)
→ causal baseline → z-scored primitives (PrimitiveStore)
→ signal function f(primitives, params) → SignalPoint
→ serving layer (standard: read from store; custom: compute on-demand)
Every layer has a single responsibility. The signal function sees only z-scored primitives; the serving layer sees only the function and its parameters; the client sees only the output schema. No layer needs to know how the layer below it works.
Signal parameters — “knobs” — are thresholds and weights expressed in sigma units, not in raw values like contract counts or dollar volumes. For instance, a 2σ threshold asks for a condition two standard deviations above the local baseline — a generic illustration of what a sigma threshold means, not a recommended or default setting for any particular signal. That rarity interpretation stays stable as market volatility changes, because the baseline tracks the market’s own recent distribution.
For an event-based signal like a conviction-zone detector, the main knobs govern:
Tighter thresholds produce fewer, higher-conviction signals. Looser thresholds produce more signals with more noise. The rarity meter in the product UI shows you the historical firing frequency at any knob position before you commit to a configuration. For the user-facing design of knobs and presets, see Tuning Order-Flow Signals: Knobs, Presets, and the Overfitting Trap.
Why store primitives rather than the final signal?
Storing primitives lets you recompute the signal at any parameter configuration without reprocessing raw tick data. If only the final signal were stored, every new configuration would require a full historical re-run — which is expensive and, under the discard architecture, impossible once the raw ticks are gone.
Why is the floor a one-way door?
Because the raw ticks are discarded after ingestion. Once the backfill is complete, there is no underlying data to re-query. The floor must be set before the pipeline runs. Events below the floor are permanently excluded from the primitive store — they cannot be recovered after the fact.
Can the same signal function run on different markets?
Yes. The signal function is parameterized by a MarketProfile — a per-market object that carries calibration values specific to that instrument. The function logic itself is market-agnostic. Porting a signal to a new market means writing a new profile and a new data adapter; the signal function does not change.
What does “on-demand” mean for custom signals?
It means the signal is computed at request time by querying the primitive store and applying the user’s knob values, rather than reading a precomputed result. The primitive store is already sigma-normalized and indexed, so the re-aggregation is fast. The most frequently requested configurations are cached so that repeated queries are served from cache.
How does the SDK differ from the API?
The API returns computed signal values — a SignalPoint with a value, direction, and metadata. The SDK ships the signal function itself as Python code, alongside an exported primitive dataset. The quant runs compute_signal(primitives, signal, **knobs) locally and can sweep any parameter range without additional API calls. For the architecture behind single-pass ingestion and discard, see Single-Pass Streaming ETL and Discard. For the knobs architecture from the engine perspective, see How the Knobs Work as an Architecture.
TickDistill sells clean, computed order-flow inputs — not trading advice or guaranteed alpha. Backtests are illustrative and not a promise of future results.