By TickDistill — order-flow microstructure signals. Educational content, not financial advice.
TickDistill persists and ships only non-reconstructible derived state — z-scores, flags, and bucketed levels — and discards the raw tick stream after computing them. A derived signal is an output engineered so the original size, price, and timestamp cannot be rebuilt from it. This is a data-engineering choice, not a legal opinion: it shrinks storage to a fraction of the raw feed and makes the derived dataset our own artefact rather than a copy of someone else’s tape.
A derived signal is a computed measurement whose output cannot be inverted back into the raw observations that produced it. A raw print is {exact_size, exact_price, exact_timestamp}. A derived output is a reading like “this is a 3σ-rare separated sweep, buy side, in a coarse price bucket” — informative about state, but missing the exact numbers needed to reconstruct the print.
The boundary is precise in our architecture: “size esatto + prezzo esatto + tempo preciso = stampa grezza” (DesignArchitecture §8). Cross any one of those and the output stops being derived. We deliberately occult the exact size (normalized to σ units) and omit or bucket the exact price.
| Property | Raw market data | Derived signal (what we ship) |
|---|---|---|
| Exact size | Present | Occulted → σ-normalized magnitude |
| Exact price | Present | Omitted or coarse-bucketed |
| Exact timestamp | Present | Event time, point-in-time |
| Reconstructible to the tape? | Yes (it is the tape) | No, by construction |
| Storage footprint | Terabytes of ticks | A fraction — sparse events |
| Whose artefact? | The exchange’s feed | Computed by our pipeline |
Persisting only derived output produces two compounding engineering advantages: a storage moat and a clean redistribution surface. Per DesignArchitecture §4, “persistiamo SOLO l’output derivato e scartiamo i tick grezzi” — the derived signal is bucketed and normalized so it is “non reverse-engineerabile nel tape originale.”
We pair this with discard: after a single pass computes every signal for a day, the raw tick file is dropped (see single-pass ETL with discard). The source is free to re-fetch, so we keep the small derived output and throw away the bulky raw input.
Non-reconstructibility is the property that keeps a derived field from secretly being raw data in disguise. Some natural formulas leak the raw numbers back out, so the constraint forces a specific engineering discipline at the point of output.
Two concrete examples from our signal specs:
level = Σ priceᵢ·sizeᵢ / Σ sizeᵢ is weighted on raw size. A VWAP weighted on exact sizes edges toward reconstructible data (density_separated_big_orders.md, output level). The defensive fix is to weight on the σ-normalized magnitude already in the primitive, or to bucket the price to a coarse grid first — so the emitted level is a quantized location, not an exact price.meta, because the derived record must not let a consumer reconstruct the exact size or price of the execution (big_order_separated.md output constraint). They are emitted bucketed/quantized to a coarse grain instead.The exact bucketing grain is a calibrated, proprietary choice and is not published here. Why it matters: the grain is the dial that trades off information (“how useful is the level?”) against reconstructibility (“how close to the raw print?”). Set it too fine and the derived output starts to leak the tape; set it too coarse and the signal loses value. Choosing the smallest quantization that still satisfies non-reconstructibility is part of the recipe, not a public number.
We engineer it at three layers — magnitude, price, and persistence — so the property holds by construction rather than by promise.
This is the same discipline DesignArchitecture §25.2 states for every data layer, free or L2: ship “solo stato derivato non ricostruibile (z-score, flag, livelli bucketizzati),” never the raw feed. The signature is engineered to be necessary to read state but insufficient to rebuild the tape — the same “necessary but not sufficient” framing we apply to the aggregated-vs-separated classification.
Derived state is durable because a measurement stays current while a strategy decays. Microstructure edges erode quickly as they get crowded, so an output sold as “alpha” goes stale and the customer leaves (DesignArchitecture §25.3).
A σ-normalized derived signal is always a current reading of state — a VPIN is always now — so the decay falls on the customer’s strategy, not on our measurement. The same z-score logic runs across BTC, ETH, SOL, and later regulated futures by swapping a per-market profile, not by rewriting the signal. Persisting non-reconstructible derived output is therefore both the storage-efficient choice and the durable-product choice.
What is a derived signal versus raw market data? A derived signal is a computed measurement — a z-score, a flag, a bucketed level — engineered so the exact size, price, and timestamp cannot be rebuilt from it. Raw market data is the tape itself: exact size, exact price, exact timestamp. We persist and ship only the former.
Why discard the raw ticks after computing signals? Because the source dumps are free to re-fetch, so keeping terabytes of raw ticks buys nothing. We run a single pass, emit the small non-reconstructible derived output, checkpoint the day, and discard the raw (DesignArchitecture §17.2). Adding a new signal later just means re-running the pass.
Why not expose the exact VWAP or total volume in the output? Because a VWAP weighted on raw size, or a raw total volume, can let a consumer reconstruct the exact execution — which would turn a derived field back into raw data. We emit those fields bucketed or σ-weighted to a coarse grain instead.
What is the bucketing grain? Proprietary and not published. The grain trades information against reconstructibility; we calibrate it to the smallest quantization that keeps the output non-reconstructible. Publishing it would expose part of the recipe.
Is this a legal or compliance claim? No. This explainer is a data-engineering and product-architecture description of how we compute and store signals. Terms-of-service and licensing questions are handled separately by the founder and counsel and are out of scope here.
TickDistill sells clean, computed order-flow inputs — not trading advice or guaranteed alpha. This is an engineering description, not legal advice. Backtests are illustrative and not a promise of future results.