Why you can trust pomata¶
Most libraries ask you to trust their output. pomata proves it. The premise is simple and a little obsessive:
every function is written twice. The shipped version is tuned to vectorize in Polars; a second, deliberately
naive oracle re-derives the same published formula and shares no code with it. The two must agree — on a fixed
series, on frozen golden-master numbers, and on thousands of randomly fuzzed inputs — or the build is red.
What every function survives¶
The same four-tier ladder runs over all three families, under 100% branch coverage:
Tier |
What it proves |
|---|---|
Contract |
The output’s shape, dtype ( |
Edge |
The exact warm-up length, and that a |
Correctness |
Bit-level agreement with the independent oracle on a fixed, realistic series, plus frozen, hand-checked golden-master values. |
Properties |
The same oracle agreement and the mathematical invariants (bounds, scale behavior, monotonicity) over the full |
The precision pomata guarantees¶
The headline is ten significant figures: every indicator reproduces its oracle to a relative 1e-10 on any
finite input within a sane dynamic range. That number is chosen, not guessed — and enforced:
It dwarfs the data. Market feeds carry four to eight significant figures. Ten figures means
pomatanever adds error you could observe — lossless against its own input, with orders of headroom to spare.It sits far above the noise floor. A
float64holds ~16 figures; honest rounding between the streaming implementation and the two-pass oracle lands around1e-15.1e-10is five orders above that — tight enough to reject any real coding error, loose enough that a last-bit difference never flakes the suite.It is verified, not asserted. The property tier holds every indicator to
1e-10across the full random domain. In practice the agreement is far tighter: about half the outputs match the oracle to the last bit (a relative difference of exactly zero), and the rest land at the noise floor — typically thirteen to fifteen figures, never fewer than the promised ten.
What pomata does not claim
Not the absence of all bugs, and not correctness outside the documented input domain. One known limit: a rolling-sum statistic loses precision only when an entire window collapses onto the float-precision floor of a much larger value that recently passed through it — it takes a deliberately adversarial series (a price dropping seven orders of magnitude bar-to-bar) to build that pattern. It is documented on the affected indicators, and the oscillators whose bound is unconditional — Chande Momentum Oscillator, Money Flow Index, Chaikin Money Flow — are clamped to that bound rather than papered over.
The receipt: indicators vs the public reference¶
For indicators there is also a public yardstick — TA-Lib, the de-facto industry C implementation, a genuinely
different computation path. Here is rsi(14), the last value of a deterministic 400-bar series, to every digit a
float64 holds:
pomata 85.20908701341023
reference 85.20908701341023 ← independent reimplementation: identical, to the last bit
TA-Lib 85.20908701341024 ← fifteen figures identical; differs only at the float64 floor
The same five indicators on the same series — most reproduce the reference exactly, the rest sit at the noise floor:
indicator |
pomata |
vs reimpl. |
vs TA-Lib |
|---|---|---|---|
|
|
exact |
|
|
|
|
|
|
|
exact |
|
|
|
|
|
|
|
|
|
The differential tier (non-gating) compares most indicators against TA-Lib bar-for-bar, from the first defined value. A documented minority is compared only on the converged tail — always with a reason its warm-up differs from TA-Lib’s (Wilder’s first True Range, the independent MACD/Chaikin EMAs, the Ehlers Hilbert pipeline’s long warm-up, the Parabolic SAR cold start) — never a steady-state disagreement.
PnL and metrics: proven at the edges, not in the digits¶
Their math is simple; their correctness lives where the bad things happen, not in the fifteenth digit. The same machine applies — independent oracle, four-tier ladder, golden masters — but the characteristic invariants change:
PnL — cost monotonicity (more cost, less PnL); the additive-vs-compounded split (
cumulative_pnlsums currency P&L,equity_curvecompounds returns); no look-ahead (every bar uses only past data); and a defined, tested behavior for every degenerate input —null,NaN,0,±inf, warm-up.Metrics — the annualization identities; scale-equivariance (a Sharpe Ratio is invariant to leverage); closed-form checks (the Sharpe Ratio of constant returns, the drawdown of a monotone series); and a defined behavior for every degenerate input — a
nullis skipped, a non-nullNaNpoisons the result, a degenerate denominator is reported as±inf/NaN, never silently clipped.
The tolerance ladder¶
Every band is named and lives in one file (tests/support/tolerances.py), sized to the worst residual the statistic’s
conditioning predicts — not a round number:
Band |
Value |
Where it applies |
|---|---|---|
exact |
|
recursive / stateless kernels that match the oracle to a few ULP |
reference / property |
|
oracle agreement on fixed and fuzzed inputs — the enforced guarantee |
scale |
|
degree-1 / degree-2 homogeneity under rescaling |
streaming |
|
a streaming statistic evaluated at an extreme magnitude |
Re-run it yourself¶
None of this is something you have to take on faith. The full gate runs from a clean clone in one command; the
published figures regenerate from scripts/precision_table.py, and the realized headroom under the guarantee (it lands
around 1e-14) from scripts/calibrate_tolerances.py. The complete method — the derivations, the test sizing, exactly
what is and is not claimed — is in
CORRECTNESS.md.