This page explains the concepts we use and cites the primary literature. Exact hyperparameters, shrinkage constants, bin thresholds, process-noise settings and value-function internals are proprietary and omitted. Live diagnostic panels show current aggregate outputs as evidence that each layer actually runs end-to-end.
Framler is organised as a pipeline where one online posterior over the current market regime feeds several downstream decisions simultaneously — factor weighting, covariance shaping, prediction-interval width, and confidence scaling. The same posterior powers all of them, so the outputs are coherent rather than independently averaged.
Thirteen academic factor families are composed into a per-ticker 0–100 score. A confluence engine then applies literature-backed multi-factor pattern overrides (e.g. Sloan + gross-margin crack; Asquith short-squeeze setup). Every override carries a published effect size and is conditionally fired only when the participating factors show genuine joint tail-dependence — not merely high average correlation.
The engine is Bayesian end-to-end: priors come from the literature, posteriors update from live residuals (once forward returns accumulate), and uncertainty is reported alongside every point estimate as a conformal interval.
Each stock card carries five independent badges. They sit on different axes — none of them is a re-skin of another — and beginners often read them as duplicates of the Framler score number. This section maps every label to what it actually encodes.
The single number computed by the 13-factor composite. Every other label below is supplementary; if you only look at one thing, look at this.
Insider-trading magnitude over the last 90 days, classified per Lakonishok-Lee 2001 (larger trades carry more information). Independent of Framler — a stock can have Framler 80 and still be tier 3 (no insiders bought), or Framler 50 and tier 1 (big insider buy on a mediocre composite).
The right column of each card combines Framler and insider size into one phrase. Format: PRIMARY · MODIFIER — primary carries the verdict colour, modifier drops to brand cyan as a secondary-evidence cue.
Bearish side mirrors these — STRONG BEARISH, INSIDERS EXITING · MIXED, BIG SELLING · WATCH, MODERATE BEARISH, FACTORS REJECT, FACTORS BEARISH, QUANT TILT BEAR — when the verdict is SELL.
A rare configuration. Fires when all three are true on the same ticker on the same day: insider net-buy ≥ $1M, at least three positive news catalysts (with positive > negative), and conviction ≥ 60%. The TRIPLE badge appears in the card header and pushes that card to the top of the feed.
A separate Bayesian posterior over five supplementary signals (insider flow, news sentiment, fund accumulation, analyst consensus, 52-week price position) combined as log-likelihood ratios. Not the Framler score — this measures how strongly the supplementary stack agrees with the engine, not the engine's own conclusion.
The small line under the signal label. Counts how many of six fixed supplementary streams have content for this ticker: 13F-fund accumulation, insider buy/sell ≥$100K, ≥2 positive news catalysts, ≥5 covering analysts, 52-week price in the top/bottom 20%, and Framler ≥75. Not the 13 academic factors — those are the engine internals; these are the pills shown on the card.
When a specific combination of the 13 factors matches a documented academic setup, the matching pattern fires and overrides part of the composite. 19 patterns total — VALUE TRAP, SHORT SQUEEZE, CONFIRMED BEAT, EARNINGS VALIDATED, PHARMA CATALYST NEAR, ADCOM NEAR, and others. Each is anchored to a published paper (Bernard-Thomas, Asquith-Pathak-Ritter, Sloan, etc.).
The number after the pattern name (e.g. +22) is the score shift the pattern applied. The tail 0.08 figure is the copula tail-alignment for the pattern's factor set — higher means the factors in the setup co-crash reliably in history, low means the joint behaviour is closer to independent. Patterns only fire when their factor set shows genuine joint tail behaviour, not merely average correlation; the specific cutoffs for amplifying vs damping a pattern are calibrated and proprietary — see §2 Tail-dependence layer.
14-day sparkline — recent Framler trajectory. Green up, red down, grey flat. Shows FRESH when there are fewer than two days of history.
Market regime badge (top of dashboard) — the BOCPD regime label: Expansion / Normal / Slowdown / Contraction (or risk-on / transition / risk-off in the older naming). Factor weights shift by regime — momentum gets more weight in expansions, quality more in contractions. See §4 Online regime detection.
Sector tag — Healthcare and Biotechnology unlock a pharma-tilted weight vector and special pharma signals (Phase 3 failure probability, AdCom-event proximity, trial enrollment). P/E is ignored in biotech mode; the engine reads cash runway, gross margins, and pipeline momentum instead.
Each factor returns a 0–100 score per ticker per day; higher is more bullish. Factor scorers are self-normalising against the live universe so a 70 means "top-quintile today", not against a stale literature baseline.
Thirteen factors is a headline number; the effective breadth after removing redundancy is lower. We report that explicitly — the composite optimiser penalises correlated signals rather than double-counting them.
The thirteen factors above are per-ticker: they describe what a single name looks like today. Two universe-wide context layers sit alongside them.
Linear correlation describes average co-movement; in crises what matters is what factors do together specifically in the tails. A Gaussian copula with ρ = 0.5 has zero tail-dependence; a Student-t copula at the same ρ has meaningful joint-crash probability. We estimate lower- and upper-tail dependence non-parametrically for every factor pair, per regime, and shrink toward an independence prior so thin-sample pairs do not dominate.
The Confluence Engine uses this directly: a pattern's confidence is amplified when its participating factors also show genuine tail-alignment in history, and damped when the alignment is weak. This prevents "textbook pattern, no real co-movement" overrides.
Method: Schmidt & Stadtmüller 2006; Frahm, Junker & Schmidt 2005; Embrechts, McNeil & Straumann 2002.
Standard mean-variance (Markowitz) optimisation solves for weights that minimise factor-score variance for a given information-coefficient vector. Using Spearman ρ as the covariance input under-estimates correlated crash risk — exactly the risk that matters for sizing. Framler blends the correlation matrix with the tail-dependence matrix, with the blend weight determined by the current regime posterior: crash-dominated in risk-off, squeeze-dominated in risk-on, unbiased in calm regimes.
A Tikhonov ridge auto-tunes to guarantee positive-definiteness even when the empirical tail-dependence matrix is near-singular on thin samples. The same regime posterior that decides the blend also supplies the factor weights themselves — one posterior, two coherent uses.
Method: Grinold & Kahn 1999; Embrechts, McNeil & Straumann 2002; Higham 2002 on nearest-correlation projection.
Hidden Markov Models give discrete regime labels that flip abruptly. Framler uses a Bayesian online changepoint detector that maintains a full posterior over how long the current regime has been running. From the run-length posterior we derive a probability vector across three market regimes — risk-on, transition, risk-off — so the engine is honest about uncertainty near turning points rather than forcing a binary decision.
Fresh regime (low expected run-length) ⇒ the engine admits ignorance and leans on transition priors. Mature regime ⇒ directional bias takes over. The same posterior powers factor-weight blending, covariance blending, interval width, and dynamic exposure updates (sections 3, 5, 6). One posterior, four coherent decisions.
Method: Adams & MacKay 2007; Hamilton 1989 for baseline hazard rate.
Marginal conformal intervals guarantee coverage on AVERAGE across the universe — over-covering easy tickers and under-covering hard ones. Framler uses a bin-conditional (Mondrian) variant that partitions tickers by tail-alignment strength and regime, then calibrates one width per partition. The result: the stated 90% interval is close to 90% inside each bin, not only on average.
Until enough forward-return observations accumulate per partition, Framler falls back to a marginal interval inflated by average tail-dependence. This is the single place where the engine publicly admits the difference between "calibrated" and "prior" — every ticker page shows it.
Method: Vovk, Lindsay, Mammen & Vovk 2003 (Mondrian Confidence Machine); Angelopoulos & Bates 2021 for introduction.
Static factor weights lag regime shifts. Framler tracks factor exposures as a Kalman dynamic linear model where the per-state random walk is accelerated on days the regime detector signals a change-point. Process noise is driven directly by the change-point posterior, so exposures pivot fastest on precisely the days they need to — again the same posterior powering yet another decision.
The DLM output is blended with the regime-aware Markowitz solution. Early in calibration the blend leans on Markowitz priors; as the DLM accumulates observations it takes over. Both are reported; the blend is transparent.
Method: Kalman 1960; West & Harrison 1986, 1997 on Bayesian forecasting and dynamic models.
When the Confluence Engine overrides the linear composite (e.g. a value-trap pattern pulls a superficially-cheap ticker down), which factor deserves the credit? The naïve linear answer is misleading — overrides fire only under joint conditions, so part of each factor's contribution comes from its role in enabling the pattern, not from its standalone score.
Framler uses a Shapley decomposition of the score attributable to each factor. Every contribution splits cleanly into a linear share (the coefficient-style piece) and an interaction share (the pattern-enabling non-linearity). The four uniqueness axioms guarantee a consistent story across tickers. Per-ticker attribution is available in the authenticated research API.
Method: Shapley 1953; Štrumbelj & Kononenko 2014 for ML interpretability.
A daily scan of the universe across the factor stack runs several thousand simultaneous hypothesis tests. Under a naïve per-test significance threshold a meaningful fraction of those tests would be flagged as "significant" under pure noise — hundreds of spurious high-conviction signals per day if uncorrected. Framler runs every daily emission through a Benjamini-Hochberg false-discovery-rate correction, controlling the expected proportion of false discoveries among rejections rather than naïvely rejecting everything that crosses a per-test threshold. The exact significance level and target FDR rate are calibrated and proprietary.
The null-distribution scale uses a robust estimator (median absolute deviation with a Gaussian-consistency multiplier) rather than the empirical standard deviation. The robust scale is unaffected by the strong signals we're trying to detect — a plain σ would inflate when real signals are present, hiding the very signals it's meant to find. The live FDR-adjusted rejection set is published at /api/diag/fdr alongside the robust scale and the BH cutoff p-value, so the correction is observable, not just claimed.
Method: Benjamini & Hochberg 1995, Controlling the False Discovery Rate, JRSS-B 57(1); Bonferroni 1936 (FWER fallback); Huber 1981 on robust scale; Storey 2003 review for finance.
Equal-weighted factor blends ignore that factors have radically different time-series volatility. A jumpy factor that swings on every earnings announcement contributes more variance to the composite than a sticky balance-sheet factor that drifts slowly, even at identical nominal weights. The user reads "20% PEAD, 20% Quality" as equal influence; the math says otherwise.
Framler inverse-volatility-scales each factor's effective contribution before blending. Per-factor σ is estimated as the average per-ticker temporal standard deviation over a rolling window, refreshed once per cron run. The window length and minimum sample threshold are calibrated and proprietary. The result: each factor contributes proportional varianceto the composite, not just proportional weight. The estimator is silent on insufficient history — until signal history accumulates a stable per-ticker σ, the blender falls back to the plain weighted average automatically.
Method: Maillard, Roncalli & Teïletche 2010, The Properties of Equally-Weighted Risk-Contribution Portfolios, JPM 36(4).
The Loughran-McDonald 2011/2018 financial sentiment dictionary is the academic standard but was built on 10-K corpora through ~2018. Modern earnings-call and MD&A vernacular — "AI capex", "design win", "valuation reset", "guidance cut", "hyperscaler buildout", drug-class names like GLP-1 — does not tokenise to any LM category. Tickers heavy in this language scored zero NLP signal even when the sentiment was unambiguous.
the engine runs a curated multi-word phrase scanner over the raw text before LM tokenisation. Phrase matches increment the same category counters (positive / negative / uncertainty / litigious) as single-word LM hits, so the downstream percentage formulas and z-score normalisation are unchanged. The phrase list is curated against 10-K MD&A and earnings-call transcripts circa 2024-2026; speculative meme vocabulary is intentionally excluded — it belongs in a separate retail-sentiment lexicon.
Anchor: Loughran & McDonald 2011, JoF 66(1); Tetlock 2007 for tone-vs-return causality. The phrase set is curated, not derived from a Word2Vec / BERT cluster — those approaches require a labelled 2024+ corpus we do not yet license.
Confidence numbers exposed to users have to be probabilities, not heuristic point sums. Linearly clipping a heuristic additive total at [10, 97] saturates without information — two tickers with raw totals of 100 and 200 both render "97" despite the second carrying twice the conviction. Framler passes the heuristic through a logistic (sigmoid) transform centred at the bullish-action threshold. The output saturates asymptotically at 0% / 100% rather than clipping, monotone in the heuristic sum, so cross-ticker rankings are preserved while the spacing becomes calibrated.
This is heuristic-Platt: the sigmoid scale and centre are set by hand to match the legacy verdict-tier semantics — exact values are calibrated and proprietary. True Platt scaling — fitting the sigmoid parameters by maximum likelihood on realised forward-return outcomes — kicks in horizon-by-horizon as samples mature. 1d and 7d horizons calibrated on 17 May 2026; the 30d horizon clears its ≥30-sample gate around 26 June 2026, and 90d around 15 September 2026.
Method: Platt 1999, Probabilistic Outputs for Support Vector Machines; Niculescu-Mizil & Caruana 2005 for calibration comparison.
Information coefficient (IC) tells you whether scores predict returns. Proper scoring rules tell you whether the model's probability statements are honest. A model can post IC = 0.10 but if it says "90% confidence" on signals that come true only 70% of the time, the calibration is dishonest and position-sizing built on it overshoots.
The engine computes three calibration metrics on persisted forward returns:
The output of this layer flows back into the engine via recommendHalfwidthScale(): when observed coverage in a Mondrian bin sits below the nominal target, the next calibration pass widens that bin's halfwidth in proportion to the gap. Calibration is not a one-shot operation; it's a feedback loop.
Method: Brier 1950; Murphy 1973 decomposition; Matheson & Winkler 1976; Gneiting & Raftery 2007.
Section 8 (Benjamini-Hochberg) controls the false-discovery rate under independence or PRDS. On the equity cross-section factors co-move — momentum and spillover share Spearman ρ ≈ 0.33 — so BH is theoretically incorrect even though it's the industry default.
Barber & Candès 2015 introduced the knockoff filter as a rigorous solution: for every predictor Xj the engine generates a synthetic "knockoff" X̃j with the same first two moments and the same correlations to X−j, but conditionally independent of the response. A factor is retained if its correlation with the realised forward return is meaningfully larger than its knockoff's. The filter holds under arbitrary dependence between predictors.
Concretely: if a factor cannot beat its knockoff for ≥ 30 consecutive calibration days, its weight in the Bayesian blend decays toward zero. We use the equi-correlated form (basic variant in section 3 of the paper) — appropriate for a 13-factor cross-section. Model-X knockoffs (Candès et al. 2018) become interesting if and when we extend to alternative-data factor pools.
Method: Barber & Candès 2015 (Ann. Stat. 43(5)); Candès, Fan, Janson & Lv 2018 for the Model-X extension.
Average IC across rolling windows misses the question that actually matters for institutional capital: does our 90% confidence interval cover reality during a crisis? A backtest that posts 0.05 IC on average but where the CI collapses to 30% coverage during a crash is a model that fails when capital is most exposed.
The engine reruns itself on historical crisis windows (currently 2022 rate-hike bear market; 2020 COVID and 2008 financial crisis once Yahoo history is extended) using only data available at the "as-of" date — no leakage. Reported per crisis: composite IC, observed CI coverage rate, regime asymmetry (bullish-flagged vs bearish-flagged tickers), and a permutation-test p-value contextualising the IC against a null distribution from 1,000 random label shuffles.
Permutation testing (Edgington 1969; Good 2000) is the only honest answer to the small-sample IC problem. With 26 tickers, the Spearman rank standard error is 1/√25 = 0.20, so an IC of 0.12 is only 0.6σ from zero. Without the permutation test we would mistake noise for signal.
Live results: /backtest/crisis-stress. Test source: src/lib/data/__tests__/crisis-stress-2022.test.ts and permutation-test.test.ts.
Adams & MacKay (2007). Bayesian Online Changepoint Detection, arXiv:0710.3742.
Benjamini & Hochberg (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, JRSS-B 57(1).
Bonferroni (1936). Teoria statistica delle classi e calcolo delle probabilità, Pubbl. R. Ist. Sup. Sci. Econ. Comm. di Firenze, 8.
Huber (1981). Robust Statistics, Wiley.
Maillard, Roncalli & Teïletche (2010). The Properties of Equally-Weighted Risk-Contribution Portfolios, Journal of Portfolio Management 36(4).
Niculescu-Mizil & Caruana (2005). Predicting Good Probabilities with Supervised Learning, ICML.
Platt (1999). Probabilistic Outputs for Support Vector Machines, in Advances in Large Margin Classifiers, MIT Press.
Storey (2003). The Positive False Discovery Rate: A Bayesian Interpretation and the q-value, Annals of Statistics 31(6).
Tetlock (2007). Giving Content to Investor Sentiment, Journal of Finance 62(3).
Angelopoulos & Bates (2021). A Gentle Introduction to Conformal Prediction, arXiv:2107.07511.
Asness, Moskowitz & Pedersen (2013). Value and Momentum Everywhere, Journal of Finance 68 (3).
Asquith, Pathak & Ritter (2005). Short Interest, Institutional Ownership, and Stock Returns, JFE 78.
Bernard & Thomas (1989). Post-Earnings-Announcement Drift, Journal of Accounting Research 27.
Cohen & Frazzini (2008). Economic Links and Predictable Returns, Journal of Finance 63 (4).
Embrechts, McNeil & Straumann (2002). Correlation and dependence in risk management: properties and pitfalls, in Risk Management: Value at Risk and Beyond.
Fama & French (1992). The Cross-Section of Expected Stock Returns, Journal of Finance 47.
Frahm, Junker & Schmidt (2005). Estimating the tail-dependence coefficient, Insurance Math. Econ. 37.
Grinold & Kahn (1999). Active Portfolio Management, 2nd ed., McGraw-Hill.
Hamilton (1989). A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle, Econometrica 57.
Higham (2002). Computing the Nearest Correlation Matrix, IMA J. Numer. Anal. 22.
Jegadeesh & Titman (1993). Returns to Buying Winners and Selling Losers, Journal of Finance 48.
Kalman (1960). A New Approach to Linear Filtering and Prediction Problems, J. Basic Engineering 82.
Lakonishok, Shleifer & Vishny (1994). Contrarian Investment, Extrapolation, and Risk, Journal of Finance 49.
Li (2008). Annual Report Readability, Current Earnings, and Earnings Persistence, JAE 45.
Loughran & McDonald (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks, Journal of Finance 66.
Moskowitz & Grinblatt (1999). Do Industries Explain Momentum?, Journal of Finance 54.
Novy-Marx (2013). The Other Side of Value: The Gross Profitability Premium, JFE 108.
Pan & Poteshman (2006). The Information in Option Volume for Future Stock Prices, RFS 19.
Schmidt & Stadtmüller (2006). Non-parametric Estimation of Tail Dependence, Scand. J. Stat. 33 (2).
Seyhun (1998). Investment Intelligence from Insider Trading, MIT Press.
Shapley (1953). A Value for n-Person Games, Annals of Mathematical Studies 28.
Sloan (1996). Do Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future Earnings?, Accounting Review 71.
Štrumbelj & Kononenko (2014). Explaining prediction models and individual predictions with feature contributions, KIS 41.
Vovk, Gammerman & Shafer (2005). Algorithmic Learning in a Random World, Springer.
Vovk, Lindsay, Mammen & Vovk (2003). Mondrian Confidence Machine, COLT.
West & Harrison (1986, 1997). Bayesian Forecasting and Dynamic Models, Springer.