Methodology
Last updated: 2026-06-11
This page documents how NewsProbe.io produces the analytics shown on the Service. We publish this for transparency, in line with the principles of MAR Art. 20 on objective presentation of investment analysis.
News ingestion
We pull news headlines from Finnhub on a recurring schedule. Each article is deduplicated by source URL.
AI summaries
Each article is processed by an Anthropic Claude model (Sonnet 4.6 at the time of writing). The model is instructed to produce a 2-sentence neutral summary and to extract referenced tickers with a relevance score from 0 to 1. The exact prompt is versioned in our repository at src/mastra/prompts/news-analyst.md.
Sentiment score
The same model assigns a sentiment score on a continuous scale from −1 (strongly bearish) to +1 (strongly bullish). The score reflects the article's tone toward the referenced tickers. It is not a price prediction.
Per-ticker daily sentiment shown on ticker pages is a relevance-weighted average of articles published that day.
Catalyst type
The same model tags each article with one catalyst type describing the kind of event it reports (e.g. earnings, M&A, FDA, guidance, analyst rating, legal, regulatory, partnership, product, macro, other). This is a factual classification of what happened. It is independent of the sentiment score and of materiality, and it carries no directional or "tradeable" connotation — it exists only to let you filter and organize the feed.
Sentiment velocity & acceleration
On ticker pages and the /trending/velocity page we show the rate of change of the smoothed daily sentiment series (velocity), and the change in that rate (acceleration). Velocity is the least-squares slope of the 7-day-smoothed sentiment over the last 7 days, expressed in sentiment units per day; acceleration is that slope minus the slope of the prior 7 days.
These are descriptive statistics computed entirely from our own published sentiment series. They describe how the sentiment reading has moved, not how the price has moved or will move. They are not a leading indicator and not a trading signal. Low-volume tickers produce noisy values, which is why we require a minimum number of articles before showing them.
Theme clustering
Articles are embedded with OpenAI text-embedding-3-large (1536 dimensions). Embeddings are stored in a Postgres pgvector index. A nightly clustering job groups articles around centroids and surfaces the resulting "themes" on /themes.
Related tickers
The "Often mentioned together" strip on ticker pages lists tickers that have historically appeared in the same news articles over the last 90 days (with a same-sector bonus). A nightly job counts these co-mentions; we require at least two shared articles before relating two tickers.
This is a descriptive map of past co-mention, not a basket to trade and not a prediction that these names will move together. It is computed from news co-occurrence only — it does not measure price correlation.
News-impact statistics
For each ticker, we compute how the price moved at H+24, H+72, and H+7d after each historical news event with relevance ≥ 0.6, using daily bars from Polygon.io. Aggregate statistics ("on $X, news with sentiment ≤ −0.5 produced an average −1.8% over 24h") are computed by simple averaging over the displayed period.
These statistics are descriptive of the past, not predictive of the future. They reflect a small sample size and ignore many confounding factors (overall market direction, sector rotation, earnings windows). Do not use them as a trading signal.
Internal validation of the sentiment tagging
We periodically run an internal event study to check whether the AI sentiment tags carry any information at all, by measuring average forward price moves (at +24h, +72h and +7d) after historically tagged articles, against daily market data. In the most recent run (over roughly three years of tagged articles), articles tagged bullish were followed on average by a positive 24-hour move of a fraction of a percent, with wide dispersion.
This is a retrospective quality check on our own tagging, published for transparency only. It is an average over thousands of past events with enormous variance; it is not a strategy, not an expected return for any future article or ticker, and past performance is not indicative of future results. We do not provide trading signals.
Limits and known biases
- AI models can hallucinate. We mitigate via low temperatures and structured-output validation, but errors do occur.
- Sentiment scoring depends on training data and may be skewed for less-covered or non-English-language tickers.
- News-impact statistics are sensitive to small sample sizes; we display the sample count alongside the average.
- Polygon free-tier daily bars do not include intraday context.
Reporting an issue
If you spot an inaccurate summary, sentiment score, or analysis, please report it at feedback@newsprobe.io with the article URL.