Earnings calls aren't only narrative — they're a structured, high-quality text dataset that gets re-emitted by 9,000+ public companies every quarter, on schedule. For a quant team willing to do basic NLP, the corpus is one of the cleanest alternative-data sources around: same speakers, same formats, same recurring topics, and direct quotes from the executives responsible for the business.
This piece walks through five text signals that have shown up in published quant research and that any team can prototype against our API in an afternoon.
1. Cross-sectional mention counts
The simplest signal: for a given universe (e.g. S&P 500) at a given quarter, how many companies mentioned theme X? A rising count over consecutive quarters is the canonical "narrative breaking out" indicator. Falling counts can warn of a fading thesis before the price catches up.
Why it works as a signal: management teams have similar incentives, and they decide what to mention in prepared remarks based on what they think will move the stock. Cross-sectional mention spikes are a coordinated belief signal.
curl 'https://earningscalls.dev/api/v1/search/by_ticker?q=agentic+AI&tickers=AAPL,MSFT,GOOGL,...&date_from=2026-01-01' \
-H "X-API-Key: $KEY"
The response groups mentions by ticker. Compare quarter-over-quarter for trend velocity.
2. Sentiment delta vs. baseline
Absolute sentiment scores are noisy and inconsistent across companies. The change in sentiment for a single company vs. its own trailing four-quarter average is much sharper. Pull each company's last 5 calls, score sentiment on the prepared remarks block (not Q&A — too analyst-influenced), and use the delta as a feature.
The clean comparison surface is the prepared remarks, which executives largely control. Q&A drift is a different (also useful, see #3) signal.
3. Q&A pushback ratio
Count the share of analyst questions that contain a hedge phrase (but, however, concern, clarify, walk us through, bridge that). Then compute the share of management answers that contain a deflection phrase (we'll come back to that, not commenting on, as we discussed last quarter, let me hand it to). The ratio is a proxy for question difficulty and management comfort.
The speakers endpoint surfaces these segments cleanly:
GET /api/v1/search?q=clarify&type=speakers&speaker_type=analyst&ticker=NVDA
Empirically this signal lags the price by about a quarter — useful in factor portfolios, not for event trading.
4. Guidance-hedging language
A specific subset of #3: the rate at which management uses conditional language around forward guidance (expect, anticipate, we believe, our view is, should, could). Increasing usage from a baseline is a soft warning. Goes well as an addition to traditional analyst-revision factors.
Run a focused phrase search restricted to the guidance section:
GET /api/v1/search?q=%22we+expect%22+OR+%22we+anticipate%22&type=speakers&speaker_type=executive&ticker=AMZN
5. Topic co-occurrence (theme baskets)
Don't track themes in isolation — track which themes co-mention. A company that simultaneously discusses tariff and pricing is signaling pricing-power. A company discussing agentic AI and headcount is signaling cost reduction via AI. These pair signals are richer than singular mentions.
Implement as a bigram of theme matches. Iterate over the universe, score each (theme1, theme2) co-occurrence, rank.
Why our API for this
- 33,000+ transcripts spanning 9,000+ public companies across 70+ countries — large enough that cross-sectional ranks are meaningful even for narrow universes.
- Speaker segments returned with
speaker_type(Executives / Analysts) — needed for signals #2–#4 that depend on who said what. search/by_tickerreturns aggregated counts in a single request, not one-per-ticker — cheap to backtest a full S&P 500 sweep.- Pro tier starts at $24.99/month, no procurement cycle, no minimums.