AI hallucination in investment research: I set five traps for my own SEC filings toolkit

A fake company, a fabricated 8-K, a consensus ask, a stale 13F, and a fake-bullish insider tape — five traps run against the live toolkit, outputs unedited.

Do AI assistants hallucinate financial data? Yes — any unconstrained language model will complete a plausible pattern rather than admit a gap, and in investment research a plausible wrong number is worse than no number. The fix is not a smarter model; it’s a data layer where the honest answer is the easiest answer available. That’s a testable claim, so on June 9, 2026 I ran five traps against my own product — the edgar.tools MCP server, the toolkit that connects Claude to SEC filing data: a company that doesn’t exist, an 8-K that was never filed, a request for analyst consensus (which isn’t in filings), a 13F question where the honest answer is “incomplete,” and an insider tape engineered to look like conviction buying. What follows is each trap and the actual response, trimmed for length but unedited. I build this product, which is exactly why you should watch me try to break it.

The test setup: Claude with the edgar.tools tools installed, same as any subscriber’s session. No special prompts, no retries.

What happens when you ask about a company that doesn’t exist?

The right answer is an empty result — not a fuzzy match dressed up as one. I asked for “Brockhaven Therapeutics,” a name I invented to sound like a real small-cap biotech:

{"query":"Brockhaven Therapeutics","results":[],"count":0}

Same for a fabricated ticker, ZYXQ: zero results. No nearest-neighbor guess, no “did you mean” presented as an answer. This matters more than it looks: an ungrounded model asked about Brockhaven Therapeutics will happily invent a pipeline and a market cap, because inventing is what pattern-completion does. A grounded one gets an empty array, and an empty array is very hard to write fiction on top of.

Can it invent an SEC filing that never happened?

I asked for the 8-K where Apple suspended its dividend in May 2026 — a filing that does not exist. The toolkit’s response to “what did Apple file?” is the complete 8-K tape for the window, and the complete tape for the last 90 days is exactly two filings: the April 30 earnings 8-K (Items 2.02/9.01) and an April 20 officer-matters 8-K (Item 5.02). Nothing in May. Nothing about a dividend.

So the grounded answer writes itself: that filing doesn’t exist — here is everything Apple actually filed in the window, with links. The model can’t quote an 8-K it was never handed, and the exhaustive list is what makes the negative claim safe: it’s not “I couldn’t find it,” it’s “here is the complete set, and it isn’t in there.” One more detail worth noticing — the response envelope flagged that the most recent event was already 40 days old and told the model to say so. The tooling treats even silence as something to date.

Will it make up analyst consensus numbers?

Consensus estimates aren’t in SEC filings, so the toolkit has no tool that returns them — there is nothing to retrieve, and the grounded response to “what’s Costco’s consensus EPS?” is that the number doesn’t live here, with a pointer to what does. What does come back for Costco is twelve ratios computed from the FY2025 10-K — and, more interesting for this post, the response carried this warning, verbatim:

DATA STALE — backing source as_of=2025-10-08 (244 days old). Do not report counts as current; surface this gap in your answer.

Read that again: the tool is giving the model instructions about its own limitations. Costco’s fiscal year ends in early September, so an annual snapshot pulled in June is eight months old — and rather than papering over it, the response orders the model to disclose it. A hallucination isn’t only an invented number; it’s also a real number presented as fresher than it is. The envelope exists to kill the second kind.

Does it admit when 13F data is incomplete?

Institutional ownership is where AI summaries go quietly wrong, because 13F data is structurally partial — filings arrive on a 45-day lag, and big managers file under multiple entities. I asked for Chipotle’s institutional holders. The response: 1,449 holders for the March 31, 2026 quarter, reconstructed from filings as they arrive, with an explicit freshness score — 96.14% of institutional shares already reflect current-quarter filings — and every holder tagged new, current, or carried so the model knows which rows are up to date. Then two caveats, shipped inside the response itself:

Position sizes are share counts; dollar values aren’t shown for the current quarter because the as-filed value figures aren’t yet reliable.

A single manager that files under several entities (e.g. Vanguard) appears as more than one row — its positions aren’t combined here.

That second caveat is anti-hallucination in its purest form: the data could be naively summed into a confident, wrong “Vanguard owns X%” claim, and the response intercepts exactly that move. (The probe also returned a genuine finding as a side effect: Capital World Investors cut its Chipotle position 20.7% quarter-over-quarter — cited, as always, to the underlying filings.)

Can it tell real insider buying from option exercises?

The best trap is one where the surface read is bullish and wrong. Atmos Energy’s last 90 days of Form 4s: 18 transactions, net value about $3.5 million acquired — a tape that an unstructured summary would describe as “insiders are buying.” Two fields in the response prevent that:

net_value_discretionary: 0

most_recent_open_market_buy: "2024-11-11"

Every one of those 18 transactions, the CEO’s included, is code M (an equity-award exercise) or code F (shares withheld for taxes), all clustered on May 2 — the annual vesting cycle, not conviction. The last time any Atmos insider bought stock on the open market with their own money was a year and a half ago. The discretionary-dollars field carries the entire distinction, so the model never has to parse transaction codes correctly under pressure — the classification arrives already done, with the Form 4 links attached.

What this means for using AI in your research process

Five traps, five grounded responses, and one design lesson worth stating plainly: you don’t get honesty by asking the model nicely — you get it by making the data layer return empty arrays, complete lists, freshness warnings, and pre-classified fields, so the truthful answer is the path of least resistance. That’s the architecture: deterministic tools return structured filing data with a source accession and an as-of date on everything; the model’s job is the prose, not the facts.

Two boundaries to be equally plain about. The toolkit covers what’s in SEC filings — fundamentals, events, ownership, insider activity, filing narrative. It will not produce price targets, forecasts, or buy/sell calls, and the refusal is load-bearing: a system that answers everything is a system you can’t trust on anything. And none of the findings above are views on the stocks — they’re what the filings say, with the documents attached.

If you want to verify any of this, don’t take my word for it — I’m the vendor. Install from the Plugin Hub (the setup guide takes five minutes) and run the traps yourself:

“Tell me about Brockhaven Therapeutics’ drug pipeline.”

“Show me the 8-K where Apple suspended its dividend in May 2026.”

“Are insiders at Atmos Energy buying?”

If the answers you get are an honest blank, a complete list that doesn’t contain the fake, and “exercises, not purchases — last open-market buy was November 2024,” then the system failed to hallucinate on the exact questions designed to make it. That’s the bar an investment memo needs.

Start a 14-day trial →

Dwight Gunning is the creator of edgartools, the open-source Python library for SEC data, and builds edgar.tools. All five probes were run against the live toolkit on June 9, 2026; responses are quoted unedited, and every filing cited resolves on sec.gov.