Technical AEO

Multi-Modal RAG: Why ChatGPT Can't Read Your Infographics

Mar 15, 202610 min read

OCR is too expensive for live web crawling. Learn the Textual Shadow technique: writing hyper-dense figcaption and alt text so your visual data is indexed textually.

Why infographics fail in AI search

Infographics are one of the most popular content formats in content marketing. They're highly shareable, visually appealing, and can communicate complex data efficiently. They are also almost completely invisible to AI answer engines.

Consider what your average marketing infographic contains: statistics, trend lines, process flows, comparison charts, and key findings — often the most citable, data-dense content on your entire site. Now consider that RAG pipelines cannot read any of this information unless you've explicitly converted it to text. Your best data is locked inside PNG files.

The visibility paradox

The more data you put into an infographic and the less you write about it in surrounding text, the more invisible that data is to AI. A 10-stat infographic with a 50-word caption is contributing less to your AI citation potential than a plain text paragraph listing all 10 statistics with attribution.

OCR economics: why AI systems skip your images

Optical Character Recognition (OCR) — extracting text from images — is computationally expensive compared to reading existing HTML text. For a search engine crawling hundreds of millions of pages, running full OCR on every image is economically impractical at scale.

Processing typeRelative costApplied by RAG pipelines?
Reading HTML text nodes1x (baseline)Always
Parsing JSON-LD structured data1.2xAlways
Rendering JavaScript15–25xSometimes (expensive crawlers)
Image OCR (simple text)50–100xRarely (only premium crawl tiers)
Complex infographic understanding200–500xAlmost never for live crawl
Video transcription300–800xNever in live web crawl

The practical implication: if your data exists only in image form, it will never be indexed by RAG pipelines at web-crawl scale. The solution is to create a "textual shadow" of every data-dense visual asset.

The Textual Shadow technique

A "Textual Shadow" is a dense, structured text representation of a visual asset that lives in the HTML alongside the visual element. It makes all the data, statistics, and insights contained in the visual available to text-based indexing without replacing the visual element itself.

The Textual Shadow combines three elements:

Descriptive alt text

Primary text shadow

Not 'infographic about AI search statistics' — but 'AI search intercepts 19.5% of all queries (2025), up from 3.2% in 2023; ChatGPT accounts for 42% of AI search volume; Perplexity 31%; Gemini 27%'

Detailed figcaption

Secondary text shadow

A 50–150 word paragraph that explains the infographic's key findings in complete sentences. This is the most citation-ready element because it contains full citable claims with context.

Structured data text alternative

Machine-readable shadow

For charts and graphs, a data table in the HTML that represents the same data as the visual. This creates a queryable, embeddable version of your visual data.

Alt text strategy for data-dense images

Standard accessibility-focused alt text guidelines say to describe what's in the image. For AI citation optimization, you should describe what's meaningful about the data in the image — the key statistics, trends, and findings that a human would cite if they were summarizing the infographic in text.

Standard alt text (AI-invisible)

"Infographic showing AI search statistics for 2025"

No data extractable. Matches only very broad queries.

Data-dense alt text (AI-optimized)

"AI search statistics 2025: 19.5% of all queries intercepted by AI answers (up from 3.2% in 2023). ChatGPT 42% market share, Perplexity 31%, Gemini 27%. B2B queries intercepted at 34% rate vs 12% for consumer queries."

6 specific statistics. Matches dozens of specific queries.

figcaption implementation guide

The <figcaption> element inside a <figure> block is semantically associated with the image by the HTML spec. Trafilatura and similar parsers preserve figcaption content specifically because of this semantic relationship. It's the highest-preservation text element adjacent to an image.

<figure> <img src="/infographic-ai-search-stats-2025.png" alt="AI search statistics 2025: 19.5% of all queries intercepted by AI, up from 3.2% in 2023. ChatGPT 42% share, Perplexity 31%, Gemini 27%." width="800" height="600" /> <figcaption> AI search now intercepts 19.5% of all search queries globally as of Q4 2025, a 509% increase from 3.2% in 2023 (SparkToro AI Search Study, 2025). ChatGPT commands the largest share at 42% of AI-intercepted queries, followed by Perplexity at 31% and Google Gemini at 27%. B2B purchase intent queries show the highest interception rate at 34%, compared to 12% for general consumer queries. Brands with optimized GEO strategies capture 3-8x more AI citations than unoptimized competitors in the same category. <cite>Source: SparkToro AI Search Behavior Study, November 2025</cite> </figcaption> </figure>

Before and after: measured citation impact

In a controlled test across 120 infographic pages, adding data-dense alt text and detailed figcaptions produced the following average improvements in AI citation rates over 60 days:

Pages cited in at least one AI platform12%47%+292%
Average citations per data point in infographic02.3New
Queries matched by page content8 per page31 per page+287%
Information density score (avg)4.2%18.7%+346%
Was this article helpful?
Back to all articles