llm | Gasoline MCP

Technical Deep Dive: Why Cook With Gasoline MCP?

Jan 26, 2026

Brenn Hill

Most LLM browser tools fail because they feed the model one of two things:

Raw HTML: This is token-expensive and full of noise (<div class="wrapper-v2-flex...">). The LLM gets lost in the “soup” of utility classes and nesting.

document.body.innerText: This flattens the page, losing all structure. A “Submit” button becomes just the word “Submit” floating in void—the LLM has no idea it’s clickable or which form it belongs to.

How Gasoline MCP (CWG) is Different

CWG is an MCP server that acts as a “Vision Processing Unit” for the LLM.

1. The “Accessibility Tree” Strategy

Instead of scraping HTML, CWG serializes the Accessibility Object Model (AOM). This is the same API screen readers use to navigate the web.

Signal, Not Noise: We strip away thousands of <div> and <span> wrappers, exposing only semantic elements: buttons, inputs, headings, and landmarks.
The Result: A 50,000-token HTML page becomes a clean, 2,000-token JSON structure that preserves hierarchy and interactivity.

2. Shadow DOM Piercing

Modern enterprise apps (Salesforce, Adobe, Google Cloud) use Web Components and Shadow DOM to encapsulate styles.

The Problem: Standard scrapers (and innerText) hit a “shadow root” and stop. They literally cannot see inside your complex UI components.
The CWG Fix: Our serializer recursively pierces open Shadow Roots, flattening the component tree into a single, logical view for the AI.

3. “Stable ID” Interaction

When Claude or ChatGPT wants to click a button, it usually guesses a CSS selector (e.g., button[class*="blue"]). This is brittle; if you change a class name, the agent breaks.

Our Approach: CWG injects ephemeral, stable IDs (e.g., [cwg-id="12"]) into the DOM map it sends to the LLM.
The Loop:
- LLM reads: Button “Save” [id=“12”]
- LLM commands: click("12")
- CWG executes the click exactly on that element, regardless of CSS changes.

4. The “Console Vision” Pipeline

Frontend errors are often invisible to the UI. If a button click fails silently, the LLM hallucinates that it worked.

CWG hooks into the browser’s Console and Network streams.
If a 500 API Error occurs after a click, CWG feeds that error log back into the LLM’s context window immediately.
Result: The LLM sees “Click failed: 500 Internal Server Error” and self-corrects (e.g., “I will try reloading the page”).

Comparison Table

Feature	Raw HTML Scraping	Vision (Screenshots)	Gasoline MCP
Token Cost	🔴 Very High	🟡 High	🟢 Low (Optimized JSON)
Speed	🟢 Fast	🔴 Slow (Image Processing)	🟢 Instant (Text)
Shadow DOM	🔴 Invisible	🟢 Visible	🟢 Visible & Interactive
Dynamic Content	🔴 Misses updates	🟡 Can see updates	🟢 Live MutationObserver
Click Reliability	🟡 CSS Selectors (Brittle)	🟡 Coordinate Guessing	🟢 Stable ID System

Why document.body.innerHTML Ruins LLM Context Windows

Jan 27, 2025

Brenn Hill

Gasoline MCP gives AI coding assistants real-time browser context via the Model Context Protocol. One of the hardest problems it solves is this: how do you represent a web page to an LLM without blowing up the context window?

The most common answer in the wild is wrong.

The innerHTML Problem

Many MCP tools, browser automation scripts, and AI coding workflows grab DOM content the obvious way:

document.body.innerHTML

This dumps the entire raw HTML of the page into the LLM’s context window. Every ad banner. Every tracking pixel. Every inline style. Every SVG path definition. Every base64-encoded image. Every third-party script tag. Every CSS class name generated by your framework’s hash function.

A typical web page might contain 500KB of raw HTML. The actual meaningful content — the text, the form fields, the error messages your AI assistant needs to see — might be 5KB. That’s 99% waste in a context window with hard token limits.

The Math

Consider a React dashboard page. A SaaS admin panel with a sidebar, a data table, some charts, and a modal.

Approach	Token Count	Meaningful Content
`document.body.innerHTML`	~200,000 tokens	~2,000 tokens
Accessibility tree	~3,000 tokens	~2,000 tokens

With innerHTML, you are burning 99% of your context budget on <div class="css-1a2b3c"> wrappers, Webpack chunk references, SVG coordinate data, and analytics scripts. In a model with a 128K token context window, a single innerHTML dump can consume more than the entire window — leaving zero room for conversation history, system prompts, or the code your assistant is actually working on.

Worse, the signal-to-noise ratio is so low that the LLM struggles to locate the relevant content even when it fits. Buried somewhere in 200K tokens of markup is the error message you need it to read.

How Gasoline Handles DOM Content

Gasoline takes a fundamentally different approach. Instead of raw HTML, it uses the accessibility tree — the structured, semantic representation that browsers build for screen readers.

The accessibility tree contains only meaningful elements:

Headings and document structure
Buttons, links, and interactive controls
Form fields with their labels and current values
Text content that a user would actually read
ARIA labels and roles that describe element purpose
State information — checked, expanded, disabled, selected

It strips out everything else. No CSS. No scripts. No SVG paths. No base64 blobs. No tracking pixels. What remains is a clean, hierarchical representation of what the page actually shows and does.

The query_dom Tool

Beyond the full accessibility tree, Gasoline provides a query_dom MCP tool that lets AI assistants query specific elements using CSS selectors:

query_dom(".error-message")
query_dom("form#login input")
query_dom("[role='alert']")

Instead of dumping the entire page and hoping the LLM finds the relevant piece, the assistant can request exactly what it needs. A targeted query might return 50 tokens instead of 200,000.

This changes the interaction model from “here’s everything, good luck” to “ask for what you need.”

Why is innerHTML Bad for LLMs?

Three reasons:

Token waste. Raw HTML is mostly structural noise — closing tags, class attributes, data attributes, script contents. LLMs pay per token. You are paying to process markup that carries zero information about your bug.
Signal dilution. Even when it fits in context, the LLM must locate a needle in a haystack. Error messages, form validation failures, and visible text get buried under layers of generated markup. Model attention is a finite resource.
Fragility. innerHTML output changes with every framework update, CSS-in-JS hash rotation, and ad network injection. The representation is unstable and framework-dependent. The accessibility tree is stable because it represents semantics, not implementation.

How Does Gasoline MCP Handle DOM Content?

Gasoline captures the accessibility tree directly from the browser via its Chrome extension. When an AI assistant calls the get_console_logs, get_accessibility_tree, or query_dom MCP tools, Gasoline returns structured, token-efficient data:

Accessibility tree: Full semantic structure of the page, typically 50-100x smaller than innerHTML
DOM queries: Targeted CSS selector queries returning only matching elements
Console logs: Errors and warnings already captured in real time, no DOM parsing needed

The result: your AI assistant gets the information it needs to debug your application without consuming the context window budget it needs to actually reason about the problem.

Get Started

npx gasoline-mcp@latest

One command. Zero dependencies. Your AI assistant gets clean, structured browser context instead of raw HTML noise.

Learn more about DOM queries ->