Skip to content

llm

2 posts with the tag “llm”

Technical Deep Dive: Why Cook With Gasoline MCP?

Most LLM browser tools fail because they feed the model one of two things:

Raw HTML: This is token-expensive and full of noise (<div class="wrapper-v2-flex...">). The LLM gets lost in the “soup” of utility classes and nesting.

document.body.innerText: This flattens the page, losing all structure. A “Submit” button becomes just the word “Submit” floating in void—the LLM has no idea it’s clickable or which form it belongs to.


CWG is an MCP server that acts as a “Vision Processing Unit” for the LLM.

Instead of scraping HTML, CWG serializes the Accessibility Object Model (AOM). This is the same API screen readers use to navigate the web.

  • Signal, Not Noise: We strip away thousands of <div> and <span> wrappers, exposing only semantic elements: buttons, inputs, headings, and landmarks.
  • The Result: A 50,000-token HTML page becomes a clean, 2,000-token JSON structure that preserves hierarchy and interactivity.

Modern enterprise apps (Salesforce, Adobe, Google Cloud) use Web Components and Shadow DOM to encapsulate styles.

  • The Problem: Standard scrapers (and innerText) hit a “shadow root” and stop. They literally cannot see inside your complex UI components.
  • The CWG Fix: Our serializer recursively pierces open Shadow Roots, flattening the component tree into a single, logical view for the AI.

When Claude or ChatGPT wants to click a button, it usually guesses a CSS selector (e.g., button[class*="blue"]). This is brittle; if you change a class name, the agent breaks.

  • Our Approach: CWG injects ephemeral, stable IDs (e.g., [cwg-id="12"]) into the DOM map it sends to the LLM.
  • The Loop:
    • LLM reads: Button “Save” [id=“12”]
    • LLM commands: click("12")
    • CWG executes the click exactly on that element, regardless of CSS changes.

Frontend errors are often invisible to the UI. If a button click fails silently, the LLM hallucinates that it worked.

  • CWG hooks into the browser’s Console and Network streams.
  • If a 500 API Error occurs after a click, CWG feeds that error log back into the LLM’s context window immediately.
  • Result: The LLM sees “Click failed: 500 Internal Server Error” and self-corrects (e.g., “I will try reloading the page”).
FeatureRaw HTML ScrapingVision (Screenshots)Gasoline MCP
Token Cost🔴 Very High🟡 High🟢 Low (Optimized JSON)
Speed🟢 Fast🔴 Slow (Image Processing)🟢 Instant (Text)
Shadow DOM🔴 Invisible🟢 Visible🟢 Visible & Interactive
Dynamic Content🔴 Misses updates🟡 Can see updates🟢 Live MutationObserver
Click Reliability🟡 CSS Selectors (Brittle)🟡 Coordinate Guessing🟢 Stable ID System

Why document.body.innerHTML Ruins LLM Context Windows

Gasoline MCP gives AI coding assistants real-time browser context via the Model Context Protocol. One of the hardest problems it solves is this: how do you represent a web page to an LLM without blowing up the context window?

The most common answer in the wild is wrong.

Many MCP tools, browser automation scripts, and AI coding workflows grab DOM content the obvious way:

document.body.innerHTML

This dumps the entire raw HTML of the page into the LLM’s context window. Every ad banner. Every tracking pixel. Every inline style. Every SVG path definition. Every base64-encoded image. Every third-party script tag. Every CSS class name generated by your framework’s hash function.

A typical web page might contain 500KB of raw HTML. The actual meaningful content — the text, the form fields, the error messages your AI assistant needs to see — might be 5KB. That’s 99% waste in a context window with hard token limits.

Consider a React dashboard page. A SaaS admin panel with a sidebar, a data table, some charts, and a modal.

ApproachToken CountMeaningful Content
document.body.innerHTML~200,000 tokens~2,000 tokens
Accessibility tree~3,000 tokens~2,000 tokens

With innerHTML, you are burning 99% of your context budget on <div class="css-1a2b3c"> wrappers, Webpack chunk references, SVG coordinate data, and analytics scripts. In a model with a 128K token context window, a single innerHTML dump can consume more than the entire window — leaving zero room for conversation history, system prompts, or the code your assistant is actually working on.

Worse, the signal-to-noise ratio is so low that the LLM struggles to locate the relevant content even when it fits. Buried somewhere in 200K tokens of markup is the error message you need it to read.

Gasoline takes a fundamentally different approach. Instead of raw HTML, it uses the accessibility tree — the structured, semantic representation that browsers build for screen readers.

The accessibility tree contains only meaningful elements:

  • Headings and document structure
  • Buttons, links, and interactive controls
  • Form fields with their labels and current values
  • Text content that a user would actually read
  • ARIA labels and roles that describe element purpose
  • State information — checked, expanded, disabled, selected

It strips out everything else. No CSS. No scripts. No SVG paths. No base64 blobs. No tracking pixels. What remains is a clean, hierarchical representation of what the page actually shows and does.

Beyond the full accessibility tree, Gasoline provides a query_dom MCP tool that lets AI assistants query specific elements using CSS selectors:

query_dom(".error-message")
query_dom("form#login input")
query_dom("[role='alert']")

Instead of dumping the entire page and hoping the LLM finds the relevant piece, the assistant can request exactly what it needs. A targeted query might return 50 tokens instead of 200,000.

This changes the interaction model from “here’s everything, good luck” to “ask for what you need.”

Three reasons:

  1. Token waste. Raw HTML is mostly structural noise — closing tags, class attributes, data attributes, script contents. LLMs pay per token. You are paying to process markup that carries zero information about your bug.

  2. Signal dilution. Even when it fits in context, the LLM must locate a needle in a haystack. Error messages, form validation failures, and visible text get buried under layers of generated markup. Model attention is a finite resource.

  3. Fragility. innerHTML output changes with every framework update, CSS-in-JS hash rotation, and ad network injection. The representation is unstable and framework-dependent. The accessibility tree is stable because it represents semantics, not implementation.

Gasoline captures the accessibility tree directly from the browser via its Chrome extension. When an AI assistant calls the get_console_logs, get_accessibility_tree, or query_dom MCP tools, Gasoline returns structured, token-efficient data:

  • Accessibility tree: Full semantic structure of the page, typically 50-100x smaller than innerHTML
  • DOM queries: Targeted CSS selector queries returning only matching elements
  • Console logs: Errors and warnings already captured in real time, no DOM parsing needed

The result: your AI assistant gets the information it needs to debug your application without consuming the context window budget it needs to actually reason about the problem.

Terminal window
npx gasoline-mcp@latest

One command. Zero dependencies. Your AI assistant gets clean, structured browser context instead of raw HTML noise.

Learn more about DOM queries ->