Featured

Stop Wasting LLM Tokens on WordPress HTML: ParseLess for CLI

Piping a WordPress page into Claude Code, Codex or Gemini CLI burns 95%+ of tokens on theme markup. ParseLess strips it and returns clean Markdown.

June 4, 2026
7 min read
Tags
ParseLessClaude CodeWordPressLLMMarkdowndeveloper toolsopen source

Last month, I needed Claude Code to automate refactoring of ~3,000-word posts for a client. Simple task: read a post, analyze competitors, do minor rewrites, maybe suggest two new H2s.

The client launched the automation. Claude Code came back with the rewrite and a warning: 19,800 input tokens used.

The problem is: content-wise, the post parsing was around 975 tokens.

I knew that the math was bad. I didn't know it was that bad.

Where Claude Code tokens were being spent

If you've ever opened a WordPress page in DevTools, you already know where. The HTML for a single post on a typical theme is a layered cake of <div> wrappers, navigation menus, sidebar widgets, Elementor scaffolding, Gutenberg block markup, schema.org JSON-LD, and cookie banner scripts. If you are using 3rd-party page builders and have bad devs on your side, you probably have a severe case of what I like to call “div Matrioska” - divs inside divs inside divs inside… divs.

The text you actually wrote, the part the LLM needs, is somewhere in the middle, surrounded by a wall of structural noise.

When you pipe that into Claude Code, Cursor, Antigravity, N8N, or any CLI that feeds your content into an LLM, the model dutifully parses all of it. Every <div class="elementor-row">. Every nav link. Every footer column with the obligatory "Quick Links" heading.

Then it throws 95% of it away and reads your text.

The CLI gets the whole code to use just a few paragraphs.

A 25x token reduction with one query parameter

ParseLess is a WordPress plugin I just released that solves this in the least clever way possible: when an AI bot or CLI tool asks for a page, the plugin serves Markdown instead of HTML.

Append ?format=md to any post URL on a site running the plugin:

curl https://yoursite.com/my-post/?format=md | claude "summarize this"

You get back the post as clean Markdown. Headings, paragraphs, lists, tables, code blocks with their language preserved, images, and links. No widget HTML. No page builder scaffolding. No theme. No extra plugins.

I ran the same three pages through Claude Code, OpenAI Codex, and Gemini CLI to make sure the savings were real and not a quirk of one tokenizer:

| Page | HTML size | Markdown size | Reduction | |---|---|---|---| | Elementor SEO guide | ~102 KB / ~25,500 tokens | 5.4 KB / ~1,350 tokens | 94.7% (19x) | | WP-AutoInsight PageSpeed guide | ~101 KB / ~25,300 tokens | 4.0 KB / ~1,000 tokens | 96.0% (25x) | | Default "hello world" post | ~108 KB / ~27,000 tokens | 3.6 KB / ~900 tokens | 96.7% (30x) |

Three different CLIs, same results. At least 95% tokens are spent on reading your sites. It’s a huge economy for SEO agencies and content writers.

Page builders make this worse: Elementor pages above ship around 100KB of HTML for around 1,000 tokens of real content. Lightweight themes do a little better, but the ratio is still atrocious.

What the plugin changes in a workflow

Bulk content review. The client used to be able to fit maybe four or five posts into a Claude Code context window before hitting the limit. With ParseLess, he can pipe in dozens at a time and ask for things like consistency checks across the whole archive: is he using "freelancer" and "consultant" interchangeably, are his CTAs aligned, which posts contradict each other on a topic?

Cross-referencing writing. Drafting a new post and want to link back to relevant older work? Pipe a curated list of past URLs into Claude Code through ParseLess, ask it to find the three most relevant existing posts and where in the new draft to link them. The whole archive fits.

Client work. When a client asks me to audit their content strategy, I can ingest their entire site into a Claude Project without watching the token meter spin. The audit gets to be about the content, not about how much of the blog I could afford to read.

How it works

The plugin hooks into template_redirect. When a request comes in, it checks two things: is this an AI bot User-Agent (GPTBot, ClaudeBot, PerplexityBot, and a dozen others), or does the URL include ?format=md. If either is true, ParseLess runs the post content through the_content filter ( same one WordPress uses for normal rendering, so page builder shortcodes and blocks get processed correctly ), then converts the resulting HTML into Markdown.

The result is cached as a transient. The next time a bot or CLI hits the same post, you get a single transient read. The conversion only runs once per post until it's edited.

Human visitors and regular search crawlers like Googlebot are never touched. They get the full HTML site, exactly as before. Nothing about the visible site changes. So you get less tokens and keep all the SEO and AEO juice.

The CLI ergonomics

?format=md works with anything that can fetch a URL. Some patterns I lean on:

# Feed a single post into Claude
curl -s https://yoursite.com/post-slug/?format=md | claude "rewrite the intro to be punchier"

# Pipe through jq if you've enabled YAML frontmatter
curl -s https://yoursite.com/post-slug/?format=md

# Multiple posts at once for cross-referencing
for slug in post-a post-b post-c; do
  curl -s "https://yoursite.com/$slug/?format=md"
  echo "---"
done | claude "find contradictions between these posts"

If you want metadata along with the content, there's an optional YAML frontmatter setting that prepends title, URL, author, date, categories, tags, and excerpt to every Markdown response. Useful if your prompt depends on knowing when the post was written or what category it was filed under.

Finding your own content programmatically

If you're scripting against your own archive, ParseLess 0.5.0 added a sitemap at /botfood-sitemap.xml that lists every public post with its Markdown URL. It's advertised in robots.txt so AI crawlers can find it, but it's also useful if you want a CLI tool to ingest your full site without manually maintaining a URL list.

# Pull every URL from the AI sitemap and pipe each post into Claude
curl -s https://yoursite.com/botfood-sitemap.xml \
  | grep -oE 'https?://[^<]+' \
  | xargs -I {} curl -s "{}?format=md"

Useful when you've added new posts since the last time you ran a bulk operation, and you don't want to remember which ones.

What I'd still want to fix

The plugin is at version 0.5.0. It works, but a few things are on my list.

Code blocks preserve the language for standard Gutenberg code blocks and most popular syntax highlighters, but some niche plugins use their own markup and won't come through cleanly. If you hit one, the filter is the right place to patch it.

There's no rate limiting on the ?format=md endpoint. The transient cache makes that mostly fine, but if you're on a small VPS and worried about someone scripting a bulk grab of your archive, that's a reasonable concern. I'll likely add an optional throttle in a future version.

Try it on your own posts

ParseLess is GPL, free, and in the official WordPress directory. Install, activate, and you're done. No configuration needed for the CLI use case, but yes, the plugin has a Settings page so you can fine tune everything to your needs.

If you build something useful on top of it, or you find an edge case where the Markdown output isn't quite right, let me know. I'm actively iterating, and edge cases are easier to fix when someone tells me about them.

If your team is moving content workflows onto LLMs and you're tired of paying for HTML you don't need, that's also a problem I help companies solve. s Now go check how many tokens your own site is costing you.

Read More Posts

Explore other articles and insights

Back to Blog