Claude, Honestly: What Actually Changes When You Use It in Production

In early 2024, I started building a WordPress plugin called WP-AutoInsight. The first version only talked to OpenAI. Then I added Anthropic's Claude. Then Google Gemini. Then Perplexity. I did this because clients kept asking for it, but also because I wanted to know, for myself, which of these tools was actually different and which were just wearing different costumes.

Almost two years and 5,000+ downloads later, I have an answer. Claude is different. Not dramatically, not the way Anthropic's marketing pages want you to believe, but in ways that show up when you use it for real work over weeks instead of minutes.

This post is for people who already use ChatGPT and wonder whether it's worth opening a second tab or installing a new app on their computers and smartphones. If you've never touched an LLM, that's a different article.

What Claude is, in one paragraph

Claude is a chatbot made by Anthropic. You ask for something, and it answers back. It can also read files, write files, run code, search the web, and hold context inside a conversation. If you've used ChatGPT, the interface is pretty familiar. So I’ll skip the signup walkthroughs. That leaves us with an interesting question: what Claude does that ChatGPT doesn’t, and do those differences matter for what you actually do?

Where Claude wins

Writing. Everyone mentions this, and, believe me, it's true. Out of the box, Claude's prose has more personality than ChatGPT's. ChatGPT defaults to a corporate beige that takes real effort to wash out, it SCREAMS “made by AI” everywhere. Claude starts closer to something a human might write and drifts toward beige if you let it. Both need guardrails and some training so they can find a voice and style that matches yours. Claude needs fewer of them to sound like a real person.

Long documents. I regularly paste full Brazilian legal texts and regulations into Claude when writing about laws or data protection, so it can analyze them and help me write. Claude handles 100+ page inputs without dropping a sweat. ChatGPT is getting better on this, but Claude still feels steadier here.

Following nuanced instructions. If I tell Claude "don't use em-dashes, avoid these fifteen words, match the register of these three pasted articles," it actually does most of that most of the time. ChatGPT tends to regress within a few exchanges and needs constant reminding.

Code review. It’s not only code generation, but it’s also code review, and Claude excels on both. When I paste a messy plugin function and ask "what's wrong with this," Claude is usually more honest about the real problem. ChatGPT more often produces a polished-sounding answer that misses it or even overengineers a solution.

Generating code is easy. Telling you your code is bad and suggesting a good solution isn't.

Where ChatGPT still wins

Voice mode is better on ChatGPT. It is not even a competition.

Image generation: Claude doesn't do it. Use ChatGPT or Gemini.

Quick shallow searches. ChatGPT with search or Perplexity feels faster for "what happened yesterday" questions. Claude's search is capable but slower.

Multimodal one-off tasks. Let’s say you want to add a photo and ask questions about it. ChatGPT is smoother there.

The point here is: you probably don't need to replace ChatGPT. You might want both, using each one where they excel.

What Claude is bad at

Every Claude tutorial I've read focuses on what it can do. That's not useful. What's useful is knowing where it fails, because that's where you either waste time or ship something broken.

Claude, like any current LLM, hallucinates facts with a straight face. Dates, quotes, statistics, WordPress hook names. It will invent a Brazilian law that doesn't exist and cite an article number. I verify every factual claim before anything I write with Claude goes public.

Claude, again, like any current LLM, agrees too much. Ask "is this a good idea?" and you'll get yes. Ask later "is this a bad idea?" and you'll get yes. Anthropic has acknowledged this (it's a known behaviour called sycophancy) and you can push against this behaviour by explicitly asking for counterarguments and honesty. But you have to know it's there.

Claude is strangely bad at counting. Word counts, character counts, list lengths. If precision matters, count it yourself or make it run a script inside Claude Code.

Claude forgets the middle of long conversations. Not because the context window is full, but because attention degrades across long sessions. When I note a conversation is growing long in content, I ask Claude to produce a summary of it, and then I open a new chat, pasting the essentials there and continuing with it. Don't fight the hallucinations.

None of these kill Claude as a tool. They just mean you treat it as a capable junior who needs review, not as a source of truth.

How I actually use it

I don't use Claude the way the tutorials tell you to. I use it the way someone uses a tool after two years of testing and knowing its strengths and its weaknesses.

For writing, I keep a Claude Project (It’s what they call a separate folder of reference files and instructions) with a dozen of my published articles, a list of words I refuse to let it use, and a running voice calibration document. Every draft starts there. This project is the difference between Claude writing like me, and Claude writing like every other AI blog post on the internet.

For WordPress development, I use Claude Code in the terminal. I don't use the chat interface for plugin work anymore. The chat is fine for one-off questions. For real code, Claude Code reads my project codebase, proposes changes, and stays grounded in what's actually there instead of hallucinating functions that might exist. The linked post covers my terminal setup if you want the specifics.

For research, I don't trust Claude. I use it to structure what I already know and flag what I should look up. Then I look it up. I've caught Claude confidently wrong often enough that I treat everything it says about dates, names, and numbers as a hypothesis.

For everyday thinking out loud, it's useful precisely because it pushes back when you ask it to. "Argue against this" is probably the most valuable prompt I use. Not "rewrite this": argue against it. You learn more from one genuine objection than from ten agreeable rewrites.

It also works well for coding: don’t just ask for a review, ask for an adversarial review, where it will find problems in your codebase, and confront your implementation plans.

The thing that matters more than the model

People obsess over which model is best. Claude Opus, GPT-5, Gemini, whatever dropped this week. It mostly doesn't matter.

What matters is context. What you feed the model, what examples you show it, what you tell it to avoid, what voice you're aiming for. A mid-tier model with good context beats the state-of-the-art model with a lazy prompt every single time. I've tested this on my own plugin: the same prompt across four APIs produces broadly similar garbage. Same prompt across four APIs with fifteen reference documents produces four broadly usable drafts.

This is also where most AI integrations stall after the demo. The demo works because someone crafted a specific input. Production fails because nobody built the context layer around it.

If you take one thing from this post, let it be this: stop asking which AI is smartest or fastest and start asking which AI you've taught the most.

Should you pay for it

The free plan is enough to decide if Claude fits how you work. Two weeks is plenty. If you don't reach for it by the end of week two, cancel the idea and move on.

The paid plan is worth it if you write for a living, code daily, or process long documents weekly. I pay for it because I do all three. If you use AI twice a week to summarize a meeting, the free plan is fine.

I don't have a strong opinion about Max plans. If you constantly hit the Pro ceiling during the week, you already know whether you need more.

Where this series goes next

This was the honest overview. Three follow-ups I'm writing separately, because each deserves its own piece:

How I calibrate Claude's voice for writing, including the early mistake that produced a Robocop-2 level of paralyzed output. (Too many conflicting instructions will freeze the model the same way they froze Murphy.)

Claude Projects and context files in practice — the thing most Claude tutorials skip and most users never set up. Projects are where Claude goes from "generic helper" to "actually useful for your specific work."

Claude Code for WordPress plugin development, building on the terminal setup post. What Claude Code actually changes about plugin architecture, testing, and the parts of development that used to be tedious.

I'll link each one from here as it goes live.

If your AI project has stalled

Most of the Claude-vs-ChatGPT conversation is beside the point if you're trying to ship something real. The hard part isn't picking a model. It's building the context layer, the error handling, the fallbacks, and the human-in-the-loop review that keep the thing working when traffic hits it.

That's what I do for a living. If your team has an AI integration that impressed in the demo and stalled before production, that's usually what I'm brought in to fix. Drop me a line at me@phalkmin.me or through the contact form and tell me where it's stuck. I'll tell you honestly whether I can help.

And if you've been curious about Claude and haven't opened it: open it. Paste something you've written. Ask it to rewrite it sharper and tell you what it changed. That's the test. If the answer is good, you'll know. If it's generic, you'll know that too.

Either answer is useful.

Claude, Honestly: What Actually Changes When You Use It in Production