How I Built a Production RAG Chatbot for a WooCommerce Store

One of my clients sells regulated products in Canada. Their customer support queue was full of the same questions: how do I dose this, can I take it with my medication, where's my order? A chatbot seemed obvious. What wasn't obvious was how to build one that wouldn't hallucinate dosing information to someone asking about drug interactions.

That’s a constraint — accuracy matters more than convenience — that shaped every architecture decision.

Why RAG Instead of a Stuffed System Prompt

The first instinct with LLM chatbots is to dump product knowledge into the system prompt and call it done. For a small catalog and static FAQ content, it works. Until it doesn’t and the chatbot hallucinates.

The FAQ content here included medical guidance: what medications interact with the client’s products, who shouldn't use these products at all, and dosing schedules for different conditions. That content changes as the company learns more, gets new compliance guidance, or updates products. Hardcoding it into a system prompt means every content update requires an n8n workflow edit and a re-deploy.

It also means the context window fills up fast, and at some point, the model starts deprioritizing content buried at the bottom of a 10,000-token prompt. Which can create wrong answers.

A proper RAG setup inverts this. The knowledge base lives in a vector database. The agent queries it at runtime. Updating the knowledge base is just an ingestion workflow.

The retrieval-augmented generation approach also gives you something a stuffed prompt can't: a clean audit trail of what the model retrieved before answering. When a customer asks about mixing psilocybin with SSRIs and the bot gets it wrong, you can check whether the right document was returned. With a prompt, you're guessing.

Two Pinecone Indexes, Not One

The ingestion workflow creates two separate Pinecone indexes: products and knowledge-base.

This separation is deliberate. Product catalog and FAQ content have different update cadences. A product gets added or repriced every few weeks. The FAQ content, especially the medical guidance, changes less frequently but matters more when it does change. Keeping them in separate indexes means you can re-embed the product catalog without touching the FAQ embeddings, and vice versa.

It also means retrieval stays clean. A customer asking "what's in the Product X capsules" should hit product data. A customer asking "can I take this if I have bipolar disorder?" should hit the FAQ. Mixing them in a single index risks the model surfacing a product description in response to a safety question.

The embedding model is text-embedding-3-large at 3072 dimensions. That's more expensive per token than the default text-embedding-3-small. For accurate dosing information and medical guidance in a regulated context, retrieval quality is not where you cut costs.

The n8n Layer

The main chatbot runs as a single n8n workflow with an AI agent at its center. But the real architecture is three workflows: the main agent, a WooCommerce order lookup sub-workflow, and a HelpScout support ticket sub-workflow.

This separation matters more than it might seem. The agent has five tools available: the Pinecone knowledge base, the WooCommerce sub-workflow, the HelpScout sub-workflow, a Google Sheets logger, and a date/time utility. Each sub-workflow is independently testable, independently debuggable, and can be updated without touching the main agent.

If the WooCommerce API changes, I will update one workflow. If HelpScout adds a new field I want to capture, I will update one workflow. The main agent workflow doesn't need to know about any of it.

n8n handles this with the toolWorkflow node type, which lets the agent call another workflow as a tool, passing structured parameters and receiving structured output. The agent decides when to call which tool — but the system prompt is explicit about priority order.

The WooCommerce Integration

Order lookup is a two-step API call. The agent receives a customer email from the conversation and passes it to the WooCommerce sub-workflow. That workflow hits /wp-json/wc/v3/customers?email=... to resolve the email to a customer ID, then hits /wp-json/wc/v3/orders?customer=... to retrieve the orders.

Two calls, but clean. The customer ID step also serves as a natural authentication check: if no customer record exists for that email, the workflow returns "No user found" and the agent responds accordingly. The agent only surfaces order information when the customer provides their own email, which handles the privacy concern without needing to build a separate auth flow.

The returned order data is filtered down to the fields that matter for support: order ID, status, creation date, modification date, and shipping address. The agent is instructed to summarize the last three completed orders in plain language, not to dump a JSON slop at the customer.

The HelpScout Bridge

When the bot can't answer a question, it doesn't just end the conversation: it offers to create a support ticket. That offer routes to the HelpScout sub-workflow.

The sub-workflow takes three parameters: email, a function flag (check_ticket or create_ticket), and a message body. If check_ticket, it queries HelpScout for existing conversations from that email and returns them. If create_ticket, it creates a new conversation with status "pending."

The most important part is: The system prompt instructs the agent to include the full conversation in transcript format, not a summary. The support team receives a complete record of what the customer asked, what the bot said, and where it got stuck.

This is the part that support teams actually care about. A ticket that says "customer had a question the bot couldn't answer" is useless. A ticket with a full transcript is actionable in 30 seconds.

The Part That Is Easy to Miss

Most modern LLMs are capable models. They are also, by default, eager to be helpful. Left to its own devices, the model will answer product questions from its training data rather than calling the knowledge base tool first. For most use cases, that's fine. For a store selling regulated psychedelics, "from training data" and "accurate" are not synonyms.

The system prompt went through several iterations and tests to address this. The current version includes a section labeled MANDATORY WORKFLOW in all caps, a STRICT RULES section with explicit instructions not to answer company or product questions without searching the knowledge base, and a statement that the agent is "failing its primary function" if it responds without querying Pinecone’s Knowledge Base first.

That's not just to show. It’s a statement that the model must confidently answer questions about dosing or usage with information that is on the product page or in the FAQ.

The lesson: the knowledge base tool call has to be treated as a constraint, not a preference. LLMs default to using their parametric knowledge because it's fast and usually sufficient. "Usually" isn't good enough when the question is about drug interactions.

The two-model setup helps. The main agent runs on GPT-4o. The knowledge base retrieval tool runs on GPT-4.1-mini, which handles the vector store query and formats the returned chunks. This cost split also creates a clear boundary: the expensive model reasons and orchestrates, and the cheaper model does the retrieval processing.

The Feedback Loop

Any question the bot can't answer gets logged on a Google Sheet, with the exact text of the question, the bot's response, the customer's email if they provided it, a timestamp, and a category the model assigns (Shipping, Product, Medical, Returns, General, Other).

This sheet is how the knowledge base improves over time. The team reviews it weekly, identifies patterns in unanswered questions, and adds new FAQ entries. Those entries get re-embedded and added to Pinecone. The next customer who asks the same question gets an actual answer.

Without this loop, the knowledge base stagnates at its initial state. You get a chatbot that handles the questions you anticipated and fails on everything else. With it, the bot gets measurably better at the questions your actual customers are asking.

The architecture is the product

A chatbot that's accurate about sensitive health questions, looks up live WooCommerce orders, creates support tickets with full transcripts, and logs its own knowledge gaps isn't a simple widget. The value isn't in the chat interface — that's two lines of embed code. The value is in the orchestration: the indexes that stay current, the sub-workflows that handle each concern in isolation, and the feedback loop that tells you what to build next.

Building a support chatbot for a WooCommerce store where accuracy matters — regulated products, medical guidance, customer-specific order data — is exactly the kind of production AI problem I take on. Contact me or see the RAG consulting work.