Adding AI Interpretations: Building a Serverless API Architecture

- 4월 08, 2026

If the same interpretation appears every time a card is flipped, interest dies by the second reading. Real-time AI interpretation wasn't a nice-to-have. It was essential. The question was cost.

It Started with Static Text

In the early version, each of the 78 cards had a pre-written interpretation. Counting upright and reversed meanings, that meant 156 text blocks stored in a JSON file, matched and displayed whenever a card was flipped.

The advantages were obvious. Zero API calls, zero cost, zero latency. The interpretation appeared the instant the card turned. But there was a fatal flaw: once you'd seen an interpretation, it was the same every time. The heart of tarot reading is that "even the same card carries a different message depending on context," and static text couldn't deliver that.

The Moment I Decided to Go AI

The tipping point was the three-card reading. When Past, Present, and Future cards were drawn, listing their individual static interpretations was useful enough, but weaving a narrative that connects the three was impossible. "Because this card appeared in the past position, the present card takes on this particular meaning" -- that kind of contextual interpretation requires AI.

The problem was cost. At OpenAI GPT-4 pricing, each interpretation consumed hundreds of tokens. With hundreds of daily readings, expenses would add up fast. Spending tens of dollars a month on a side project wasn't an option.

Discovering Groq API

Groq uses custom LPU chips for extremely fast LLM inference. The critical detail: it offers a free tier. Up to 14,400 requests per day, running the LLaMA 3.3 70B model.

LLaMA 3.3 70B performs impressively for an open-source model. For creative text generation like tarot interpretations, it wasn't far behind GPT-4. More importantly, Groq's inference speed was exceptional. While typical LLM APIs take two to five seconds to respond, Groq returned results in under one second in most cases. That's a huge UX difference.

Is 14,400 requests per day enough? For a side project, absolutely. You'd need ten requests per minute around the clock to hit that limit. Early-stage traffic wouldn't come close.

The API Key Security Challenge

Calling an LLM API from a frontend-only app requires an API key. The problem is that frontend code runs in the browser, and everything embedded in the code is visible to users.

Environment variables don't help -- they get baked into the build. Open the browser DevTools, check the Network tab, and the API key is right there. Someone could grab it and burn through your free quota in minutes.

The conclusion was straightforward: I needed a middle layer. The frontend calls my server, and my server attaches the API key before forwarding the request to Groq. A classic proxy pattern.

Cloudflare Workers: The Free Serverless Answer

When "I need a server" is the conclusion, most developers think AWS Lambda or Vercel Serverless Functions. But Cloudflare Workers has a decisive advantage: 100,000 free requests per day and execution at edge locations worldwide for fast response times.

The Workers code is simple. When a request comes from the frontend, it forwards the body to the Groq API while adding the API key to the headers. It returns the response to the frontend. That's the entire logic.

CORS configuration was also necessary. Only requests from my domain are allowed; everything else is blocked. This also prevents other sites from secretly piggybacking on my Workers URL.

Prompt Engineering: Setting the Tone of Tarot

The prompt design was just as important as the technical architecture. Telling an LLM "interpret this tarot card" produces a dry, encyclopedia-style explanation. The appeal of a tarot reading lies in a tone that's mystical yet warm.

The key instructions in my prompt were: "You are an experienced tarot reader. Interpret with a mystical yet warm tone. Use the symbolism and imagery of each card, and frame your reading in a way that gives the querent courage. Even negative cards should be reframed as opportunities for growth."

The "gives courage" part was crucial. People who seek tarot readings are usually going through uncertainty or anxiety. When the Death card appears, there's a world of difference between "This signifies an ending and a new beginning" and simply "Something is going to end."

For multi-card spreads like Three-Card or Celtic Cross, I added instructions to interpret the relationships between cards. "Connect all three cards into a single cohesive narrative." That one line of instruction dramatically elevated the quality of AI interpretations.

Transitioning to Cloudflare Workers AI

Groq API was working well, so why switch to Cloudflare Workers AI? Two reasons.

First, architectural simplification. Using Groq meant depending on two external services: Cloudflare Workers for proxying and Groq for inference. Cloudflare Workers AI consolidates both proxy and inference on a single platform. One fewer point of failure.

Second, Workers AI's free tier was sufficient. It provides daily free allocations in neuron units, and for lightweight models handling side-project-level traffic, it was more than enough. Model performance was slightly below LLaMA 3.3 70B, but for the specific use case of tarot interpretation, the practical difference was negligible.

The migration was surprisingly simple. I replaced the Groq API call in my Workers code with Workers AI's AI binding. The frontend code didn't change at all, not a single line. The proxy sits between the frontend and the LLM, so the frontend doesn't know or care which model runs behind it. That's the beauty of the proxy architecture.

Running Production on Free Infrastructure

To sum up, the entire AI interpretation infrastructure runs for free. Groq API provides 14,400 daily requests, Cloudflare Workers provides 100,000 daily requests, Workers AI offers free allocations, and GitHub Pages is free. Total server cost: zero.

There are limits, of course. Exceeding free-tier quotas means service interruption. But given realistic side-project traffic, hitting those limits would itself be a sign of success. When that day comes, it's not too late to consider paid plans or alternative architectures.

"Start for free, solve problems when they arise." For side projects, I can't think of a more rational strategy.

What's Next

With AI interpretations in place, the project felt substantially more complete. But an app nobody can find might as well not exist. In Part 13, I cover the belated realization of SPA's inherent SEO limitations, and the scramble to fix things with pre-rendering and meta tags.

이 블로그 검색

goldtag