Deployment and Infrastructure — Running a Live Service at $0

 


The Reality of $0 Monthly Server Cost

The biggest enemy of side projects is server cost. No matter how good the idea, if monthly hosting and API bills become a burden, the service eventually gets shut down. Between "what you want to build" and "what you can sustain" lies a wall called cost.

This Saju app's total operating cost is $0. Excluding the domain name, hosting, API, CDN, and SSL are all free. Here's how that's possible — and what we gave up in return, honestly.

Architecture: Static + Serverless

The entire infrastructure rests on two pillars.

First, static hosting via GitHub Pages. We serve the React app's build output — HTML, CSS, JS, and images — through GitHub Pages. It's free for public repositories, includes a built-in CDN, and HTTPS is automatically configured.

Second, the AI API proxy via Cloudflare Workers. If the browser called the Groq API directly, the API key would be exposed. Cloudflare Workers sits in between as a proxy. The client sends requests to Workers, Workers attaches the API key and calls Groq API, then streams the response back to the client.

There's no server in this architecture. No traditional backend server, no database, no file storage. Saju calculations run in the client's browser (the rule engine is pure JavaScript), and AI interpretation is delegated to an external API through a serverless function (Workers).

GitHub Pages: The Static Hosting Choice

We chose GitHub Pages for straightforward reasons: free, reliable, and simple to deploy.

Push build artifacts to a specific branch (gh-pages) and deployment happens automatically. With GitHub Actions, we set up a CI/CD pipeline: every push to main triggers build, sitemap generation, pre-rendering, and deployment in sequence.

Custom domain setup is simple too. Enter the custom domain in repository settings, add a CNAME record in DNS, done. GitHub automatically provisions HTTPS certificates via Let's Encrypt.

GitHub Pages has clear limitations. It only serves static files — no server-side logic. There's a recommended site size limit (1GB) and a bandwidth soft limit (100GB/month). For a Saju app, these limits are essentially unreachable. The entire build output including pre-rendered HTML is in the tens of megabytes, and a personal side project rarely exceeds 100GB of monthly bandwidth.

Cloudflare Workers: The Serverless AI Proxy

Cloudflare Workers is the most critical infrastructure piece in this project. The entire AI interpretation flow passes through it.

Workers serves three roles.

First, API key protection. The Groq API key is stored as a Workers environment variable. The client only needs to know the Workers endpoint — it never accesses the actual API key. Keeping the key out of frontend code is essential.

Second, streaming proxy. Workers receives Groq API's streaming response (Server-Sent Events) and relays it directly to the client. Workers natively supports streaming, so there's zero relay latency from first token to last.

Third, CORS handling. Since the frontend on GitHub Pages and Cloudflare Workers are on different domains, Workers must set CORS (Cross-Origin Resource Sharing) headers. We implemented CORS middleware in Workers, specifying the allowed origin (GitHub Pages' domain) and necessary headers and methods.

Cloudflare Workers' free tier allows 100,000 requests per day. Each Saju interpretation requires 1-2 Workers requests (comprehensive interpretation plus optional detailed interpretation), so theoretically we could handle 50,000+ daily analyses. A personal side project is extremely unlikely to reach this limit.

Groq API: Fast and Free AI

The actual engine behind AI interpretation is Groq API. Groq runs open-source models like LLaMA on their proprietary LPU (Language Processing Unit) hardware, and their defining characteristic is extremely fast inference.

Two reasons we chose Groq's free tier: the cost is $0, and the inference speed makes streaming feel responsive. Time to First Token (TTFT) is very short, and token generation speed is fast, giving users the experience of an "instantly starting response."

The free tier has limitations — requests per minute (RPM) caps and daily token limits. Specific limits vary by model and time period, but they're sufficient for a personal project.

Still, we needed a strategy for when we approach the free tier's limits.

Rate Limiting Strategy

To efficiently manage free API limits, we applied multiple layers of rate limiting.

First, client-side limiting. If the same chart is re-requested within a short window, we return results from local cache. This prevents unnecessary API calls from users who "spam the refresh button."

Second, Workers-side limiting. Excessive requests from the same IP within a short period receive a 429 (Too Many Requests) response. This guards against malicious use or bot traffic.

Third, error handling. When Groq API returns a rate limit response (429), Workers translates it into a user-friendly message: "The service is currently busy. Please try again shortly." Instead of technical error codes, users see an understandable explanation.

Fourth, model routing as a bonus. The high-performance/lightweight model split mentioned in our cost optimization also helps with rate limiting. Lightweight models often have more generous token limits, so routing individual category interpretations through them uses the overall quota more efficiently.

The Build Pipeline

The complete pipeline from build to deployment:

In development, vite dev runs a local server. Hot Module Replacement (HMR) makes code changes reflect instantly, keeping the development experience smooth.

In the build phase, three steps run sequentially: vite build for production React build, generate-sitemap for automatic sitemap creation, and prerender for rendering key pages to static HTML.

In the deployment phase, build artifacts are pushed to the gh-pages branch. GitHub Actions automates this entire flow — push to main triggers build, sitemap, pre-render, and deploy in sequence.

Cloudflare Workers deploys separately. The wrangler CLI pushes Workers code, and environment variables (API keys, etc.) are configured in the Cloudflare dashboard. Workers code changes can be deployed independently of the frontend.

The Trade-offs of $0

Free infrastructure obviously has trade-offs. Here they are, honestly.

First, scalability ceiling. If traffic surges, you can hit GitHub Pages' bandwidth limit, Cloudflare Workers' request limit, and Groq API's token limit simultaneously. Not a realistic concern for a side project, but if something goes viral, response options are limited.

Second, model selection constraints. The models available on Groq's free tier are limited. You can't use top-tier models like Claude Sonnet or GPT-4o. LLaMA-based models produce good interpretations, but there is a quality gap compared to top-tier models.

Third, service dependency. GitHub Pages, Cloudflare Workers, Groq API — we depend on three free services. If any one changes its free policy or discontinues service, we're affected. Paid services carry similar risks, but free services come without SLAs (Service Level Agreements).

Fourth, no server-side logic. User authentication, data storage, analysis history — these features are difficult without a server. Currently the app stores data only in local storage, so switching browsers or devices means losing data.

Why We Accepted the Trade-offs

The reason is clear: a side project's first goal is to ship.

Setting up AWS or GCP for perfect infrastructure, building databases, creating authentication systems — this is how projects never get finished. Ship core features within free infrastructure constraints first; if traffic actually grows, migrate to paid infrastructure then.

This wasn't a "we'll deal with it later" attitude. The architecture was designed from the start to be swappable. Changing the AI API behind Cloudflare Workers from Groq to another provider means modifying only the Workers endpoint code — zero frontend changes. Moving from GitHub Pages to Vercel or Netlify just means deploying the build output elsewhere.

This "swappability" was the key design principle that made $0 infrastructure a viable choice.

The Reality of Running a Live Service

Here's our experience running a live service on $0 infrastructure.

Stability was better than expected. We experienced virtually no GitHub Pages downtime, and Cloudflare Workers was rock-solid. Groq API had occasional slowdowns but never a complete outage.

Speed was satisfying. GitHub Pages' CDN makes static asset loading fast, and Groq's fast inference speed made AI interpretation streaming feel responsive. The entire flow from initial page load to analysis results was smooth.

The biggest advantage was near-zero operational burden. No servers to monitor, no security patches to apply, no scaling to worry about. Push code, it deploys automatically, and three free services handle the rest. For side projects, this "minimization of operational overhead" is the decisive factor in keeping a service alive long-term.

What This Choice Means

Running a live service at $0 total cost is a technical achievement, but the more important meaning is this: anyone with an idea can ship a live service. The era of abandoning projects due to server costs is over. By combining free infrastructure like GitHub Pages, Cloudflare Workers, and Groq API, coding skill and ideas alone are enough to put a service into the world.

Of course, costs will arise as traffic grows. But that's a "good problem." Having costs because users are growing is far better than shutting down because there are no users. Start at $0 and scale costs with growth — this is the realistic infrastructure strategy for side projects.

Coming Up Next

We're three-quarters through the 20-part series. We've covered design, domain modeling, calculation engine, AI interpretation, UI, SEO, and deployment — the entire journey of the Saju app. Starting with the next installment, we look back at the whole project. The evolution of AI collaboration methodology across two projects (Tarot and Saju), and mapping the boundary of what only humans can do in complex domains.

댓글

이 블로그의 인기 게시물

사랑을 직접 올리지 않는 설계

감정을 변수로 옮기다 — 3계층 감정 모델

시작의 충동 — "타로 웹앱을 만들어볼까?"