---
title: "How This Site Is Built"
description: "The stack behind artificialcuriositylabs.ai: Astro on S3 + CloudFront, DNS on Cloudflare, email forwarding via SES, and an AI-readable content layer. Every decision explained."
canonical_url: "https://artificialcuriositylabs.ai/posts/how-this-site-is-built/"
md_url: "https://artificialcuriositylabs.ai/posts/how-this-site-is-built.md"
published_at: "2026-05-05T12:00:00.000Z"
modified_at: "2026-06-06T00:00:00.000Z"
tags:
  - "infrastructure"
  - "build-log"
---

## TL;DR

- Astro on S3 + CloudFront with OAC, DNS on Cloudflare (saves $49/year vs Route 53 on a `.ai` domain), email via SES + Lambda forward — total cost under $5/month with no managed platform dependency.
- Every post URL returns a 404 out of the box: CloudFront's `DefaultRootObject` only applies to `/`, not subdirectories — fix is a 15-line CloudFront Function that appends `index.html` before S3 sees the request.
- `llms-full.txt` auto-generated at build time concatenates every post into one Markdown file — any agent gets the complete site corpus in a single HTTP GET, zero crawling required.
- GitHub Actions deploys run two-pass S3 syncs: static assets get `max-age=31536000,immutable`; HTML and `llms*.txt` get no-cache — because Astro hashes asset filenames but not HTML.
- AWS credentials never touch the repo — GitHub Actions uses OIDC to assume an IAM role at runtime, and local deploys can use the default AWS credential chain on EC2 with no profile wiring.

---

I spent a day building this site from scratch. Not because there aren't easier options — there are plenty — but because I wanted to own the infrastructure and understand every layer. Here's what I built and why.

## The constraint that drove every decision

I didn't want a managed platform. Ghost, Squarespace, Substack — they're all fine until they're not. Pricing changes. Features get enshittified. Export formats break. The moment you depend on someone else's persistence layer for your writing, you're renting, not owning.

The goal: my content in plain markdown files, my infrastructure, my control. Total cost under $5/month.

## The stack

**Static site generator: Astro**

Astro compiles everything to static HTML at build time. No server, no runtime, no database. The site is just files. I used the Astro Paper theme as a base — dark mode default, clean typography, built-in search.

The build script does three things: generates `llms-full.txt` (more on that below), runs the Astro build, then generates the search index with Pagefind. One command.

**Hosting: S3 + CloudFront**

The built files go into an S3 bucket with public access completely disabled. CloudFront sits in front of it using Origin Access Control — only CloudFront can read from S3, nothing else. ACM handles the SSL cert.

**DNS: Cloudflare**

DNS is on Cloudflare, not Route 53. Route 53 charges $129/year for a `.ai` domain; Cloudflare charges $80. That's the entire reason for the split. The apex domain and www point at the CloudFront distribution via Cloudflare's DNS — the hosting layer doesn't change. Cloudflare's token model also makes it easy to give external tools (MCP servers, automation) scoped API access without handing over full account credentials.

Result: HTTPS enforced, HTTP redirects automatically, global CDN edge caching, and the bucket itself is locked down.

One gotcha I hit immediately: every post URL returned a 404. The files were in S3, the deploy worked fine — but `/posts/my-post/` returned nothing.

The problem is subtle. Astro (like most static site generators) builds each post as `/posts/my-post/index.html` — an `index.html` file inside a subdirectory. S3 doesn't resolve directories to index files. CloudFront's `DefaultRootObject` setting only applies to the apex `/` — it doesn't cascade to subdirectories. So when CloudFront asked S3 for `/posts/my-post/`, S3 found no object at that exact key, returned a 403, and CloudFront served the 404 page.

The fix is a CloudFront Function — a small JavaScript function that runs at the CDN edge on every incoming request before it reaches S3. It checks the URI: if it ends with `/`, append `index.html`. If it has no file extension, append `/index.html`. That's it — about 15 lines of code.

```js
function handler(event) {
  var request = event.request;
  var uri = request.uri;

  if (uri.endsWith('/')) {
    request.uri += 'index.html';
  } else if (!uri.includes('.')) {
    request.uri += '/index.html';
  }

  return request;
}
```

Attach it to the distribution as a viewer-request handler and every URL on the site resolves correctly. This is a one-time infrastructure fix — not something you repeat per deploy.

If you're building a static site on S3 + CloudFront with OAC, add this function before you go live. It's the kind of thing that works fine locally (dev servers handle it automatically) and breaks silently in production.

**Email: SES + Lambda**

(Note: email forwarding infra was set up for the domain, but the address is not currently listed as a public contact method — use X or LinkedIn instead. The setup details below are kept for the build history.)

I wanted `info@artificialcuriositylabs.ai` to work as a real email address without running a mail server. The setup:

1. SES receives inbound email for the domain
2. A receipt rule stores incoming messages in S3
3. A Lambda function rewrites the headers and forwards to Gmail, preserving the Reply-To so replies look native

It took about an hour to wire up. DKIM CNAME records and MX record in Cloudflare DNS, pointing at SES inbound. The Lambda function is about 80 lines of Node.js. Works exactly like having an inbox without actually having one.

## The decision I'm most glad I made: llms.txt

There's an emerging standard for AI-readable content — `llms.txt` as an index file, similar to `robots.txt`, that tells AI crawlers what's on the site and where. I added two files:

- `/llms.txt` — a curated index of pages, topics, and permissions
- `/llms-full.txt` — every blog post concatenated into a single file, auto-generated at build time

The second one is the interesting one. Any AI system that fetches `llms-full.txt` gets the complete text of everything I've published, in one request, structured and clean. It's a better interface for AI consumption than crawling individual HTML pages.

The generator script reads from the blog content directory, strips frontmatter, and concatenates everything with headers separating posts. Runs in under a second as part of the normal build.

I don't know exactly how this will get used — but making the content machine-readable is a zero-cost decision with asymmetric upside.

## The deploy workflow

**Production deploy is GitHub Actions.** Push to `main` triggers `.github/workflows/deploy.yml` when content or build inputs change (`src/**`, `public/**`, `scripts/**`, `astro.config.ts`, `package.json`). The workflow:

1. **Validates configuration** — fails fast if required repository secrets or variables are missing
2. **Assumes an AWS role via GitHub OIDC** — [`aws-actions/configure-aws-credentials`](https://github.com/aws-actions/configure-aws-credentials) exchanges the workflow's short-lived GitHub identity token for temporary AWS credentials at runtime
3. **`npm run build`** — generates `llms-full.txt`, runs the Astro build, generates the Pagefind search index
4. **Two S3 syncs** — static assets (JS, CSS, images) get long-lived cache headers (`max-age=31536000,immutable`); HTML, XML, and `llms*.txt` get no-cache headers so browsers always fetch the latest
5. **CloudFront invalidation** — `/*` to flush the CDN edge cache
6. **Blog markdown sync** — `src/data/blog/*.md` copied to an S3 `blog/` prefix (source for a Bedrock knowledge base)
7. **KB ingestion trigger** — starts a Bedrock re-sync; marked non-blocking so a job already in progress doesn't fail the deploy

The two-pass sync matters because the cache strategy is different per file type. Static assets are content-addressed (Astro hashes filenames), so they can be cached indefinitely. HTML is not — you want readers to see the new post immediately.

**Local deploy is the same pipeline, different credential path.** `./deploy.sh` runs the build and S3/CloudFront steps on your machine. It requires two environment variables — `S3_BUCKET` and `CLOUDFRONT_DISTRIBUTION_ID`. `AWS_PROFILE` is optional: set it on a laptop if you need to force a specific local AWS CLI profile, or leave it unset on EC2 so the default credential chain uses the instance profile. Use local deploy when you want to ship without pushing to `main`, or when debugging a failed Actions run.

The split between Cloudflare DNS and CloudFront hosting means Cloudflare never touches the content. DNS resolves to the CloudFront distribution, CloudFront pulls from S3, and deploy invalidates CloudFront directly. Cloudflare sees none of this — it just points the domain at the right IP.

**What is not in the repo:** bucket name, distribution ID, and Knowledge Base IDs live in GitHub repository variables and secrets. The workflow file names what must exist; the values stay out of git.

## The question this raises

Static sites feel like going backward until you realize what you're trading away: runtime complexity, database dependencies, server costs, someone else's uptime SLA. The question isn't "why would you use a static site in 2026" — it's "why would you add a server if you don't need one?"

## Timeline (initial setup)

**2026-05-05 — Day one**

- Domain registered: `artificialcuriositylabs.dev` via Route 53 ($17/year).
- AWS personal account setup: root MFA enabled, root keys deleted, IAM admin user, CloudTrail (multi-region), IAM Access Analyzer, monthly budget alerts.
- Static site stack: Astro Paper theme (dark default + toggle), S3 bucket (private), CloudFront with OAC, ACM SSL, initial Route 53 records, `llms.txt` + `llms-full.txt` (AI-readable layer, build-time generated).
- Email forwarding: SES inbound + Lambda to forward for info@artificialcuriositylabs.ai (infra set up; not currently listed for public contact — use socials), DKIM/MX in DNS.
- Homepage: custom hero + "Now" section (3 cards for writing/building/experimenting).

**Subsequent updates (captured in posts and deploys)**

- DNS switched to Cloudflare (cost + token model for agent/MCP access).
- GitHub Actions deploy workflow (validates vars, assumes deploy role via OIDC, two-pass S3 sync for cache headers, CloudFront invalidation, blog markdown sync to S3 for Bedrock KB, non-blocking KB ingestion trigger).
- Local `./deploy.sh` parity with CI.
- Ongoing: component refreshes (e.g. Surface for Now cards), messaging alignment (Now + About), syndication pipelines added for distribution, site structure simplification (retired redundant Build Log and Archives pages to reduce duplication).

The full change history lives in the repo. The dedicated Build Log page has been retired — its value is now distributed across this post, the blog posts themselves (many tagged build-log), the homepage "Now" cards, and the agent-infra work documented elsewhere.
