Changelog

Shipping Log

A record of what is actually shipping on this site and the infrastructure behind it. Published entries are written in my own voice; drafts are auto-generated from git history and curated before going live.

April 10, 2026/
MetaFeature

Added this changelog

The site now has a changelog at /changelog — partly as a record of what's shipping, partly as live evidence that the stack is actually being used.

A portfolio that claims to ship should be able to show you what it's shipped. This page is driven by MDX files in content/changelog/, edited through TinaCMS, with an auto-draft script that reads git commit history across the relevant repos and emits draft entries for curation before publishing. Low-friction for me, honest for you.

If you're looking at the first few entries and thinking "those are from tonight" — yes, they are. The first entry is the Qwen migration that made Reader fast enough to feel conversational. The second is the bug fix that made Reader actually render structured answers. This one is the meta-entry explaining why they're here.

The plan: new entries get written in a James-voice rewrite of what actually changed. Pure git-log commit messages go in the drafts folder; the curated published versions read like a normal changelog, not a commit dump.

April 10, 2026/
FixInfrastructure

Rewrote the chat proxy route, killed a markdown-eating filter

The Next.js → reader-agent proxy had quietly drifted to the wrong endpoint and was running a filter that stripped any response starting with a heading, numbered list, or bold text. Both fixed.

The Next.js API route for chat was proxying to the wrong backend path, and was running a legacy SSE filter that inspected each streamed chunk for patterns like ^##, ^\d+\.\s, and ^\s*\*\* — meaning any response that opened with a markdown heading, numbered list, or bolded phrase was being dropped before it reached the browser. The filter was left over from an earlier model that emitted visible "Thinking Process" prefixes; once the underlying LLM stopped doing that, the filter was pure downside.

Both are gone. The proxy is now a clean passthrough of reader-agent's SSE stream, so everything Reader writes — headings, bolded terms, numbered steps, the whole markdown vocabulary — reaches the UI as-is and gets rendered properly by the existing react-markdown layer.

Side effect: Reader can now actually answer "what's your architecture?" with structured bullet lists and headers, instead of silently returning empty responses.

April 10, 2026/
InfrastructureFeature

Moved Reader to fully-local inference on Qwen3.5-35B-A3B

The site's AI agent no longer calls a cloud API for chat. All inference now runs on my NVIDIA DGX Spark via vLLM, fronted by LiteLLM for routing and chat-template config.

Reader now answers you with a Qwen3.5-35B-A3B MoE model, AWQ-quantized, served by vLLM on my DGX Spark and fronted by a local LiteLLM proxy. Zero cloud dependency for the chat path.

The A3B part is where the speed comes from: it's a 35-billion-parameter sparse mixture-of-experts, but only ~3B parameters are active per token. For each response you read, the model is doing roughly the compute of a 3B dense model — which is why it streams noticeably faster than the Gemma 27B setup it replaced, and faster than the cloud Opus calls I was routing through before.

LiteLLM sits in front as a thin routing layer so I can swap the underlying vLLM process out, or fall back to a cloud model, without touching agent code. The LiteLLM config also disables Qwen's visible thinking mode (enable_thinking: false) — otherwise the model wraps answers in a "Thinking Process" preamble that's interesting for debugging but noise for visitors.

Cost per visitor interaction: effectively zero. The silicon is already running.