Infrastructure/Coming Soon
Running Production LLMs on a DGX Spark
What it actually takes to run large language models locally. Hardware choices, memory architecture, inference optimization, and why 128GB of unified memory changes the game.
Blog
Thoughts on building with AI, running infrastructure, and the messy reality of making things work.
What it actually takes to run large language models locally. Hardware choices, memory architecture, inference optimization, and why 128GB of unified memory changes the game.
Envelope budgeting is a solved problem. So why did I spend months building a new app? Because the existing solutions either cost too much or do too little.
Everyone is building RAG. Most of it is bad. Here is what I learned building a retrieval pipeline that handles real-world government data.
Content managed with Tina CMS. New posts coming soon.