The case for local
steady workloads with predictable input sizes, data that can't leave the building, and tasks where a mid-tier open-source model is good enough. Classification, tagging, extraction.
The hardware floor
a single workstation with enough VRAM to hold the model plus headroom. Gemma 4, Nemotron 3 Super, or a GPT-OSS variant. Ollama or llama.cpp as the runtime.
Where local still loses
frontier reasoning, long-context research, anything that needs Opus-grade quality. Cost per token on open-source inference providers is often lower than owning hardware at small scale anyway.