ML engineer. Building Vajra (LLM inference) at Georgia Tech, consulting on applied AI. Previously at Barcelona Supercomputing Center and CaixaBank.
Why simple chat benchmarks are not enough for inference performance evaluation, and how to model agentic workloads.
Part 3 of the State Space Models series: selective state spaces, Mamba, and Mamba-2.
Part 2 of the State Space Models series: how SSMs became neural networks, and the path to S4.
Part 1 of the State Space Models series: what SSMs are, where they came from, and why they matter.