$ whoami
tirth — compiler engineer @ NVIDIA
Welcome. Pull up a terminal.
This blog is where I write about the things I spend my days (and nights) thinking about: compilers, GPUs, parallel programming, and the dark arts of making code go fast.
Who is this for?
Anyone who finds themselves asking:
- Why does this shader compile slower than it runs?
- What does a GPU actually do with my code?
- How does a compiler decide when to vectorize a loop?
- What happens inside
nvccwhen you hit Enter?
If any of those questions make you lean forward in your chair, you're in the right place.
What I'll write about
Mostly things that come up at work or things I'm learning:
- Compiler internals — IR transformations, optimization passes, register allocation
- GPU architecture — warps, occupancy, memory hierarchy, the whole machine
- MLIR / LLVM — because that's where I live
- Performance engineering — profiling, bottlenecks, the gap between theory and practice
A taste: why compilers matter for GPUs
The gap between what you write and what the GPU executes is enormous. Consider this:
# You write this
output = input * 2.0 + bias
By the time this reaches silicon, the compiler has:
- Lowered it through multiple IR levels
- Decided how to vectorize across SIMD lanes
- Scheduled instructions to hide memory latency
- Allocated registers to avoid spills
- Emitted machine code for a specific SM architecture
Each of those steps is a rabbit hole. We'll go down most of them.
See you on the other side of the instruction boundary.
$ exit
logout