Atharva Bhutani
Implemented Multihead Latent Attention with Rotary Positional Embeddings, super fun.
Programming can be frustrating.
Good vibes for your inbox! ๐ Once a week.