Pavan Balaji

Research Scientist and Technical Lead, Meta Platforms

[Short Bio]

HOME
JOB OPPORTUNITIES
PUBLICATIONS

About Me

Dr. Pavan Balaji is a Distinguished Research Scientist at Meta’s Superintelligence Laboratory (MSL), where he serves as the overall technical lead for AI communication libraries for GPUs and Meta’s internal silicon. His work includes key initiatives such as NCCLX and Torchcomms, which serve as core infrastructure for Meta’s AI workloads. Additionally, he has contributed to Meta’s largest GPU supercomputers, including the Grand Teton and Catalina systems. These 100K+ GPU platforms form the backbone for Meta’s AI training and serving infrastructure across Generative AI models (Muse Spark, Llama), Instagram, Facebook Ads, and Reels.

Before joining Meta, Dr. Balaji held appointments as a Senior computer Scientist and Group Lead at the Argonne National Laboratory and as an Institute Fellow of the Northwestern-Argonne Institute of Science and Engineering at Northwestern University. He contributed to the design and software implementation of a number of projects on communication runtime systems (MPI, UCX), threading models (lightweight threads such as Argobots, OpenMP), and heterogeneous memory systems. Particularly noteworthy are the MPICH project (used by thousands of supercomputers around the world, including the three US Exascale supercomputers — Aurora, Frontier, and El Capitan), the UCX project (R&D100 award winner in 2019), and the Argobots project (R&D100 award finalist in 2020, and a driving piece of software for numerous supercomputers and commercial products such as Intel DAOS).

Dr. Balaji has held several other leadership roles in the community serving on the board of directors or advisory board for numerous domestic and International projects, including UCX (US), Cilkplus (US), EPEEC (Europe), and Exascale Technologies (China). He has also served on the organizing committee for numerous high-profile conferences and journals including IEEE/ACM SC (technical program chair), IEEE Cluster (general co-chair), IEEE/ACM CCGrid (general co-chair, program chair), and IEEE TPDS (associate editor-in-chief).