Research Scientist and Technical Lead, Meta Platforms
HOME
JOB OPPORTUNITIES
PUBLICATIONS
Dr. Pavan Balaji is a Principal Research Scientist at Meta AI, where he serves as the technical lead for two areas: (1) GPU training systems (architectural design, performance analysis); and (2) AI communication libraries for our various hardware systems (GPUs, Meta internal silicon). Dr. Balaji helped build some of Meta’s largest AI supercomputing systems, such as the recent Grand Teton architecture, that power Meta’s internal AI workloads, including recommendation and ranking models and Generative AI models such as Llama.
Before joining Meta, Dr. Balaji held appointments as a Senior computer Scientist and Group Lead at the Argonne National Laboratory and as an Institute Fellow of the Northwestern-Argonne Institute of Science and Engineering at Northwestern University. He contributed to the design and software implementation of a number of projects on communication runtime systems (MPI, UCX), threading models (lightweight threads such as Argobots, OpenMP), and heterogeneous memory systems. Particularly noteworthy are the MPICH project (used by thousands of supercomputers around the world, including the three US Exascale supercomputers — Aurora, Frontier, and El Capitan), the UCX project (R&D100 award winner in 2019), and the Argobots project (R&D100 award finalist in 2020, and a driving piece of software for numerous supercomputers and commercial products such as Intel DAOS).
Dr. Balaji has held several other leadership roles in the community serving on the board of directors or advisory board for numerous domestic and International projects, including UCX (US), Cilkplus (US), EPEEC (Europe), and Exascale Technologies (China). He has also served on the organizing committee for numerous high-profile conferences and journals including IEEE/ACM SC (technical program chair), IEEE Cluster (general co-chair), IEEE/ACM CCGrid (general co-chair, program chair), and IEEE TPDS (associate editor-in-chief).