Pavan Balaji

Research Scientist and Technical Lead, Meta Platforms

[Short Bio]

HOME
JOB OPPORTUNITIES
PUBLICATIONS

About Me

I am a Research Scientist and Technical Lead at Meta Platforms where I drive two separate technical directions: (1) early industry partnerships for developing large-scale supercomputers in support of our AI workloads; and (2) design and optimization of HPC networks, communication libraries, and their supporting ecosystem. I contribute to the AI hardware/software codesign aspects of the Meta ecosystem in a pathfinding research role. I help codesign our AI application requirements for communication, network, and scale-up/scale-out with various hardware and software solutions (both external or homegrown), and influence vendor architecture and software roadmaps to align with Meta’s long-term workload requirements.

Before joining Meta, I used to hold appointments as a Senior computer Scientist and Group Lead at the Argonne National Laboratory and as an Institute Fellow of the Northwestern-Argonne Institute of Science and Engineering at Northwestern University. At Argonne, I have lead two groups. The first group focused on Programming Models and Runtime Systems and mainly covering areas related to communication runtime systems, threading models, accelerator models, big data and cloud computing systems, and other related technologies. The second group focused on Future Architectures for AI and mainly covered technologies related to AI, machine learning and deep learning, particularly in the area of AI for science. I also contributed to the design and software implementation of a number of projects on communication runtime systems (MPI, UCX), threading models (lightweight threads such as Argobots, OpenMP), and heterogeneous memory systems. Particularly noteworthy are the MPICH project (used by thousands of supercomputers around the world, including the three upcoming US Exascale supercomputers — Aurora, Frontier, and El Capitan), the UCX project (R&D100 award winner in 2019), and the Argobots project (R&D100 award finalist in 2020, and a driving piece of software for numerous supercomputers and commercial products such as Intel DAOS).

I have held several other leadership roles in the community serving on the board of directors or advisory board for numerous domestic and International projects, including UCX (US), Cilkplus (US), EPEEC (Europe), Exascale Technologies (China), and various others. I have also served as a general or technical program chair or editor for numerous high-profile conferences and journals including IEEE/ACM SC in 2019 (technical program chair), IEEE Cluster in 2015 (general co-chair), IEEE/ACM CCGrid in 2015 (general co-chair), and so on. I also served as an advisor for Argonne’s computing facility on the programming models aspects of the various upcoming supercomputers, including Aurora, which is expected to be one of the early US Exascale supercomputers.