Pavan Balaji

Applied Research Scientist, Facebook

[Short Bio]

HOME
JOB OPPORTUNITIES
PUBLICATIONS

About Me

I am an Applied Research Scientist, Technical Lead, and Manager at Facebook where I lead the HPC Network Communications and Early Industry Partnerships Group. Before joining Facebook, I used to hold appointments as a Senior computer Scientist and Group Lead at the Argonne National Laboratory and as an Institute Fellow of the Northwestern-Argonne Institute of Science and Engineering at Northwestern University.

At Facebook, I contribute to the AI hardware/software codesign aspects of the Facebook ecosystem in a pathfinding research role. I am responsible for two broad areas: (1) codesigning our AI application requirements for communication, network, and scale-up/scale-out with various hardware and software solutions (both external or homegrown); and (2) influencing and codesigning vendor architecture and software roadmaps to align with Facebook’s long-term workload requirements.

In my previous job at Argonne, I have lead two groups. The first group focused on Programming Models and Runtime Systems and mainly covering areas related to communication runtime systems, threading models, accelerator models, big data and cloud computing systems, and other related technologies. The second group focused on Future Architectures for AI and mainly covered technologies related to AI, machine learning and deep learning, particularly in the area of AI for science. I also contributed to the design and software implementation of a number of projects on communication runtime systems (MPI, UCX), threading models (lightweight threads such as Argobots, OpenMP), and heterogeneous memory systems. Particularly noteworthy are the MPICH project (used by thousands of supercomputers around the world, including the three upcoming US Exascale supercomputers — Aurora, Frontier, and El Capitan), the UCX project (R&D100 award winner in 2019), and the Argobots project (R&D100 award finalist in 2020, and a driving piece of software for numerous supercomputers and commercial products such as Intel DAOS).

I have held several other leadership roles in the community serving on the board of directors or advisory board for numerous domestic and International projects, including UCX (US), Cilkplus (US), EPEEC (Europe), Exascale Technologies (China), and various others. I have also served as a general or technical program chair or editor for numerous high-profile conferences and journals including IEEE/ACM SC in 2019 (technical program chair), IEEE Cluster in 2015 (general co-chair), IEEE/ACM CCGrid in 2015 (general co-chair), and so on. I also served as an advisor for Argonne’s computing facility on the programming models aspects of the various upcoming supercomputers, including Aurora, which is expected to be one of the first US Exascale supercomputers.