Research Scientist and Technical Lead, Meta Platforms
HOME
JOB OPPORTUNITIES
PUBLICATIONS
James S. Dinan and Pavan Balaji. Scalable Computing and Communications: Theory and Practice. Chapter on Parallel Programming Models for Scalable Computing. Editors: Samee Ullah Khan, Lizhe Wang, and Albert Y. Zomaya. John Wiley & Sons Publishing, 2012.
Pavan Balaji, Darius T. Buntinas and Dries Kimpe. Scalable Computing and Communications: Theory and Practice. Chapter on Fault Tolerance Techniques for Scalable Computing. Editors: Samee Ullah Khan, Lizhe Wang, and Albert Y. Zomaya. John Wiley & Sons Publishing, 2012.
Pavan Balaji, Wu-chun Feng and Qian Zhu. Scalable Computing and Communications: Theory and Practice. Chapter on Virtualization Techniques for Graphics Processing Units. Editors: Samee Ullah Khan, Lizhe Wang, and Albert Y. Zomaya. John Wiley & Sons Publishing, 2012.
Dhabaleswar K. Panda, Pavan Balaji, Sayantan Sur and Matthew Koop. Attaining High Performance Communication: A Vertical Approach. Chapter on Commodity High Performance Interconnects. Editor: Ada Gavrilovska. CRC Press, 2009.
Wu-chun Feng and Pavan Balaji. Attaining High Performance Communication: A Vertical Approach. Chapter on Ethernet vs. Ethernot. Editor: Ada Gavrilovska. CRC Press, 2009.
Pavan Balaji, P. Sadayappan and Mohammad Kamrul Islam. Market-Oriented Grid and Utility Computing. Chapter on Techniques on Providing Hard Quality of Service Guarantees in Job Scheduling. Editors: Rajkumar Buyya and Kris Bubendorfer. Wiley Publishers, 2008.
Jianqiu Ge, Jintao Meng, Ning Guo, Yanjie Wei, Pavan Balaji, and Shengzhong Feng. Counting Kmers for Biological Sequences at Large Scale. International Journal of Interdisciplinary Sciences: Computational Life Sciences (INSC). pp. 99-108, March, 2020.
Adrian Castello, Rafael Mayo Gual, Sangmin Seo, Pavan Balaji, Enrique S. Quintana-Orti, and Antonio J. Pena. Analysis of Threading Libraries for High Performance Computing. IEEE Transactions on Computers (TC), Vol. 69, Issue 9, pp. 1279–1292, 2020. [pdf]
Tao Gao, Yanfei Guo, Pietro Cicotti, Yutong Lu, Pavan Balaji, and Michela Taufer. Memory-Efficient and Skew-Tolerant MapReduce over MPI for Supercomputing Systems. IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 31, Issue 12, pp. 2734–2748, 2020. [pdf]
Sarunya Pumma, Min Si, Wu-chun Feng, and Pavan Balaji. Scalable Deep Learning via I/O Analysis and Optimization. ACM Transactions on Parallel Computing (ToPC), Vol. 6, Issue 2, 2019. [pdf]
Abdelhalim Amer, Huiwei Lu, Pavan Balaji, Milind Chabbi, Yanjie Wei, Jeff Hammond, and Satoshi Matsuoka. Lock Contention Management in Multithreaded MPI. ACM Transactions on Parallel Computing (ToPC), Vol. 5, Issue 3, 2018. [pdf]
Min Si, Antonio J. Pena, Jeffrey R. Hammond, Pavan Balaji, Masamichi Takagi, and Yutaka Ishikawa. Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications. IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 29, Issue 9, pp. 1975–1989, 2018. [pdf]
Sangmin Seo, Abdelhalim Amer, Pavan Balaji, Cyril Bordage, George Bosilca, Alex Brooks, Philip Carns, Adrian Castello, Damien Genet, Thomas Herault, Shintaro Iwasaki, Prateek Jindal, Laxmikant V. Kale, Sriram Krishnamoorthy, Jonathan Lifflander, Huiwei Lu, Esteban Meneses, Marc Snir, Yanhua Sun, Kenjiro Taura, and Peter H. Beckman. Argobots: A Lightweight Low-Level Threading and Tasking Framework. IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 29, Issue 3, pp. 512–526, 2018. [pdf]
Adrian Castello, Rafael Mayo Gual, Kevin Sala, Vicenc Beltran, Pavan Balaji and Antonio J. Pena. On the Adequacy of Lightweight Thread Approaches for High-Level Parallel Programming Models. International Journal of Future Generation Computer Systems (FGCS), Vol. 84, pp. 22–31, 2018. [pdf]
Adrian Castello, Antonio J. Pena, Rafael Mayo Gual, Judit Planas, Enrique S. Quintana-Orti, and Pavan Balaji. Exploring the Interoperability of Remote GPGPU Virtualization using rCUDA and Directive-based Programming Models. Elsevier Journal of Supercomputing (JoS), Vol. 74, Issue 11, pp. 5628–5642, 2018. [pdf]
Andrew A. Chien, Pavan Balaji, Nan Dun, Aiman Fang, Hajime Fujita, Kamil Iskra, Zachary A. Rubenstein, Ziming Zheng, Jeffrey R. Hammond, Ignacio Laguna, David F. Richards, Anshu Dubey, Brian van Straalen, Mark Hoemmen, Michael A. Heroux, Keita Teranishi, and Andrew R. Siegel. Exploring Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience. International Journal of High Performance Computing Applications (JHPCA), 31(6), pp. 564–590, 2017. [pdf]
Boyu Zhang, Trilce Estrada, Pietro Cicotti, Pavan Balaji, Michela Taufer. Enabling Scalable and Accurate Clustering of Distributed Ligand Geometries on Supercomputers. International Parallel Computing (ParCo) Journal, Vol. 63, pp. 38–60, 2017. [pdf]
Ashwin M. Aji, Antonio J. Pena, Pavan Balaji, Wu-chun Feng. MultiCL: Enabling Automatic Scheduling for Task-Parallel Workloads in OpenCL. International Parallel Computing (ParCo) Journal, Vol. 58, pp. 37–55, 2016. [pdf]
James S. Dinan, Pavan Balaji, Darius T. Buntinas, David J. Goodell, William D. Gropp, Rajeev S. Thakur. An Implementation and Evaluation of the MPI 3.0 One-Sided Communication Interface. Journal of Concurrency and Computation: Practice and Experience (CCPE), Vol. 28, Issue 17, pp. 4385–4404, 2016. [pdf]
Humayun Arafat, James S. Dinan, Sriram Krishnamoorthy, Pavan Balaji, and P. Sadayappan. Work Stealing for GPU-accelerated Parallel Programs in a Global Address Space Framework. Journal of Concurrency and Computation: Practice and Experience (CCPE), Vol. 28, Issue 13, pp. 3637–3654, 2016. [pdf]
Saif Ur-Rehman Malik, Samee Ullah Khan, Nikos Tziritas, Joanna Kolodziej, Albert Y. Zomaya, Sajjad A. Madani, Nasro Min-Allah, Lizhe Wang, Cheng-Zhong Xu, Qutaibah Marwan Malluhi, Jonathan E. Pecero, Pavan Balaji, Abhinav Vishnu, Rajiv Ranjan, Sherali Zeadally, and Hongxiang Li. Performance Analysis of Data Intensive Cloud Systems based on Data Management and Replication: A Survey. Journal of Distributed and Parallel Databases (DAPD). Vol. 34, No. 2, pp. 179–215, 2016. [pdf]
Junaid Shuja, Kashif Bilal, Sajjad A. Madani, Mazliza Othman, Rajiv Ranjan, Pavan Balaji, and Samee Ullah Khan. Survey of Techniques and Architectures for Designing Energy Efficient Data Centers. IEEE Systems Journal. Vol. 10, No. 2, pp. 507–519, 2016. [pdf]
Abdul Hameed, Alireza Khoshkbarforoushha, Rajiv Ranjan, Prem Prakash Jayaraman, Joanna Kolodziej, Pavan Balaji, Sherali Zeadally, Qutaibah Marwan Malluhi, Nikos Tziritas, Abhinav Vishnu, Samee Ullah Khan, and Albert Y. Zomaya. A Survey and Taxonomy on Energy Efficient Resource Allocation Techniques for Cloud Computing Systems. Journal of Computing. Vol. 98, No. 7, pp. 751–774, 2016. [pdf]
Antonio J. Pena and Pavan Balaji. A Data-oriented Profiler to Assist in Data Partitioning and Distribution for Heterogeneous memory in HPC. International Parallel Computing (ParCo) Journal. Vol. 51, pp. 46–55, 2016. [pdf]
Torsten Hoefler, James S. Dinan, Rajeev S. Thakur, Brian Barrett, Pavan Balaji, William D. Gropp, and Keith Underwood. Remote Memory Access Programming in MPI-3. ACM Transactions on Parallel Computing (ToPC). Vol. 2, Number 2, July Issue, pages 9:1–9:26, 2015. [pdf]
Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Karthik Murthy, Milind Chabbi, Pavan Balaji, Keith R. Bisset, James S. Dinan, Wu-chun Feng, John Mellor-Crummey, Xiaosong Ma, and Rajeev S. Thakur. MPI-ACC: Accelerator-Aware MPI for Scientific Applications. IEEE Transactions on Parallel and Distributed Systems (TPDS). Vol. PP, Issue 99, pages 1–14, 2015. [pdf]
Ryan E. Grant, Mohammad Rashti, Pavan Balaji, and Ahmad Afsahi. Scalable Connectionless RDMA over Unreliable Datagrams. International Parallel Computing (ParCo) Journal. Vol. 48, pages 15–39, 2015. [pdf]
Jintao Meng, Bingqiang Wang, Yanjie Wei, Shengzhong Feng, and Pavan Balaji. SWAP-Assembler: Scalable and Efficient Genome Assembly Towards Thousands of Cores. BMC Bioinformatics Journal. Vol. 15 (Suppl. 9), 2014. [pdf]
James S. Dinan, Pavan Balaji, David J. Goodell, Douglas Miller, Marc Snir and Rajeev S. Thakur. Enabling Communication Concurrency Through Flexible MPI Endpoints. International Journal of High Performance Computing Applications (JHPCA); special issue for the Euro MPI Users’ Group Meeting (Euro MPI), Vol. 28, Issue 4, pp. 390–405. [pdf]
Marc Snir, Robert W. Wisniewski, Jacob A. Abraham, Sarita V. Adve, Saurabh Bagchi, Pavan Balaji, Jim Belak, Pradip Bose, Franck Cappello, Bill Carlson, Andrew A. Chien, Paul Coteus, Nathan A. Debardeleben, Pedro Diniz, Christian Engelmann, Mattan Erez, Saverio Fazzari, Al Geist, Rinku K. Gupta, Fred Johnson, Sriram Krishnamoorthy, Sven Leyffer, Dean Liberty, Subhashish Mitra, Todd Munson, Robert Schreiber, Jon Stearley, and Eric Van Hensbergen. Addressing Failures in Exascale Computing. International Journal of High Performance Computing Applications (JHPCA), Vol. 28, Issue 2, pp. 129–173, 2014. [pdf]
John Jenkins, James S. Dinan, Pavan Balaji, Tom Peterka, Nagiza F. Somatova, Rajeev S. Thakur. Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data. IEEE Transactions on Parallel and Distributed Systems (TPDS), pp. 2627–2637, Vol. 25, Issue 10, 2013. [pdf]
Torsten Hoefler, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Brian Barrett, Ronald Brightwell, William D. Gropp, Vivek Kale and Rajeev S. Thakur. MPI+MPI: A New, Hybrid Approach to Parallel Programming with MPI Plus Shared Memory. Springer Journal of Computing; special issue for the Euro MPI Users’ Group Meeting (Euro MPI), pp. 1121–1136, Vol. 95, 2013. [pdf]
Hameed Hussain, Nasro Min-Allah, Samee Ullah Khan, Abdul Hameed, Saif Ur-Rehman Malik, Limin Zhang, Nasir Ghani, Joanna Kolodziej, Albert Y. Zomaya, Cheng-Zhong Xu, Pavan Balaji, Abhinav Vishnu, Fredric Pinel, Jonathan E. Pecero, Pascal Bouvry, and Ammar Rayes. A Survey on Resource Allocation in High Performance Distributed Computing Systems. International Parallel Computing (ParCo) Journal, pp. 709–736, Vol. 39, Issue 11, 2013. [pdf]
Jue Hong, Pavan Balaji, Gaojin Wen, Bibo Tu, Junming Yan, Cheng-Zhong Xu, and Shengzhong Feng. Implementation and Evaluation of Container-based Job Management for Fair Resource Sharing. Lecture Notes in Computer Science and General Issues; special issue for the International Supercomputing Conference (ISC), pp. 290–301, Vol. 7905, 2013. (conference: Jun. 16–20, 2013, Leipzip, Germany.) [pdf] [slides]
Abhinav Vishnu, Shuaiwen Song, Andres Marquez, Kevin Barker, Darren Kerbyson, Kirk W. Cameron and Pavan Balaji. Designing Energy Efficient Communication Runtime Systems: A View from PGAS Models. Journal of Supercomputing (JoS), pp. 691-709, Vol. 63, Issue 3, 2013. [pdf]
Giorgio Luigi Valentini, Walter Lassonde, Samee Ullah Khan, Nasro Min-Allah, Sajjad A. Madani, Juan Li, Limin Zhang, Lizhe Wang, Nasir Ghani, Joanna Kolodziej, Hongxiang Li, Albert Y. Zomaya, Cheng-Zhong Xu, Pavan Balaji, Abhinav Vishnu, Frederic Pinel, Jonathan E. Pecero, Dzimitry Kliazovich, and Pascal Bouvry. An Overview of Energy Efficiency Techniques in Cluster Computing Systems. Springer Journal of Cluster Computing; special issue on Green Computing and Communications, pp. 3-15, Vol. 16, Issue 1, 2013. [pdf]
Pavan Balaji, Rinku K. Gupta, Abhinav Vishnu and Peter H. Beckman. Mapping Communication Layouts to Network Hardware Characteristics on Massive-Scale Blue Gene Systems. Springer Journal of Computer Science on Research and Development; special issue for the International Supercomputing Conference (ISC), pp. 247-256, Vol. 26, Issue 3-4, 2011. (conference: Jun. 18–23, 2011, Hamburg, Germany.) [pdf] [slides]
Pavan Balaji, Darius T. Buntinas, David J. Goodell, William D. Gropp, Torsten Hoefler, Sameer Kumar, Ewing L. (Rusty) Lusk, Rajeev S. Thakur and Jesper Larsson Tr"aff. MPI on Millions of Cores. Parallel Processing Letters (PPL) Journal; special issue for the Euro MPI Users’ Group Meeting (Euro MPI), pp. 45–60, Vol. 21, Issue 1, 2011. [pdf]
Pavan Balaji, Wu-chun Feng, Heshan Lin, Jeremy Archuleta, Satoshi Matsuoka, Andrew Warren, Joao Carlos Setubal, Ewing L. (Rusty) Lusk, Rajeev S. Thakur, Ian Foster, Daniel S. Katz, Shantenu Jha, Kevin Shinpaugh, Susan Coghlan, and Daniel A. Reed. Global-scale Distributed I/O with ParaMEDIC. Journal of Concurrency and Computation: Practice and Experience (CCPE), pp. 2266–2281, Vol. 22, Issue 16, 2010. [pdf]
Pavan Balaji, Anthony K. Chan, William D. Gropp, Rajeev S. Thakur and Ewing L. (Rusty) Lusk. The Importance of Non-Data-Communication Overheads in MPI. International Journal of High Performance Computing Applications (IJHPCA); special issue for the Euro MPI Users’ Group Meeting (Euro MPI), pp. 5–15, Vol. 24, Issue 1, 2010. [pdf]
Pavan Balaji, Darius T. Buntinas, David J. Goodell, William D. Gropp and Rajeev S. Thakur. Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming. International Journal of High Performance Computing Applications (IJHPCA); special issue for the Euro MPI Users’ Group Meeting (Euro MPI), pp. 49–57, Vol. 24, Issue 1, 2010. [pdf]
Jesper Larsson Tr"aff, Andreas Ripke, Christian Siebert, Pavan Balaji, Rajeev S. Thakur and William D. Gropp. A Pipelined Algorithm for Large, Irregular All-gather Problems. International Journal of High Performance Computing Applications (IJHPCA); special issue for the Euro MPI Users’ Group Meeting (Euro MPI), pp. 58–68, Vol. 24, Issue 1, 2010. [pdf]
Pavan Balaji, Anthony K. Chan, Rajeev S. Thakur, William D. Gropp and Ewing L. (Rusty) Lusk. Toward Message Passing for a Million Processes: Characterizing MPI on a Massive Scale Blue Gene/P. Springer Journal of Computer Science on Research and Development; special issue for the International Supercomputing Conference (ISC), pp. 11–19, Vol. 24, Issue 1, 2009. \bestpaper{Best Paper Award at ISC.} (conference: June 23–26, 2009, Hamburg, Germany.) [pdf] [slides]
Ping Lai, Pavan Balaji, Rajeev S. Thakur and Dhabaleswar K. Panda. ProOnE: A General Purpose Protocol Onload Engine for Multi- and Many-Core Architectures. Springer Journal of Computer Science on Research and Development; special issue for the International Supercomputing Conference (ISC), pp. 133–142, Vol. 23, Issue 3, 2009. (conference: June 23–26, 2009, Hamburg, Germany.) [pdf] [slides]
Pavan Balaji, Wu-chun Feng and Dhabaleswar K. Panda. Bridging the Ethernet-Ethernot Performance Gap. IEEE Micro Journal; special issue on High-Performance Interconnects, pp. 24–40, Vol. 26, Issue 3, 2006. [pdf]
Hyun-Wook Jin, Pavan Balaji, Chuck Yoo, Jin-Young Choi and Dhabaleswar K. Panda. Exploiting NIC Architectural Support for Enhancing IP based Protocols on High Performance Networks. Journal of Parallel and Distributed Computing (JPDC); special issue on Design and Performance of Networks for Super-, Cluster- and Grid-Computing, pp. 1348–1365, Vol. 65, Issue 11, 2005. [pdf]
Mohammad Kamrul Islam, Pavan Balaji, P. Sadayappan and Dhabaleswar K. Panda. QoPS: A QoS based scheme for Parallel Job Scheduling (extended journal version). IEEE Springer LNCS Journal Series, pp. 252–268, Vol. 2862, 2003. [pdf]
Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen and Pavan Balaji. CAB-MPI: Exploring Interprocess Work-Stealing towards Balanced MPI Communication. IEEE/ACM International Conference on High Performance Computing, Networking, Storage, and Analysis (SC). Nov. 9–19, 2020, virtual event. [pdf] [slides] [video]
Rohit Zambre, Aparna Chandramowlishwaran and Pavan Balaji. How I Learned to Stop Worrying about User-Visible Endpoints and Love MPI. International Conference on Supercomputing (ICS). June 29–July 2, 2020, Barcelona, Spain. [pdf] [slides] [video]
Shintaro Iwasaki, Abdelhalim Amer, Kenjiro Taura, Sangmin Seo, and Pavan Balaji. BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads. International Conference on Parallel Architectures and Compilation Techniques (PACT). Sep. 21–25, 2019, Seattle, Washington. \bestpaper{Best Paper Award.} [pdf] [slides]
Xiaomin Zhu, Yunhui Zeng, Yanjie Wei, Shengzhong Feng, and Pavan Balaji. An Auto Code Generator for Stencil on SW26010. IEEE International Conference High Performance Computing and Communications (HPCC). Aug. 10–12, 2019, Zhangjiajie, China.
Seonmyeong Bak, Yanfei Guo, Pavan Balaji and Vivek Sarkar. Optimized Execution of Parallel Loops via User-Defined Scheduling Policies. International Conference on Parallel Processing (ICPP). Aug. 5–8, 2019, Kyoto, Japan. [pdf] [slides]
Joshua Davis, Tao Gao, Sunita Chandrasekaran, Heike Jagode, Anthony Danalis, Pavan Balaji, Jack Dongarra, and Michela Taufer. Characterization of Power Usage and Performance in Data-Intensive Applications using MapReduce over MPI. International Conference on Parallel Computing (ParCo). Sep. 10–13, 2019, Prague, Czech Republic. [pdf] [slides]
Abdelhalim Amer, Charles Archer, Michael Blocksome, Chongxiao Cao, Michael Chuvelev, Hajime Fujita, and Maria Garzaran, Yanfei Guo, Jeffrey R. Hammond, Shintaro Iwasaki, Kenneth J. Raffenetti, Mikhail Shiryaev, Min Si, Kenjiro Taura, Sagar Thapaliya, and Pavan Balaji. Software Combining to Mitigate Multithreaded MPI Contention. ACM International Conference on Supercomputing (ICS). Jun. 26–28, 2019, Phoenix, Arizona. [pdf] [slides]
Tao Gao, Yanfei Guo, Boyu Zhang, Pietro Cicotti, Yutong Lu, Pavan Balaji, and Michela Taufer. On the Power of Combiner Optimizations in MapReduce over MPI Workflows. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 11–13, 2018, Sentosa, Singapore. [pdf] [slides]
Rohit Zambre, Aparna Chandramowlishwaran and Pavan Balaji. Scalable Communication Endpoints for MPI+Threads Applications. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 11–13, 2018, Sentosa, Singapore. \bestpaper{Best Poster Award.} [pdf] [slides] [poster]
Giuseppe Congiu and Pavan Balaji. Evaluating the Impact of High-Bandwidth Memory on MPI Communications. IEEE International Conference on Computer and Communications (ICCC). Dec. 7–10, 2018, Chengdu, China. [pdf] [slides]
Kenneth J. Raffenetti, Neelima Bayyapu, and Pavan Balaji. Locality-Aware PMI Usage for Efficient MPI Startup. IEEE International Conference on Computer and Communications (ICCC). Dec. 7–10, 2018, Chengdu, China [pdf] [slides]
Shintaro Iwasaki, Abdelhalim Amer, Kenjiro Taura, and Pavan Balaji. Lessons Learned from Analyzing Dynamic Promotion for User-level Threading. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 11–16, 2018, Dallas, Texas. [pdf] [slides]
Sudheer Chunduri, Scott Parker, Pavan Balaji, Kevin Harms, and Kalyan Kumaran. Characterization of MPI Usage on a Production Supercomputer. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 11–16, 2018, Dallas, Texas. [pdf] [slides]
Atsushi Hori, Min Si, Balazs Gerofi, Masamichi Takagi, Jai Dayal, Pavan Balaji, and Yutaka Ishikawa. Process-in-Process: Techniques for Practical Address-Space Sharing. ACM International Conference on High Performance Distributed Computing (HPDC). \bestpaper{Best Paper Award.} Jun. 11–15, 2018, Tempe, Arizona. [pdf] [slides]
Seyed Hessamedin Mirsadeghi, Jesper Larsson Traff, Pavan Balaji and Ahmad Afsahi. Exploiting Common Neighborhoods to Optimize MPI Neighborhood Collectives. IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC). Dec. 18–21, 2017, Jaipur, India. [pdf] [slides]
Tao Gao, Yanfei Guo, Yanjie Wei, Bingqiang Wang, Yutong Lu, Pietro Cicotti, Pavan Balaji and Michela Taufer. Bloomfish: A Highly Scalable Distributed K-mer Counting Framework. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 15–17, 2017, Shenzhen, China. [pdf] [slides]
Sarunya Pumma, Min Si, Wu-chun Feng and Pavan Balaji. Parallel I/O Optimizations for Scalable Deep Learning. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 15–17, 2017, Shenzhen, China. [pdf] [slides]
Robert Latham, Leonardo Arturo Bautista Gomez and Pavan Balaji. Portable Topology-Aware MPI-I/O. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 15–17, 2017, Shenzhen, China. [pdf] [slides]
Lena Oden and Pavan Balaji. Hexe: A Toolkit for Heterogeneous Memory Management. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 15–17, 2017, Shenzhen, China. [pdf] [slides]
Min Si and Pavan Balaji. Process-based Asynchronous Progress Model for MPI Point-To-Point Communication. IEEE International Conference on High Performance Computing and Communications (HPCC). Dec. 18–20, 2017, Bangkok, Thailand. [pdf] [slides]
Sarunya Pumma, Min Si, Wu-chun Feng and Pavan Balaji. Towards Scalable Deep Learning via I/O Analysis and Optimization. IEEE International Conference on High Performance Computing and Communications (HPCC). Dec. 18–20, 2017, Bangkok, Thailand. [pdf] [slides]
Kenneth J. Raffenetti, Abdelhalim Amer, Lena Oden, Charles Archer, Wesley Bland, Hajime Fujita, Yanfei Guo, Tomislav Janjusic, Dmitry Durnov, Michael Blocksome, Min Si, Sangmin Seo, Akhil Langer, Gengbin Zheng, Masamichi Takagi, Paul Coffman, Jithin Jose, Sayantan Sur, Alexander Sannikov, Sergey Oblomov, Michael Chuvelev, Masayuki Hatanaka, Xin Zhao, Paul Fischer, Thilina Rathnayake, Matt Otten, Misun Min, and Pavan Balaji. Why is MPI so Slow? Analyzing the Fundamental Limits in Implementing MPI-3.1. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 12–17, 2017, Denver, Colorado. [pdf] [slides]
Xiaohui Duan, Kai Xu, Yuandong Chan, Christian Hundt, Bertil Schmidt, Pavan Balaji and Weiguo Liu. S-Aligner: Ultrascalable read mapping on Sunway Taihu Light. IEEE International Conference on Cluster Computing (Cluster). Sep. 5–8, 2017, Hawaii, USA. [pdf] [slides]
Adrian Castello, Sangmin Seo, Rafael Mayo Gual, Pavan Balaji, Enrique S. Quintana-Orti, and Antonio J. Pena. GLT: A Unified API for Lightweight Thread Libraries. International European Conference on Parallel and Distributed Computing (EuroPar). Aug. 28–Sep. 1, 2017, Santiago, Spain. [pdf] [slides]
Adrian Castello, Sangmin Seo, Rafael Mayo Gual, Pavan Balaji, Enrique S. Quintana-Orti, and Antonio J. Pena. GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations. International Conference on Parallel Processing (ICPP). Aug. 14–17, 2017, Bristol, United Kingdom. [pdf] [slides]
Hoang-Vu Dang, Sangmin Seo, Abdelhalim Amer, and Pavan Balaji. Advanced Thread Synchronization for Multithreaded MPI Implementations. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). May 14–17, 2017, Madrid, Spain. [pdf] [slides]
Nikela Papadopoulou, Lena Oden, and Pavan Balaji. A Performance Study of UCX over InfiniBand. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). May 14–17, 2017, Madrid, Spain. [pdf] [slides]
Yanfei Guo, Charles Archer, Michael Blocksome, Scott Parker, Wesley Bland, Kenneth J. Raffenetti, and Pavan Balaji. Memory Compression Techniques for Network Address Management in MPI. IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 29–June 2, 2017, Orlando, Florida. [pdf] [slides]
Tao Gao, Yanfei Guo, Boyu Zhang, Pietro Cicotti, Yutong Lu, Pavan Balaji, and Michela Taufer. Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems. IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 29–June 2, 2017, Orlando, Florida. [pdf] [slides]
Jichi Guo, Qing Yi, Jiayuan Meng, Junchao Zhang, and Pavan Balaji. Compiler-Assisted Overlapping of Communication and Computation in MPI Applications. IEEE International Conference on Cluster Computing (Cluster). Sep. 12–16, 2016, Taipei, Taiwan. [pdf] [slides]
Adrian Castello, Antonio J. Pena, Sangmin Seo, Rafael Mayo, Pavan Balaji, and Enrique S. Quintana-Orti. A Review of Lightweight Thread Approaches for High Performance Computing. IEEE International Conference on Cluster Computing (Cluster). Sep. 12–16, 2016, Taipei, Taiwan. [pdf] [slides]
Xin Zhao, Pavan Balaji, and William D. Gropp. Scalability Challenges in Current MPI One-Sided Implementations. International Symposium on Parallel and Distributed Computing (ISPDC). Jul. 8–10, 2016, Fuzhou, China. [pdf] [slides]
Sayan Ghosh, Jeffrey R. Hammond, Antonio J. Pena, Pavan Balaji, Assefaw Gebremedhin, and Barbara Chapman. One-Sided Interface for Matrix Operations using MPI-3 RMA: A Case Study with Elemental. International Conference on Parallel Processing (ICPP). Aug. 16–19, 2016, Philadelphia, Pennsylvania. [pdf] [slides]
Jintao Meng, Sangmin Seo, Pavan Balaji, Yanjie Wei, Bingqiang Wang and Shengzhong Feng. SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Extreme Scale. International Conference on Parallel Processing (ICPP). Aug. 16–19, 2016, Philadelphia, Pennsylvania. [pdf] [slides]
Hajime Fujita, Kamil Iskra, Pavan Balaji, and Andrew A. Chien. Versioning Architectures for Local and Global Memory. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 16–19, 2015, Melbourne, Australia. [pdf] [slides]
Antonio J. Pena, Wesley Bland, and Pavan Balaji. VOCL-FT: Introducing Techniques for Efficient Soft Error Coprocessor Recovery. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 15–20, 2015, Austin, Texas. [pdf] [slides]
Yanfei Guo, Wesley Bland, Pavan Balaji, and Xiaobo Zhou. Fault Tolerant MapReduce-MPI for HPC Clusters. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 15–20, 2015, Austin, Texas. [pdf] [slides]
Karthikeyan Vaidyanathan, Dhiraj D. Kalamkar, Kiran Pamnany, Jeffrey R. Hammond, Pavan Balaji, Dipankar Das, Jongsoo Park, and Balint Joo. Improving Concurrency and Asynchrony in Multithreaded MPI Applications using Software Offloading. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 15–20, 2015, Austin, Texas. [pdf]
Huiwei Lu, Sangmin Seo, and Pavan Balaji. MPI+ULT: Overlapping Communication and Computation with User-Level Threads. IEEE International Conference on High Performance Computing and Communications (HPCC). Aug. 24–26, 2015, New York, USA. [pdf] [slides]
Ashwin M. Aji, Antonio J. Pena, Pavan Balaji and Wu-chun Feng. Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL. IEEE International Conference on Cluster Computing (Cluster). Sep. 8–11, 2015, Chicago, USA. [pdf] [slides]
Adrian Castello, Antonio J. Pena, Rafael Mayo Gual, Pavan Balaji and Enrique S. Quintana-Orti. Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA. Short paper. IEEE International Conference on Cluster Computing (Cluster). Sep. 8–11, 2015, Chicago, USA. [pdf] [slides]
Hajime Fujita, Kamil Iskra, Pavan Balaji and Andrew A. Chien. Empirical Comparison of Three Versioning Architectures. Short paper. IEEE International Conference on Cluster Computing (Cluster). Sep. 8–11, 2015, Chicago, USA. [pdf] [slides]
Andrew A. Chien, Pavan Balaji, Peter H. Beckman, Nan Dun, Aiman Fang, Hajime Fujita, Kamil Iskra, Zachary A. Rubenstein, Ziming Zheng, Robert Schreiber, Jeffrey R. Hammond, James S. Dinan, Ignacio Laguna, David F. Richards, Anshu Dubey, Brian van Straalen, Mark Hoemmen, Michael Heroux, Keita Teranishi and Andrew R. Siegel. Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience. International Conference on Computational Science (ICCS). June 1–3, 2015, Reykjavik, Iceland. [pdf] [slides]
Min Si, Antonio J. Pena, Jeffrey R. Hammond, Pavan Balaji, Masamichi Takagi and Yutaka Ishikawa. Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures. IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 25–29, 2015, Hyderabad, India. [pdf] [slides]
Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji and Satoshi Matsuoka. MPI+Threads: Runtime Contention and Remedies. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Feb. 7–11, 2015, San Francisco, California. [pdf] [slides]
David Ozog, Allen D. Malony, Jeffrey R. Hammond and Pavan Balaji. WorkQ: A Many-Core Producer/Consumer Execution Model Applied to PGAS Computations. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 16–19, 2014, Hsinchu, Taiwan. [pdf] [slides]
Judicael A. Zounmevo, Xin Zhao, Pavan Balaji, William D. Gropp, and Ahmad Afsahi. Nonblocking Epochs in MPI One-Sided Communication. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). \bestpaper{Best Paper Finalist.} Nov. 16–21, 2014, New Orleans, Louisiana. [pdf] [slides]
Zhezhe Chen, James S. Dinan, Zhen Tang, Pavan Balaji, Hua Zhong, Jun Wei, Tao Huang, and Feng Qin. MC-Checker: Detecting Memory Consistency Errors in MPI One-Sided Applications. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 16–21, 2014, New Orleans, Louisiana. [pdf] [slides]
Antonio J. Pena and Pavan Balaji. Toward the Efficient Use of Multiple Explicitly Managed Memory Subsystems. IEEE International Conference on Cluster Computing (Cluster). Sep. 22–26, 2014, Madrid, Spain. [pdf] [slides]
Junchao Zhang, Bill Long, Kenneth J. Raffenetti, and Pavan Balaji. Implementing the MPI-3.0 Fortran 2008 Binding. The Euro MPI Users’ Group Conference (Euro MPI/Asia). Sep. 9–12, 2014, Kyoto, Japan. [pdf] [slides]
Min Si, Antonio J. Pena, Pavan Balaji, Masamichi Takagi and Yutaka Ishikawa. MT-MPI: Multithreaded MPI for Many-core Environments. ACM International Conference on Supercomputing (ICS). June 10–13, 2014, Munich, Germany. [pdf] [slides]
Chaoran Yang, Wesley Bland, Pavan Balaji, and John Mellor-Crummey. Portable, MPI-Interoperable Coarray Fortran. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Feb. 15–19, 2014, Orlando, Florida. [pdf] [slides]
Xin Zhao, Pavan Balaji, William D. Gropp, Rajeev S. Thakur. Optimization Strategies for MPI-Interoperable Active Messages. IEEE International Conference on Scalable Computing and Communications (ScalCom). \bestpaper{Best Paper Award.} Dec. 21–22, 2013, Chengdu, China. [pdf] [slides]
Lokendra S. Panwar, Ashwin M. Aji, Jiayuan Meng, Pavan Balaji, and Wu-chun Feng. Online Performance Projection for Clusters with Heterogeneous GPUs. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 15–18, 2013, Seoul, Korea. [pdf] [slides]
Xin Zhao, Pavan Balaji, William D. Gropp, and Rajeev S. Thakur. MPI-Interoperable Generalized Active Messages. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 15–18, 2013, Seoul, Korea. [pdf] [slides]
Pavan Balaji and Dries Kimpe. On the Reproducibility of MPI Reduction Operations. IEEE International Conference on High Performance Computing and Communications (HPCC). Nov. 13–15, 2013, Zhangjiajie, China. [pdf] [slides]
David Ozog, Jeffrey R. Hammond, James S. Dinan, Pavan Balaji, Sameer Shende, and Allen D. Malony. Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions. International Conference on Parallel Processing (ICPP). Oct. 1–4, 2013, Lyon, France. [pdf] [slides]
Md. Ziaul Haque Olive, Qing Yi, James S. Dinan, and Pavan Balaji. Enhancing Performance Portability of MPI Applications Through Annotation-Based Transformations. International Conference on Parallel Processing (ICPP). Oct. 1–4, 2013, Lyon, France. [pdf] [slides]
Antonio J. Pena, Ralf Gunter Correa Carvalho, James S. Dinan, Pavan Balaji, Rajeev S. Thakur and William D. Gropp. Analysis of Topology-Dependent MPI Performance on Gemini Networks. The Euro MPI Users’ Group Conference (EuroMPI). Sep. 15–18, 2013, Madrid, Spain. [pdf] [slides]
James S. Dinan, Pavan Balaji, David J. Goodell, Douglas Miller, Marc Snir and Rajeev S. Thakur. Enabling MPI Interoperability Through Flexible Communication Endpoints. The Euro MPI Users’ Group Conference (EuroMPI). Sep. 15–18, 2013, Madrid, Spain. [pdf] [slides]
Palden Lama, Yan Li, Ashwin M. Aji, Pavan Balaji, James S. Dinan, Shucai Xiao, Yunquan Zhang, Wu-chun Feng, Rajeev S. Thakur and Xiaobo Zhou. pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments. International Conference on Distributed Computing Systems (ICDCS). July 8–11, 2013, Philadelphia, Pennsylvania. [pdf] [slides]
Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R. Bisset, James S. Dinan, Wu-chun Feng, John Mellor-Crummey, Xiaosong Ma and Rajeev S. Thakur. On the Efficacy of GPU-Integrated MPI for Scientific Applications. ACM International Symposium on High Performance Parallel and Distributed Computing (HPDC). Jun. 17–21, 2013, New York, New York. [pdf] [slides]
Xin Zhao, Darius T. Buntinas, Judicael A. Zounmevo, James S. Dinan, David J. Goodell, Pavan Balaji, Rajeev S. Thakur, Ahmad Afsahi and William D. Gropp. Towards Asynchronous, MPI-Interoperable Active Messages. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). May 13–16, 2013, Delft, Netherlands. [pdf] [slides]
Jing Zhang, Heshan Lin, Pavan Balaji and Wu-chun Feng. Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). May 13–16, 2013, Delft, Netherlands. [pdf]
Jeffrey R. Hammond, James S. Dinan, Pavan Balaji, Ivo Kabadshow, Sreeram Potluri and Vinod Tipparaju. OSPRI: An Optimized One-Sided Communication Runtime for Leadership-Class Machines. International Conference on Partitioned Global Address Space Programming Models (PGAS). Oct. 10–12, 2012, Santa Barbara, California. [pdf] [slides]
John Jenkins, James S. Dinan, Pavan Balaji, Nagiza F. Samatova and Rajeev S. Thakur. Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments. IEEE International Conference on Cluster Computing (Cluster). Sep. 28–30, 2012, Beijing, China. [pdf] [slides]
Torsten Hoefler, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Brian Barrett, Ronald Brightwell, William D. Gropp, Vivek Kale, Rajeev S. Thakur. Leveraging MPI’s One-Sided Communication Interface for Shared-Memory Programming. The Euro MPI Users’ Group Conference (EuroMPI). Sep. 23–26, 2012, Vienna, Austria. [pdf] [slides]
James S. Dinan, David J. Goodell, William D. Gropp, Rajeev S. Thakur, and Pavan Balaji. Efficient Multithreaded Context ID Allocation in MPI. The Euro MPI Users’ Group Conference (EuroMPI). Sep. 23–26, 2012, Vienna, Austria. [pdf] [slides]
Feng Ji, Ashwin M. Aji, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Rajeev S. Thakur, Wu-chun Feng and Xiaosong Ma. DMA-Assisted, Intranode Communication in GPU Accelerated Systems. IEEE International Conference on High Performance Computing and Communications (HPCC). June 25–27, 2012, Liverpool, UK. [pdf] [slides]
Ashwin M. Aji, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Wu-chun Feng, Keith R. Bisset and Rajeev S. Thakur. MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems. IEEE International Conference on High Performance Computing and Communications (HPCC). June 25–27, 2012, Liverpool, UK. [pdf] [slides]
James S. Dinan, Pavan Balaji, Jeffrey R. Hammond, Sriram Krishnamoorthy and Vinod Tipparaju. Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication. IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 21–25, 2012, Shanghai, China. [pdf] [slides]
Shucai Xiao, Pavan Balaji, James S. Dinan, Qian Zhu, Rajeev S. Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong and Wu-chun Feng. Transparent Accelerator Migration in a Virtualized GPU Environment. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). May 13–16, 2012, Ottawa, Canada. [pdf] [slides]
Shucai Xiao, Pavan Balaji, Qian Zhu, Rajeev S. Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong and Wu-chun Feng. VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units. International Conference on Innovative Parallel Computing (InPar). May 12–14, 2012, San Jose, California. [pdf] [slides]
Rui Wang, Erlin Yao, Pavan Balaji, Darius T. Buntinas, Mingyu Chen and Guangming Tan. Building Algorithmically Nonstop Fault Tolerant MPI Programs. IEEE International Conference on High Performance Computing (HiPC). Dec. 18–21, 2011, Bangalore, India. [pdf] [slides]
Gaojin Wen, Jue Hong, Cheng-Zhong Xu, Pavan Balaji, Shengzhong Feng and Pingchuang Jiang. Energy-aware Hierarchy Scheduling of Applications in Large Scale Data Centers. International Conference on Cloud and Service Computing (CSC). Dec. 12–14, 2011, Hong Kong, China. [pdf] [slides]
James S. Dinan, Sriram Krishnamoorthy, Pavan Balaji, Jeffrey R. Hammond, Manoj Krishnan, Vinod Tipparaju and Abhinav Vishnu. Noncollective Communicator Creation in MPI. The Euro MPI Users’ Group Conference (EuroMPI); special session on Improving MPI User and Developer Interaction (IMUDI). Sep. 18–21, 2011, Santorini, Greece. [pdf] [slides]
Mohammad J. Rashti, Jonathan Green, Pavan Balaji, Ahmad Afsahi and William D. Gropp. Multi-core and Network Aware MPI Topology Functions. The Euro MPI Users’ Group Conference (EuroMPI). Sep. 18–21, 2011, Santorini, Greece. [pdf] [slides]
Ryan E. Grant, Mohammad J. Rashti, Pavan Balaji and Ahmad Afsahi. RDMA Capable iWARP over Datagrams. IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 16–20, 2011, Anchorage, Alaska. [pdf] [slides]
Mohammad J. Rashti, Ryan E. Grant, Pavan Balaji and Ahmad Afsahi. iWARP Redefined: Scalable Connectionless Communication over High-Speed Ethernet. IEEE International Conference on High Performance Computing (HiPC). Dec. 19–22, 2010, Goa, India. [pdf] [slides]
Abhinav Vishnu, Huub Van Dam, Wibe De Jong, Pavan Balaji and Shuaiwen Song. Fault Tolerant Communication Runtime Support for Data Centric Programming Models. IEEE International Conference on High Performance Computing (HiPC). Dec. 19–22, 2010, Goa, India. [pdf] [slides]
Yang Jiao, Heshan Lin, Pavan Balaji and Wu-chun Feng. Power and Performance Characterization of Computational Kernels on the GPU. IEEE/ACM International Conference on Green Computing and Communications (GreenCom). Dec. 18–20, 2010, Hangzhou, China. [pdf] [slides]
Abhinav Vishnu, Shuaiwen Song, Andres Marquez, Kevin Barker, Darren Kerbyson, Kirk W. Cameron, Pavan Balaji. Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models. IEEE/ACM International Conference on Green Computing and Communications (GreenCom). Dec. 18–20, 2010, Hangzhou, China. [pdf] [slides]
David J. Goodell, Pavan Balaji, Darius T. Buntinas, Gabor Dozsa, William D. Gropp, Sameer Kumar, Bronis R. de Supinski and Rajeev S. Thakur. Minimizing MPI Resource Contention in Multithreaded Multicore Environments. IEEE International Conference on Cluster Computing (Cluster). Sep. 20–24, 2010, Heraklion, Crete, Greece. [pdf] [slides]
Pavan Balaji, Darius T. Buntinas, David J. Goodell, William D. Gropp, Jayesh Krishna, Ewing L. (Rusty) Lusk and Rajeev S. Thakur. PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems. The Euro MPI Users’ Group Conference (Euro MPI). Sep. 12–15, 2010, Stuttgart, Germany. [pdf] [slides]
Gabor Dozsa, Sameer Kumar, Pavan Balaji, Darius T. Buntinas, David J. Goodell, William D. Gropp, Joseph Ratterman and Rajeev S. Thakur. Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems. The Euro MPI Users’ Group Conference (Euro MPI). Sep. 12–15, 2010, Stuttgart, Germany. [pdf] [slides]
Jayesh Krishna, Pavan Balaji, Ewing L. (Rusty) Lusk, Rajeev S. Thakur and Fab Tiller. Implementing MPI on Windows: Comparison with Common Approaches on Unix. The Euro MPI Users’ Group Conference (Euro MPI). Sep. 12–15, 2010, Stuttgart, Germany. [pdf] [slides]
Ryan E. Grant, Pavan Balaji and Ahmad Afsahi. A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). Mar. 28–30, 2010, White Plains, NY. [pdf] [slides]
James S. Dinan, Pavan Balaji, Ewing L. (Rusty) Lusk, P. Sadayappan and Rajeev S. Thakur. Hybrid Parallel Programming with MPI and Unified Parallel C. ACM International Conference on Computing Frontiers (CF). May 17–19, 2010, Bertinoro, Italy. [pdf] [slides]
Pavan Balaji, Harish Naik and Narayan Desai. Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 8–10, 2009, Shenzhen, China. [pdf] [slides]
Ryan E. Grant, Ahmad Afsahi and Pavan Balaji. An Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers. IEEE International Conference on Parallel and Distributed Systems (ICPADS). Dec. 8–10, 2009, Shenzhen, China. [pdf] [slides]
Ajeet Singh, Pavan Balaji and Wu-chun Feng. GePSeA: A General-Purpose Software Acceleration Framework for Lightweight Task Offloading. International Conference on Parallel Processing (ICPP). Sep. 22–25, 2009, Vienna, Austria. [pdf] [slides]
Narayan Desai, Darius T. Buntinas, Daniel Buettner, Pavan Balaji and Anthony K. Chan. Improving Resource Availability by Relaxing Network Allocation Constraints on the Blue Gene/P. International Conference on Parallel Processing (ICPP). Sep. 22–25, 2009, Vienna, Austria. [pdf] [slides]
Pavan Balaji, Darius T. Buntinas, David J. Goodell, William D. Gropp, Sameer Kumar, Ewing L. (Rusty) Lusk, Rajeev S. Thakur and Jesper Larsson Tr"aff. MPI on a Million Processors. The Euro PVM/MPI Users’ Group Conference (Euro PVM/MPI). \bestpaper{Outstanding Paper Award.} Sep. 7–10, 2009, Espoo, Finland. [pdf] [slides]
Gopalakrishnan Santhanaraman, Pavan Balaji, Karthik Gopalakrishnan, Rajeev S. Thakur, William D. Gropp and Dhabaleswar K. Panda. Natively Supporting True One-sided Communication in MPI on Multi-core Systems with InfiniBand. IEEE International Symposium on Cluster Computing and the Grid (CCGrid). May 18–21, 2009, Shanghai, China. [pdf] [slides]
Pavan Balaji, Sitha Bhagvat, Rajeev S. Thakur and Dhabaleswar K. Panda. Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet. International Conference on High Performance Computing (HiPC). Dec. 17–20, 2008, Bangalore, India. [pdf] [slides]
Anthony K. Chan, Pavan Balaji, William D. Gropp and Rajeev S. Thakur. Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems. International Conference on High Performance Computing (HiPC). Dec. 17–20, 2008, Bangalore, India. [pdf] [slides]
Mithlesh Kumar, Vineeta Chaube, Pavan Balaji, Wu-chun Feng and Hyun-Wook Jin. Making a Case for Proactive Flow Control in Optical Circuit-Switched Networks. International Conference on High Performance Computing (HiPC). Dec. 17–20, 2008, Bangalore, India. [pdf] [slides]
Heshan Lin, Pavan Balaji, Ruth Poole, Carlos Sosa, Xiaosong Ma and Wu-chun Feng. Massively Parallel Genomic Sequence Search on the Blue Gene/P Architecture. IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Nov. 15–21, 2008, Austin, Texas. [pdf] [slides]
Thomas R. W. Scogland, Ganesh Narayanaswamy, Pavan Balaji and Wu-chun Feng. Asymmetric Interactions in Symmetric Multi-core Systems: Analysis, Enhancements and Evaluation. IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Nov. 15–21, 2008, Austin, Texas. [pdf] [slides]
Narayan Desai, Pavan Balaji, P. Sadayappan and Mohammad Kamrul Islam. Are Non-Blocking Networks Really Needed for High-End-Computing Workloads? IEEE International Conference on Cluster Computing (Cluster). \bestpaper{Best Paper Award.} Sep. 29 – Oct. 1st, 2008, Tsukuba, Japan. [pdf] [slides]
Pavan Balaji, Anthony K. Chan, William D. Gropp, Rajeev S. Thakur and Ewing L. (Rusty) Lusk. Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P. The Euro PVM/MPI Users’ Group Conference (Euro PVM/MPI). \bestpaper{Outstanding Paper Award.} Sep. 7–10, 2008, Dublin, Ireland. [pdf] [slides]
Pavan Balaji, Darius T. Buntinas, David J. Goodell, William D. Gropp and Rajeev S. Thakur. Toward Efficient Support for Multithreaded MPI Communication. The Euro PVM/MPI Users’ Group Conference (Euro PVM/MPI). Sep. 7–10, 2008, Dublin, Ireland. [pdf] [slides]
Jesper Larsson Tr"aff, Andreas Ripke, Christian Siebert, Pavan Balaji, Rajeev S. Thakur and William D. Gropp. A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems. The Euro PVM/MPI Users’ Group Conference (Euro PVM/MPI). Sep. 7–10, 2008, Dublin, Ireland. [pdf] [slides]
Ganesh Narayanaswamy, Pavan Balaji and Wu-chun Feng. Impact of Network Sharing in Multi-core Architectures. IEEE International Conference on Computer Communication and Networks (ICCCN). Aug. 3–7, 2008, St. Thomas, U.S. Virgin Islands. [pdf] [slides]
Pavan Balaji, Wu-chun Feng and Heshan Lin. Semantics-based Distributed I/O with the ParaMEDIC Framework. ACM/IEEE International Symposium on High Performance Distributed Computing (HPDC). Jun. 23–27, 2008, Boston, Massachusetts. [pdf] [slides]
Pavan Balaji, Wu-chun Feng, Heshan Lin, Jeremy Archuleta, Satoshi Matsuoka, Andrew Warren, Joao Carlos Setubal, Ewing L. (Rusty) Lusk, Rajeev S. Thakur, Ian Foster, Daniel S. Katz, Shantenu Jha, Kevin Shinpaugh, Susan Coghlan and Daniel A. Reed. Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer. International Supercomputing Conference (ISC). \bestpaper{Outstanding Paper Award.} Jun. 17–20, 2008, Dresden, Germany. [pdf] [slides]
Pavan Balaji, Wu-chun Feng, Jeremy Archuleta, Heshan Lin, Rajkumar Kettimuthu, Rajeev S. Thakur and Xiaosong Ma. Semantics-based Distributed I/O for mpiBLAST. Short paper. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Feb. 20–23, 2008, Salt Lake City, Utah. [pdf] [poster]
Pavan Balaji, Wu-chun Feng, Sitha Bhagvat, Dhabaleswar K. Panda, Rajeev S. Thakur and William D. Gropp. Analyzing the Impact of Supporting Out-of-Order Communication on In-order Performance with iWARP. IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Nov. 10–16, 2007, Reno, Nevada. [pdf] [slides]
Pavan Balaji, Wu-chun Feng, Jeremy Archuleta and Heshan Lin. ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing. IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC). \bestpaper{Storage Challenge Award.} Nov. 10–16, 2007, Reno, Nevada. [pdf] [slides]
Pavan Balaji, Sitha Bhagvat, Dhabaleswar K. Panda, Rajeev S. Thakur and William D. Gropp. Advanced Flow-control Mechanisms for the Sockets Direct Protocol over InfiniBand. IEEE International Conference on Parallel Processing (ICPP). Sep. 10–14, 2007, Xi’an, China. [pdf] [slides]
Mohammad Kamrul Islam, Pavan Balaji, Gerald Sabin and P. Sadayappan. Analyzing and Minimizing the Impact of Opportunity Cost in QoS-aware Job Scheduling. IEEE International Conference on Parallel Processing (ICPP). Sep. 10–14, 2007, Xi’an, China. [pdf] [slides]
Ganesh Narayanaswamy, Pavan Balaji and Wu-chun Feng. An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multicore Environments. IEEE International Symposium on High-Performance Interconnects (HotI). Aug. 22–24, 2007, Palo Alto, California. [pdf] [slides]
Pavan Balaji, Darius T. Buntinas, Satish Balay, Barry F. Smith, Rajeev S. Thakur and William D. Gropp. Nonuniformly Communicating Noncontiguous Data: A Case Study with PETSc and MPI. IEEE International Parallel and Distributed Processing Symposium (IPDPS). Mar. 26–30, 2007, Long Beach, California. [pdf] [slides]
Pavan Balaji, Wu-chun Feng, Qi Gao, Ranjit Noronha, Weikuan Yu and Dhabaleswar K. Panda. Head-to-TOE Comparison for High Performance Sockets over Protocol Offload Engines. IEEE International Conference on Cluster Computing (Cluster). Sep. 26–30, 2005, Boston, Massachusetts. [pdf] [slides]
Wu-chun Feng, Pavan Balaji, Christopher Baron, Laxmi N. Bhuyan and Dhabaleswar K. Panda. Performance Characterization of a 10-Gigabit Ethernet TOE. IEEE International Symposium on High Performance Interconnects (HotI). Aug. 17–19, 2005, Palo Alto, California. [pdf] [slides]
Sundeep Narravula, Pavan Balaji, Karthikeyan Vaidyanathan, Hyun-Wook Jin and Dhabaleswar K. Panda. Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data-Centers over InfiniBand. IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid). May 9–12, 2005, Cardiff, UK. [pdf] [slides]
Pavan Balaji, Karthikeyan Vaidyanathan, Sundeep Narravula, Hyun-Wook Jin and Dhabaleswar K. Panda. On the Provision of Prioritization and Soft QoS in Dynamically Reconfigurable Shared Data-Centers over InfiniBand. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). Mar. 20–22, 2005, Austin, Texas. [pdf] [slides]
Mohammad Kamrul Islam, Pavan Balaji, P. Sadayappan and Dhabaleswar K. Panda. Towards Provision of Quality of Service Guarantees in Job Scheduling. IEEE International Conference on Cluster Computing (Cluster). Sep. 20–23, 2004, San Diego, California. [pdf] [slides]
Pavan Balaji, Sundeep Narravula, Karthikeyan Vaidyanathan, Savitha Krishnamoorthy, Jiesheng Wu and Dhabaleswar K. Panda. Sockets Direct Protocol over InfiniBand in Clusters: Is it Beneficial? IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). Mar. 10–12, 2004, Austin, Texas. [pdf] [slides]
Rohan Kurian, Pavan Balaji and P. Sadayappan. Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications. Los Alamos Computer Science Institute (LACSI) Symposium. Oct. 12–14, 2003, Santa Fe, New Mexico. [pdf] [slides]
Pavan Balaji, Jiesheng Wu, Tahsin Kurc, "Umit V. Cataly"urek, Dhabaleswar K. Panda and Joel Saltz. Impact of High Performance Sockets on Data Intensive Applications. IEEE International Symposium on High Performance Distributed Computing (HPDC). Jun. 22–24, 2003, Seattle, Washington. [pdf] [slides]
Rinku K. Gupta, Pavan Balaji, Jarek Nieplocha and Dhabaleswar K. Panda. Efficient Collective Operations using Remote Memory Operations on VIA-based Clusters. IEEE International Parallel and Distributed Processing Symposium (IPDPS). Apr. 22–26, 2003, Nice, France. [pdf] [slides]
Pavan Balaji, Piyush Shivam, Peter Wyckoff and Dhabaleswar K. Panda. High Performance User-level Sockets over Gigabit Ethernet. IEEE International Conference on Cluster Computing (Cluster). Sep. 23–26, 2002, Chicago, Illinois. [pdf] [slides]
Jan Ciesko, Noah Evans, Stephen Olivier, Howard Pritchard, Shintaro Iwasaki, Kenneth J. Raffenetti, and Pavan Balaji. Implementing Flexible Threading Support in Open MPI. International Workshop on Exascale MPI (ExaMPI). Nov. 13th, 2020, Atlanta, Georgia. [pdf] [slides]
Abdelhalim Amer, Satoshi Matsuoka, Miquel Pericas, Naoya Maruyama, Kenjiro Taura, Rio Yokota and Pavan Balaji. Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures. International Workshop on OpenMP (IWOMP). Oct. 5-7, 2016, Nara, Japan. [pdf] [slides]
Xiaomin Zhu, Junchao Zhang, Kazutomo Yoshii, Shigang Li, Yunquan Zhang, and Pavan Balaji. Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations. Workshop on Parallel Programming Model for the Masses (PPMM); held in conjunction with IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid). May 4, 2015, Shenzhen, China. [pdf] [slides]
Abdelhalim Amer, Huiwei Lu, Pavan Balaji, and Satoshi Matsuoka. Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS. Workshop on Parallel Programming Model for the Masses (PPMM); held in conjunction with IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid). May 4, 2015, Shenzhen, China. [pdf] [slides]
Sangmin Seo, Robert Latham, Junchao Zhang, and Pavan Balaji. Implementation and Evaluation of MPI Nonblocking Collective I/O. Workshop on Parallel Programming Model for the Masses (PPMM); held in conjunction with IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid). May 4, 2015, Shenzhen, China. [pdf] [slides]
Akio Shimada, Atsushi Hori, Yutaka Ishikawa and Pavan Balaji. User-level Process towards Exascale Systems. Information Processing Society of Japan (IPSJ) workshop. Dec. 9th, 2014. [pdf] [slides]
Wesley Bland, Kenneth J. Raffenetti and Pavan Balaji. Simplifying the Recovery Model of User-Level Failure Mitigation. International Workshop on Exascale MPI (ExaMPI); held in conjunction with the IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Nov. 17th, 2014, New Orleans, Louisiana. [pdf] [slides]
Antonio J. Pena and Pavan Balaji. A Framework for Tracking Memory Accesses in Scientific Applications. International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2); held in conjunction with the International Conference on Parallel Processing (ICPP). Sep. 12th, 2014, Minneapolis, Minnesota. [pdf] [slides]
Ralf Gunter Correa Carvalho, David J. Goodell, James S. Dinan, and Pavan Balaji. Optimizing Charm++ over MPI. Annual Workshop on Charm++ and its Applications. April 15-16, 2013, Urbana-Champaign, Illinois. [pdf] [slides]
Ashwin M. Aji, Pavan Balaji, James S. Dinan, Wu-chun Feng and Rajeev S. Thakur. Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming. Workshop on Accelerators and Hybrid Exascale Systems (AsHES); held in conjunction with the IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 20th, 2013, Boston, Massachusetts. [pdf] [slides]
Rinku K. Gupta, Kamil Iskra, Kazutomo Yoshii, Peter H. Beckman and Pavan Balaji. Introspective Fault Tolerance for Exascale Systems. U.S. Department of Energy Advanced Scientific Computing Research, OS and Runtime Technical Council Workshop. Oct. 4–5, 2012, Washington, DC. [pdf] [slides]
Feng Ji, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Xiaosong Ma and Wu-chun Feng. Optimizing GPU-to-GPU intra-node communication in MPI. Workshop on Accelerators and Hybrid Exascale Systems (AsHES); held in conjunction with the IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 25th, 2012, Shanghai, China. [pdf] [slides]
Jeffrey A. Stuart, Pavan Balaji, and John D. Owens. Extending MPI to Accelerators. Workshop on Architectures and Systems for Big Data (ASBD); held in conjunction with the International Conference on Parallel Architectures and Compilation Techniques (PACT). Oct. 10th, 2011, Galveston Island, Texas. [pdf] [slides]
Abhinav Vishnu, Manoj Krishnan and Pavan Balaji. Dynamic Time-Variant Connection Management for PGAS Models on InfiniBand. Workshop on Communication Architecture for Scalable Systems (CASS); held in conjunction with the IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 16th, 2011, Anchorage, Alaska. [pdf] [slides]
Pavan Balaji, Sitha Bhagvat, Hyun-Wook Jin and Dhabaleswar K. Panda. Asynchronous Zero-copy Communication for Synchronous Sockets in the Sockets Direct Protocol (SDP) over InfiniBand. Workshop on Communication Architecture for Clusters (CAC); held in conjunction with the IEEE International Parallel and Distributed Processing Symposium (IPDPS). Apr. 25th, 2006, Rhodes Island, Greece. [pdf] [slides]
Venkatram Vishwanath, Pavan Balaji, Wu-chun Feng, Jason Leigh, Dhabaleswar K. Panda. A Case for UDP Offload Engines in LambdaGrids. Workshop on Protocols for Fast Long-Distance Networks (PFLDnet). Feb. 2-3, 2006, Nara, Japan. [pdf] [slides]
Pavan Balaji, Hyun-Wook Jin, Karthikeyan Vaidyanathan and Dhabaleswar K. Panda. Supporting iWARP Compatibility and Features for Regular Network Adapters. Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations and Techniques (RAIT); held in conjunction with IEEE International conference on Cluster Computing (Cluster). Sep. 26th, 2005, Boston, Massachusetts. [pdf] [slides]
Hyun-Wook Jin, Sundeep Narravula, Gregory Brown, Karthikeyan Vaidyanathan, Pavan Balaji and Dhabaleswar K. Panda. Performance Evaluation of RDMA over IP Networks: A Study with the Ammasso Gigabit Ethernet NIC. Workshop on High Performance Interconnects for Distributed Computing (HPI-DC); held in conjunction with IEEE International Symposium on High Performance Distributed Computing (HPDC). Jul. 24th, 2005, Research Triangle Park, North Carolina. [pdf] [slides]
Karthikeyan Vaidyanathan, Pavan Balaji, Hyun-Wook Jin and Dhabaleswar K. Panda. Workload driven analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand. Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW); held in conjunction with IEEE International Symposium on High Performance Computer Architecture (HPCA). Feb. 12th, 2005, San Francisco, California. [pdf] [slides]
Pavan Balaji, Hemal V. Shah and Dhabaleswar K. Panda. Sockets vs. RDMA Interface over 10-Gigabit Networks: An In depth Analysis of the Memory Traffic Bottleneck. Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations and Technologies (RAIT); held in conjunction with IEEE International Conference on Cluster Computing (Cluster). Sep. 20th, 2004, San Diego, California. [pdf] [slides]
Pavan Balaji, Karthikeyan Vaidyanathan, Sundeep Narravula, Savitha Krishnamoorthy, Hyun-Wook Jin and Dhabaleswar K. Panda. Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand. Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations and Technologies (RAIT); held in conjunction with IEEE International Conference on Cluster Computing (Cluster). Sep. 20th, 2004, San Diego, California. [pdf] [slides]
Sundeep Narravula, Pavan Balaji, Karthikeyan Vaidyanathan, Savitha Krishnamoorthy, Jiesheng Wu and Dhabaleswar K. Panda. Supporting Strong Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand. Workshop on System Area Networks (SAN); held in conjunction with IEEE International Symposium on High Performance Computer Architecture (HPCA). Feb. 14th, 2004, Madrid, Spain. [pdf] [slides]
Mohammad Kamrul Islam, Pavan Balaji, P. Sadayappan and Dhabaleswar K. Panda. QoPS: A QoS based scheme for Parallel Job Scheduling. Job Scheduling Strategies for Parallel Processing (JSSPP) workshop; held in conjunction with IEEE International Symposium on High Performance Distributed Computing (HPDC). Jun. 24th, 2003, Seattle, Washington. [pdf] [slides]
Sarunya Pumma, Min Si, Wu-chun Feng and Pavan Balaji. I/O Bottleneck Investigation in Deep Learning Systems. International Conference on Parallel Processing (ICPP). \bestpaper{Best Student Poster Award.} Aug. 13–16, 2018, Eugene, Oregon. [pdf] [poster]
Sarunya Pumma, Min Si, Wu-chun Feng and Pavan Balaji. Parallel I/O Optimizations for Scalable Deep Learning. The Euro MPI Users’ Group Conference (Euro MPI/USA). Sep. 25–28, 2017, Chicago, USA.
Shintaro Iwasaki, Abdelhalim Amer, Kenjiro Taura and Pavan Balaji. Optimistic Threading Techniques for MPI+ULT. The Euro MPI Users’ Group Conference (Euro MPI/USA). Sep. 25–28, 2017, Chicago, USA.
Rohit Zambre, Abdelhalim Amer, Aparna Chandramowlishwaran and Pavan Balaji. Evaluating Multiple Endpoints for MPI with libibverbs. The Euro MPI Users’ Group Conference (Euro MPI/USA). Sep. 25–28, 2017, Chicago, USA.
Wesley Bland, Huiwei Lu, Sangmin Seo, and Pavan Balaji. Lessons Learned Implementing User Level Failure Mitigation in MPICH. IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid). May 4–7, 2015, Shenzhen, China. [pdf] [slides] [poster]
Kenneth J. Raffenetti, Antonio J. Pena, and Pavan Balaji. Toward Implementing Robust Support for Portals 4 Networks in MPICH. IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid). May 4–7, 2015, Shenzhen, China. [pdf] [slides] [poster]
Antonio J. Pena and Pavan Balaji. Understanding Data Access Patterns Using Object-Differentiated Memory Profiling. IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid). May 4–7, 2015, Shenzhen, China. [pdf] [slides] [poster]
Xin Zhao, Pavan Balaji and William D. Gropp. Runtime Support for Irregular Computation in MPI-Based Applications. Doctoral Symposium. IEEE/ACM International Symposium on Cluster,Cloud, and Grid Computing (CCGrid). May 4–7, 2015, Shenzhen, China. [pdf] [slides] [poster]
Min Si, Pavan Balaji and Yutaka Ishikawa. Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures. Doctoral Symposium. IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid). May 4–7, 2015, Shenzhen, China. [pdf] [slides] [poster]
Jintao Meng, Yanjie Wei, Sangmin Seo, and Pavan Balaji. SWAP-Assembler 2: Scalable Genome Assembler towards Millions of Cores – Practice and Experience. Doctoral Symposium. IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid). May 4–7, 2015, Shenzhen, China.
Hajime Fujita, Nan Dun, Aiman Fang, Zachary A. Rubenstein, Ziming Zheng, Kamil Iskra, Jeffrey R. Hammond, Anshu Dubey, Pavan Balaji, Andrew A. Chien. Using Global View Resilience (GVR) to add Resilience to Exascale Applications. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). \bestpaper{Best Poster Finalist.} Nov. 16–21, 2014, New Orleans, Louisiana. [pdf] [poster]
Min Si, Yutaka Ishikawa, and Pavan Balaji. Optimizing MPI Implementation on Massively Parallel Many-Core Architectures. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Doctoral Symposium Early Research Showcase. Nov. 17–22, 2013, Denver, Colorado. [pdf] [poster]
David Ozog, Jeffrey R. Hammond, James S. Dinan, Pavan Balaji, Sameer Shende, Allen D. Malony. Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions. ACM International Conference on Supercomputing (ICS). June 10–14, 2013, Eugene, Oregon. [pdf]
Zachary A. Rubenstein, Hajime Fujita, Guoming Lu, Aiman Fang, Ziming Zheng, Andrew A. Chien, Pavan Balaji, Kamil Iskra, Peter H. Beckman, James S. Dinan, Jeffrey R. Hammond, Robert Schreiber. The Global View Resilience Model. Greater Chicago Area System Research Workshop (GCASR). May 3rd, 2013, Evanston, Illinois. [poster]
Jintao Meng, Bingqiang Wang, Yanjie Wei, Shengzhong Feng, Jiefeng Cheng and Pavan Balaji. SWAP-Assembler: A Scalable De Bruijn Graph Based Assembler for Massive Genome Data. International Conference on Research in Computational Molecular Biology (RECOMB). Apr. 7–10, 2013, Beijing, China. [poster]
James S. Dinan, Pavan Balaji, Jeffrey R. Hammond, Sriram Krishnamoorthy, and Vinod Tipparaju. High-Level, One-Sided Programming Models on MPI: A Case Study with Global Arrays and NWChem. IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Nov. 12–18, 2011, Seattle, Washington. [pdf] [poster]
Jeffrey R. Hammond, Sreeram Potluri, Zheng (Cynthia) Gu, Alex Dickson, James S. Dinan, Ivo Kabadshow, Pavan Balaji, and Vinod Tipparaju. Fast One-Sided Communication on Supercomputers and Application to Three Scientific Codes. IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Nov. 12–18, 2011, Seattle, Washington. [poster]
Rajeev S. Thakur, Pavan Balaji, Darius T. Buntinas, David J. Goodell, William D. Gropp, Torsten Hoefler, Sameer Kumar, Ewing L. (Rusty) Lusk and Jesper Larsson Tr"aff. MPI at Exascale. Department of Energy SciDAC workshop. Jul. 11-15th, 2010, Chattanooga, Tennessee. [pdf] [slides]
Wu-chun Feng, Pavan Balaji and Ajeet Singh. Network Interface Cards as First-Class Citizens. Workshop on The Influence of I/O on Microprocessor Architecture (IOM); held in conjunction with the IEEE International Symposium on High Performance Computer Architecture (HPCA). Feb. 15th, 2009, Raleigh, North Carolina. [pdf] [slides]
Karthikeyan Vaidyanathan, Sundeep Narravula, Pavan Balaji and Dhabaleswar K. Panda. Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers. Workshop on the National Science Foundation Next Generation Software (NSFNGS) Program; held in conjunction with the IEEE International Parallel and Distributed Processing Symposium (IPDPS). Mar. 26th, 2007, Long Beach, California. [pdf] [slides]
Pavan Balaji, Karthikeyan Vaidyanathan, Sundeep Narravula, Hyun-Wook Jin and Dhabaleswar K. Panda. Designing Next Generation Data-centers with Advanced Communication Protcols and Systems Services. Workshop on the National Science Foundation Next Generation Software (NSFNGS) Program; held in conjunction with the IEEE International Parallel and Distributed Processing Symposium (IPDPS). Apr. 25th, 2006, Rhodes Island, Greece. [pdf] [slides]
Jintao Meng, Ning Guo, Jianqiu Ge, Yanjie Wei, and Pavan Balaji. Scalable Assembly for Massive Genomic Graphs. \bestpaper{Scalable Computing Challenge Finalist}. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). May 14–17, 2017, Madrid, Spain. [pdf] [slides]
Boyu Zhang, Trilce Estrada, Pietro Cicotti, Pavan Balaji, Michela Taufer. Accurate Scoring of Drug Conformations at the Extreme Scale. \bestpaper{Scalable Computing Challenge Winner}. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). May 4–7, 2015, Shenzhen, China. [pdf] [slides]
Min Si, Antonio J. Pena, Jeffrey R. Hammond, Pavan Balaji, and Yutaka Ishikawa. Scaling NWChem with Efficient and Portable Asynchronous Communication in MPI RMA. \bestpaper{Scalable Computing Challenge Finalist}. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). May 4–7, 2015, Shenzhen, China. [pdf] [slides]
Sonia R. Sachs, Katherine Yelick, Saman Amarasinghe, Mary Hall, Richard Lethin, Keshav Pingali, Dan Quinlan, Vivek Sarkar, John Shalf, Robert Lucas, Pavan Balaji, Pedro C. Diniz, Alice Koniges, and Marc Snir. Exascale Programming Challenges Workshop Report. The ASCR Programming Models Workshop, July, 2011. [pdf]
Jack A. Gilbert, Folker Meyer, Dion Antonopoulos, Pavan Balaji, Christopher T. Brown, Narayan Desai, Jonathan A. Eisen, Dick Evers, Dawn Field, Wu-chun Feng, Daniel Huson, Janet Jansson, Rob Knight, James Knight, Eugene Kolker, Kostas Konstantindis, Joel Kostka, Nikos Kyrpides, Rachel Mackelprang, Alice McHardy, Christopher Quince, Jeroen Raes, Alexander Sczyrba, Ashley Shade, and Rick Stevens. Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project. Institute of Computing in Science (ICiS) Workshop on the Earth Microbiome Project (EMP), 2010. [pdf]
Karthikeyan Vaidyanathan, Sitha Bhagvat, Pavan Balaji and Dhabaleswar K. Panda. Understanding the Significance of Network Performance in End Applications: A Case Study with EtherFabric and InfiniBand. Technical Report, OSU-CISRC-2/06-TR19, The Ohio State University. Feb, 2006. [pdf]
Karthikeyan Vaidyanathan, Pavan Balaji, Jiesheng Wu, Hyun-Wook Jin and Dhabaleswar K. Panda. An Architectural Study of Cluster-based Multi-tier Data-Centers. Technical Report, OSU-CISRC-5/04-TR25, The Ohio State University. May, 2004. [pdf]
Savitha Krishnamoorthy, Pavan Balaji, Karthikeyan Vaidyanathan, Hyun-Wook Jin and Dhabaleswar K. Panda. Dynamic Reconfigurability Support for providing Soft Quality of Service Guarantees in Multi-Tier Data-Centers over InfiniBand. Technical Report, OSU-CISRC-2/04-TR10, The Ohio State University. Feb, 2004. [pdf]
MPI: A Message-Passing Interface Standard, Version 3.1. The Message Passing Interface Forum, Jun. 4th, 2015. [pdf]
MPI: A Message-Passing Interface Standard, Version 3.0. The Message Passing Interface Forum, Sep. 21st, 2012. [pdf]
MPI: A Message-Passing Interface Standard, Version 2.2. The Message Passing Interface Forum, Sep. 4th, 2009. [pdf]
MPI: A Message-Passing Interface Standard, Version 2.1. The Message Passing Interface Forum, Jun. 23rd, 2008. [pdf]
Abhinav Vishnu, Pavan Balaji, and Yong Chen. Guest Editors’ Introduction. Special Issue on Parallel Programming Models and Systems Software with the International Journal of Supercomputing (JoS), 2014.
Zhiyi Huang and Pavan Balaji. Guest Editors’ Introduction. Special Issue on Programming Models and Applications for Multicores and Manycores with the International Parallel Computing (ParCo) journal, 2013.
Yong Chen, Pavan Balaji and Abhinav Vishnu. Guest Editors’ Introduction. Special Issue on Parallel Programming Models and Systems Software with the International Parallel Computing (ParCo) journal, 2013.
Pavan Balaji and Satoshi Matsuoka. Guest Editors’ Introduction. Special Issue on Applications for the Heterogeneous Computing Era with the International Journal of High Performance Computing Applications (IJHPCA), 2013.
Pavan Balaji and Rajkumar Buyya. Guest Editors’ Introduction. Special Issue on Cluster, Cloud and Grid Computing with the International Journal of Future Generation Computer Systems (FGCS), 2013.
Abhinav Vishnu, Pavan Balaji and Yong Chen. Guest Editors’ Introduction. Special Issue on Programming Models and Systems Software with the International Journal of Supercomputing, 2012.
Pavan Balaji and Jiayuan Meng. Guest Editors’ Introduction. Special Issue on Applications for the Heterogeneous Computing Era with the International Journal of High Performance Computing Applications (IJHPCA), 2012.
Pavan Balaji and Abhinav Vishnu. Guest Editors’ Introduction. Special Issue on Programming Models, Software and Tools for High-End Computing with the International Journal of High Performance Computing Applications (IJHPCA), 2011.
Pavan Balaji and Abhinav Vishnu. Guest Editors’ Introduction. Special Issue on Programming Models and Systems Software Support for High-End Computing Applications with the International Journal of High Performance Computing Applications (IJHPCA), 2010.
Wu-chun Feng and Pavan Balaji. Guest Editors’ Introduction. Special Issue on Tools and Environments for Multicore and Many-core Architectures with IEEE Computer, 2009.