Job Opportunities
The codesign group at Facebook has a number of full-time job openings
for research scientist, software engineer, and engineering manager
roles.
About the Team
The AI SW/HW Co-Design team is a part of Facebook technology and
strategy organization. The team’s mission is to explore, develop and
help productize high-performance software and hardware technologies
for AI at datacenter scale. The team co-optimizes both software
(e.g., algorithms and numerics) and hardware (e.g., platform and
network) to come up with a balanced system design. Developing new
systems requires understanding performance bottlenecks on existing
systems. Therefore, the team also works on optimizing performance of
AI models running in production. This has resulted in TCO wins for
all key AI services.
The team makes impact by working on both pathfinding and
productization: it builds prototypes to demonstrate the value of the
new ideas, and it works closely with many partner teams to bring them
into production.
Here are some of our recent publications, if you are interested in
learning more about the kind of work that we do:
- DL Models and benchmarks
- Algorithms
- Systems
- Quantization techniques
AI System Codesign Research Scientist or Software Engineer Roles
Roles and Responsibilities:
- HPC systems software expert
- [For Software Engineer Roles] Strong programming ability: Ranging
the spectrum of prototype software to production
- [For Research Scientist Roles] Proven track record of publishing in top-tier conferences
- Performance at the core: Ability to identify and improve software and hardware performance bottlenecks
- Computing-at-scale: Enabling large-scale software deployments in production
- Ability to work with and develop software solutions for cutting-edge (including off-market) hardware systems
- Ability to work across team boundaries
Qualifications/Experience/Skills Required (it is OK to have expertise
in only a subset of them):
- Experience with some subset of the following HPC systems software:
Accelerator (GPU/ASIC) kernel development and optimization (e.g.,
NVIDIA, AMD, Intel, or other misc accelerator), CPU-based threading
models (e.g., OpenMP, TBB, Pthreads), HPC communication libraries
(e.g., NCCL, MPI), parallel runtime software solutions, numerical
libraries (e.g., mixed precision linear algebra, tensor-based
frameworks), performance enablement, tracing, profiling and
debugging.
- Scientific computing or other forms of HPC with an AI/ML/DL
emphasis
- Extensive programming experience. Knowledge of C/C++ and Python
(highly skilled in at least one of them).
- Scaling research across different modalities
- Experience with AI workload optimizations (compression,
quantization, pruning techniques; graph-based systems)
- Experience with distributed AI training and inference
(data-center-scale distributed training/inference is preferred)
- Performance, programmability, and efficiency at data-center scale
(data access optimizations such as prefetching and caching,
designing scalable frameworks for efficient use of high performance
hardware, high-performance and fault-tolerant middleware, network
and communication fabric optimization)
To Apply
For full-time positions, please apply
here. Unfortunately,
the posting is shared between multiple teams, so if you are interested
in the above mentioned roles, please reach out to Pavan
Balaji, Maxim
Naumov or Sam
Naghshineh expressing your
interest and the specific kind of role that you are interested in as
well.
For internship positions: Bachelors and masters students, please apply
here;
Ph.D. students, please apply
here.