We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Manager - AI ML Software Dev. Triton Compiler

Advanced Micro Devices, Inc.
$150,560.00/Yr.-$225,840.00/Yr.
United States, Texas, Austin
7171 Southwest Parkway (Show on map)
Apr 19, 2025


WHAT YOU DO AT AMD CHANGES EVERYTHING

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

AMD together we advance_

THE ROLE:

This Manager - AI ML Triton Compiler role offers the unique opportunity to shape the future of AI and machine learning by optimizing deep learning frameworks and GPU performance for AMD's cutting-edge hardware. As the Manager of Triton Compiler, you'll be at the forefront of enhancing GPU kernels for some of the most important workloads while driving innovation in high-performance computing environments. Join a dynamic team where your work will directly impact the performance and scalability of AI models on some of the most powerful GPUs in the industry.

THE PERSON:

We are looking for a visionary leader with deep technical expertise in GPU kernel development, compiler development, enhancement and optimization, and AI/ML frameworks. The ideal candidate is a hands-on problem solver, with a proven track record of managing high-performance software teams and delivering groundbreaking solutions that push the boundaries of deep learning technology.

KEY RESPONSIBILITIES:

  • Develop and Optimize Triton Compiler: In-depth experience in enhancing and optimizing compiler infrastructure including Triton, MLIR, LLVM and related AMD tooling and libraries.
  • Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations.
  • Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance.
  • Collaborate with GPU Library Teams: Experience in collaborating closely with internal teams to analyze and improve training and inference performance on AMD GPUs.
  • Collaborate with Open-Source Maintainers: Strong experience with framework maintainers to ensure code changes are aligned with requirements and integrated upstream.
  • Work in Distributed Computing Environments: Expert in optimizing deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems.
  • Utilize Cutting-Edge Compiler Tech: Understand how to effectively leverage advanced compiler technologies to improve deep learning performance.
  • Optimize Deep Learning Pipeline: Expert at enhancing the full pipeline, including integrating graph compilers.
  • Software Engineering Best Practices: Knowledgeable about sound engineering principles to ensure robust, maintainable solutions and able to guide the application strategically and effectively.
  • Lead and Manage Software Development Teams: Experienced in leading and managing a team of software developers, focusing on compiler optimization for deep learning workloads. Able to foster a collaborative and innovative environment, ensuring the successful execution of complex software projects.

PREFERRED EXPERIENCE:

  • GPU Kernel Development & Optimization: Strong technical experience in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM). Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance.
  • Deep Learning Integration: Expert level experience in integrating optimized GPU performance into machine learning frameworks such as TensorFlow and PyTorch, with a focus on accelerating model training and inference while maximizing scaling and throughput.
  • Software Engineering: Expert in Python and C++, with deep experience in debugging, performance tuning, and test design to ensure high-quality, maintainable software solutions in AI and ML contexts.
  • Compiler Optimization: Solid understanding of compiler theory, with hands-on experience using tools like LLVM and ROCm for kernel and system-level performance optimization. Ability to apply compiler techniques to deep learning workloads to enhance overall efficiency.
  • Leadership & Team Management: Demonstrated experience in leading, mentoring, and growing teams of software developers, fostering a collaborative culture that encourages innovation and excellence. Strong skills in project management, resource allocation, and delivering results under tight deadlines.
  • Professional Experience: Professional experience in technical software development, with a focus on GPU optimization, performance engineering, and/or framework development.

ACADEMIC CREDENTIALS:

  • Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.

#LI-JG1

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

Applied = 0

(web-77f7f6d758-2q2dx)