We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

AI Systems Engineer

Lenovo
United States, North Carolina, Morrisville
Dec 10, 2025


General Information
Req #
WD00091874
Career area:
Artificial Intelligence
Country/Region:
United States of America
State:
North Carolina
City:
Morrisville
Date:
Wednesday, December 10, 2025
Working time:
Full-time
Additional Locations:
* United States of America - North Carolina - Morrisville

Why Work at Lenovo
We are Lenovo. We do what we say. We own what we do. We WOW our customers.
Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.

Description and Requirements

Lenovo is seeking a highly motivated AI Systems Performance Engineer to contribute to the design, development, and exploration of our next-generation AI systems. As an a Systems Performance Engineer, you will be responsible for system-wide performance analysis and optimization of large-scale AI workloads, with a focus on LLM inference and agentic/Orchestrated systems. You'll work across the stack-from model graphs and runtime kernels to memory hierarchies, interconnects, and distributed deployment-to understand and improve latency, throughput, and cost on heterogeneous hardware. will design, build, and scale agentic AI systems: multi-step agents, orchestration layers for LLMs and tools, and the surrounding infrastructure that lets foundation models safely interact with real products and users. This is an exciting opportunity to gain hands-on experience with cutting-edge AI systems while collaborating with experienced engineers, researchers, and product teams to help advance Lenovo's Hybrid AI vision and make Smarter Technology for All.

Responsibilities

  • End-to-end performance analysis
    Analyze performance of LLM and agentic workloads across the full stack: models, runtimes, compilers, kernels, memory, interconnect, and distributed deployment.
  • Model- and context-aware tuning
    Characterize and optimize performance for models of varying size and context length, including tradeoffs around batch size, KV/cache management, quantization, and latency vs. throughput.
  • Memory & microarchitectural analysis
    Profile memory usage and access patterns across CPU, GPU, and accelerators; identify bottlenecks related to cache behavior, memory bandwidth, and compute utilization; propose and validate optimizations.
  • Networking & distributed systems
    Study and improve performance in heterogeneous distributed systems (multi-node, multi-accelerator), considering different networking conditions (latency, bandwidth, congestion); tune sharding, pipelining, and routing strategies.
  • Benchmarking & methodology
    Design, implement, and maintain benchmarks and load tests for LLM and agentic workloads under realistic traffic patterns and SLAs.
  • Optimization & experimentation
    Collaborate with ML, platform, and infrastructure teams to prototype and roll out optimizations (e.g., kernel-level improvements, scheduling changes, batching policies, caching strategies).
  • Observability & capacity planning
    Build and refine dashboards, alerts, and reports that surface key performance and efficiency metrics; provide data-driven guidance for capacity planning and hardware selection.
  • Cross-functional collaboration
    Work closely with model, runtime, and platform teams to translate performance findings into architectural improvements and product-impacting changes.

Qualifications

  • 2+ years of industry experience in systems performance engineering, ML infrastructure, HPC, or related fields.
  • Master's degree or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field.
  • Strong understanding of computer architecture: CPU/GPU pipelines, caches, memory hierarchies, vector/SIMD, and accelerators.
  • Experience profiling and optimizing performance of complex systems using tools such as perf, VTune, Nsight, rocprof, or similar.
  • Strong coding skills in C++ and/or Python.
  • Experience working with Linux-based systems, shell scripting, and standard tooling.
  • Familiarity with containerized environments and orchestration (e.g., Docker, Kubernetes).
  • Experience working with ML workloads (preferably deep learning) in frameworks like PyTorch, TensorFlow, or JAX.
  • Conceptual understanding of LLM inference, including batching, token generation, and context window behavior.
  • Understanding of distributed systems concepts (RPC, load balancing, fault tolerance) and basic networking fundamentals (latency, bandwidth, throughput).
  • Strong data analysis skills; comfortable working with logs, traces, and metrics.
  • Ability to clearly communicate findings and tradeoffs to both engineering and non-engineering stakeholders.

Bonus Points

  • Hands-on experience optimizing LLM inference or other large-scale deep learning workloads on GPUs or specialized accelerators.
  • Experience with heterogeneous systems (e.g., mixtures of CPU, GPU, NPU/ASIC) and cluster-scale deployment.
  • Familiarity with LLM-specific optimization techniques (KV cache strategies, quantization, tensor/sequence parallelism, speculative decoding, etc.).
  • Experience with large-scale observability stacks (Prometheus, Grafana, OpenTelemetry) for performance monitoring.
  • Prior work on high-performance computing (HPC), networking-intensive systems, or real-time/low-latency services.

#LATC

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.
Additional Locations:
* United States of America - North Carolina - Morrisville
* United States of America
* United States of America - North Carolina
* United States of America - North Carolina - Morrisville

Applied = 0

(web-df9ddb7dc-zsbmm)