New
Software Engineer II
![]() | |
![]() United States, Washington, Redmond | |
![]() | |
OverviewMicrosoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft's expanding Cloud Infrastructure and responsible for powering Microsoft's "Intelligent Cloud" mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and deliver trusted experience to customers and partners worldwide and we are looking for passionate, high-energy engineers to help achieve that mission. To achieve this goal, we in the Hardware Health Service team within Azure are responsible for the design, implementation, and operation of global scalable cloud services to monitor the fleet's hardware health and predict anomalies and pending failures. We focus on delivering the solutions required for our cloud service platforms at the lowest possible cost of ownership (TCO) and providing great customer experience on unreliable hardware. Azure Hardware Health Service is looking for a Software Engineer II to join our team!
ResponsibilitiesDevelop and operate large scale, low latency, and high throughput cloud services.Drive highly complex and mission critical solutions that involve multiple Azure Services. Evaluates AI technologies (such as LLMs, SLMs, embeddings) and architectures (such as orchestration patterns, RAG, etc) when developing solutions. Specifies or implements AI platform improvements like fine-tuning or training custom ML models.Define & measure the success/impact of requested analytics & reporting features via quantitative measures.Contributes to data analysis and feedback integration for product engineering decisions, acting as a Designated Responsible Individual (DRI) for monitoring and restoring system functionality within Service Level Agreement (SLA) timeframe. Participates in live service operations, and supports telemetry data integration for system behavior insights, with a focus on performance, reliability, and safety.Supports the identification of dependencies and design documentation for product features, learns about system interactions and back-end dependencies, and contributes to architectural processes under guidance.Produces code to test hypotheses for technical solutions and assist with technical validation efforts. Collaborates on quality assurance plans, augments test cases, and integrates automation into testing, while understanding the implications of security and compliance in system architecture.Ensures compliance with security, privacy, safety, and accessibility standards, leverages developer tools for code creation and debugging, contributes to automation in production and deployment, and proactively seeks knowledge to improve product availability, reliability, efficiency, and performance at scale.Understands and applies Microsoft's responsible AI practices to ensure systems meet our commitments to our customers. |