New

Senior Site Reliability Engineer - CTJ - POLY

Microsoft
United States, Washington, Redmond
Nov 26, 2024
OverviewMicrosoft has an exciting opportunity for aSenior Site Reliability Engineerin theCloud+Artificial Intelligence (C+AI) Silver SQL Team. This team is responsible for deploying and operating the Azure SQL family of services within Azure Government clouds. In this role, you will have the opportunity to work with engineers who enable a broad set of Azure services to be consumed by customers in highly secured and regulated industries. The systems and software you build will berequiredto meet the security policy and assurance requirements of both public and private sector customers. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. ResponsibilitiesThe scale of our operations is enormous. Microsoft's products and services are overwhelmingly consumed online, and billions of people use them every day. We need people who enjoy analyzing complex problems,coming up withcreative solutions, working in focused teams to build things no one has thought of before, all in the service of production reliability. Demonstratesexperiencein distributed systems design, interactions between cloud technology layers and components, common dependencies at scale, and the code that defines infrastructures. Canidentifyand recommend configurationsoptimalof cloud technology solutions andmodifythe code base that defines systems or cloud technologies to improve the reliability and operability of supported products with minimal guidance from other engineers. Develops an understanding of the code, features, and operations of specific products at scale asrequiredto contribute to incremental improvements in product availability, reliability, efficiency, observability, and/or performance;participatesin on-boarding, code/design reviews, and regular meetings with the engineering teams that develop and/or manage those products. Researches andmaintainsan awareness in industry trends, advances in distributed systems and cloud technologies, new tools, and/or processes formaintainingand improving product availability, reliability, efficiency, observability, and/or performance. Contributes to the implementation ofnew solutionswithin their team byidentifyingways they can be applied to solve persistent problems. Leverages technicalexperiencein large-scale distributed systems and specific products as well as objective insights drawn from the analyses of production telemetry data to suggest changes or add-ons to product features or code to improve the availability, reliability, efficiency, observability, and performance of product components or features supported by their team. Develops and tests basic changes tooptimizecode and improve the observability,reliabilityand operability of a defined range of platform, system, or product components or features with direction from other engineers. Engages with product engineering teams by participating code/design reviews, regular meetings, on-call rotations and incident responses throughout product development and operations cycles;leveragestechnicalexperienceon underlying systems/platforms and insights drawn from engagements with product engineering teams and telemetry analyses to propose potential improvements in code base and designs across components and features of one or more products.Embody our Culture and Values