New
Observability Engineer
Spectraforce Technologies | |
United States, Virginia, Richmond | |
Mar 24, 2026 | |
|
Job Title: Observability Engineer Location: McLean, VA or Richmond, VA, or Chicago, IL (hybrid three days per week) Duration: 4 months Job Description: Observability Engineer SRE background required, AWS, Python/Java, Expertise in observability tools like Splunk, New Relic, Observe (Must have) Working on journey mapping on DFS intent Focus: Full-Stack Observability, System Traceability, & Executive Health Scoring Role Summary We are seeking a hands-on Observability Specialist to accelerate the adoption of our Observe based platform. The ideal candidate possesses an SRE mindset-the ability to explore how complex systems interact and identify the exact data sets needed to provide a 360-degree view of the environment. You will bridge the gap between disparate Lines of Business (LOBs) to build E2E traceability and unified "Health Indices" that reduce mean-time-to-detect (MTTD) from hours to minutes. Technical Skill Requirements 1. Core Observability & Tooling Platform Expertise: Deep experience with modern observability platforms. While we use Observe, proficiency in New Relic, Splunk, or Databricks is required for rapid ramp-up. Query & Data Fluency: Expert-level ability to write complex queries (SQL-based or proprietary like NRQL/SPL) to aggregate API success rates, latency, and crash-free session data. Dashboard Architecture: Proven track record of building "Drill-Down" architectures-moving from high-level user journeys (e.g., Login) directly into microservice-level logs and traces. 2. The Modern Tech Stack Infrastructure: Hands-on experience with AWS (ECS/Fargate/Lambda) and Docker. Languages: Ability to navigate and instrument code in Python or Java. Integrations: Familiarity with GraphQL for data fetching and Jenkins for CI/CD pipeline monitoring. Instrumentation: Hands-on experience with OTel, and familiarity with NewRelic APM or Datadog APM 3. SRE & Systems Architecture Mindset Cross-Domain Traceability: Experience monitoring digital customer engagement across disparate system boundaries (e.g., Comms, Phone, and Backend APIs) to expose "silent failures." Telemetry Mapping: Ability to map technical metrics to business outcomes, specifically creating Unified Health Indices for Senior Leadership (SLT)Root Cause Analysis (RCA): Skill in configuring alerts and correlations that enable instant pinpointing of failures within complex user flows. | |
Mar 24, 2026