Observability & DevOps Specialist Role Overview The Observability & DevOps Specialist is responsible for improving the reliability, visibility, and deployment maturity of enterprise database platforms and their supporting infrastructure. This role focuses on standardizing monitoring and observability practices, integrating telemetry into CI/CD pipelines, and ensuring operational workflows are scalable, auditable, and support regulatory requirements. The specialist works closely with database, infrastructure, application, and security teams to define and implement practical DevOps and observability patterns that reduce operational risk and improve system resilience. The role supports both current state stabilization and longer term modernization initiatives. Key Responsibilities Observability & Monitoring Assess, design, and improve monitoring and observability across enterprise database platforms such as Oracle, SQL Server, Snowflake, and cloud native databases. Evaluate and optimize existing monitoring tools (e.g., Splunk, OEM, Spotlight, Azure Monitor, Datadog) to improve alert fidelity, reduce noise, and centralize visibility. Define standardized metrics, logs, and alerting practices for database performance, availability, backups, replication, and disaster recovery health. Build dashboards and operational views that provide a single source of truth for database health and incident response. Develop runbooks and response playbooks to enable faster diagnosis and reduced mean time to resolution (MTTR). CI/CD & DevOps Integration Integrate database observability and health validation into CI/CD pipelines to ensure changes are safely promoted across environments. Support adoption of version controlled database changes, controlled promotions, and rollback strategies. Collaborate with application and platform teams to align database DevOps practices with existing delivery processes and approval workflows. Ensure automation workflows support auditability, traceability, and separation of duties. Governance, Security & Compliance Ensure observability and DevOps practices align with regulatory, audit, and security standards. Validate consistent enforcement of audit logging, encryption, access controls, and retention requirements. Contribute to governance artifacts such as risk registers, dependency matrices, decision logs, and operational standards. Partner with security and compliance teams to document controls and supporting operational evidence. Continuous Improvement & Enablement Identify opportunities for automation and self healing to reduce manual effort and operational risk. Support tool rationalization initiatives by identifying redundant or underutilized monitoring platforms. Document repeatable patterns, operational guidance, and dashboards that can be sustained by internal teams. Support knowledge transfer and enablement so teams can confidently own and evolve observability and automation practices. Required Qualifications 7+ years of experience in observability, DevOps, SRE, or infrastructure operations roles. Hands on experience supporting enterprise database platforms (e.g., Oracle, SQL Server, Snowflake). Strong experience with monitoring, logging, and alerting platforms such as Splunk, Azure Monitor, OEM, Datadog, or similar tools. Practical experience integrating monitoring and telemetry into CI/CD pipelines. Solid understanding of database availability, replication, backup, disaster recovery, and performance tuning concepts. Experience working in environments with regulatory, audit, or compliance considerations. Strong collaboration skills across database, infrastructure, application, and security teams. Preferred Qualifications Experience in highly regulated industries. Familiarity with Infrastructure as Code tools such as Terraform, ARM, or CloudFormation. Experience with database schema migration and deployment tooling (e.g., Liquibase, Flyway). Demonstrated success improving alert quality and reducing operational noise. Experience supporting hybrid environments spanning on premises and cloud platforms. Success Indicators Reduced alert fatigue and improved signal to noise across monitored platforms. Centralized dashboards providing clear, actionable system health views. CI/CD pipelines that include embedded database health and compliance checks. Improved operational readiness through consistent audit logging and evidence capture. Teams confidently operating and evolving observability and automation practices independently. |