Overview
Skills
Job Details
Job Description: Senior Site Reliability Engineer (SRE) Full Stack Observability
Location : Charlotte , North Carolina -Hybrid
We are seeking a highly skilled Senior Site Reliability Engineer (SRE) with extensive experience in full-stack observability for data applications across SaaS, hybrid cloud, and on-prem environments. The ideal candidate will be responsible for monitoring, troubleshooting, and optimizing ETL pipelines and integrations, ensuring data integrity, and supporting SaaS platforms. This role requires deep expertise in observability tools, automation, and change management.
Key Responsibilities
Design, implement, and maintain comprehensive observability solutions for data applications across SaaS, hybrid cloud, and on-prem environments.
Monitor ETL pipelines and integrations for failures, latency spikes, and data loss/integrity issues.
Build proactive alerts and dashboards to detect and resolve issues before they impact business operations.
Perform performance tuning and optimization of ETL processes and data workflows.
Provide strong SRE support for ETL operations, including incident response, root cause analysis, and resolution.
Lead change management and coordination for system updates, deployments, and infrastructure changes.
Collaborate with development, operations, and business teams to ensure seamless platform support and reliability.
Required Skills and Experience
8+ years of experience in full-stack observability for data applications in SaaS, hybrid cloud, and on-prem environments.
Proven track record in monitoring ETL pipelines, integrations, and data workflows.
Expertise in building proactive alerts and dashboards for ETL failures, latency spikes, and data loss/integrity.
Strong experience in ETL SRE support, including incident management and performance tuning.
Proficiency in observability tools such as Grafana, Splunk, Prometheus, and AppDynamics.
Strong scripting skills in Python and Bash.
Experience with automation tools like Ansible.
Deep knowledge of ETL tools and frameworks.
Experience supporting SaaS platforms and ensuring high availability and reliability.
Preferred Qualifications
Experience with cloud platforms (AWS, Azure, Google Cloud Platform).
Familiarity with containerization and orchestration tools (Docker, Kubernetes).
Knowledge of CI/CD pipelines and DevOps practices.
Strong communication and collaboration skills.