Overview
Skills
Job Details
SRE Lead Engineer
Position: Contract (W2)
Duration: 12+ Months
Location: Austin, TX [Hybrid]
Job description:
Around 10-12 years of SRE hands on experience with cloud technologies, development, SRE toolsets and automation
Own the design, configuration, deployment, and optimization of Dynatrace for enterprise-wide observability.
Define monitoring standards, best practices, and governance to ensure consistency and scalability.
Strong skills in APM, distributed tracing, synthetic & real user monitoring, log monitoring, and Davis AI configuration.
Experience to deploy and tune OneAgent, build end-to-end PurePath tracing, and leverage Smartscape topology for proactive performance monitoring and root-cause analysis.
Experience integrating Dynatrace with incident management, automation, and cloud platforms (AWS, Azure, Google Cloud Platform).
Strong problem-solving skills and ability to work in cross-functional, fast-paced environments.
Collaborate with application and infrastructure teams to troubleshoot performance issues and implement permanent fixes.
Correlation mechanisms and dashboards to have end to end visibility of requests from external to internal applications.
Strong hands-on experience with any Cloud Technology (AWS): Control Tower, Project Setup, Creating Accounts, RDS, SSO
Solid understanding and hands on experience with Docker/Kubernetes
Should have good experience with Linux Commands, GitLab CICD Setup and Terraform (state management, etc)
Monitoring & alerting setup experience with Splunk, Prometheus, Grafana, Kibana, ELK etc.
Good understanding of Observability Framework leveraging programmatic SLI/SLO blueprints to standardize the collection of golden signals.
Should have automation (data refresh, releases, DB snapshots) experience using Ansible or any other scripting languages
Experience with following languages (Groovy-DSL, Java, Python, Yaml and microservices architecture)
Good understanding and hands on experience with MQ, Kafka
Experience with Databases (Oracle, MySQL)
Good to have: Any of the relevant professional certifications Certified Site Reliability Engineer (CSRE), Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer Professional, , Google Cloud Professional; DevOps Engineer