Overview
Skills
Job Details
Job Title: Site Reliability Engineer
Location: Hartford, CT (Hybrid)
Contract: 6+ months
Job Description
Dynatrace, Splunk, CDN, Akamai
mission critical have to be up and running otherwise whole enterprise suffers
- Ideally someone with Splunk ITSI - IT Service Intelligence, Splunk Core at an admin level
- Dynatrace is desirable not required
- Everything with automation so this background and Infrastructure as code ( terraform) is helpful
- Pipelines, Jenkins is helpful
- We won't be coding a lot
- Would love someone who is a generalist who morphs easily with the new and emerging technology
Insurance's RE&A Observability team is seeking a highly motivated and experienced Senior Reliability Engineer with expertise in Splunk, Dynatrace, CDN, and other industry observability tools.
The Senior Reliability Engineer will be responsible for ensuring the reliability of IT services focused on the developer experience. . This role requires a "build-to-manage mindset, strong problem-solving skills, and innovative thinking applied to the design, build, test, deployment, change, and maintenance of services, leveraging deep engineering expertise. Previous experience and management of AI-based systems is desired.
Key success metrics include service stability, effective software delivery, and utilization of customer-based observability practices, and achieving top quartile operating norms within tools.
The Senior Reliability Engineer will also contribute to the ongoing advancement of RE Insights practice within and beyond their area of responsibility.
- Guide the use of best-in-class software engineering standards and design practices for instrumenting code and application technology stacks for Splunk, Dynatrace, and Akamai. Enable the generation of relevant metrics on overall technology health, including availability, performance, quality, technical debt, and resiliency.
Function as the go-to technical expert for the applications supported, requiring depth and breadth of knowledge in Splunk ITSI and related technologies. Provide expertise in applications, integration, interfaces, and the business domain to drive insights and improvements.
Leverage Splunk ITSI and Dynatrace Davis AI capabilities to enhance predictive analytics and automated incident response. Utilize AI-driven insights to proactively identify and address potential issues, ensuring optimal performance and reliability of IT services.
Insights Solution Responsibilities:
Enable alerting, monitoring, service intelligence, noise reduction, self-healing, dashboards (user journeys), and overall insights using Splunk ITSI, Dynatrace, to support all LOBs within the organization.
Enhance the delivery flow by engineering solutions with Splunk ITSI, Dynatrace to increase delivery speed while adhering to technology standards for sustained reliability.
Progressively implement preventative controls and drive increased automation and self-healing capabilities using Splunk ITSI, Dynatrace. Continue to improve cost efficiency baselines.
Promote and implement innovative solutions leveraging the capabilities of Splunk ITSI, Dynatrace.
IT Ops Responsibilities:
Ensure operational excellence. Independently drive the triaging and service restoration of all high-impact incidents to minimize the mean time to service restoration and impact the business. Demonstrate end-to-end ownership.
Partner with infrastructure teams to design and implement intelligent incident routing, enhanced monitoring/alerting capabilities and automated service restoration processes. Take proactive measures to prevent high impactful incidents.
Achieve and maintain the continuity of Hartford and third-party assets that support a business function. Accountable for keeping the IT application and infrastructure metadata repositories current.
System Thinking, end-to-end and broad understanding of enterprise architectures and distributed systems.
Highly collaborative, partners with peers, stakeholders with a passion about delighting customers.
Hands on experience with Performance and Observability tools such as Splunk ITSI (IT Service Intelligence), Dynatrace, Splunk, CloudWatch, CloudTrail, and related tools.
Strong solution architecture orientation to enable expedient troubleshooting, issue-resolution and root-cause removal in a hybrid cloud environment.
Experience with continuous integration and DevOps methodologies, preferred tools such as GitHub, Jenkins, Nexus, Rally, SonarQube, Akamai etc.
Keeps abreast with new market technologies and adept at learning and adopting new models. Promotes and applies continuous learning.
Knowledge of complex traditional and modern enterprise architectures and systems. Strong hybrid cloud experience (private and public) across various service delivery models SRE, IaaS, PaaS, SaaS.
Effective communication (verbally and written) / collaboration / negotiation skill, working in a diverse team cross business unit