Overview
Skills
Job Details
Role: Site Reliability Engineer
Duration: 6-month contract
The Site Reliability Engineer will play a key role in building a sustainable platform by developing systems for analyzing environments, predicting and resolving issues, and supporting the production environment. He or she will automate system administration tasks, utilize monitoring tools, and collaborate with other teams to ensure seamless infrastructure and application integration. The ideal candidate has experience in system administration, DevOps tools, monitoring, and troubleshooting. Should also be proficient in scripting and maintaining Java applications and know about Docker and Kubernetes.
Responsibilities:
Collaborate with Infrastructure and Application teams to build a sustainable platform.
Automate system administration tasks using scripting tools such as Python or Shell.
Utilize monitoring and automation tools to analyze real-time issues.
Implement and manage monitoring and metrics in Prometheus, Grafana, and integrate with ServiceNow.
Work with cross-functional teams to ensure seamless operation of healthcare infrastructure and applications.
Support other teams' infrastructure needs on an as-needed basis.
Develop and use tools for continuous delivery automation.
Perform system administration on Linux (CentOS) and Windows Server.
Proficient with DevOps tools and environments like TeamCity/Jenkins and Git.
Implement and manage monitoring solutions.
Provide off-hours support on a rotational basis.
Be available to discuss and resolve technical issues and escalations.
Automate the detection and resolution of recurring issues in the production environment.
Utilize application performance monitoring tools like AppDynamics, Splunk, and Dynatrace.
Identify and automate self-remediation processes to enhance system reliability and performance.
Experience:
5 to 8 years of production support
Strong experience in system administration on Linux (RHEL, etc.) and Windows Server environments.
Proven track record in automating system administration tasks using scripting tools such as Python or Shell.
Hands-on experience with monitoring and automation tools to analyze real-time issues.
Experience implementing and managing monitoring and metrics.
Collaborative work with cross-functional teams to ensure seamless operation of infrastructure and applications using ServiceNow and Jira.
Support for infrastructure needs of other teams, maintaining compliance with healthcare regulations.
Development and use of tools for continuous delivery automation.
Provision of off-hours support on a rotational basis.
Education:
Bachelor's Degree or equivalent experience in Engineering
Master's Degree is preferred
Thanks & regards
Karthik Kanaji (Fine me linkedin)
Aspire IT Solutions Inc