Overview
Skills
Job Details
Job Title: Release Operation Engineer/SRE
Contract: W2 Only - able to work on our payroll | No C2C |
Location: Woonsocket, RI (Onsite)
Visa Status: OPT &(No or H1Bs please)
Relevant Experience (in Yrs): 6+ Years
Note: Candidates with over 5+ years of hands-on experience in Java and Google Cloud Platform (Google Cloud Platform) are typically easier to select during the hiring process.
Job Description:
Release Management: Coordinate and manage release cycles for observability platforms. Ensure smooth and timely releases with minimal disruption to services. Work with partners to migrate legacy monitoring to modern solutions. Work with the observability engineering team to provide solutions for new requirements that arise, by leveraging existing or developing new solutions.
Incident/Request Management: Troubleshoot and resolve incidents related to observability platforms. Manage escalated customer issues and requests, ensuring timely and effective resolution. Document incident remediation activities and automate remediation activities where possible.
Performance Optimization: Continuously monitor and enhance platform performance to support scalability and complexity.
Collaboration and Communication: Collaborate with cross-functional infrastructure, application, and business stakeholders to ensure observability solutions align with the broader IT strategy and infrastructure requirements. Communicate effectively with team members, management, and other stakeholders.
Continuous Improvement: Identify opportunities for process optimization and efficiency gains. Stay current with industry trends and best practices to continuously improve observability operations.
Customer Focus: Ensure high levels of customer satisfaction by effectively managing customer relationships. Provide excellent customer service and support for observability solutions.
Compliance and Security: Ensure observability platforms comply with organizational policies and security standards. Implement tools and processes to detect and remediate configuration drifts and security risks.
Documentation and Reporting: Maintain comprehensive documentation of observability platform, Product DOU, processes, and procedures.
Technical Expertise:
5+ Years of experience in IT operations, with significant responsibilities in system monitoring,
performance tuning, and troubleshooting enterprise applications.
4+ Years in a Site Reliability Engineering (SRE) role managing modern observability solutions.
5+ years of development experience on enterprise class applications: Javascript/Java, Sql ,Spring boot & Micro services
5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana).
5+ years of experience of cloud computing platforms (e.g., AWS, Azure, Google Cloud Platform) and container orchestration (e.g., Kubernetes, Docker)
Familiarity with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab , ArgoCD etc)
Experience developing and implementing monitoring and logging standards for infrastructure, platforms, and applications.
Experience establishing and implementing event correlation policies and related rules to enrich event data, and reduce TTD and TTR.