Apply Now

Site Reliability Engineer (Level 2)

Hybrid in Farmington Hills, MI, US • Posted 30+ days ago • Updated 9 hours ago

Contract Corp To Corp

Contract W2

Hybrid

$50 - $58/hr

Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

Datadog
Tidal scheduler
PagerDuty
Cloud operations
Infrastructure as Code
cloud platforms
Bash
Python
SRE
Site Reliability Engineering

Summary

The Site Reliability Engineer (Level 2) is responsible for operating and enhancing the performance, availability, and reliability of cloud and on-premises infrastructure. This individual consults on more complex observability scenarios, streamlining server and batch operations, and contributing to the efficient management of data center resources. Consults with cross-functional teams to identify opportunities for process improvements, implement best practices, and support critical business operations.

Essential Tasks/Major Duties:

Develop, implement, and maintain observability tools to monitor cloud and on-premises systems.
Create dashboards, alerts, and reports to track system health, performance, and availability.
Proactively leverage observability tools and identify opportunities.
Analyze metrics and logs to identify trends, prevent potential issues, and optimize system performance.
Collaborate with FinOps teams to monitor resource utilization and ensure cost-effective operations across cloud environments.
Support the lifecycle of cloud and on-premises servers, including provisioning, patching, configuration, and decommissioning.
Troubleshoot and resolve server-related issues, ensuring minimal downtime. Implement and enforce server security policies and compliance requirements.
Schedule, monitor, and manage batch processes to ensure timely execution of critical tasks.
Identify and resolve batch failures or delays, coordinating with relevant teams to ensure smooth operations.
Optimize batch jobs for improved performance and resource utilization.
Manage on-site and remote data center operations, ensuring proper functioning of hardware, power, cooling, and network infrastructure.
Coordinate with vendors and service providers for hardware maintenance, replacements, and upgrades.
Maintain accurate inventory of data center assets and ensure compliance with organizational standards.
Participate in on-call rotations to address system incidents and outages promptly.
Conduct root cause analysis and implement solutions to prevent recurrence of issues. Document and communicate incident resolution processes to relevant stakeholders.
Work closely with cross-functional teams, including DevOps, Networking, and Application Development, to implement and maintain system integrations.
Maintain and create comprehensive documentation for configurations, processes, and incident resolutions.
Provide training and support to team members and other departments.

Knowledge, Skills & Abilities:

Bachelor’s degree in computer science, Information Technology, or a related field, or equivalent experience.
3 years of experience working with monitoring and observability tools (e.g., Datadog, PagerDuty).
Certified Datadog Fundamentals or equivalent experience required.
Certified PagerDuty Administrator or equivalent experience required.
3 years of experience in cloud operations or server management roles.
Certified AWS SysOps Administrator or equivalent experience required.
3 years of progressive server administration experience (Windows, Linux).
3 years of experience in designing, implementing, and managing IT workload automation solutions to optimize scheduling, orchestration, and execution of enterprise workflows across on-prem and cloud environments.
Experience leveraging artificial intelligence to drive innovation and solve complex problems. Demonstrated ability to utilize AI-driven solutions that optimize processes, enhance decision-making, or create transformative business outcomes.
3 years working with cloud platforms (AWS, Azure, OCI).
Certified AWS Cloud Practitioner or equivalent experience required.
Strong experience with data center infrastructure and knowledge of best practices.
Proficiency in scripting and automation tools (Python, Bash, PowerShell).
Strong understanding of networking and identity management in cloud environments.
Working knowledge of security best practices and compliance standards.
Working knowledge of agile methodologies.
Excellent troubleshooting, problem-solving, and communication skills.

Salary/Rate: $50-$55/HR (depends on experience level). This is a contract position with candidates expected to work 40 hours/ week.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10123255
Position Id: 105672
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Cloud Operations Analyst

Farmington Hills, Michigan

•

Today

TEKsystems has a client that is seeking Cloud Operations Analyst: Summary Statement The Cloud Operations Analyst is responsible for leading the management, optimization, and automation of cloud and on-premises infrastructure to ensure seamless operations and business continuity. This role includes driving improvements in observability, server and batch operations, and data center management while proactively identifying and resolving performance and reliability issues. The Cloud Operations Anal

Full-time

USD 40.00 - 60.00 per hour

Full-Stack Engineer

Remote or Southfield, Michigan

•

17d ago

The Journeyman will be responsible for designing, developing, and supporting full stack applications in a cloud-native AWS environment. The role requires strong hands-on development experience, adherence to Agile practices, and a focus on quality, security, and performance across all deliverables. Role Title: Full-Stack Engineer Location: Remote Clearance Requirement: Public Trust required Required Skills and Experience Minimum of five years of software development experience with hands-on desig

Contract

competitive

Java FullStack Developer

Dearborn, Michigan

•

Today

Stefanini Group is hiring! Stefanini is looking for a Java FullStack Engineer, Dearborn, MI (Onsite) For quick apply, please reach out to Adil Khan at / We are looking for a candidate who is focused on developing and maintaining reusable software components that serve the needs of product developers in the organization. They are responsible for designing, implementing, integrating and maintaining the underlying infrastructure and software applications that support developer productivity and self

Easy Apply

Contract

$61 - $66 /hr

Cloud Operations Analyst

Farmington Hills, Michigan

•

Today

Top Skills' Details Enterprise Windows-based systems administration - 5 plus yearsCloud Experience, AWS preferred- 3 plus yearsObservability experience Key Responsibilities: The role involves maintaining a legacy workload automation platform (Tidal), focusing on observability (Datadog, PagerDuty, etc.), and general Windows-based systems administration, including VM management moving to AWS.While Tidal experience is part of the role, the preference is for candidates with modern skill sets in clou

Full-time

USD 85,000.00 - 110,000.00 per year

Search all similar jobs