Site Reliability Engineer (Level 2)

Hybrid in Farmington Hills, MI, US • Posted 30+ days ago • Updated 9 hours ago
Contract Corp To Corp
Contract W2
Hybrid
$50 - $58/hr
Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

  • Datadog
  • Tidal scheduler
  • PagerDuty
  • Cloud operations
  • Infrastructure as Code
  • cloud platforms
  • Bash
  • Python
  • SRE
  • Site Reliability Engineering

Summary

The Site Reliability Engineer (Level 2) is responsible for operating and enhancing the performance, availability, and reliability of cloud and on-premises infrastructure. This individual consults on more complex observability scenarios, streamlining server and batch operations, and contributing to the efficient management of data center resources. Consults with cross-functional teams to identify opportunities for process improvements, implement best practices, and support critical business operations. 

Essential Tasks/Major Duties:

  • Develop, implement, and maintain observability tools to monitor cloud and on-premises systems. 
  • Create dashboards, alerts, and reports to track system health, performance, and availability. 
  • Proactively leverage observability tools and identify opportunities. 
  • Analyze metrics and logs to identify trends, prevent potential issues, and optimize system performance. 
  • Collaborate with FinOps teams to monitor resource utilization and ensure cost-effective operations across cloud environments. 
  • Support the lifecycle of cloud and on-premises servers, including provisioning, patching, configuration, and decommissioning. 
  • Troubleshoot and resolve server-related issues, ensuring minimal downtime. Implement and enforce server security policies and compliance requirements. 
  • Schedule, monitor, and manage batch processes to ensure timely execution of critical tasks. 
  • Identify and resolve batch failures or delays, coordinating with relevant teams to ensure smooth operations. 
  • Optimize batch jobs for improved performance and resource utilization. 
  • Manage on-site and remote data center operations, ensuring proper functioning of hardware, power, cooling, and network infrastructure. 
  • Coordinate with vendors and service providers for hardware maintenance, replacements, and upgrades. 
  • Maintain accurate inventory of data center assets and ensure compliance with organizational standards. 
  • Participate in on-call rotations to address system incidents and outages promptly. 
  • Conduct root cause analysis and implement solutions to prevent recurrence of issues. Document and communicate incident resolution processes to relevant stakeholders. 
  • Work closely with cross-functional teams, including DevOps, Networking, and Application Development, to implement and maintain system integrations. 
  • Maintain and create comprehensive documentation for configurations, processes, and incident resolutions. 
  • Provide training and support to team members and other departments. 

Knowledge, Skills & Abilities:

  • Bachelor’s degree in computer science, Information Technology, or a related field, or equivalent experience. 
  • 3 years of experience working with monitoring and observability tools (e.g., Datadog, PagerDuty). 
  • Certified Datadog Fundamentals or equivalent experience required. 
  • Certified PagerDuty Administrator or equivalent experience required. 
  • 3 years of experience in cloud operations or server management roles. 
  • Certified AWS SysOps Administrator or equivalent experience required. 
  • 3 years of progressive server administration experience (Windows, Linux). 
  • 3 years of experience in designing, implementing, and managing IT workload automation solutions to optimize scheduling, orchestration, and execution of enterprise workflows across on-prem and cloud environments. 
  • Experience leveraging artificial intelligence to drive innovation and solve complex problems. Demonstrated ability to utilize AI-driven solutions that optimize processes, enhance decision-making, or create transformative business outcomes. 
  • 3 years working with cloud platforms (AWS, Azure, OCI). 
  • Certified AWS Cloud Practitioner or equivalent experience required. 
  • Strong experience with data center infrastructure and knowledge of best practices. 
  • Proficiency in scripting and automation tools (Python, Bash, PowerShell). 
  • Strong understanding of networking and identity management in cloud environments. 
  • Working knowledge of security best practices and compliance standards. 
  • Working knowledge of agile methodologies. 
  • Excellent troubleshooting, problem-solving, and communication skills. 

 

Salary/Rate: $50-$55/HR (depends on experience level). This is a contract position with candidates expected to work 40 hours/ week.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10123255
  • Position Id: 105672
  • Posted 30+ days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Farmington Hills, Michigan

Today

Full-time

USD 40.00 - 60.00 per hour

Remote or Southfield, Michigan

17d ago

Contract

competitive

Dearborn, Michigan

Today

Easy Apply

Contract

$61 - $66 /hr

Farmington Hills, Michigan

Today

Full-time

USD 85,000.00 - 110,000.00 per year

Search all similar jobs