Senior Site Reliability Engineer -Hybrid (Owings Mills, MD 2 days onsite, 3 days remote)

Overview

On Site

Hybrid

$OPEN

Contract - Independent

Contract - W2

Contract - 6+ Month(s)

50% Travel

Skills

Python

AWS

Bash

Powershell

Kubernetes

grafana

SRE

Job Details

Cerebra Consulting Inc is a System Integrator and IT Services Solution provider with a focus on Big Data, Business Analytics, Cloud Solutions, Amazon Web Services, Salesforce, Oracle EBS, Peoplesoft, Hyperion, Oracle Configurator, Oracle CPQ, Oracle PLM and Custom Application Development. Utilizing solid business experience, industry-specific expertise, and proven methodologies, we consistently deliver measurable results for our customers. Cerebra has partnered with leading enterprise software companies and cloud providers such as Oracle, Salesforce, Amazon and able to leverage these partner relationships to deliver high-quality, end-to-end customer solutions that are targeted to the needs of each customer.

Senior Site Reliability Engineer

Location: Hybrid (Owings Mills, MD 2 days onsite, 3 days remote)

Contract Duration: 6 months (potential rate increase pending approval)

Role Overview

This team is responsible for engineering scalable, resilient hybrid cloud solutions across AWS and On-prem environments. The ideal candidate will have strong technical expertise and will develop automation tooling, observability solutions, and SRE consulting practices to drive reliability and efficiency.

Key Responsibilities

Design and implement automated systems and services to ensure availability, reliability, and scalability across cloud and on-premises environments.
Develop monitoring and alerting frameworks using tools like Prometheus, Grafana, and New Relic for real-time analysis of system health.
Automate operational processes using Terraform, Ansible, Python, Groovy, PowerShell, Bash to reduce manual toil.
Define Service Level Indicators (SLIs), Service Level Objectives (SLOs), Error Budgets, & Burn Rate Alerts for proactive system reliability.
Collaborate with development & engineering teams to embed reliability best practices, mentor stakeholders, and drive adoption of SRE principles.
Conduct system performance analysis to determine operational trends, enhance observability, and improve resilience strategies.
Participate in continuous improvement efforts, generating new reliability standards across multi-functional domains.
Troubleshoot complex incidents & drive resolution alongside support and operations teams.
Lead documentation efforts for infrastructure automation best practices, ensuring operational knowledge is easily accessible and scalable.
Engage in an on-call rotation, proactively improving automation and alerting capabilities.

Required Qualifications

Strong experience in monitoring & alerting with Prometheus, Grafana, and New Relic.
Container orchestration expertise in AWS ECS, Fargate, and Kubernetes.
Docker container development experience.
Scripting experience in Python, Groovy, PowerShell, Bash, and Perl.
Proven track record building dashboards with Grafana, Prometheus, Statsd for system insights.
Extensive AWS cloud experience, operating in GitOps frameworks for automation.
Deep infrastructure & systems engineering knowledge including Unix/Linux, networking, storage, monitoring stacks.
Hands-on automation expertise with Terraform, Ansible.
Excellent written & oral communication skills to drive collaboration.
Adaptable, quick learner with strong interpersonal skills.
Experience managing off-hour implementations.

Preferred Qualifications

Bachelor's degree in Computer Science or related field.
7+ years of systems design, programming, implementation, and integration experience.
3+ years of AWS platform experience.
Relevant certifications (AWS, Kubernetes).

Interview Process & Logistics

First round: Zoom interview.
Second round: In-person onsite interview.
Candidates must use their device to connect via Citrix.

Thanks,

Sai Revanth

Email | revanth.patnala

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share