Senior Site Reliability Engineer -Hybrid (Owings Mills, MD 2 days onsite, 3 days remote)

Overview

On Site
Hybrid
$OPEN
Contract - Independent
Contract - W2
Contract - 6+ Month(s)
50% Travel

Skills

Python
AWS
Bash
Powershell
Kubernetes
grafana
SRE

Job Details

Cerebra Consulting Inc is a System Integrator and IT Services Solution provider with a focus on Big Data, Business Analytics, Cloud Solutions, Amazon Web Services, Salesforce, Oracle EBS, Peoplesoft, Hyperion, Oracle Configurator, Oracle CPQ, Oracle PLM and Custom Application Development. Utilizing solid business experience, industry-specific expertise, and proven methodologies, we consistently deliver measurable results for our customers. Cerebra has partnered with leading enterprise software companies and cloud providers such as Oracle, Salesforce, Amazon and able to leverage these partner relationships to deliver high-quality, end-to-end customer solutions that are targeted to the needs of each customer.

Senior Site Reliability Engineer
Location: Hybrid (Owings Mills, MD 2 days onsite, 3 days remote)
Contract Duration: 6 months (potential rate increase pending approval)
Role Overview
This team is responsible for engineering scalable, resilient hybrid cloud solutions across AWS and On-prem environments. The ideal candidate will have strong technical expertise and will develop automation tooling, observability solutions, and SRE consulting practices to drive reliability and efficiency.
Key Responsibilities
  • Design and implement automated systems and services to ensure availability, reliability, and scalability across cloud and on-premises environments.
  • Develop monitoring and alerting frameworks using tools like Prometheus, Grafana, and New Relic for real-time analysis of system health.
  • Automate operational processes using Terraform, Ansible, Python, Groovy, PowerShell, Bash to reduce manual toil.
  • Define Service Level Indicators (SLIs), Service Level Objectives (SLOs), Error Budgets, & Burn Rate Alerts for proactive system reliability.
  • Collaborate with development & engineering teams to embed reliability best practices, mentor stakeholders, and drive adoption of SRE principles.
  • Conduct system performance analysis to determine operational trends, enhance observability, and improve resilience strategies.
  • Participate in continuous improvement efforts, generating new reliability standards across multi-functional domains.
  • Troubleshoot complex incidents & drive resolution alongside support and operations teams.
  • Lead documentation efforts for infrastructure automation best practices, ensuring operational knowledge is easily accessible and scalable.
  • Engage in an on-call rotation, proactively improving automation and alerting capabilities.
Required Qualifications
  • Strong experience in monitoring & alerting with Prometheus, Grafana, and New Relic.
  • Container orchestration expertise in AWS ECS, Fargate, and Kubernetes.
  • Docker container development experience.
  • Scripting experience in Python, Groovy, PowerShell, Bash, and Perl.
  • Proven track record building dashboards with Grafana, Prometheus, Statsd for system insights.
  • Extensive AWS cloud experience, operating in GitOps frameworks for automation.
  • Deep infrastructure & systems engineering knowledge including Unix/Linux, networking, storage, monitoring stacks.
  • Hands-on automation expertise with Terraform, Ansible.
  • Excellent written & oral communication skills to drive collaboration.
  • Adaptable, quick learner with strong interpersonal skills.
  • Experience managing off-hour implementations.
Preferred Qualifications
  • Bachelor's degree in Computer Science or related field.
  • 7+ years of systems design, programming, implementation, and integration experience.
  • 3+ years of AWS platform experience.
  • Relevant certifications (AWS, Kubernetes).
Interview Process & Logistics
  • First round: Zoom interview.
  • Second round: In-person onsite interview.
  • Candidates must use their device to connect via Citrix.

Thanks,

Sai Revanth
Email | revanth.patnala

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.