Overview
On Site
Full Time
Skills
FOCUS
Network Administration
Scalability
Boost
IaaS
Dashboard
Provisioning
Continuous Integration
Root Cause Analysis
Cloud Computing
Operational Excellence
Computer Science
Continuous Delivery
Workflow
Terraform
Kubernetes
Regulatory Compliance
Docker
Scripting
Database Administration
Version Control
Git
Incident Management
Management
Mortgage
Cloud Security
Disaster Recovery
Amazon Web Services
DevOps
Job Details
Site Reliability Engineer
Primary Focus Areas: Cloud Infrastructure, System & Network Administration, Monitoring, Governance, Risk, and Compliance
Position Level: P4 - Advanced
Work Location: McLean, VA or Wilmington, NC
Overview
Looking for a skilled and driven Cloud based Site Reliability Engineer to help ensure the performance, scalability, and reliability of our AWS-based cloud systems. In this role, you'll work closely with engineers and ops teams to boost system stability and developer efficiency through automation, monitoring, and incident management.
Key Responsibilities
Education Requirements
Experience Requirements
Core Competencies
Required Skills:
Preferred Skills:
Certifications
Primary Focus Areas: Cloud Infrastructure, System & Network Administration, Monitoring, Governance, Risk, and Compliance
Position Level: P4 - Advanced
Work Location: McLean, VA or Wilmington, NC
Overview
Looking for a skilled and driven Cloud based Site Reliability Engineer to help ensure the performance, scalability, and reliability of our AWS-based cloud systems. In this role, you'll work closely with engineers and ops teams to boost system stability and developer efficiency through automation, monitoring, and incident management.
Key Responsibilities
- Build and manage robust, scalable, and secure AWS-based cloud infrastructure.
- Create and support monitoring systems, alert mechanisms, and dashboards to ensure uptime and service health.
- Use infrastructure-as-code tools like Terraform (with Terragrunt), CDK, and CloudFormation to automate provisioning and configuration.
- Set up and manage CI/CD workflows to facilitate efficient code deployment and enhance development processes.
- Take ownership of incident resolution, conduct thorough root cause analysis, and develop long-term solutions to recurring problems.
- Partner with engineering teams to fine-tune performance, bolster reliability, and manage cloud costs effectively.
- Promote operational excellence and guide architectural decisions for infrastructure enhancements.
- Develop and maintain disaster recovery strategies to guarantee system continuity in crisis scenarios.
Education Requirements
- Required: Bachelor's degree in Computer Science
- Preferred: Master's degree in Computer Science or related field
Experience Requirements
- Required: Minimum of 5 years of relevant experience
- Preferred: 8 years of experience in a similar role
Core Competencies
Required Skills:
- Experience with Argo CD and Argo Workflows
- Proficiency in infrastructure-as-code: Terraform and Terragrunt
- Kubernetes and Linkerd knowledge
- In-depth experience with AWS services (EKS, Fargate, Aurora)
- Strong background in security and compliance
- Containerization tools such as Docker
- Monitoring and logging technologies
- Scripting or programming language proficiency
- Database administration
- Source control using Git
- Hands-on experience in incident response and management
Preferred Skills:
- Familiarity with Datadog
- Knowledge of Cloudflare services
- Understanding of the mortgage industry
- Advanced cloud security tools (e.g., GuardDuty, Security Hub)
- Disaster recovery strategy experience
- Experience with automation tools like Camunda
Certifications
- Required: AWS Certified Solutions Architect - Associate
- Preferred: AWS Certified DevOps Engineer - Professional
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.