Site Reliability Engineer

Overview

On Site
USD 175,000.00 - 200,000.00 per year
Full Time

Skills

IaaS
Reliability Engineering
Microsoft Azure
Scalability
Management
High Availability
Performance Tuning
Design Patterns
Root Cause Analysis
Delegation
Technical Drafting
Mentorship
Requirements Elicitation
Stakeholder Engagement
DevOps
Amazon Web Services
Cloud Computing
Amazon EC2
Scripting
GitLab
Jenkins
Terraform
Dynatrace
Microservices
Docker
Kubernetes
Analytical Skill
Problem Solving
Communication
Collaboration
Computer Science
Financial Services
Asset Management
Capital Market
Python
Django
Pandas
NumPy
SQL
Regulatory Compliance
Optimization
Continuous Integration
Continuous Delivery
Incident Management
Documentation
Quality Assurance
MEAN Stack
Customer Service
Training And Development
SAP BASIS

Job Details

Software Guidance & Assistance, Inc. (SGA) is searching for a Site Reliability Engineer (SRE) for a Direct Placement assignment with one of our premier financial services clients in mid-town New York City. Hybrid schedule, 2-3 days onsite/week.

The firm is seeking a hands-on and analytical Site Reliability Engineer to design, build, and maintain reliable, secure, and scalable cloud infrastructure and observability systems. The ideal candidate will have deep experience in AWS, Python, CI/CD automation, and monitoring frameworks, supporting mission-critical applications in a modern cloud ecosystem. This role plays a key part in enhancing system reliability, performance, and resilience across enterprise platforms.

Responsibilities
  • Architect, deploy, and maintain AWS and Azure infrastructure focused on reliability, scalability, and cost efficiency.
  • Design and manage monitoring, logging, and alerting systems to ensure high availability and rapid incident response.
  • Build and maintain CI/CD pipelines (GitLab, Jenkins) for continuous software delivery and automation.
  • Implement and maintain Infrastructure as Code (IaC) using CDK, Terraform, or CloudFormation.
  • Collaborate with development teams to improve deployment processes and production reliability.
  • Contribute to application codebase for resiliency, performance tuning, and observability best practices.
  • Maintain detailed documentation for architectures, design patterns, and configurations.
  • Partner with Dev, QA, and AppSecOps teams to promote automation, consistency, and improved security posture.
  • Perform incident triage, root cause analysis, and develop permanent solutions to production issues.
  • Continuously improve standards, tools, and processes for platform reliability and efficiency.


Work Allocation (Approximate)
  • 60 % - Hands-on development, automation, and operations
  • 25 % - Technical design, architecture, and mentoring
  • 15 % - Collaboration, requirements gathering, and stakeholder engagement


Required Skills
  • 8+ years of hands-on experience in Site Reliability, DevOps, or Platform Engineering roles.
  • Strong experience with AWS Cloud Services (ECS, EC2, Lambda, IAM, CloudWatch, etc.).
  • Proficiency in Python for automation, scripting, and infrastructure integration.
  • Solid understanding of CI/CD pipelines using GitLab or Jenkins.
  • Hands-on experience with Infrastructure as Code (CDK, Terraform, or CloudFormation).
  • Expertise in monitoring and observability tools (Datadog, Dynatrace, ELK).
  • Working knowledge of microservices, serverless architectures, and containerization (Docker, ECS, Kubernetes).
  • Strong analytical, troubleshooting, and problem-resolution skills.
  • Excellent communication and collaboration abilities in cross-functional teams.
  • Bachelor's degree in Computer Science or related field (or equivalent experience).


Preferred Skills
  • Experience in financial services, asset management, or capital markets environments.
  • Familiarity with Python/Django and data libraries (Pandas, NumPy, SQL).
  • Knowledge of security, compliance, and cost optimization best practices.
  • Proven ability to identify and implement process and automation improvements.


Initial Success Criteria (First 6 Months)
The successful candidate will demonstrate measurable impact by:
  • Implementing or optimizing monitoring and alerting systems to improve visibility and response time.
  • Playing a key role in modernizing CI/CD pipelines to enhance automation and release consistency.
  • Contributing hands-on code and tooling improvements that increase platform reliability and performance.
  • Establishing incident response processes and documentation for key production systems.
  • Building strong working relationships across engineering, QA, and AppSecOps teams to promote shared reliability ownership

SGA is a technology and resource solutions provider driven to stand out. We are a women-owned business. Our mission: to solve big IT problems with a more personal, boutique approach. Each year, we match consultants like you to more than 1,000 engagements. When we say let's work better together, we mean it. You'll join a diverse team built on these core values: customer service, employee development, and quality and integrity in everything we do. Be yourself, love what you do and find your passion at work. Please find us at .

SGA is an Equal Opportunity Employer and does not discriminate on the basis of Race, Color, Sex, Sexual Orientation, Gender Identity, Religion, National Origin, Disability, Veteran Status, Age, Marital Status, Pregnancy, Genetic Information, or Other Legally Protected Status. We are committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities in employment, and our services, programs, and activities. Please visit our company to request an accommodation or assistance regarding our policy.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Software Guidance & Assistance