Electronic Consulting Services, Inc (ECS Federal)

Site Reliability Engineer (SRE) / Operations Engineer

ARLINGTON, VA, US • Posted 1 hour ago • Updated 1 hour ago

Full Time

On-site

USD $145,000.00 - 180,000.00 per year

Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

Bridging
Software Engineering
IT Operations
Root Cause Analysis
Recovery
Operational Efficiency
Release Management
Collaboration
Scalability
Standard Operating Procedure
System Documentation
Management
Capacity Management
Performance Tuning
Regulatory Compliance
Continuous Improvement
Operational Risk
Computer Science
Information Technology
Information Systems
Cloud Computing
Splunk
Dynatrace
Amazon Web Services
Configuration Management
Satellite
Red Hat Linux
Continuous Integration
Continuous Delivery
DevOps
GitLab
Computer Networking
Orchestration
Kubernetes
Linux
Unix Administration
Scripting
Python
Bash
Reliability Engineering
Service Level
Budget
Incident Management
Agile
DevSecOps
Communication
SAP BASIS
Law
FOCUS

Summary

Job Description

ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer to work in our Arlington, VA office / remote .

ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer who is responsible for ensuring the reliability, availability, performance, and operational efficiency of enterprise applications and supporting infrastructure. This role bridges software engineering and IT operations by applying engineering practices, automation, and monitoring to maintain stable systems and rapidly resolve operational issues. The SRE/Ops Engineer works closely with development, security, and platform teams to support system deployments, manage incidents, improve observability, and implement resilient architectures that support continuous delivery and mission-critical operations.

Responsibilities

Maintain the reliability, availability, and performance of production systems and cloud-based services.

Monitor system health using observability tools (metrics, logs, and tracing) and respond to alerts and incidents.

Participate in incident response, troubleshooting, and root cause analysis to restore service and prevent recurrence.

Implement automation and infrastructure-as-code to improve operational efficiency and reduce manual intervention.

Support deployment pipelines and release management processes to enable reliable and repeatable software delivery.

Collaborate with development teams to improve application resiliency, scalability, and operational readiness.

Develop and maintain operational runbooks, standard operating procedures, and system documentation.

Manage system capacity planning, performance tuning, and scaling strategies.

Ensure systems comply with security, compliance, and organizational operational standards.

Contribute to continuous improvement initiatives by identifying opportunities to reduce operational risk and technical debt.

Salary Range: $145,000 - $180,000

General Description of Benefits

Required Skills

U.S. Citizenship
Ability to obtain at minimum a Public Trust suitability designation.

Bachelor's degree in Computer Science , Engineering, Information Technology, Information Systems, or a related field

Minimum of seven (7) years of related experience

Desired Skills

Experience supporting production systems in cloud or hybrid environments (e.g., AWS).
Proficiency with monitoring and observability tools (e.g., Splunk, Dynatrace, AWS Red Hat Console ).

Experience with infrastructure automation and configuration management tools (e.g., Red Hat Satellite Server, Red Hat Open Shift 4 ).

Familiarity with CI/CD pipelines and DevOps practices using tools such as GitLab.

Strong troubleshooting skills across application , infrastructure , and networking layers.

Experience with containerization and orchestration technologies (e.g., Kubernetes).

Knowledge of Linux/Unix system administration and scripting (e.g., Python, Bash, or similar).

Understanding of reliability engineering principles such as service level objectives (SLOs), error budgets, and incident management.

Ability to work collaboratively in cross-functional teams supporting Agile or DevSecOps environments.

Strong written and verbal communication skills to document processes and coordinate during operational events.

#ECS1

ECS is an equal opportunity employer and does not discriminate or allow discrimination on the basis any characteristic protected by law. All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, or local jurisdiction law.

ECS is a leading mid-sized provider of technology services to the United States Federal Government. We are focused on people, values and purpose. Every day, our 3200+ employees focus on providing their technical talent to support the Federal Agencies and Departments of the US Government to serve, protect and defend the American People.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10112MAN
Position Id: 3472
Posted 1 hour ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Washington, District of Columbia

•

Today

At Accenture Federal Services, nothing matters more than helping the US federal government make the nation stronger and safer and life better for people. Our 13,000+ people are united in a shared purpose to pursue the limitless potential of technology and ingenuity for clients across defense, national security, public safety, civilian, and military health organizations. Join Accenture Federal Services, a technology company within global Accenture. Recognized as a Glassdoor Top 100 Best Place to

Full-time

USD 146,100.00 - 278,500.00 per year

Senior Platform Engineer

Reston, Virginia

•

Today

TransUnion's Job Applicant Privacy Notice Personal Information We Collect Your Privacy Choices What We'll Bring: At TransUnion, we have a welcoming and energetic environment that encourages collaboration and innovation we're - consistently exploring new technologies and tools to be agile. This environment gives our people the opportunity to hone current skills and build new capabilities, while discovering their genius. Come be a part of our team - you'll work with great people, pioneering produ

Full-time

USD 78,750.00 - 131,250.00 per year

Site Reliability Engineer - CTJ - POLY

Reston, Virginia

•

Today

Overview Microsoft has an exciting opportunity for a Senior Site Reliability Engineer (SRE) to join the Azure Silver and Sovereign Team as part of the Azure Data Transfer (ADT) team. Azure Data Transfer enables secure access and data transfer between enclaves and supports multiple transfer and access patterns for highly regulated industries. In this role, you will apply SRE principles-availability, latency, performance, efficiency, change management, and incident response-to help ensure ADT is

Full-time

USD 119,800.00 - 234,700.00 per year

Network Operations Center Manager

Arlington, Virginia

•

Today

Job ID: T2601887 Location: Arlington, VA, US Date Posted: 2026-03-05 Category: Information Technology Subcategory: Network Engineer Schedule: Full-Time Shift: Day Job Travel: No Minimum Clearance Required: None Clearance Level Must Be Able to Obtain: Public Trust Potential for Remote Work: ORA_HYBRID Description ? SAIC is seeking a Network Operations Center (NOC) Lead to oversee monitoring and event management operations to ensure seamless 24x7x365 support of the IT infrastructure. Th

Full-time

Search all similar jobs

Site Reliability Engineer (SRE) / Operations Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs