Overview
On Site
Full Time
Skills
Real-time
Threat Analysis
COTS
Software Configuration
Microsoft Excel
Critical Thinking
Cyber Security
Continuous Improvement
Reliability Engineering
Scalability
Service Level
Root Cause Analysis
DevOps
Computer Science
Grafana
Splunk
Cloud Computing
Amazon Web Services
Scripting
Python
Bash
Nagios
Microservices
Orchestration
Docker
Kubernetes
Collaboration
Development Testing
Software Development
Conflict Resolution
Problem Solving
Analytical Skill
Attention To Detail
CISA
Agile
SAFE
Application Lifecycle Management
JIRA
Confluence
Continuous Delivery
Jenkins
CircleCI
GitLab
GitHub
Continuous Integration
Configuration Management
Ansible
Puppet
Progress Chef
Terraform
Network Security
System Administration
Linux
Operating Systems
Version Control
Disaster Recovery
Backup
Replication
SAP BASIS
Law
FOCUS
Job Details
Job Description
ECS is seeking a Site Reliability Engineer (SRE) - Senior to work in our Arlington, VA office. Please Note: This position is contingent upon contract award.
Program Description
ECS is seeking talented professionals to join our successful and growing team in building the next-generation Threat Intelligence Enterprise Service (TIES) solution. The TIES Program is the Cybersecurity and Infrastructure Security Agency's (CISA) dynamic approach to fulfilling its federally mandated cyber information sharing responsibilities and ensuring real-time automated threat intelligence reaches key security partners. The TIES product is an integrated suite of multiple Commercial Off the Shelf (COTS) products, software configuration packages, and custom code which work together to operate as an integrated solution tailored to meet CISA requirements.
We seek driven professionals who excel in a dynamic, fast-paced, and highly collaborative environment, where critical thinking, problem-solving, and a mission-focused approach are essential. A passion for continuous learning, improvement, and cybersecurity is vital.
As a small team committed to radically improving government, every member directly shapes ECS's direction and success. We take pride in our stewardship, holding deep responsibility for the solutions we develop. Collaboration is at the heart of our work-both within our team and alongside our federal partners.
Role & Responsibilities:
ECS is seeking a Site Reliability Engineer (SRE) - Senior to play a key role in defining, implementing the SRE requirements for the TIES program to ensure the reliability, availability, and performance of our critical production environments.
The Senior SRE will contribute to a culture of continuous improvement, identifying areas for enhancement, and driving initiatives to improve system reliability, scalability, and efficiency.
The successful candidate will have demonstrated hands-on experience designing, implementing, and maintaining solutions to ensure that systems, including infrastructure and applications, are resilient, highly available, and performant. The Senior SRE will also play a critical role in defining and measuring the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our solution.
The Senior SRE will be responsible for setting up comprehensive logging, monitoring, and alerting solutions using the Elastic stack and other tools as necessary to ensure the continuous performance of services. Additionally, they will respond to incidents, perform root cause analyses, and implement solutions to prevent reoccurrences. The Senior SRE will work in close collaboration with other SRE team members, developers, testers, infrastructure engineers, DevOps engineers, and other stakeholders to integrate reliability and observability into the software development lifecycle.
Required Skills
Desired Skills
ECS is an equal opportunity employer and does not discriminate or allow discrimination on the basis any characteristic protected by law. All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, or local jurisdiction law.
ECS is a leading mid-sized provider of technology services to the United States Federal Government. We are focused on people, values and purpose. Every day, our 3800+ employees focus on providing their technical talent to support the Federal Agencies and Departments of the US Government to serve, protect and defend the American People.
ECS is seeking a Site Reliability Engineer (SRE) - Senior to work in our Arlington, VA office. Please Note: This position is contingent upon contract award.
Program Description
ECS is seeking talented professionals to join our successful and growing team in building the next-generation Threat Intelligence Enterprise Service (TIES) solution. The TIES Program is the Cybersecurity and Infrastructure Security Agency's (CISA) dynamic approach to fulfilling its federally mandated cyber information sharing responsibilities and ensuring real-time automated threat intelligence reaches key security partners. The TIES product is an integrated suite of multiple Commercial Off the Shelf (COTS) products, software configuration packages, and custom code which work together to operate as an integrated solution tailored to meet CISA requirements.
We seek driven professionals who excel in a dynamic, fast-paced, and highly collaborative environment, where critical thinking, problem-solving, and a mission-focused approach are essential. A passion for continuous learning, improvement, and cybersecurity is vital.
As a small team committed to radically improving government, every member directly shapes ECS's direction and success. We take pride in our stewardship, holding deep responsibility for the solutions we develop. Collaboration is at the heart of our work-both within our team and alongside our federal partners.
Role & Responsibilities:
ECS is seeking a Site Reliability Engineer (SRE) - Senior to play a key role in defining, implementing the SRE requirements for the TIES program to ensure the reliability, availability, and performance of our critical production environments.
The Senior SRE will contribute to a culture of continuous improvement, identifying areas for enhancement, and driving initiatives to improve system reliability, scalability, and efficiency.
The successful candidate will have demonstrated hands-on experience designing, implementing, and maintaining solutions to ensure that systems, including infrastructure and applications, are resilient, highly available, and performant. The Senior SRE will also play a critical role in defining and measuring the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our solution.
The Senior SRE will be responsible for setting up comprehensive logging, monitoring, and alerting solutions using the Elastic stack and other tools as necessary to ensure the continuous performance of services. Additionally, they will respond to incidents, perform root cause analyses, and implement solutions to prevent reoccurrences. The Senior SRE will work in close collaboration with other SRE team members, developers, testers, infrastructure engineers, DevOps engineers, and other stakeholders to integrate reliability and observability into the software development lifecycle.
Required Skills
- ship with ability to obtain Public Trust Suitability
- Bachelor's degree in computer science, Engineering, or a related field (or 4 additional years of related experience)
- 6+ years of experience as a Site Reliability Engineer (SRE) or equivalent
- 6+ years of demonstrated experience designing, implementing, and maintaining observability solutions to include logging, monitoring, and alerting
- 6+ years of hands-on experience with SRE tools (e.g., Elastic, Prometheus, Grafana, Splunk, etc.)
- 3+ years defining and measuring SLOs and SLIs
- 3+ years of relevant experience using cloud platforms (AWS GovCloud preferred)
- 3+ years of hands-on programming or scripting (e.g., Python, Bash, etc.)
- Experience with enterprise monitoring tools (e.g. AppDyanmics, ScienceLogic, Nagios, etc.)
- Strong knowledge of microservices, containerization, and orchestration tools (Docker, Kubernetes)
- Proven ability to collaborate with cross-functional teams (development, testing, and product) to integrate reliability and observability into the software development lifecycle
- Strong problem-solving and analytical skills
- Proactive, detail-oriented approach to identifying inefficiencies and implementing improvements
Desired Skills
- Prior DHS CISA mission experience
- Experience working in an Agile/SAFe environment using ALM tools (Jira, Confluence, or similar)
- Strong understanding of CI/CD principles and platforms (Jenkins, CircleCI, GitLab, GitHub Actions, Argo, Travis CI, etc.)
- Expertise in configuration management tools (Ansible, Puppet, Chef)
- Experience with infrastructure as code (Terraform, CloudFormation)
- In-depth understanding of networking, security, and system administration of Linux operating systems
- Knowledge of version control platforms and branching strategies
- Knowledge of disaster recovery planning, backup strategies, and data replication
ECS is an equal opportunity employer and does not discriminate or allow discrimination on the basis any characteristic protected by law. All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, or local jurisdiction law.
ECS is a leading mid-sized provider of technology services to the United States Federal Government. We are focused on people, values and purpose. Every day, our 3800+ employees focus on providing their technical talent to support the Federal Agencies and Departments of the US Government to serve, protect and defend the American People.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.