Overview
On Site
100k - 150k
Full Time
Skills
Ideation
Onboarding
Collaboration
Data Engineering
Expect
Provisioning
Continuous Integration and Development
Python
Jenkins
Open Source
Cloud Computing
Computer Science
Management
Big Data
Apache Hadoop
Apache Spark
Apache Kafka
Linux
Ansible
Docker
Kubernetes
Communication
SAP BASIS
Job Details
Site Reliability Engineer
The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support. This implies regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams. As part of the team, we expect the candidate to take ownership of the data platform, regularly interacting with the internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.
The company is located in Reston, VA and will be a hybrid model.
What You Will Be Doing:
This position doesn't provide sponsorship.
The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support. This implies regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams. As part of the team, we expect the candidate to take ownership of the data platform, regularly interacting with the internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.
The company is located in Reston, VA and will be a hybrid model.
What You Will Be Doing:
- Architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes
- Automating cluster provisioning (CICD), scaling and monitoring using Ansible, Python and Jenkins
- Participating in technical designs for software solutions that combine Open-Source, Commercial and custom developed components
- Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry
- Upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact
- Troubleshooting complex issues in large and distributed environments.
- Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments
- Supporting data platform customers
- Participating in the on-call rotation monitoring production systems and responding to incident
- Bachelor's degree in computer science or a related technical field, or equivalent combination of education and experience
- 5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid)
- Excellent understanding of Linux configuration and administration
Strong automation experience - Not just developing automation, but knowing why we automate and what to automate - Strong understanding of infrastructure-as-code such as Ansible
- Experience with Docker or Kubernetes in a production environment
- Strong written and verbal communication skills - able to clearly and succinctly describe complex issues
This position doesn't provide sponsorship.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.