Site Reliability Engineer- Hybrid Reston, VA

Overview

On Site

$100000 - $150000

Full Time

Skills

Ideation

Onboarding

Collaboration

Data Engineering

Expect

Provisioning

Continuous Integration and Development

Python

Jenkins

Open Source

Cloud Computing

Computer Science

Management

Big Data

Apache Hadoop

Apache Spark

Apache Kafka

Linux

Ansible

Docker

Kubernetes

Communication

SAP BASIS

Job Details

Site Reliability Engineer

The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support. This implies regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams. As part of the team, we expect the candidate to take ownership of the data platform, regularly interacting with the internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.

The company is located in Reston, VA and will be a hybrid model.

What You Will Be Doing:

Architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes
Automating cluster provisioning (CICD), scaling and monitoring using Ansible, Python and Jenkins
Participating in technical designs for software solutions that combine Open-Source, Commercial and custom developed components
Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry
Upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact
Troubleshooting complex issues in large and distributed environments.
Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments
Supporting data platform customers
Participating in the on-call rotation monitoring production systems and responding to incident

Required Skills & Experience:

Bachelor's degree in computer science or a related technical field, or equivalent combination of education and experience
5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid)
Excellent understanding of Linux configuration and administration
Strong automation experience - Not just developing automation, but knowing why we automate and what to automate
Strong understanding of infrastructure-as-code such as Ansible
Experience with Docker or Kubernetes in a production environment
Strong written and verbal communication skills - able to clearly and succinctly describe complex issues

Applicants must be currently authorized to work in the United States on a full-time basis now and in the future.
This position doesn't provide sponsorship.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Motion Recruitment Partners, LLC

Share