Principal SRE

Overview

On Site

Full Time

Skills

Google Cloud

Software engineering

Change management

Reliability engineering

Systems design

Engineering design

Issue resolution

Data processing

Systems engineering

Performance testing

Service level management

Analytical skill

Shell scripting

Graph databases

RabbitMQ

Apache Mesos

IT service management

Data

Leadership

Automation

Operations

Mentorship

Design

Workflow

Collaboration

Orchestration

Specification

Management

Documentation

IaaS

PaaS

SaaS

Art

Cloud computing

Ansible

Microsoft Azure

Amazon Web Services

Regulatory Compliance

Planning

ProVision

Microservices

IMPACT

DevOps

API

Testing

Budget

Software deployment

Banking

Training

Linux

Unix

Writing

Python

Ruby

SQL

Database

PostgreSQL

MongoDB

Apache Cassandra

Streaming

Messaging

Apache Kafka

Apache HTTP Server

Kubernetes

Docker

Oracle Application Express

SAP BASIS

Law

Innovation

Recruiting

Job Details

Job#: 2007675

Job Description:
Summary - Principal SRE - Hybrid 3 days/week - Charlotte, NC - Phoenix, AZ - Dallas, TX - W2 Only, no C2C
*Candidate must be able to work on clients W2 without sponsorship now or in the future*
*Candidate must be able to commute to one of the following locations 3 days a week - Charlotte, NC - Phoenix, AZ - Dallas, TX*
The Principal Site Reliability Engineer (SRE) role on our dynamic SRE team is a subject matter expert and SRE professional, with key focuses in analyzing complex data and distributed systems, anticipating problems? and finding ways to mitigate risks to the environment.? Incorporating the knowledge of business drivers, the principal SRE will affect changes, will lead and drive the SRE charter with innovative improvements and facilitate best practices in using software engineering to enable automation and efficiency in all aspects of platform change management and operations. The main responsibilities include optimizing day-to-day activities to reliably support product roll out and operation through automation and mentoring other lead, senior and staff SRE toward adopting and implementing the devsecops culture.? As a principal SRE, the role will include both oversight for production operations and launch execution for major initiatives, as well as development/engineering of solutions to optimize system reliability.
You will identify opportunities to design, build and implement innovative solutions to solve unique platform and infrastructure problems in order to optimize product delivery and operations workflow and enhance platform production stability for the products. You will collaborate with other senior and lead team members within and outside of ITSO to evangelize the SRE mindset and system design toward optimizing the performance and availability of our environment.
Responsibilities

Lead the design, build and implement orchestration and tooling solutions to optimize workflows and tasks can be achieved at a high level of efficiency and free of defect?
Establish operational best practices for structuring, automating, building, deploying and monitoring complex distributed software products and environments.?
Collaborate with other engineering teams to ensure the reliability and traceability of software releases and deployments of software and infrastructure changes.?
Create and maintain platform operational engineering design specifications to aid the maintenance and smooth operation of software environments?
Collaborate with other engineering teams to triage alerts & diagnose/resolve critical issues, and manage implementation of changes.
Collaborate with other engineering teams in the coordination, documentation and tracking of critical incidents ensuring rapid and complete issue resolution and appropriate closed loop to customers and other key stakeholders.?
Lead, grow, mentor other SREs team members.
Evangelize SRE mindset and mentor others about reliability and best practices of SRE?
Maintain a strong understanding of IaaS, Paas, and SaaS offerings with building and maintaining a state-of-the-art, cloud-based environment for massive-scale data processing
Ensure that implementation and solution are fully documented, and solution deployed with fully operationalized processes to support the solution lifecycle
Other tasks as assigned

Minimum Requirements

Ability to read and write code in Ansible
10+ years of experience in System engineering or Software engineering
Advanced knowledge in at least 3 of the following key areas: Cloud native and IaaS Architecture (Azure preferred, will accept Google Cloud Platform/AWS) (performance testing, monitoring, operations), Design (compliance, security), Cloud Engineering (planning, provision), Containers Orchestration, Microservice architecture and engineering.
Strong understanding of business technology drivers and their impact on architecture and engineering design, performance and monitoring?
SME of Site Reliability Engineering (SRE) and DevOps philosophies, technologies, platforms and tools, SLA management, incident resolution, and automation.
API First design, implementation and testing experience.
Demonstrated ability to conceptualize, launch and deliver multiple engineering projects on time and within budget?
Demonstrated ability to understand and troubleshoot complex problems under pressure
Strong understanding of cloud native architecture and microservices design and deployment pattern
Strong data analytical skills
Banking industry experience a plus

Skills/Training Required

Expert level of Linux/Unix skills and shell scripting.
Minimum of 7 years of experience automating tasks, building cloud native software in microservice architecture style and writing tools in either Python, Go, or Ruby,? C#?
Excellent knowledge of at least 3 of leading SQL and No SQL database technologies (Postgres, MongoDB, Graph databases, Cassandra etc.)?
Hands-on advanced experience with implementing and supporting streaming and messaging technologies such as Rabbit MQ, Kafka, Apache Pulsar, Azure or Google Cloud Platform
Minimum of 7+ years of experience working with container orchestration platform, Kubernetes preferred but will look other platform also such Apache Mesos, Docker Swarm.
Expert knowledge of distributed tracing with hands on experience with implementing and operating any one of these: Jaeger, OpenTelemetry, Open Tracing, Zipkin
Expert knowledge of service mesh technologies - preferred Istio but will consider others including Linkerd, Consul

EEO Employer
Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at or .

EEO Employer

Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at or .

Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico.

Job Details

About Apex Systems

Share