Lead-Site Reliability Engineer

  • Posted 8 hours ago | Updated 8 hours ago

Overview

Remote
$100,000 - $120,000
Full Time

Skills

docker
devops
SLA

Job Details

Senior Lead-Site Reliability Engineer

Location: Remote

Duration: Full time

Proven experience in Technical project management with leading and managing DevOps/Agile projects, ensuring they are delivered on time, within scope, and on budget.

Ensure high customer connect while building processes for all relevant team members to engage with the customer.

Collaborate with stakeholders to define, measure, and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).

Work closely with cross functional teams to plan, design, and implement reliability improvements and automation initiatives.

Facilitate post-incident reviews (PIRs), ensuring action items are identified and followed through.

Ensure that development, staging, and production environments are correctly configured.

Drive initiatives to automate manual tasks and improve system observability and monitoring. Facilitate knowledge sharing across teams to ensure best practices are followed and operational knowledge is captured.

Ensure all dependencies (libraries, services, etc.) are installed and compatible. Compile the code and create build artifacts.

Use containerization (e.g., Docker) to package applications with their dependencies. Ensure compliance with organizational security policies.

Decide on a deployment strategy (e.g., blue-green deployment, canary releases, rolling updates). Define rollback procedures in case the deployment fails.

Ensure changes are reviewed, approved, and documented.

Use CI/CD pipelines (e.g., Jenkins, GitLab CI, CircleCI) to automate the build, test, and deployment process.

Ensure tests (unit, integration, end-to-end) are passing before deploying to production.

Implement automated rollback mechanisms to revert to the previous stable version in case of a failed deployment.

Ensure load balancers are properly configured to distribute traffic evenly across instances.

Maintain up-to-date deployment playbooks, runbooks, and architecture diagrams. Continuously refine deployment processes based on this feedback.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Kodeva LLC