Overview
Skills
Job Details
Sr. Site Reliability Engineer
Location : REMOTE , US
Visa - USC
Sr. Site Reliability Engineer - ensure system reliability, scalability, and automation by implementing DevOps best practices and optimizing CI/CD pipelines in large-scale production environments. Design, implement, and maintain scalable, reliable, and secure infrastructure supporting mission-critical applications and services.
Lead end-to-end CI/CD pipeline automation using Jenkins, GitLab, Bitbucket, and GitHub Enterprise ensuring seamless integration and delivery processes.
Manage and optimize artifact repositories and security scanning with JFrog Artifactory and Xray.
Collaborate with development, QA, and operations teams to ensure high availability, performance, and reliability of services.
Drive continuous improvement initiatives, automate repetitive tasks, and mentor junior engineers.
Provide after-hours support for production releases and participate in on-call rotations as needed.
Document processes, runbooks, and architectural decisions to ensure knowledge sharing and operational transparency.
What Required Skills You'll Bring:
10+ years of experience in DevOps, Site Reliability Engineering, or related roles in large-scale production environments.
Proficiency with developer tools like Atlassian suite, Jenkins, GitLab, GitHub, SonarQube, and JFrog.
Strong scripting abilities (Python, Bash, or similar) and Infrastructure as Code (Terraform, CloudFormation, Ansible) for automation and tooling.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform)
Solid understanding of containerization (Docker, Kubernetes) and microservices deployment.
What Desired Skills You'll Bring:
Bachelor s or master s degree in computer science, Engineering, or the equivalent job-related experience.
AI experience would be preferred (GitHub Copilot, OpenAI, MCP, Agentic) but not a requirement.
Strong analytical and problem-solving skills with a passion for automation, reliability, and continuous improvement.
Ability to troubleshoot complex production issues and lead incident response efforts