Overview
On Site
Hybrid
$60,000 - $70,000
Contract - W2
Contract - 12 Month(s)
Skills
Site Reliability Engineering
Release Engineering
DevOps
automated deployment strategies
Job Details
Cloud DevOps Engineer/ Senior Reliability Engineer 5+yrs ( W2) - Remote
Pref : Locals in California
RESPONSIBILITIES:
- Ensure operational stability, availability, performance, and scalability of cloud-hosted systems across production and development environments supporting multiple agile teams
- Provide real-time monitoring, alerting, incident response, and health checks for infrastructure and applications across all cloud layers (OS, app, DB)
- Implement and maintain dashboards, visualizations, and reports for system health, event management, and cost optimization using native CSP tools
- Manage cloud resource thresholds and automate capacity planning, forecasting, and resource optimization strategies
- Perform incident and event management (SIEM) operations, and support issue diagnosis, resolution, and reporting including RCA documentation
- Track, document, and report monthly issues, including system performance, stability, ticket volumes, and time-to-resolution metrics
- Monitor resource utilization (CPU, memory, disk space) across all deployed VMs, containers, and PaaS components
- Contribute to the implementation of the Enterprise FinOps framework, including forecasting, budget control, and right-sizing analysis
- Support deployment automation and ensure systems are resilient, repeatable, and scalable via Infrastructure as Code (IaC)
- Integrate operations with DevSecOps, MLOps, and CI/CD pipelines for seamless deployment and management
- Execute daily or agreed frequency system health checks and maintain operational Runbooks and SOPs
REQUIRED EXPERIENCE & QUALIFICATIONS:
- Bachelor's degree in Computer Science, Software Engineering, or related field
- 5 + years experience in IT system engineering, systems development, systems coding and programming
- Deep expertise with AWS, Azure, or Google Cloud Platform services, including monitoring, logging, compute, storage, and networking
- Proficiency in Infrastructure as Code (IaC) tools like Terraform, AWS CloudFormation, or Azure Bicep
- Hands-on experience with monitoring and APM tools such as CloudWatch, Azure Monitor, Datadog, Prometheus, Grafana, New Relic, etc.
- Solid understanding of incident response, change management, and ITIL-based operational support
- Familiarity with CI/CD toolchains and automation platforms (Jenkins, GitHub Actions, GitLab, ArgoCD)
- Strong scripting skills (Python, PowerShell, Bash) for automation and orchestration
- Advanced experienced in providing DevSecOps implementation using GitOps, or similar tools
- Experienced in developing, testing, and maintaining containerized applications
- Expert knowledge of source version control, build/release tools and methodologies, CI/CD pipelines and the Software Build process for large enterprises that consists of a large number of complex applications
- Experience with FinOps practices, cost modeling, forecasting, and optimization tools within cloud platforms
- Understanding of federal compliance and security frameworks (e.g., FedRAMP, NIST, JISF Rev 5)
- ITIL, AWS SysOps, or Google Professional Cloud DevOps Engineer certifications are a plus
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.