Overview
On Site
Full Time
Part Time
Accepts corp to corp applications
Contract - Independent
Contract - W2
Skills
IaaS
Incident Management
Terraform
Ansible
Collaboration
Software Engineering
Knowledge Sharing
Computer Science
Software Development
Python
Java
C++
Amazon Web Services
Google Cloud Platform
Google Cloud
Microsoft Azure
Network Security
Storage
Data Management
Linux
Computer Networking
File Systems
Cloud Computing
Dashboard
Continuous Integration and Development
Continuous Integration
Continuous Delivery
Automated Testing
Provisioning
Management
Root Cause Analysis
SANS
Service Level
Reliability Engineering
Job Details
Hiring: W2 Candidates Only
Visa: Open to any visa type with valid work authorization in the USA
Level: Mid to Lead positions
Key Responsibilities:
- Design, build, and maintain highly available, scalable, and secure cloud infrastructure on platforms such as AWS, Google Cloud Platform, or Azure.
- Develop and implement automation for provisioning, monitoring, scaling, and incident response using Infrastructure-as-Code tools (e.g., Terraform, CloudFormation, Ansible).
- Monitor system reliability, capacity, and performance; proactively detect and address issues before they impact users.
- Respond to production incidents, participate in on-call rotations, and lead post-incident reviews to drive root cause analysis and reliability improvements.
- Collaborate with software engineering and security teams to ensure new services and features are production-ready and meet reliability standards.
- Build and maintain tools for deployment, monitoring, and operations; automate manual processes to reduce toil.
- Document operational processes and system architectures to ensure knowledge sharing and repeatability.
- Continuously evaluate and implement new technologies to improve system reliability, security, and efficiency.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. 3+ years of experience in software development with proficiency in at least one programming language (e.g., Python, Go, Java, C++).
- Experience administrating cloud platforms (AWS, Google Cloud Platform, Azure), including networking, security, containerization, storage, data management, and serverless technologies.
- Solid understanding of Linux systems, networking fundamentals, virtualized, and distributed systems, file systems, system processes and configurations. Deep understanding of observability (monitoring, alerting, and logging) tools in cloud environments.
- Ability to set up and maintain monitoring dashboards, alerts, and logs. Familiarity with Continuous Integration/Continuous Deployment (CI/CD) tools for automated testing, deployments, provisioning, and observability.
- Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews.
- Understanding of setting, monitoring, and maintaining Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs) for system reliability.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.