Overview
On Site
Depends on Experience
Contract - W2
Skills
DevOps
Amazon Web Services
Authentication
Google Cloud
Microservices
Microsoft Azure
JIRA
Kubernetes
Docker
Dashboard
Continuous Integration
Google Cloud Platform
Orchestration
Python
Scalability
Scripting
ServiceNow
Splunk
Terraform
Virtual Machines
Continuous Monitoring
Cloud Computing
Environment Management
Grafana
IT Service Management
Artificial Intelligence
Job Details
Responsibilities:
- Extensive experience with IT infrastructure, cloud platforms (AWS, Azure, Google Cloud Platform), and modern DevOps/SRE methodologies.
- Hands-on expertise with monitoring and observability tools: Grafana, Prometheus, Splunk.
- Familiarity with ITSM and operational tools such as ServiceNow and OpsRamp.
- Experience with project and incident tracking tools like JIRA.
- Proficiency in scripting and automation using Python, Bash, Terraform, Ansible.
- Strong understanding of CI/CD pipelines, containerization (Docker), and container orchestration (Kubernetes).
- Performs environment management, automated server provisioning, pipeline configuration (VMs).
- Delivers software to improve the availability, scalability, latency, and efficiency of Client services.
- Creates, manages, and uses dashboard for continuous monitoring and health check of applications, and the underlying infrastructure, improve the quality of services using the monitoring feedback for nonproduction environment.
- Contributes in future improvement of software delivery processes and operations, e.g., cloud enablement, use of microservices with containerization.
- Integrate Dynatrace with CI/CD pipelines, alerting tools, ITSM systems, and incident automation frameworks.
- Tune alert thresholds, baselines, and AI-driven anomaly detection to reduce noise and improve actionable insights.
- Deeper understanding of Login authentication mechanisms using Ping, ForgeRock and SiteMinder technologies (session management and cookie management)
- Correlation mechanisms and dashboards to have end to end visibility of requests from external to internal applications.
- Evangelize SRE evolution within IT operations and promoting a culture of engineering excellence and best practices.
- Define best practices and principles for SRE, including incident management, monitoring, alerting, and automation.
- Collaborate with development teams on resiliency to ensure that services and applications are designed with operational reliability in mind.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.