SRE Engineer / DevOps Engineer / SRE/DevOps Engineer -Full Time Only

Overview

On Site

BASED ON EXPERIENCE

Full Time

Skills

DevOps

Management

Root Cause Analysis

Release Management

Capacity Management

Load Balancing

Microservices

Configuration Management

.NET

Java

Database

Network

Continuous Integration

Continuous Delivery

Linux

Unix

Change Management

Sprint

User Stories

Debugging

Knowledge Base

Documentation

SOP

Software Performance Management

Splunk

AppDynamics

Apache Kafka

Computer Networking

TCP/IP

SSL

TLS

IPsec

Virtual Private Network

Firewall

Scripting

Shell

Windows PowerShell

Python

Google Cloud Platform

Google Cloud

Amazon Web Services

Microsoft Azure

Cloud Computing

Job Details

Title: Site Reliability Engineer (SRE) /DevOps Engineer
Location: Austin, TX || Onsite (Client Round-In-person)
FTE

Job Summary -

Seasoned Site Reliability Engineer (SRE) with 5+ years of experience in supporting complex, large-scale distributed systems. Highly skilled in managing production failures, conducting root cause analysis, and driving effective remediation. Strong communicator with expertise in Ing, monitoring, and release management, complemented by automation proficiency and a keen ability to learn quickly.

Responsibilities: Onshore Lead role. Responsible for Delivery, Availability, latency, performance efficiency, Change Management, Monitoring, Observability, Emergency Response and Capacity Management. & Role

Years of experience needed -

Candidate experience - 5+ Years

Technical Skills:

Expertise in understanding large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management.
Skill : Net or Java or C#
Should have solid hands-on experience in troubleshooting and fixing application failures, application Performance degradation, Code issues, cloud platform issues, Batch Failures, Infra failures, DB failures, Network failures.
Hands-on experience in performing Production deployments using CI/CD and exposure to deployment strategies.
Experience in troubleshooting of Linux/Unix.
Monitor the application/Services/batch availability.
Act quickly on the application s(Performance, Availability) and Batch Job failures
Perform the required analysis (Code/Log) and escalate to the Engineering team as required.
Initiate and drive the Techlines in case of outages/major incidents/Batch abends and ensure Service Restoration in the least time possible.
Effectively handle the Incident, Problem, Release and Change management.
Own and deliver the user stories assigned as part of the sprint.
The user stories range from application code Debugging, Issue analysis, Code fix, Knowledge base creation, documentation of SOP s, Production Deployments, Pre & Post Patching/Maintenance activities, Service Requests.
Build monitoring solutions using APM tools like Splunk, Appdynamics, Thousand Eyes, ITRS, AppMetrics, MoogSoft, Kafka etc.
Automate of day-day operational tasks.
Be part of the Exit reviews to ensure the best practices are followed to have the right code deployed to Production systems
Provide feedback/recommend improvements to the system which would enable highly stable systems.
Strong understanding of Networking Concepts (TCP/IP, SSL/TLS, IPSec, VPN etc), Firewall and Load Balancers.
Experience in Scripting - Shell/Powershell/Python
Strong Experience in working with any Cloud-based infrastructure (PCF, Google Cloud Platform, AWS, Azure Cloud or others)

Thanks & Regards
Viplav Mandal

Tanisha Systems Inc.
99 Wood Ave South, Suite # 308, Iselin, NJ 08830

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share