Overview
Remote
Depends on Experience
Accepts corp to corp applications
Contract - Independent
Contract - W2
Skills
Production Support
Site Reliability Engineer
Dashboard
Observability
Monitoring
Network
AppDynamics
Splunk
Grafana
Firewall
Docker
Kubernetes
Containerization
AWS
Azure
Wireshark
Job Details
Role: Site Reliability Engineer
Location: 100% Remote
Type: Contract Position
Experience Required: 12+ years
Job description:
Production support expertise with SRE Observability experience :
- Proactive issue identification using observability tools.
- Skills in using different monitoring & observability tools to track system performance
- Production support activities including proactive identification of issues leveraging observability tools, Corelating inputs from various dashboards & tools to drive resolution
- Experience in swiftly identifying probable failure points through the analysis of multiple inputs from the logs, observability dashboards, recent application changes, infra, network changes etc.
- Basic level of trouble shooting on every layer of the tech stack (Application, Database, infra (Container platforms) and Network)
Communication :
Excellent communicator. They are also expected to actively lead and triage proactively identified issues/incidents where VPs/SVPs are also present in this call.
Flexibility to work in 24 X 7 environment
Technical expertise:
- Analysis of issues via Splunk (including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes
- Debugging of issues in VMs, Load balancers, Firewalls, API Gateways, DB, Network, Linux / Unix
- Debugging of issues in Containerization, Docker, Kubernetes, AWS, PCF, Azure
- Analysis of issues via APM, NMON , Wireshark usage and analysis
- Experience in UEM and synthetic monitoring set up
Optional skills:
- ServiceNow (including AIOps, tools for Self-Heal and automated playbooks)
- Development experience in some of the technologies -Java, Python, AWS, Azure, Oracle, Cassandra, SQL Server, My SQL and Mongo DB
Detailed Job Description:
- System Administration: Strong knowledge of infrastructure, including command-line tools and system internals.
- Networking: Understanding of network protocols, configurations, and troubleshooting.
- Computing: Experience with cloud understanding, including cloud architecture (on-perm and public) and services.
- Application Management: Familiarity with continuous integration and continuous deployment processes and tools.
- Monitoring and Observability: Skills in using monitoring tools to track system performance and detect issues including all the backend systems, database, and API's
- Problem-Solving: Ability to diagnose and resolve complex issues quickly and efficiently
- Collaboration: Strong communication skills to work effectively with cross-functional teams
- Adaptability: Flexibility to handle changing priorities and technologies
- Attention to Detail: Precision in managing configurations and deployments to avoid errors
Required Skills & Qualifications:
- 8+ years of experience as a Site Reliability Engineer and experience with Telecom domain is must.
- Strong proficiency in Java and Go (must-have); experience with Python is a plus.
- Hands-on experience with Kubernetes, CI/CD, containerization, and service mesh.
- Expertise in observability tools: Splunk, AppDynamics, or similar platforms.
- Experience with cloud platforms AWS and Azure is must.
- Solid background in software engineering and application architecture.
- Experience in application performance monitoring, scalability, and high availability.
- Knowledge of telecom systems and domain-specific challenges is highly preferred.
- Strong problem-solving and debugging skills across distributed systems.
- Excellent communication and collaboration abilities in a remote work environment.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.