Role: Sr. DevOps Engineer Observability
Location: Plano, TX - possible extension and/or conversion to full time
Duration: 12 months
Software Development: Writing code (Python) for network tools, protocols, automation scripts (Ansible), and managing APIs.
Network Design & Implementation: ensuring optimal performance and security.
Automation: Creating self-service tools and scripts to simplify network management and reduce manual intervention.
Troubleshooting: Diagnosing complex issues at the intersection of software and hardware.
Skills that are a must:
Observability / New Relic / Python / AWS (ECS) / Splunk
Role Description:
This manager just took over this team recently, and needs someone that has done
60% observability, 20% triage 20% automation (mostly python),
Responsible for the stability and reliability of internal applications and auto finance applications. There will be no training on the tools, just env and any customizations they have. This professional should be productive after a few days on the job.
Heavy focus on Observability: working with automation for failover and then triage work will come. You should be able to look at the logs and connect the dots.
The triage work is mostly for low severity incidents
Exp in working with dashboards and finding and creating alerts
This is an 8-5 Day time role m-f, but may require some off-hours work as needed
Example of troubleshooting needed: an application is failing, go into the AWS console and find out why it is failing. Dig into the logs after an alert is triggered and triage.
Should have at least 2-3 years of exp doing this and 1-2 years of observability
7 people on his team, this will be one of the more sr level observability eng s on the team.
Industry exp not as important
LOB: Supporting Auto finance and all internal applications