Hybrid: Hybrid
Duration: Contract to Hire
Interview Process:
- 1st: Zoom
- 2nd: Onsite
Bachelor's Degree: Yes
Industry Background: Plus
Years of experience: 2 - 5 years exp
Shifts:
Morning: 8am - 5:00pm
Evening: 12:30 - 8:am
o Weekend: OnCall (Remote)
Must:
Linux and Unix Hands on Experience!
o Needs to be comfortable troubleshooting at OS Level
Production Support Exp
o Incident Handling, Debugging live systems and on-call experience
Scripting/Automation experience within Python
o Not a Software Engineer but most be able to automate repetitive tasks
Comm skills MUST BE THERE!
o Will be working hands on with Dev Teams and the business
Service now Experience from a ticketing perspective
Knowledge of ITIL Principles
Plus:
Good understanding of Java, GO, C++, Scala etc
Grafana and snowflake
Any Cloud Exp
Agentic AI background or knowledge
Description
Must haves:
Good hands on Linux experience and SQL
Knowledge for Python (more from a reading code perspective opposed from scripting)
This is a pipeline requisition to be used for all open consultant positions for Reliability & Production Engineering (RPE) in Utah. Please submit candidates who satisfy requirements for either Production Support or System Reliability Engineering (SRE) roles.
See below for detailed job descriptions:
1- PRODUCTION SUPPORT ANALYST
As Production Support Analyst, your responsibilities will include, but not be limited to: - Monitoring for and resolving issues across the entire tech stack: hardware, software, application and network. A majority of your time will be devoted to production support activities.- Working closely with engineering/development teams to address repetitive issues, reduce operational effort and the likelihood of future service disruptions.- Partnering with business users and other technology teams to manage significant events such as business continuity/disaster recovery tests, IPOs, stock splits, and major infrastructure changes. - Defining and refining standard operating procedures for everything from monitoring to troubleshooting complex code and infrastructure issues.- Identifying and driving opportunities to improve platform supportability through automation.- Advocating for reliability priorities in application design reviews and operational readiness exercises for new and existing services.- Participating in weekend and off hours on-call rotation.- Collaborating and striving to understand business users? needs and problems.
Qualifications External
What skills and experience do I need? You should apply if you have at least a Bachelors degree in Computer Science or other technical discipline(s), plus hands-on experience with any combination of the following: - 3-5+ years practical experience in production systems support or application development.- Hands on experience managing systems in a large scale distributed Unix/Linux environment is essential.- Effective communicator who is comfortable speaking in front of both internal/external groups as well as business clients- Demonstrated ability to troubleshoot problems and debug to conclusively identify root causes- Knowledge of ITIL Principles. ITIL certification is a plus.- Knowledge of Unix/Linux operating system level concepts such as processes, memory allocation, and networking, with an understanding of how applications are affected by these, and ability to debug and troubleshoot accordingly.- Automation-related experience is particularly valued, using scripting languages such as Python, bash, Perl, and/or Ruby. Higher-level compiled languages such as C++, C#, JAVA, Scala, and Go are a big plus.- Working ability to interact with message transport platforms and protocols (MQ, CPS, XML, FIX) and distributed database technologies (DB2, Sybase, Mongo, GreenPlum, Postgres, KDB).- Autosys scheduling and batch processing concepts- Experience with source code and binary repositories, build tools, and CI/CD (Git, Artifactory, Jenkins, Docker) etc and data streaming technologies like Spark, Kafka etc.- Hands on experience on enterprise tools set such as Grafana, Splunk, Dynatrace, AppDynamics, etc.- Awareness of, and ability to reason through modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes generally, micro services, etc.
2- SYSTEM RELIABILITY ANALYST
As a System Reliability Analyst, your responsibilities will include, but not be limited to:
- Working closely with engineering/development teams to design, build, optimize, and maintain systems.
- Troubleshooting issues across the entire technology stack: hardware, software, application, and network.
- Aggressively targeting toil and operational risk, and deploying solutions to reduce these.
- Broadening infrastructure and application observability.
- Proactively identifying and addressing active or potential risks to system reliability.
- Advocating for reliability priorities in application design reviews and operational readiness exercises for new and existing services.
Qualifications:
- External What skills and experience do I need?
You should apply if you have at least a Bachelor's degree in Computer Science or other technical discipline(s), plus hands-on experience with any combination of the following:
- 3-5+ years practical experience in production systems support or application development- Hands on experience managing systems in a large scale distributed Unix/Linux environment is essential.
- Automation-related experience is required, using scripting languages such as Python, bash, Perl, and/or Ruby. Higher-level compiled languages such as C++, C#, JAVA, Scala, and Go are a big plus.
- Deep knowledge of and hands-on experience applying the principles of System/Site Reliability Engineering (SRE).
- Practical experience designing and instrumenting SLO/SLI dashboards is particularly valuable.
- Hands on experience on enterprise tools such as AppDynamics, Grafana, Splunk, Dynatrace
- Experience with Puppet, Ansible, Chef, GitHub or any automation/configuration/release management tools- Awareness of, and ability to reason through modern software and systems architectures, including load
-balancing, databases, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.
- Working ability to interact with message transport platforms and protocols (MQ, CPS, XML, FIX) and distributed database technologies (DB2, Sybase, Mongo, GreenPlum, Postgres, KDB).
- Autosys scheduling and batch processing concepts.
- Deep understanding of infrastructure and operating system concepts such as processes, memory allocation, and networking, with an understanding of how applications are affected by the above, and ability to debug and troubleshoot accordingly.