Location: Charlotte, NC
Salary: $61.00 USD Hourly - $66.00 USD Hourly
Description: Site Reliability Engineer (SRE) - Systems Operations EngineerWe are not accepting C2C or 1099 arrangements.Location: Charlotte, NC or Irving, TX (Hybrid: 3 days onsite)
Employment Type: Contract (18 months, extension possible; conversion eligible)
Minimum Qualifications- 2+ years of experience in Site Reliability Engineering (SRE) or related production engineering roles
- Experience with Kubernetes or OpenShift (OCP)
- Strong understanding of databases (SQL/NoSQL)
- Experience with monitoring, alerting, and observability tools
- Proficiency in at least one programming language (e.g., Python, Java)
Preferred Qualifications- Experience with AutoSys or job scheduling tools
- Background in financial services or corporate banking environments
- Interest or exposure to AI/ML, AIOps, or automation technologies (RPA, chatbots)
- Experience defining and managing SLOs, SLIs, and error budgets
- Familiarity with microservices architecture and distributed systems
About the RoleAs a Site Reliability Engineer, you will help ensure the reliability, scalability, and efficiency of critical enterprise systems supporting Shared Services Operations Technology. You will work across domains including payments, regulatory operations, financial crimes (KYC/AML), and real estate systems.
You will play a key role in transitioning the organization from reactive "firefighting" to proactive reliability engineering, improving system stability, automation, and observability across a portfolio of ~85 applications.
Responsibilities- Design and implement automation solutions to reduce manual operational work and improve system efficiency
- Build and enhance observability frameworks including monitoring, logging, and alerting
- Define and implement SLOs/SLIs and track service performance and reliability
- Improve system resilience using cloud and containerization technologies (Kubernetes/OCP)
- Lead efforts to reduce incidents and improve uptime through root cause analysis and postmortems
- Collaborate with engineering and operations teams to shift reliability practices left
- Support and optimize CI/CD pipelines and automation workflows
- Contribute to adoption of AIOps, predictive monitoring, and self-healing systems
- Drive implementation of non-functional requirements (NFRs) such as scalability, availability, and performance
What You'll Work On- Modernizing legacy systems with improved reliability standards
- Establishing SRE best practices across multiple business domains
- Increasing proactive engineering efforts from ~40% to ~80%
- Supporting high-impact systems related to financial crimes, payments, and regulatory operations
- Implementing automation using AI/ML, RPA, and intelligent monitoring tools
Work Environment & Expectations- Hybrid model: 3 days onsite required
- Participation in on-call or weekend rotation (approximately once every 1-2 months)
- Fast-paced environment with a focus on continuous improvement and reliability engineering maturity
Why Join- Opportunity to influence large-scale reliability transformation initiatives
- Work on mission-critical financial systems
- Exposure to modern SRE practices, cloud platforms, and AI-driven operations
- Collaborative team environment with strong engineering focus
By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively "Judge") to such phone number regarding job opportunities, your job application, and for other related purposes. Message & data rates apply and message frequency may vary. Consistent with Judge's Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.
Contact: This job and many more are available through The Judge Group. Please apply with us today!