Location: Charlotte, NC
Salary: $61.00 USD Hourly - $66.00 USD Hourly
Description: Job Title: Senior Site Reliability Engineer (Systems Operations Engineer)Location: Charlotte, NC or Irving, TX
Schedule: Hybrid - 3 days per week onsite (mandatory)
Contract: 18 months (with possible extension and eligibility for conversion)
About the RoleWe are seeking a highly skilled
Senior Site Reliability Engineer (SRE) to support key Shared Services Operations Technology platforms, including Payment Evaluations, Regulatory Operations, Financial Crimes, and Business & Real Estate Evaluation. You will be part of a team responsible for maintaining availability, performance, and reliability across ~85 applications that support KYC, AML, and other critical financial-crimes-related workloads.
This role blends
software engineering,
systems operations, and
cloud-native reliability practices to drive automation, enhance resilience, and support modernization across a large enterprise ecosystem. You will also help evolve AIOps capabilities, including predictive alerting, self-healing workflows, and AI/ML-driven incident analysis.
Some occasional weekend work or overtime may be required for critical system support.
What You'll DoSite Reliability & Operations- Lead SRE practices that enhance system availability, performance, and scalability across multi-cloud environments.
- Support and improve critical applications and customer journeys; lead incident response and blameless postmortems.
- Conduct root-cause analysis and drive long-term remediation of recurrent issues.
- Define and enforce operational readiness and Non-Functional Requirements (NFRs) during platform modernization.
Automation & Tooling- Design and implement automation to eliminate operational toil and improve service reliability.
- Build frameworks for automated SLO/SLI tracking, availability metrics, error budgeting, and customer impact analysis.
- Implement self-healing and autonomic systems using AI/ML, RPA, and intelligent monitoring.
Monitoring, Observability & AIOps- Develop and enhance monitoring, alerting, and observability capabilities.
- Drive adoption of AIOps platforms to support anomaly detection, predictive alerting, and automated incident resolution.
Collaboration & Leadership- Collaborate with platform teams, product owners, and technology partners across the COO Technology organization.
- Mentor peers and champion SRE best practices across engineering teams.
- Identify process gaps across domains and recommend scalable, long-term improvements.
Required Qualifications- 5+ years in Systems Engineering, Site Reliability Engineering, Technology Architecture, or related fields (or equivalent military/training/education experience).
- 2+ years performing as part of an SRE team.
- Strong written and verbal communication skills.
Technical SkillsSoftware Development- Proficiency in Python and/or Java/J2EE.
- Experience with REST APIs, microservices, Kafka/MQ, and modern integration patterns.
- Familiarity with JavaScript frameworks (React, Bootstrap).
- Strong SQL skills and database schema design experience.
Infrastructure & Cloud- Expertise with Linux and container orchestration (Kubernetes, OpenShift/OCP strongly preferred).
- Experience with PCF, AWS, Google Cloud Platform, or Azure environments.
CI/CD & Automation- Tools: Jenkins, GitLab, SonarQube, Artifactory, Ansible.
Observability & AIOps- Tools: Grafana, Prometheus, Splunk/ELK, AppDynamics, Elastic, ThousandEyes, Aternity, Google Cloud Logging.
- AIOps Platforms: Moogsoft, AI/ML-based analytics frameworks.
Operations & Data- ITSM Tools: ServiceNow, Remedy, IBM Netcool.
- Databases: Oracle, DB2, SQL Server, MongoDB, Hadoop/Cloudera, Spark, Teradata.
Foundational AI Knowledge- Understanding of common AI/ML concepts (classification, regression, clustering, anomaly detection).
- Ability to work with structured/unstructured data for model evaluation.
- Awareness of ethical/operational considerations in AI systems.
- Experience integrating AI into automation workflows is a plus.
Preferred Qualifications- Experience with AutoSys.
- Prior experience in corporate banking or financial services.
- Strong interest in AI-driven operations and AIOps.
By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively "Judge") to such phone number regarding job opportunities, your job application, and for other related purposes. Message & data rates apply and message frequency may vary. Consistent with Judge's Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.
Contact: This job and many more are available through The Judge Group. Please apply with us today!