Senior Site Reliability Engineer (OpenShift + Agentic AI)
Charlotte, NC/Dallas, TX/Phoenix, AZ
Skills: Openshift Container platform; Python development and Small language model development
Sr. SRE Expertise in one of the platform (OCP, AZURE or Google Cloud Platform)
Adding skills and tools to Agentic/MCP Development with Python
Please focus on these two qualifications and submit more candidates.
Job Description
Skills:
OpenShift Container platform; Python OpenShift Systems Operations Engineer and Agentic AI Developer Focus
Advanced OpenShift Engineering & Operations ( 4 to 6 years)
o Deep expertise with OpenShift includes building clusters with pipelines, diagnosing, debugging, remediation, upgrades, patching, and RCA.
o Strong experience with Operators, Diagnostics hub operators, ETCD, API server, OVN K networking, workloads, and cluster internals.
Python development and OCP API skills for Remediation & Automation tasks (> 5 years)
o Strong, production quality Python engineering skills.
o Hands on experience with the OpenShift Python client and Operator / Kubernetes Python libraries.
o Ability to build automated remediation workflows and operational tools in Python.
o High code quality – Unit Tests, Vulnerability Management
o Packaging and Deploying Experience – Helm Charts, CI/CD, GitSaaS workflows
o System Design Experience
o Technical Documentation
Agentic AI Development for OCP ( > 3 years)
o Experience developing agentic AI workflows aimed at OpenShift operations and remediation.
o Strong skills with frameworks such as LangChain, Google ADK/Agents, or similar multi agent orchestration tools.
o Develop skills and Tools for Agentic AI to complete tasks for OpenShift Platform
o Ability to build tool calling agents that interact with OpenShift safely and deterministically.
o Experience building agentic AI systems without using managed ML/AI platforms such as AWS SageMaker, Bedrock, Vertex AI, or similar hosted orchestration services.
** Candidate should have developed and deployed self-hosted, Python-based AI/agentic services end to end.
API Automation Platform (FastAPI / Flask)
o Ability to build, extend, and maintain a Python-based automation platform (FastAPI or Flask) that executes OpenShift actions.
o Experience designing agent tools/endpoints for AI-driven workflows.evelopment and Small language model development
Job Responsibilities
you may: Consult as an expert to develop or influence initiatives and resources for highly complex business and technical needs across Engineering. Consult on the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, and advanced analytical and inductive thinking. Provide expertise to client senior leadership on innovative Engineering business solutions. Strategically engage with client personnel.