Overview
Skills
Job Details
Please visit us at ; to know more .
Skills Required:
Scripting, Automation, Root Cause Analysis, Troubleshooting (Problem Solving), Cloud Architecture, IT Solutions, GitHub, Cloud Infrastructure, Change Management, Technical Analysis, Developer, Tekton, Utilization Management, Kubernetes
Additional Information:
- Conduct capacity planning and forecasting for the OpenShift Virtualization platform, including compute, memory, storage, and network resources, to ensure scalability and prevent resource exhaustion. - Analyze resource utilization trends and make recommendations for infrastructure scaling, consolidation, or optimization. - Collaborate with application teams and stakeholders to understand future demand and project capacity needs. - Develop and maintain capacity models and reports to support strategic planning. - Develop automation solutions (scripts, playbooks) for repetitive OSV tasks, including configuration changes, VM management, auditing, remediation and integration with ticketing systems - Leverage automation to enable delivering operator updates and changes efficiently at scale - Implement Site Reliability Engineering (SRE) principles and practices to improve overall platform stability, performance, and operational efficiency - Role Based Access Control deployment and auditing - Namespace and Resource Quota management - Implement and maintain comprehensive end to end observability solutions (monitoring, logging, tracing) for the OSV environment, including integration with tools like Dynatrace and PrometheGrafana - Explore and implement Event Driven Architecture (EDA) for enhanced real time monitoring and response. - Develop capabilities to flag and report abnormalities and identify "blind spots " in observability - Perform deep dive Root Cause Analysis (RCA), potentially utilizing available tooling, to quickly identify and resolve issues across the global compute environment - Find the needle in a haystack/unhealthy bits in the compute universe (Globally) for faster time to resolution - Monitor VM health, resource usage, and performance metrics proactively - Monitor for unusual activity that might indicate a compromise or misconfiguration - Solution Design & Consulting - Knowledge Management
V2Soft is an Equal Opportunity Employer ( EOE). We welcome applicants from all backgrounds, including individuals with disabilities and veterans.
- to view all of our open opportunities and to learn more about our benefits.