Overview
Remote
Depends on Experience
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 12 Month(s)
Skills
IBM
IBM SmartCloud
SaaS
Incident Management
Management
Network
Kubernetes
Performance Tuning
Job Details
Hello,
Hope you are doing well,
Job Title: IBM Cloud SME (Infrastructure Management, 24/7 Support)
Location: Remote
Duration: 3+ Months
Environment: Production & Test (IBM Cloud)
Job Overview
Seeking a hands-on IBM Cloud SME to fully manage and support two cloud environments (prod and test) for a critical SaaS project. The role requires 24/7 active monitoring and ownership of IBM Cloud infrastructure, ensuring updates, patches, security, performance, and incident response are delivered proactively. Application management is NOT required; you ll collaborate with the application team for infra-related queries.
Responsibilities
- Own infrastructure operations for 2 IBM Cloud environments (production & test) running Watsonx.gov SaaS implementation.
- Perform all cloud maintenance tasks: patching, upgrades, monitoring, backups, DR exercise, and performance tuning.
- Troubleshoot and resolve networking, accessibility, performance, or service issues as they arise.
- Set up and maintain cloud network configurations: VPN, firewalls, access policies, and security controls.
- Monitor environments 24/7 (shift-based or on-call coverage) and ensure prompt issue resolution.
- Work with application team and stakeholders to facilitate integration and answer infra-related questions.
- Maintain clear documentation, incident logs, and proactive status reporting.
- Carry out scheduled maintenance, updates, and DR drills in coordination with stakeholders.
- Ensure environments remain compliant with security and operational standards.
Requirements
- Bachelor s/Master s degree in IT, Computer Science, or equivalent experience.
- 5+ years of hands-on experience with IBM Cloud infrastructure; Watsonx, Kubernetes, SaaS experience highly preferred.
- Proven ability to manage enterprise cloud environments (production/test), including monitoring, upgrades, patching, networking, and troubleshooting.
- Familiarity with cloud-native monitoring and automation tools (IBM, ELK, Prometheus, CloudWatch, etc.).
- Strong networking skills (firewalls, VPNs, subnetting, access control).
- Excellent communication and stakeholder management skills.
- Availability for 24/7 support and incident response.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.