Cloud Platform Engineer
Location: Charlotte, NC
Type of Hire: C2C
No.of positions :: 2
Key Skills:
Must-Have Skills (Mandatory):
- Google Cloud Platform, Azure (multi-cloud preferred)
- Terraform (strong hands-on IaC)
- Cloud Networking & Hybrid Connectivity (VPN, VPC/VNet peering, private endpoints)
- Landing Zones & Cloud Governance (Org Policies, guardrails)
- Kubernetes (GKE), OpenShift (OCP)
- Platform Engineering / Internal Developer Platforms
- Observability (monitoring, logging, tracing)
- SRE concepts (SLOs, SLIs, reliability engineering)
- Python (automation)
- HashiCorp Vault (secrets management)
GenAI / Advanced Skills (Strong Preferred):
- GenAI Platforms / LLMs
- RAG (Retrieval Augmented Generation)
- MLOps / LLMOps pipelines
Key Responsibilities (Keywords for Search):
- Build enterprise cloud platforms (Google Cloud Platform + Azure)
- Implement Terraform-based reusable modules
- Design landing zones & governance frameworks
- Enable hybrid/multi-cloud connectivity
- Manage Kubernetes platforms (GKE/OCP)
- Build Internal Developer Portals (self-service infra)
- Define SLOs, reliability patterns, observability
- Support GenAI/LLM workloads and platform enablement
Google Cloud Platform Azure Terraform Cloud Networking Landing Zones Org Policy / Governance HashiCorp Vault Hybrid Connectivity Kubernetes GKE OpenShift (OCP) Platform Engineering Observability SRE / SLOs Python Internal Developer Portals GenAI Platforms LLMs RAG MLOps/LLMOps
Responsibilities:
Design, build, and operate secure, scalable Google Cloud Platform and OpenShift (OCP/GKE) platforms to support deployment of GenAI models, LLMs, and RAG workloads.
Provision and manage cloud infrastructure using Terraform, including landing zones, networking, org policies, and hybrid connectivity across Google Cloud Platform and Azure.
Enable MLOps/LLMOps pipelines for model deployment, monitoring, and lifecycle management, integrating Arize AI and GenAI platforms.
Implement platform engineering best practices, including Kubernetes-based abstractions, internal developer portals, and self-service environments.
Ensure platform security, governance, and secrets management using HashiCorp Vault, IAM, and policy-as-code.
Establish observability, SLOs, and SRE practices to ensure reliability and performance of GenAI and platform services.
Collaborate with data scientists, ML engineers, and application teams to onboard new LLMs, APIs, and inference services efficiently.