Lead Infrastructure Admin – AI Cloud Services

Remote • Posted 16 hours ago • Updated 16 hours ago
Contract Corp To Corp
Contract Independent
Contract W2
No Travel Required
Remote
$50 - $55/hr
Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

  • Cloud Computing
  • FOCUS
  • Machine Learning (ML)

Summary

Job Title: Lead Infrastructure Admin – AI Cloud Services (Azure & AWS)

Location: California, USA (Remote – Candidates must be in CA Area)
Duration: 6+ Months Contract
Start Date: 05/15/2026

Job Summary

We are seeking an experienced Infrastructure Administrator / Cloud Engineer with strong expertise in supporting AI/ML cloud environments across both Azure and AWS platforms.

The ideal candidate should have hands-on experience managing scalable cloud infrastructure, AI model hosting environments, Kubernetes clusters, DevOps automation, and Infrastructure as Code for enterprise AI services.

This role will focus heavily on LLM infrastructure setup, GPU workloads, cloud security, CI/CD enablement, and AI platform administration.

Healthcare industry exposure is highly preferred.


Mandatory Experience

  • Minimum 5+ years of experience in Infrastructure Administration / Cloud Infrastructure Engineering
  • Strong recent experience supporting AI services infrastructure on Azure & AWS
  • Experience working in centralized support / triaging teams
  • Experience supporting production-grade AI/ML platforms, model training, and inference workloads

Must Have Technical Skills

  • Microsoft Azure
  • AWS Cloud Computing
  • DevOps / CI-CD
  • Artificial Intelligence Infrastructure Support
  • Azure Platform Services
  • Cloud Services Administration
  • Kubernetes Clusters
  • Infrastructure as Code
  • LLM Infrastructure Setup

Key Responsibilities

1. Cloud Infrastructure Management

  • Design, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and Azure.
  • Manage services such as:
    • EC2
    • Azure Virtual Machines
    • GPU Instances
    • EKS / AKS
    • ECS
    • VPC
    • S3
    • Lambda
    • Route 53
    • Kubernetes Clusters
  • Configure networking, storage, compute, and security services for AI environments.
  • Ensure high availability, reliability, scalability, and fault tolerance.

2. AI / ML Platform Support

  • Deploy and maintain enterprise AI/ML services including:
    • Amazon SageMaker
    • Azure Machine Learning
    • Azure AI Foundry
  • Build and maintain AI model training and inference environments.
  • Support Data Scientists, ML Engineers, and AI teams with optimized GPU/cloud infrastructure.
  • Assist with LLM deployment environments and GenAI service administration.

3. Automation / Infrastructure as Code

  • Implement Infrastructure as Code using:
    • Terraform
    • Terragrunt
    • CloudFormation
    • ARM Templates / Bicep
    • Dockerfiles
  • Automate provisioning, patching, configuration management, and environment scaling.

4. Containerization & Orchestration

  • Deploy and manage containerized AI workloads using:
    • Docker
    • Kubernetes
    • Amazon EKS
    • Azure Kubernetes Service (AKS)
    • Amazon ECS
  • Manage cluster administration, scaling, pod monitoring, and deployment troubleshooting.

5. Monitoring & Performance Optimization

  • Monitor AI infrastructure using:
    • CloudWatch
    • Azure Monitor
    • Datadog
    • Prometheus
  • Optimize:
    • GPU utilization
    • Cloud cost
    • AI workload performance
    • Resource consumption

6. Security & Compliance

  • Implement:
    • IAM / RBAC
    • Network Security Groups
    • Encryption
    • Secrets Management
  • Ensure enterprise compliance and secure cloud governance.

 

7. DevOps & CI/CD Integration

  • Integrate AI workloads with CI/CD pipelines.
  • Support automated deployment of ML models and AI services.
  • Work with:
    • GitHub Actions
    • Terraform pipelines
    • Container deployment automation

Required Qualifications

  • Bachelor’s Degree in Computer Science / Information Systems / Related Field
  • 5+ years in Cloud Infrastructure / Infrastructure Administration
  • Strong Linux Administration
  • Scripting experience in:
    • Python
    • Bash
    • PowerShell
  • Strong experience with Terraform / Terragrunt
  • Experience with Docker & Kubernetes
  • Experience with GitHub Actions
  • Hands-on experience with LLM/GenAI Infrastructure setup
  • Experience in incident triaging / centralized cloud operations
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90915884
  • Position Id: 8958832
  • Posted 16 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

3d ago

Easy Apply

Contract, Third Party

Depends on Experience

Remote

2d ago

Easy Apply

Contract, Third Party

Depends on Experience

Remote

2d ago

Easy Apply

Third Party, Contract

Depends on Experience

Remote

3d ago

Easy Apply

Contract, Third Party

Depends on Experience

Search all similar jobs