Senior Databricks AI Platform SRE

Overview

On Site
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - 12 Month(s)

Skills

Access Control
Agile
Amazon Web Services
Artificial Intelligence
Auditing
Build Automation
Cloud Computing
Collaboration
Command-line Interface
Communication
Computer Networking
Continuous Delivery
Continuous Integration
Data Governance
Data Science
Databricks
Debugging
DevOps
Distributed Computing
Encryption
Experience Design
Finance
GitHub
Good Clinical Practice
Google Cloud Platform
Grafana
Health Care
Identity Management
Java
Kanban
Linux
Machine Learning (ML)
Machine Learning Operations (ML Ops)
Management
Mentorship
Microsoft Azure
Network
Onboarding
Orchestration
Provisioning
Python
Quality Assurance
RBAC
Regulatory Compliance
Scrum
Storage
Terraform
Unity
Virtual Private Cloud
Workflow
Databricks AI

Job Details

Position: Senior Databricks AI Platform SRE

Location: Atlanta, GA

Duration: Long Term

Rate: $ Open/Hour

Senior Databricks AI Platform SRE

Job Description

We are looking for a Senior Databricks AI Platform SRE to join our Platform SRE team. This role will be critical in designing, building, and optimizing a scalable, secure, and developer-friendly Databricks platform to enable Machine Learning (ML) and Artificial Intelligence (AI) workloads at enterprise scale

You will partner with ML engineer, data scientists, platform teams, and cloud architects to automate infrastructure, enforce best practices, and streamline the end-to-end ML lifecycle using modern cloud-native technologies.

Total Experience 5+ Years. Bachelor s or master s degree in computer science, Engineering or a related field.

Responsibilities:

Design and implement secure, scalable, and automated Databricks environments to support AI/ML workloads.

Develop infrastructure-as-code (IaC) solutions using Terraform for provisioning Databricks, cloud resources, and network configurations.

Build automation and self-service capabilities using Python, Java and APIs for platform onboarding, workspace provisioning, orchestration and monitoring.

Collaborate with data science and ML teams to define compute requirements, governance policies, and efficient workflows across dev/qa/prod environments.

Integrate Databricks offering with cloud-native services on Azure/AWS

Champion CI/CD and GitOps for managing ML infrastructure and configurations.

Ensure compliance with enterprise security and data governance policies using RBAC, Audit Controls, Encryption, Network Isolation, and policies.

Monitor platform performance, reliability, and usage, and drive improvements to optimize cost and resource utilizations

Required Skills:

Proven experience with Terraform for building and managing infrastructure.

Strong programming skills in Python and Java

Hands-on experience with cloud networking, identity and access management, key vaults, monitoring, and logging in Azure

Hands on experience with Databricks (Workspace management, Clusters, Jobs, MLFlow, Delta Lake, Unity Catalog, Mosaic AI)

Deep understanding of Azure or AWS infrastructure (e.g. IAM, VNets/VPC, Storage, Networks, Compute, Key management, monitoring)

Strong experience in distributed system design, development and deployment using agile/devops practices.

Experience with CI/CD pipelines (GitHub Actions, or similar)

Experience implementing monitoring and observability using Prometheus, Grafana or Databricks-native solutions.

Good communication skills, excellent teamwork experience, ability to mentor and develop more junior developers, including participating in constructive code reviews

Preferred Skills:

Experience in multi-cloud environments (AWS/Google Cloud Platform) is a bonus

Experience in working in highly regulated environments (finance, healthcare, etc.) is desirable

Experience with Databricks REST APIs and SDKs

Knowledge of MLFlow, Mosaic AC, & MLOps tooling

Working with teams using Scrum, Kanban or other agile practices

Proficiency with standard Linux command line and debugging tools

Azure or AWS Certifications

Please send your resume in word format, following details to or call me @ for more information:

Name in Full:

Contact Details:

Email ID:

Current Location:

Relocation:

Availability:

Expected Billing Rate:

Work Authorization:

LinkedIn Profile:

DOB(Month and Day):

Zip Code:

Skype ID:

Employer Details if Any:

Regards,

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.