Overview
Skills
Job Details
Summary:
The Terraform AWS Engineer is responsible for designing, implementing, and maintaining infrastructure-as-code (IaC) solutions to support scalable machine learning inference workloads across multiple AWS accounts. This role focuses on automating the provisioning of ML pipelines, endpoints, monitoring systems, and security configurations using Terraform.
Key Responsibilities:
* Develop and maintain Terraform modules for provisioning ML inference pipelines (real-time, batch, serverless, asynchronous)
* Automate deployment of SageMaker endpoints, Lambda functions, Step Functions, and associated monitoring/logging infrastructure
* Implement cross-account infrastructure for model registry, approval workflows, and endpoint deployment
* Provision and manage AWS resources including S3 buckets, CloudTrail, EventBridge, CloudWatch log groups, and API Gateway
* Ensure infrastructure is secure, scalable, and compliant with organizational standards
* Collaborate with ML engineers, security architects, and cloud architects to align infrastructure with model lifecycle workflows
* Support backup and restore processes for model artifacts and logs
* Maintain documentation and version control of Terraform configurations
Required Skills:
* Strong experience with Terraform and AWS services (SageMaker, Lambda, Step Functions, CloudWatch, S3, IAM, API Gateway)
* Proficiency in infrastructure-as-code principles and CI/CD pipelines
* Familiarity with ML workflows and model deployment processes
* Experience with multi-account AWS environments and cross-account resource sharing
* Knowledge of security best practices in cloud environments
Preferred Qualifications:
* Experience in MLOps or ML platform engineering
* Familiarity with monitoring tools and alerting systems
* Understanding of model governance and compliance frameworks