TASKS & DUTIES:
· Monitor database and system performance using CloudWatch metrics, alarms, and logs; troubleshoot proactively.
· Develop, deploy, and optimize AI/ML solutions using AWS AI services including SageMaker and Bedrock, supporting model training, inference, and integration into production systems.
· Automate operational tasks using AWS Lambda, Systems Manager (SSM), and Infrastructure-as-Code tools such as CloudFormation or Terraform.
· Design, build, and maintain scalable, fault-tolerant data processing and analytics workflows on AWS using services such as API Gateway, S3, EC2, RDS, Lambda, Glue, Athena, DynamoDB, EMR, Kinesis, DataSync.
· Design and integrate agentic AI systems, including LLM-based agents, multi-agent workflows, and autonomous orchestration pipelines using frameworks such as LangChain and LangGraph.
· Implement ETL/ELT pipelines and data architectures that support machine learning, analytics, and intelligent agent-based applications.
· Support CI/CD pipelines for AI models and data workflows using Jenkins and container-based platforms such as ECS, EKS, or Kubernetes.
· Apply security best practices across AI and data platforms, including IAM least-privilege access, encryption, audit logging, and compliance controls.
· Maintain technical documentation for AI architectures, data pipelines, infrastructure configurations, and operational runbooks.
REQUIRED SKILLS
· Minimum 7 years of hands-on AWS experience: EC2, RDS, S3, CloudWatch, CloudTrail, IAM, KMS, AWS Backup, and Lambda.
· Minimum 7 years of experience in Linux/Unix administration and automation scripting (Bash, Shell, Python).
· Minimum 7 years of experience with Infrastructure as Code (IaC) and automation tools, including CloudFormation, Terraform, and Ansible, for provisioning and maintaining.
· Minimum 7 years of knowledge in AWS networking: VPC, subnets, NACLs, security groups, Route 53, and multi-AZ architectures.
· Minimum 5 years of experience CI/CD pipelines, Jenkins, and IaC for deploying AI agents and ML models into production, monitoring autonomous workflows, and supporting MLOps using Kubernetes, ECS, or EKS.
· Minimum 4 years of experience architecting, building, and maintaining scalable data processing workflows using AWS managed services and Python (including PySpark); strong understanding of data architecture and ETL/ELT patterns.
· Minimum 4 years of experience working with AWS AI/ML services such as SageMaker, Bedrock, and vector databases (OpenSearch).
· Strong understanding of machine learning algorithms, NLP concepts, and deep learning frameworks such as TensorFlow, PyTorch, or Hugging Face.