Position Overview:
The AI/ML Engineer is a key technical contributor driving AI transformation initiatives. This role focuses on building and deploying intelligent, cloud-native applications from GenAI-powered systems and retrieval-augmented assistants to data-driven automation workflows.
Working at the intersection of machine learning, cloud engineering, and educational innovation, the engineer translates complex needs into scalable, secure, and maintainable AWS-native AI systems that enhance teaching, learning, and operations across global online programs.
Key Responsibilities:
AI Application & Systems Development
- Own the design and end-to-end implementation of AI systems combining GenAI, narrow AI, and traditional ML models (e.g., regression, classification).
- Implement retrieval-augmented generation (RAG), multi-agent, and protocol-based AI systems (e.g., MCP).
- Integrate AI capabilities into production-grade applications using serverless and containerized architectures (AWS Lambda, Fargate, ECS).
- Fine-tune and optimize existing models for specific educational and administrative use cases, focusing on performance, latency, and reliability.
- Build and maintain data pipelines for model training, evaluation, and monitoring using AWS services such as Glue, S3, Step Functions, and Kinesis.
Cloud & Infrastructure Engineering
- Architect and manage scalable AI workloads on AWS, leveraging services such as SageMaker, Bedrock, API Gateway, EventBridge, and IAM-based security.
- Build microservices and APIs to integrate AI models into applications and backend systems.
- Develop automated CI/CD pipelines ensuring continuous delivery, observability, and monitoring of deployed workloads.
- Apply containerization best practices using Docker and manage workloads through AWS Fargate and ECS for scalable, serverless orchestration and reproducibility.
- Ensure compliance with Stanford and regulatory standards (FERPA, GDPR) for secure data handling and governance.
Collaboration, Culture & Continuous Improvement
- Collaborate closely with cross-functional teams to deliver integrated and impactful AI solutions.
- Use Git-based version control and code review best practices as part of a collaborative, agile workflow.
- Operate within an agile, iterative development culture, participating in sprints, retrospectives, and planning sessions.
- Continuously learn and adapt to emerging AI frameworks, AWS tools, and cloud technologies. Contribute to documentation, internal knowledge sharing, and mentoring as the team scales.
Required Qualifications
Education & Certifications
- Bachelor s degree in Computer Science, AI/ML, Data Engineering, or a related field (Master s preferred)
- AWS certification preferred (Solutions Architect, Developer, or equivalent); Professional-level certification a plus.
Experience:
- 3+ years of experience developing and deploying AI/ML-driven applications in production.
- 2+ years of hands-on experience with AWS-based architectures (serverless, microservices,CI/CD, IAM).
- Proven ability to design, automate, and maintain data pipelines for model inference, evaluation, and monitoring.
- Experience with both GenAI and traditional ML techniques in applied, production settings.
Technical Skills:
- Languages: Python (required); familiarity with Go, Rust, R, or TypeScript preferred.
- AI/ML Frameworks: PyTorch, TensorFlow, LangChain, LlamaIndex, or similar.
- Cloud & Infrastructure: AWS SageMaker, Bedrock, Lambda, ECS/Fargate, API Gateway, EventBridge, Glue, S3, Step Functions, IAM, CloudWatch.
- Infrastructure as Code: AWS CloudFormation.
- DevOps & Tools: Git, Docker, AWS Fargate, ECS, CI/CD (GitHub Actions, CodePipeline).
- Data Systems: SQL/NoSQL, vector databases, and AWS-native data services.
Desired Attributes:
- Strong understanding of data engineering fundamentals and production-quality AI system design.
- Passion for applying AI to improve educational outcomes and operational efficiency. Excellent problem-solving, debugging, and communication skills.
- Demonstrated ability to learn rapidly, adapt to new technologies, and continuously improve. Commitment to ethical AI, data privacy, and transparency.
- Collaborative mindset with proven success in agile, team-based environments.
- Thrives in a fast-paced, evolving environment, proactively seeking opportunities to upskill and enhance processes.
Success Metrics:
- Timely delivery of scalable, maintainable AI solutions.
- High system uptime, performance, and cost-efficiency of deployed workloads.
- Consistent adoption of best practices in CI/CD, monitoring, and version control.
- Positive stakeholder feedback and contribution to team documentation, learning, and innovation initiatives.
Working Conditions:
- Hybrid work model (2 3 days on campus).
- Collaborative, agile team culture with regular code reviews and paired development.