Overview
Skills
Job Details
You ll be part of a high-impact team pushing the boundaries of cloud-native AI in a real-world enterprise setting. This is not a prompt-engineering sandbox or a resume keyword trap. If you ve merely dabbled in SageMaker, mentioned RAG on LinkedIn, or read about vector search this isn t the right fit. We re looking for candidates who have architected, developed, and supported AI/ML services in production environments.
This is a builder s role within our Public Cloud AWS Engineering team. We aren t hiring buzzword lists or conference attendees. If you ve built something you re proud of especially if it involved real infrastructure, real data, and real users we d love to talk. If you re still learning, that s great too but this isn t an entry-level role or a theory-only position.
Duties & Responsibilities:
- Develop and maintain modular AI services on AWS using Lambda, SageMaker, Bedrock, S3, and related components built for scale, governance, and cost-efficiency.
- Contribute to the end-to-end development of RAG pipelines that connect internal datasets (e.g., logs, S3 docs, structured records) to inference endpoints using vector embeddings.
- Fine-tune LLM-based applications, including Retrieval-Augmented Generation (RAG) using LangChain and other frameworks.
- Tune retrieval performance using semantic search techniques, proper metadata handling, and prompt injection patterns.
- Work within the software release lifecycle, including CI/CD pipelines, GitHub-based SDLC, and infrastructure as code (Terraform).
- Support the development and evolution of reusable platform components for AI/ML operations.
- Create and maintain technical documentation for the team to reference and share with our internal customers.
- Excellent verbal and written communication skills in English.
Required Knowledge, Skills and Abilities:
- 7-10 years of proven software engineering experience with a strong focus on Python and GoLang.
- Must have a strong background in document tokenization, embeddings, various word models (such as Word2Vec, FastText, TF-IDF, BERT, GPT, ELMo, LDA, Transformers), and experience with NLP pipelines.
- Direct, hands-on development of RAG, semantic search, or LLM-augmented applications, and using frameworks and ML tooling like Transformers, PyTorch, TensorFlow, and LangChain not just experimentation in a notebook.
- Deep expertise with AWS services, especially Bedrock, SageMaker, ECS, and Lambda.
- Proven experience fine-tuning large language models, building datasets, and deploying ML models to production.
- Demonstrated experience with AWS organizations and policy guardrails (SCP, AWS Config).
- Demonstrated experience in Infrastructure as Code best practices and experience with building Terraform modules for AWS cloud.
- Strong background in Git-based version control, code reviews, and DevOps workflows.
- Demonstrated success delivering production-ready software with release pipeline integration.
Must Have Skills:
- AWS Services - Bedrock, SageMaker, ECS and Lambda
- Demonstrated Proficiency in Python and Golang Coding Languages
- LLM
- Natural Language Processing (NLP)
- Retrieval Augmented Generation
Nice to Have Skills:
- AWS or relevant cloud certifications.
- Policy as Code development (i.e., Terraform Sentinel).
- Experience optimizing cost-performance in AI systems (FinOps mindset).
- Data science background or experience working with structured/unstructured data.
- Awareness of data privacy and compliance best practices (e.g., PII handling, secure model deployment).
- Experience with Node.js.
- Exposure to Fin Ops and Cloud Cost Optimization
- Node.JS
- Policy as Code Development (I.e Terraform Sentinel)