Role: AI Engineer
Location: Remote
Preferred: Only on W2.
About the Role:
We are hiring an AI Engineer to be the lead technical contributor on a personalization and
ranking engagement for a large-scale consumer marketplace. You will set the technical
direction, make the key modeling decisions, and stay hands-on throughout. You will be a senior
technical point of contact with the customer — explaining trade-offs, managing expectations,
and turning results into clear recommendations. You will lead a rigorous, POC-first program:
engineering user-level features from behavioral data, integrating LLM-generated user profiles
into a deep-learning ranking model, and driving the work from offline validation through
production-readiness.
What You’ll Do:
* Own the technical strategy for a personalization program on a production
recommendation/ranking system, making the architecture and modeling decisions and
being accountable for the results.
* Stay hands-on: build the features, train the models, run the experiments, and write the
critical code.
* Set the technical bar and support other engineers through design reviews, mentorship,
and pairing.
* Act as a senior technical point of contact with the customer, communicating progress,
risks, and results to both engineers and senior stakeholders, and managing expectations
through ambiguity.
* Design and run a structured, parallel-track proof-of-concept that measures the incremental
lift of GenAI-based profiles over well-engineered behavioral ML features.
* Engineer user-level features from large-scale behavioral data (category/product affinity,
time-of-day and price-sensitivity patterns, per-user click/conversion history, recencyfrequency
signals).
* Integrate LLM-generated user profiles into ranking models, including embedding
generation, projection-layer tuning, gating, and ablation to ensure the signal is properly
weighted.
* Own the deep-learning ranking model (multi-task CTR/CVR architectures such as sharedbottom
MTL), including feature integration, hyperparameter optimization (Bayesian/grid
search), and bias correction (position/popularity).
* Define and run the offline evaluation framework — NDCG, MRR, Precision/Recall at K —
with segment-level analysis and ablation studies across user cohorts.
* Establish the path to production: model serving and scheduled inference integration,
shadow-mode testing, A/B framework readiness, and guardrail metrics.
* Deliver clear technical documentation and lead knowledge-transfer sessions so the
customer’s teams can operate and iterate independently after handoff.
Required Qualifications:
* 10+ years in applied machine learning / data science, with deep hands-on experience in
recommender systems, learning-to-rank, or large-scale personalization.
* Practical experience building with LLMs in production: generating and integrating modelderived
features or profiles, working with embeddings, and reasoning about evaluation,
latency, and cost.
* Experience with Amazon Bedrock or comparable managed LLM platforms for production
inference.
* Hands-on experience with segment- or cohort-based personalization, including measuring
performance at the segment level rather than relying on aggregate metrics.
* Experience designing cold-start strategies for users or items with limited history.
* Strong communication skills — able to explain modeling decisions, trade-offs, and results
clearly to engineers, data scientists, and senior business stakeholders, and to manage
expectations through ambiguity.
* Customer-facing or stakeholder-facing experience: building trust, navigating competing
priorities, and serving as a senior technical voice in high-stakes conversations.
* A track record of technical leadership through mentoring engineers, driving design
decisions, and setting standards.
* Strong track record taking ML models from experimentation to production, owning the
offline-to-online validation story (ranking metrics, ablations, segment analysis, shadow
testing, A/B readiness).
* Deep, hands-on expertise in deep learning for ranking/recommendation — multi-task
learning, embedding-based architectures — with a major framework (TensorFlow or
PyTorch).
* Strong feature engineering on large behavioral datasets using the modern data stack
(PySpark, SQL, distributed data lakes).
* Rigorous experimental methodology — hyperparameter optimization, bias correction, and
a disciplined, hypothesis-driven approach to measuring true lift.
* Hands-on AWS experience across the ML lifecycle, and strong proficiency in Python.
Preferred Qualifications:
* Experience personalizing ranking for marketplaces or consumer platforms at scale (ecommerce,
food delivery, media, or similar).
* MLOps maturity: model versioning, monitoring, and reproducible training pipelines.
* Advanced degree in Computer Science, Machine Learning, Statistics, or a related
quantitative field.
* Prior experience in a client-facing consulting or professional-services delivery
environment.