Overview
Skills
Job Details
Job Description: Data Scientist - Data Annotation AI Specialist
Location: Canada (Remote)
Duration: 3 Months contract
Job Description: Position Title: Data Annotation AI Specialist
Client AI group is seeking a Data Annotation AI Specialist to be part of a team that will be dedicated to build and support Generative AI, Machine learning, Deep Learning and Data science solutions across the organization. The position could be based out of our Chicago or NY offices. We are seeking a Data Annotation AI Specialist to lead the evaluation, selection, and onboarding of a data annotation platform, and to establish best-in-class annotation workflows for our NLP and CV initiatives. This role will bridge product, data science, MLOps, and compliance to ensure high-quality labeled datasets that accelerate model development for tasks such as text classification, entity extraction, unstructured data extraction, document summarization, and prompt/response curation.
What We Offer:
This will be a high impact role with significant visibility where the candidate will work on some flagship Fitch products
The candidate will have an excellent opportunity to work in the cutting-edge field of AI, NLP, Computer vision and MLOPs/LLMOps
Fitch promotes an excellent work culture and is known for providing a good work life balance
We'll Count on You To:
Platform Evaluation and Onboarding:
o Assess and compare data annotation platforms (e.g., Labelbox, Prodigy, Snorkel, Scale AI, SuperAnnotate, LightTag, custom open-source stacks) against business and technical requirements.
o Lead proof-of-concept trials; define evaluation criteria (quality, throughput, cost, security, privacy, compliance, UI/UX, workflow features, integrations, auditability).
o Drive vendor due diligence, security reviews, and coordinate procurement/contracting with Legal, Security, and Procurement.
o Plan and execute platform deployment, integrations (SSO, data lakes, MLOps pipelines), and role-based access controls.
Workflow and Taxonomy Design:
o Collaborate with NLP and CV scientists and product owners to define labeling taxonomies, guidelines, and rubrics for tasks such as NER, data extraction, intent classification, topic modeling, toxicity/BI risk tagging, and document QA.
o Establish annotation protocols, inter-annotator agreement measures (IAA), and quality gates; design multi-pass review processes and adjudication steps.
o Develop gold standards and calibration sets; maintain versioning and change management of label schemas.
Quality Management:
o Implement QA metrics and dashboards (precision/recall on labeled subsets, IAA, disagreement analysis, drift detection, sampling strategies).
o Design active learning and human-in-the-loop strategies to continually improve data quality and labeling efficiency.
o Conduct audits, bias checks, and error analyses; enforce data governance and documentation (data sheets, model cards inputs).
Operations and Scale:
o Build and manage a hybrid workforce model (in-house annotators, expert reviewers, external vendors) including training, SLAs, throughput planning, and budget tracking.
o Create training materials and onboarding programs for annotators, SMEs, and reviewers; run calibration sessions and periodic res.
o Optimize throughput and cost with workflow automation, pre-labeling, heuristics, and annotation tooling features.
Integration and MLOps:
o Integrate the annotation platform with data pipelines, model training loops, experiment tracking, and storage (e.g., Databricks, Snowflake, AWS/Google Cloud Platform/Azure, MLflow).
o Implement programmatic interfaces (APIs/SDKs) for data ingestion/export, schema management, and reproducibility.
o Collaborate on dataset curation, splitting strategies, and governance (PII handling, encryption, retention policies).
What You Need to Have:
4 7+ years of experience in data annotation, data operations, or applied NLP/CV/ML, with direct responsibility for building and managing labeling programs.
Hands-on experience with annotation platforms and workflows for NLP tasks; familiarity with enterprise deployment considerations (SSO, RBAC, audit, SOC2).
Strong understanding of NLP and CV techniques: tokenization, embeddings, NER, text classification, sentiment, summarization, prompt engineering, and evaluation.
Proficiency in Python and data tooling (Pandas, spaCy, Hugging Face, NLTK); experience using APIs/SDKs to automate annotation and active learning loops.
Experience defining label taxonomies, guidelines, and measuring IAA; practical knowledge of QA methodologies and error/bias analysis.
Familiarity with cloud platforms (AWS/Google Cloud Platform/Azure), data governance, and secure data handling.
Excellent communication skills; ability to collaborate with data scientists, product managers, engineers, SMEs, and vendors.
What Would Make You Stand Out:
Experience with large language model (LLM) data curation, RLHF/RLAIF pipelines, and prompt/response quality evaluation.
Background in financial services, risk analytics, or regulated industries with strong compliance requirements.
Prior experience building hybrid annotation teams and managing external vendors.
Knowledge of annotation for multilingual NLP and document-heavy workflows (PDF parsing, OCR)