Data Analytics Engineer (AI/ML Focus)
Responsibilities
• Data Pipeline & Infrastructure Development: Build, maintain, and scale data pipelines (ETL or ELT) using tools like Apache Spark, Airflow, and Kafka to support AI and ML workloads.
• AI Ready Data Preparation: Transform messy, unstructured data (text, images, video) into structured datasets suitable for model training, including handling feature engineering and vector database ingestion.
• ML Model Productionization: Partner with data scientists to deploy ML models, create APIs for models, and implement MLOps practices, including monitoring for data drift.
• Analytics and Visualization: Create dashboards (Tableau, Power BI, Looker) and run SQL queries to provide actionable business insights, acting as an analytics engineer.
• Data Governance & Quality: Ensure data quality, reliability, and security (PII or PHI) within AI systems, ensuring compliance with regulations like GDPR or HIPAA.
• Cloud and Data Management: Operate within cloud environments (AWS, Azure, Google Cloud) using services like S3, Redshift, Glue, or Databricks.
Key Skills and Qualifications
• Programming Languages: Expert level Python and Advance SQL are mandatory. Java or Scala are preferred for large scale distributed systems.
• ML Frameworks: Familiarity with libraries such as PyTorch, TensorFlow, or Scikit learn for data manipulation and model interaction.
• Data Engineering Tools: Experience with Apache Spark, Kafka, Airflow, dbt, and Vector Databases (Pinecone, Milvus).
• Cloud Platforms: Hands on experience with AWS (Glue, SageMaker) or Google Cloud Platform.
• Analytical Skills: Strong ability to perform exploratory data analysis (EDA) and interpret complex datasets.
• Soft Skills: Must have Strong communication to bridge technical data engineering with business stakeholders.