Senior Data Engineer Machine Learning

  • Posted 60+ days ago | Updated 2 days ago

Overview

Remote
Depends on Experience
Full Time

Skills

AWS Glue
AWS serverless
AutoML
Azure Data Factory
Cloud Architect
ETL
NLP
Python
Data Integration
data pipeline
data lakes
data warehouse
serverless
metadata
data pipelines
data lake
machine learning
artificial intelligence

Job Details

Halvik is a highly successful company that puts people first, and we are looking for someone just like you. We are committed to delivering smarter IT-driven solutions bolstered by quality and innovation to help our customers succeed. Come be a part of something truly special!
 
What You Will Do:
This position will be responsible for building out data pipelines in AWS for data lakes and data warehouses. The individual will be responsible for implementing data pipelines, via a variety of tools including AWS Glue, Azure Data Factory, SQL and/or Python scripts, in the cloud to an existing data lake and data warehouse. Specific roles on this project inside an AWS environment include: (i) implementing data pipelines, for batch and streaming data sources, from external feeds into a cloud-based data lake and eventually into data warehouses; (ii) implementing data cataloging to share metadata information for datasets in the data lake; (iii) using AWS serverless components in the data pipeline architecture; and (iv) using IaC tools to deploy the pipelines within AWS.
 
What You Need:
Bachelor’s degree and one of these certifications; AWS Cloud Architect Certification or AWS ML Certification or AWS Data Analytics Certification is required.
 
10 years of relevant experience. A minimum of 5 years of hands-on Data Integration experience creating and maintaining efficient scripts/data pipelines to clean, transform and ingest data from a variety of formats into database tables, data warehouses or data lake repositories. Experience building data pipelines using AWS serverless components; using AWS Glue to build, maintain and monitor ETL jobs; using Python to implement ETL scripts and AWS Secrets Manager to manage credentials.
 
Experience with AI/ML, image recognition, object detection, NLP etc. with one of the leading cloud providers is a plus. Hand's on experience in building and training models with Databricks AutoML is required.
 
Exposure to Amazon Textract or AWS comprehend services (is a plus).
Python with API integration (is preferred).