"Data Engineer" AND "NLP"

Overview

Hybrid
Depends on Experience
Contract - W2
Contract - 12 Month(s)

Skills

Amazon Web Services
Apache Spark
Cloud Computing
Computer Science
Continuous Delivery
Continuous Integration
Data Engineering
Data Processing
Data Quality
Data Science
Data Storage
Docker
Git
Good Clinical Practice
Google Cloud Platform
Kubernetes
Machine Learning (ML)
Management
Microsoft Azure
NLTK
Natural Language Processing
Python
Unstructured Data
Version Control
Workflow

Job Details

Greeting from ABAL Technologies Inc

Role: Data Engineer with NLP

Location: North Carolina - Hybrid

Job Summary:

We are looking for a Data Engineer with experience in Natural Language Processing (NLP) to join our team. The ideal candidate will design, build, and maintain scalable data pipelines and infrastructure to support NLP-based applications. You ll work closely with data scientists, machine learning engineers, and product teams to extract insights from unstructured text data.

Key Responsibilities:

  • Build and maintain efficient data pipelines for text data processing
  • Collect, clean, and organize structured and unstructured data (e.g., documents, social media, logs)
  • Implement and optimize NLP pipelines using libraries like spaCy, NLTK, or Hugging Face Transformers
  • Work with large-scale datasets and ensure data quality and integrity
  • Collaborate with ML and data science teams to deploy NLP models into production
  • Optimize performance of data workflows and NLP model inference
  • Monitor and troubleshoot data pipelines and infrastructure

Required Skills:

  • Strong experience in Python and SQL
  • Knowledge of NLP libraries (e.g., spaCy, NLTK, Hugging Face, gensim)
  • Experience with data processing frameworks (e.g., Apache Spark, Airflow, or similar)
  • Understanding of text preprocessing techniques (tokenization, stemming, lemmatization, etc.)
  • Familiarity with cloud platforms (AWS, Google Cloud Platform, or Azure) and data storage solutions
  • Knowledge of version control (Git) and CI/CD practices

Preferred Qualifications:

  • Bachelor's or Master s degree in Computer Science, Data Engineering, or related field
  • Experience deploying NLP models in production environments
  • Familiarity with containerization tools (Docker, Kubernetes)
  • Exposure to MLflow or other model management tools
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.