Senior Data Engineer

Overview

On Site
Depends on Experience
Contract - W2
Contract - Independent

Skills

Python
PySpark
Bigdata
SQL
Data engineering

Job Details

Request you to get in touch with us if available for the below immediate opportunity.

Please email me your resume and contact number at your earliest.

Role Senior Data Engineer

Location Austin, TX

Duration Long term

Role, Skills, Responsibilities:

Data Engineer with experience in implementing and scaling data collection, storage, processing, and filtering for fine-tuning large language models (LLMs) within Conversational Engineering.

This role provides the exciting opportunity to collaborate closely with ML engineers, software engineers, and data scientists who create our AI systems today.

In this role, you will:

  • Design, build, and manage scalable data pipelines for collecting, storing, processing, and filtering large volumes of text data for fine-tuning LLMs.
  • Develop and optimize data storage architectures to handle the massive scale of data required for training state-of-the-art language models.
  • Implement efficient data preprocessing, cleaning, and feature extraction techniques to ensure high-quality data for model training.
  • Collaborate with machine learning engineers and researchers to understand their data requirements and provide tailored solutions for LLM fine-tuning.
  • Design and implement robust and fault-tolerant systems for data ingestion, processing, and delivery.
  • Optimize data pipelines for performance, scalability, cost-efficiency, leveraging distributed computing frameworks and cloud platforms.
  • Ensure the security, privacy, and compliance of data according to industry best practices and regulatory requirements.

Skills:

  • 7+ years of experience as a data engineer, with a strong background in designing and building large-scale data pipelines.
  • Possess deep expertise in distributed computing frameworks such as Apache Spark, Hadoop, or Flink, and have hands-on experience optimizing data processing at scale.
  • Proficient in programming languages commonly used in data engineering, such as Python, and have a solid understanding of data structures and algorithms.
  • Have extensive experience with cloud platforms like AWS, Google Cloud, or Azure for data storage, processing, and management.
  • Well-versed in various data storage technologies, including distributed file systems (e.g., HDFS, S3), databases (e.g., Cassandra, HBase), and data warehouses (e.g., Redshift, BigQuery).
  • Have hands-on experience with ETL orchestration tools such as Apache Airflow, Dagster, or Prefect for managing complex data workflows.
  • Possess knowledge of natural language processing (NLP) techniques and have worked with text data preprocessing, normalization, and feature extraction.
  • Are passionate about staying up-to-date with the latest advancements in data engineering and NLP, and are eager to apply innovative techniques to solve challenging problems.
  • Have strong problem-solving skills, detail-oriented, and good communication skills.

Must have Skills -

  • Python
  • PySpark
  • Bigdata
  • SQL
  • Data engineering

Thanks

Nirmal

Aria Consulting Services LLC

Certified Minority Business Enterprise

E-Verify Certified