Principal Data Engineer

Overview

On Site

USD 150,000.00 - 180,000.00 per year

Full Time

Skills

Recruiting

Unstructured data

Vector Databases

Continuous integration

Design thinking

Attention to detail

Problem solving

Organizational skills

Data processing

Generative Artificial Intelligence (AI)

Data quality

Build automation

Extract

transform

load

Data security

Cloud storage

IT service management

Data

Apex

Management

Video

PySpark

Python

Machine Learning (ML)

Amazon Web Services

Storage

Cloud computing

NoSQL

SQL

Databricks

Snow flake schema

GitHub

JIRA

Confluence

Software deployment

Jenkins

Terraform

Splunk

Dynatrace

Analytical skill

Microsoft SharePoint

Design

XML

HTML

PDF

PPT

Policies

Meta-data management

Strategy

Leadership

Research

SAP BASIS

Law

Innovation

Collaboration

Training

Job Details

Job#: 2013168

Job Description:

Apex Systems combines with parent company ASGN Inc. to make it the 2nd largest IT staffing agency in the country.

Apex has an opportunity open for a Principal Data Engineer. If interested in discussing the position further, please send an MS Word version of your resume to Corey Smith at and mention Job ID 2013168

Job Title: Principal Data Engineer
Location: Parsippany, NJ (3 days onsite 2 days remote)
Compensation: $150-180k salary and $80/HR on Apex W2
Duration: 3-month contract to perm role. (direct hire)
Interview Process: 2 rounds total. First is a 1-hour technical video meeting with the client. Second is an onsite meeting at the client office in Parsippany, NJ.
Must have:

8+ years of experience overall, strong in Pyspark/Python, understanding of ML concepts, creating data pipelines / unstructured data, data bricks.
Good experience working with AWS and its services. Performance related analysis
Experience in building data ingestion pipelines for Structured and Unstructured data both for storage and optimal retrieval
Experience working with Cloud data stores, NoSQL, Graph and Vector databases.
Good experience with languages such as Python, SQL, and PySpark
Experience working with Databricks and Snowflake technologies.
Experience with relevant code repository and project tools such as GitHub, JIRA, and Confluence
Working experience with Continuous Integration & Continuous Deployment with hands-on expertise on Jenkins, Terraform, Splunk and Dynatrace.
Highly innovative with aptitude for foresight, systems thinking and design thinking, with a bias towards simplifying processes.
Detail oriented individual with strong analytical, problem-solving, and organizational skills
Ability to clearly communicate to both technical and business teams.

Responsibilities:
Build data ingestion framework and data pipelines to ingest unstructured and structured data from various data sources such as SharePoint, Confluence, Chat Bots, Jira, External Sites, etc. into our existing One Data platform.

Work closely with cross-functional teams, including product managers, data scientists and engineers to understand project requirements and objectives ensuring alignment with overall business goals.

Design a scalable target state architecture for data processing-based on document content (Data types may include, but are not limited to: XML, HTML, DOC, PDF, XLS, JPEG, TIFF, and PPT) including PII/CII handling, policy-based hierarchy rules and Metadata tagging.

Design, development, and deployment of optimal data pipelines including incremental data ingestion strategy by taking advantage of leading-edge technologies through experimentation and iterative refinement.

Design and implement vector databases to efficiently store and retrieve high-dimensional vectors.
- Conducting research to stay up to date with the latest advancements in generative AI services and identify opportunities to integrate them into our products and services.

Implement data quality and validation checks to ensure accuracy and consistency of data.

Build automation that effectively and repeatably ensures quality, security, integrity, and maintainability of our solutions.

Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues.

Define and implement data access policies; implement and maintain data security measures and access policies for cloud storage buckets and vector databases.

EEO Employer

Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at or .

Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico.

Job Details

About Apex Systems

Share