Principal Data Engineer

  • Parsippany, NJ
  • Posted 60+ days ago | Updated 11 hours ago

Overview

On Site
USD 150,000.00 - 180,000.00 per year
Full Time

Skills

Recruiting
Unstructured data
Vector Databases
Continuous integration
Design thinking
Attention to detail
Problem solving
Organizational skills
Data processing
Generative Artificial Intelligence (AI)
Data quality
Build automation
Extract
transform
load
Data security
Cloud storage
IT service management
Data
Apex
Management
Video
PySpark
Python
Machine Learning (ML)
Amazon Web Services
Storage
Cloud computing
NoSQL
SQL
Databricks
Snow flake schema
GitHub
JIRA
Confluence
Software deployment
Jenkins
Terraform
Splunk
Dynatrace
Analytical skill
Microsoft SharePoint
Design
XML
HTML
PDF
PPT
Policies
Meta-data management
Strategy
Leadership
Research
SAP BASIS
Law
Innovation
Collaboration
Training

Job Details

Job#: 2013168

Job Description:

Apex Systems combines with parent company ASGN Inc. to make it the 2nd largest IT staffing agency in the country.

Apex has an opportunity open for a Principal Data Engineer. If interested in discussing the position further, please send an MS Word version of your resume to Corey Smith at and mention Job ID 2013168

Job Title: Principal Data Engineer
Location: Parsippany, NJ (3 days onsite 2 days remote)
Compensation: $150-180k salary and $80/HR on Apex W2
Duration: 3-month contract to perm role. (direct hire)
Interview Process: 2 rounds total. First is a 1-hour technical video meeting with the client. Second is an onsite meeting at the client office in Parsippany, NJ.
Must have:
  • 8+ years of experience overall, strong in Pyspark/Python, understanding of ML concepts, creating data pipelines / unstructured data, data bricks.
  • Good experience working with AWS and its services. Performance related analysis
  • Experience in building data ingestion pipelines for Structured and Unstructured data both for storage and optimal retrieval
  • Experience working with Cloud data stores, NoSQL, Graph and Vector databases.
  • Good experience with languages such as Python, SQL, and PySpark
  • Experience working with Databricks and Snowflake technologies.
  • Experience with relevant code repository and project tools such as GitHub, JIRA, and Confluence
  • Working experience with Continuous Integration & Continuous Deployment with hands-on expertise on Jenkins, Terraform, Splunk and Dynatrace.
  • Highly innovative with aptitude for foresight, systems thinking and design thinking, with a bias towards simplifying processes.
  • Detail oriented individual with strong analytical, problem-solving, and organizational skills
  • Ability to clearly communicate to both technical and business teams.

Responsibilities:
Build data ingestion framework and data pipelines to ingest unstructured and structured data from various data sources such as SharePoint, Confluence, Chat Bots, Jira, External Sites, etc. into our existing One Data platform.
  • Work closely with cross-functional teams, including product managers, data scientists and engineers to understand project requirements and objectives ensuring alignment with overall business goals.
  • Design a scalable target state architecture for data processing-based on document content (Data types may include, but are not limited to: XML, HTML, DOC, PDF, XLS, JPEG, TIFF, and PPT) including PII/CII handling, policy-based hierarchy rules and Metadata tagging.
  • Design, development, and deployment of optimal data pipelines including incremental data ingestion strategy by taking advantage of leading-edge technologies through experimentation and iterative refinement.
  • Design and implement vector databases to efficiently store and retrieve high-dimensional vectors.
    • Conducting research to stay up to date with the latest advancements in generative AI services and identify opportunities to integrate them into our products and services.
  • Implement data quality and validation checks to ensure accuracy and consistency of data.
  • Build automation that effectively and repeatably ensures quality, security, integrity, and maintainability of our solutions.
  • Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues.
  • Define and implement data access policies; implement and maintain data security measures and access policies for cloud storage buckets and vector databases.


EEO Employer

Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at or .

Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico.

About Apex Systems