Sr Data Engineer

Remote • Posted 4 hours ago • Updated 4 hours ago
Contract W2
6 Months
No Travel Required
Remote
$70 - $75/hr
Fitment

Dice Job Match Score™

⏳ Almost there, hang tight...

Job Details

Skills

  • GCP
  • Python
  • Azure Devops
  • CI/CD Pipeline
  • ETL/ELT
  • Docker

Summary

**** Please note that the candidates / Consultants need to be on our W2 and we cannot work on C2C for this position*****

 

Title: Sr. Software Engineer ( Data Engineer)

Duration: 6 Months

Client: Mayo Clinic

Req ID: 37363153

Remote

Scope: The resources will be supporting an engineering team tasked with building a research data platform which will ingest and make discoverable research generated data.

Data Engineering Skills & Experience:

-Create, verify, and maintain data replication scripts

-Create, verify, and maintain data validation, processing, and ingestion pipelines

-Deploy and automate the execution of data replication scripts and data pipelines in cloud infrastructure

-Create and maintain data catalogs that describe datasets and their contents (i.e. files, file types, tables/views, columns, fields, etc.)

-Create, verify, and maintain dashboards and reports that characterize ingested datasets

-Create, verify, and maintain data validation scripts/APIs that verify the production dataset contains the correct number of samples/records, expects values/fields/columns are populated, and values are of the correct data type, format, and range. -Deploy and automate the execution of data validation scripts/APIs

-Create and maintain user documentation (dataset descriptions, tutorials, code examples, etc.)

-Define entitlements, user groups, roles, and permissions utilized to grant access to datasets

 Programming Languages:

Primary pipeline development language with be python.

Some datatypes and formats may require the use of other languages (i.e. java, R, etc.) because the libraries/frameworks/sdks available to work with those datatypes and formats are not available in python

 Operating Systems:

Primary operating system for data pipeline execution will be linux, with data pipelines packaged, deployed, and run as containers.

Data source systems could be windows or linux based.

 Infrastructure:

Primary data platform and data pipeline execution infrastructure will be hosted on Google Cloud Platform (Google Cloud Platform) utilizing cloud native technologies (i.e. Google Cloud Storage, BigQuery, Google Batch, Dataflow, Cloud SQL, etc.).

Data will be replicated from various on-premises sources that include laboratory instruments, network shared drives, and windows desktops attached to instruments.

 Development Tools:

Sprints, features, and tasks will be managed in Azure DevOps.

Code will be managed and versioned Azure DevOps based git repositories.

Code will be compiled, packaged, and deployed utilizing Azure DevOps build pipelines.

Data pipelines will be packaged, deployed, and run in docker containers.

Docker containers will be stored and versioned in Google Cloud Artifact Repositories.

Veracode will be utilized to scan source code for vulnerabilities and Prisma Cloud will be utilized to scan containers.

The standard integrated development environment will be jetbrains (pycharm, intellij, etc.) or VSCode.

 Preferred Candidates:

-Experience working on healthcare, life science, or scientific research projects

-A degree or domain knowledge in a life science related field (biochemistry, genetics, biology, etc)

-Experience with Google Cloud Platform based infrastructure and services 100% remote.

Mayo will provide equipment.

 Education: Bachelor''''''''s Degree in Computer Science/Engineering or related field with 5 years of experience as noted below; OR an Associate''''''''s degree in Computer/Science/Engineering or related field with 7 years of experience

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10118474
  • Position Id: Mayo06202623
  • Posted 4 hours ago
Contact the job poster
SV

Suresh Viswanathan

Recruiter @ Xylo Technologies, Inc.
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Third Party, Contract

$75 - $85

Remote or Irving, Texas

Today

Full-time

USD 114,379.00 - 185,400.00 per year

Remote

4d ago

Easy Apply

Contract

0 - 0

Remote

Today

Easy Apply

Contract

$50 - $70

Search all similar jobs