Data Engineer Python 9462891

company banner
CSI (Consultant Specialists Inc.)
Python, Artificial Intelligence, Clarify, Data Managment, Data Quality, Data Science, Pipeline, ETL, BI, Data, Data Pipeline, Machine Learning, Pyspark, AWS, SQL, Coud, Clinical, Drug Development
Contract W2, 12 Months
Depends on Experience
Work from home not available Travel not required

Job Description

Data Engineer Python 9462891

Required Skills:
ARTIFICIAL INTELLIGENCE
CLARIFY
DATA MANAGEMENT
DATA QUALITY
DATA SCIENCE

Duties:

The Research and Early Development (gRED), Early Clinical Development Informatics (ECDi) department is seeking an experienced Data Engineer who will be responsible for designing, developing and optimizing ETL / data pipelines to support a variety of machine learning, predictive analytics, systems and BI solutions in support of the organization's goals to digitize and optimize clinical trials.

This individual will work within ECDi's Information Management Office (IMO).

The role will require cross-functional interactions with Data Management Leads, Predictive Analytics Analysts, Artificial Intelligence Scientists and Information Technology teams across multiple projects to implement data solutions in ECDi's data lake and data warehouse called gCORE.

The hallmark of a great candidate is one who can translate the unique needs of a diverse set of stakeholders and requirements across both the data lake and data warehouse use cases and is eager to solve complex data challenges selecting the best fit solution.

Must be self-motivated, passionate about data management and analytics and able to extrapolate customer needs with minimal direction.

Responsibilities:

Understand the current state data landscape, use cases and existing data lake and data warehouse setup

Work with Business Analysts, Data Analysts, Data Scientists and AI Engineers to identify infrastructure and data roadmap needs and propose the appropriate strategy in partnership with other IMO engineers

Assemble large, complex data sets in the format fit for each use case
Architect, develop and optimize ETL pipelines using Python, Spark, EMR, Docker and Airflow

Develop and optimize big data pipelines for data scientists (requires a basic understanding of data science concepts and ML)

Write generic Python/Pyspark modules for processing data from various data sources (XML, Parquet, CSV, Relational)

Hands on physical and logical database design and modeling in the context of data warehousing (currently using AWS Redshift)

Perform hands-on infrastructure design of ECD's AWS data lake and data warehouse environment (gCORE) including continuous exploration and recommendation of new technologies and best practices;

Research and recommend new innovative methods and systems to manage data for business improvement;

Participate in internal governance to drive the data quality business cycle and roadmap

Skills:
5+ years of programming experience (including functional programming);

must be advanced in Python

3+ years experience designing, building and maintaining production data pipelines and/or data warehouses

Demonstrable experience working with different database types including columnar data stores, SQL and graph based and the ability to select the right tool for the right job

Experience building and optimizing big data pipelines using Spark

Experience with AWS cloud services: S3, EC2, EMR, RDS, Redshift, Lambda, EKS

Solid understanding of how to design robust data workflows including optimization and user experience

Strong analytical and problem-solving skills

Excellent oral and written communication skills

Able to work in teams and collaborate with others to clarify requirements

Strong coordination and project management skills to handle complex projects

Experience developing and working with XML, JSON, and external web services

Preferred Qualifications:

Clinical drug development domain knowledge

Experience working with clinical and biomedical data types (clinical patient data, omics, imaging, etc.)

Competencies in applied statistics to solve business needs

Knowledge of industry data standards used in drug development, particularly in Clinical development

Education:
Bachelor's or Master's degree in computer science or software engineering

Additional Skills:
DATA SOURCES
DATA WAREHOUSE
DATA WAREHOUSES
DATA WAREHOUSING
DATABASE
DATABASE DESIGN
DRUG DEVELOPMENT
EC2
EMR
ENGINEER
EXPLORATION
GOVERNANCE
JSON
OPTIMIZATION
PREDICTIVE ANALYTICS
PROBLEM-SOLVING
PROJECT MANAGEMENT
PYSPARK
PYTHON
SQL
TRANSLATE
USE CASES
USER EXPERIENCE
WEB SERVICES
XML
AMAZON ELASTIC COMPUTE CLOUD
B2B SOFTWARE
BI
BIOMEDICAL
BUSINESS INTELLIGENCE
CLINICAL TRIALS
DATABASES
DOCKER
ETL
IMAGING

Posted By

Jackie Felipe



Company Information

Consultant Specialists, Inc. is a small, specialized firm, built and operated by staffing industry veterans with many years experience. We are a full-service technical staffing firm focused on providing technical consultants on a contract basis to companies San Francisco, San Jose and across Northern California. As a small firm, CSI is extremely flexible and fast while remaining cost competitive. Recruiting the best of the best gets more done in less time. We rely on our comprehensive internal database, rather than soliciting resumes to fill positions. This gives our clients the best IT resources at competitive prices and provides our consultants with access to the most desirable positions.
Dice Id : 10106409
Position Id : DataEngrPython
Originally Posted : 1 month ago

Similar Positions at CSI (Consultant Specialists Inc.)

Data Curator Python
  • South San Francisco, CA
  • 20 hours ago
Sr. Software Engineer IV - Data Integrations 9312466
  • South San Francisco, CA
  • 20 hours ago
Matlab Software Specialist IV
  • South San Francisco, CA
  • 20 hours ago
BI Developer - Tableau
  • South San Francisco, CA
  • 20 hours ago