Data engineer FTE, 150-160K, 20% bonus, 20K signon, 401K, benefits -- Direct Full time hire by client --

airflow, docker, kubernetes, sagemaker, Python, PySpark, Spark, SQL, Data engineer, data scientist, AWS S3, Amazon Redshift, Pycharm, API, Amazon EC2, Amazon RDS, Amazon S3, Amazon Web Services, Apache Kafka, Apache Spark, Big data, Cloud, Data flow, Data engineering, Computer science, EMR, ETL, JSON, NoSQL, Machine learning, Scala, Scripting, PostgreSQL, Workflow management, XML, Web services, Data science
Full Time
$150,000 - $160,000
Travel not required

Job Description

Full time employee role (direct hire) with our customer. Work can be done offsite as of now until COVID, later work needs to be done onsite in bay area

Responsibilities

  • Assemble large, complex data sets in the format fit for each use case
  • Strong knowledge on file based data lake platform. Incremental updates to data lake
  • Delta Lake technology knowledge is highly desired
  • Parquet based data lake technology expertise
  • Write generic Python/Pyspark modules for processing data from various data sources (XML, Parquet, CSV, Relational)
  • Demonstrable experience architecting, developing and optimizing ETL pipelines using Python, Spark, EMR and Airflow
  • Develop and optimize big data pipelines for data scientists (requires a basic understanding of data science concepts and ML)
  • Research and recommend new innovative methods and systems to manage data for business improvement
  • Participate in internal governance to drive the data quality business cycle and roadmap

Required Skills

Python, Spark, ETL/Data engineering, S3 based datalake in AWS. Development and management of Airflow based data flows

  • Bachelor s or Master s degree in computer science or software engineering
  • 3+ years of programming experience (including functional programming); must be advanced in Python
  • Experience building and optimizing big data pipelines using Spark
  • Experience with AWS cloud services: S3, EC2, EMR, RDS, Redshift, PySpark, Airflow 
  • Experience with relational SQL and NoSQL databases, including Postgres
  • Solid understanding of how to design robust data workflows including optimization and user experience
  • Strong analytical and problem-solving skills
  • Excellent oral and written communication skills
  • Able to work in teams and collaborate with others to clarify requirements
  • Strong co-ordination and project management skills to handle complex projects
  • Experience developing and working with XML, JSON, and external web services
  • Strong experience working with a variety of relational SQL and NoSQL databases
  • Strong experience working with big data tools: Hadoop, Spark, Kafka, etc.
  • Experience with at least one cloud provider solution (AWS, GCP, Azure)
  • Strong experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
  • Ability to work in Linux environment
  • Experience working with APIs
  • Strong knowledge of data pipeline and workflow management tools
  • Expertise in standard software engineering methodology, e.g. unit testing, code reviews, design documentation
  • Experience creating ETL processes that prepare data for consumption appropriately
  • Experience in setting up, maintaining and optimizing databases for production usage in reporting, analysis and ML applications
  • Working in a collaborative environment and interacting effectively with technical and non-technical team members equally well
  • Relevant working experience with Docker and Kubernetes preferred
  • Ability to work with ML frameworks preferred
  •  
Dice Id : 10126850
Position Id : S-DE-DS
Originally Posted : 1 year ago
Have a Job? Post it