Role: Sr. Data Engineer-ETL (F2F Interview)
Location: Denver, CO (Only Locals)
Duration: Long-term
Main Skill:
Over 10+ years of experience in the Software Development Industry.
We need data Engineering exp - building ETLS using spark and sql, real time and batch pipelines using Kafka/firehose, experience with building pipelines with data bricks/snowflake, experience with ingesting multiple data formats like json/parquet/delta etc.
Job Description:
Looking for a highly technical, hands-on Data Engineer III for our Data Lake Team that can independently lead data engineering projects and strive to proactively improve process efficiency, making recommendations for process and system improvements where applicable. The Data Engineer III role will be responsible for not only understanding data pipelines but, event streaming applications, and how to build systems that handle massive amounts of data while making it consumable by other application teams, users and data scientists.
You will also be assisting in the design and architecture of highly scalable, fault tolerant infrastructure capable of processing millions of operations per minute coming from millions of TVs, efficiently store petabytes of data and provide fast insights from the data. You will also be working with teams across the Enterprise to bring their data into our Big Data ecosystem, monitor data quality for cleanliness and fix discrepancies. Ensure data accuracy through validation tasks, perform root cause analysis and implement solutions for data prep and cleanliness, review data at all granular/aggregate levels, and versioning.
About You:
- You have a BS or MS in Computer Science or similar relevant field
- You work well in a collaborative, team-based environment
- You are an experienced engineering with 3+ years of experience
- You have a passion for big data structures
- You possess strong organizational and analytical skills related to working with structured and unstructured data operations
- You have experience implementing and maintaining high performance / high availability data structures
- You are most comfortable operating within cloud based eco systems
- You enjoy leading projects and mentoring other team members
Specific Skills:
- Over 10 years of experience in the Software Development Industry.
- Experience or knowledge of relational SQL and NoSQL databases
- High proficiency in Python, Pyspark, SQL and/or Scala
- Experience in designing and implementing ETL processes
- Experience in managing data pipelines for analytics and operational use
- Strong understanding of in-memory processing and data formats (Avro, Parquet, Json etc.)
- Experience or knowledge of AWS cloud services: EC2, MSK, S3, RDS, SNS, SQS
- Experience or knowledge of stream-processing systems: i.e., Storm, Spark-Structured-Streaming, Kafka consumers.
- Experience or knowledge of data pipeline and workflow management tools: i.e., Apache Airflow, AWS Data Pipeline
- Experience or knowledge of big data tools: i.e., Hadoop, Spark, Kafka.
- Experience or knowledge of software engineering tools/practices: i.e., Github, VSCode, CI/CD
- Experience or knowledge in data observability and monitoring
- Hands-on experience in designing and maintaining data schema life-cycles.
- Bonus - Experience in tools like Databricks, Snowflake and Thoughtspot