Louisville KY
Job Description:
>> Architect and develop scalable data pipelines using Databricks, Apache Spark, and related technologies.
>> Collaborate with data scientists, analysts, and business stakeholders to understand requirements and deliver robust solutions.
>> Optimize ETL workflows for performance, reliability, and cost efficiency on cloud platforms (Azure, AWS, or Google Cloud Platform).
>> Implement data governance, security, and compliance best practices within the Databricks environment.
>> Mentor junior engineers and contribute to code reviews, architectural decisions, and platform enhancements.
>> Develop and maintain documentation for data pipelines, architecture, and operational procedures.
>> Troubleshoot and resolve complex technical issues related to data ingestion, transformation, and storage.
Requirements:
>> Extensive hands-on experience with Databricks and Apache Spark (PySpark, Scala, or SQL).
>> Proficiency in at least one programming language (Python, Scala, or Java).
>> Strong understanding of cloud data architectures (Azure Data Lake, AWS S3, Google Cloud Platform Storage) and related ecosystem.
>> Experience with CICD tools and DevOps practices for data engineering.
>> Familiarity with data warehousing concepts and tools (Delta Lake, SQL, etc.).
>> Solid grasp of distributed computing, performance tuning, and big data best practices.
>> Excellent communication and problem-solving skills.