Overview
On Site
$60,000 - $80,000
Full Time
Skills
Python
PySpark
HTML/CSS
Java Script
Angular & React
SQL
Django
ETL
Big Data
AWS & GCP
Linux/Unix
GIT
ML
API
Testing & Debugging
Apache Airflow
Job Details
Core Technical Skills
- Python Programming:Strong grasp of Python fundamentals, including data structures, object-oriented programming, and functional programming concepts. Proficiency in libraries like Pandas, NumPy, and PyArrow is also crucial for data manipulation and analysis.
- PySpark Expertise:Deep understanding of PySpark's architecture and functionalities, including DataFrames, RDDs, Spark SQL, transformations, actions, and optimization techniques.
- SQL:Proficiency in SQL for data querying, manipulation, and analysis, as PySpark often integrates with SQL-like operations.
- Data Engineering Principles:Knowledge of data warehousing, ETL (Extract, Transform, Load) processes, data modeling, and data governance.
- Big Data Technologies:Familiarity with the Hadoop ecosystem (HDFS, YARN), and other big data tools like Kafka and Hive.
- Cloud Platforms:Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform, particularly their data storage and processing services.
- Database Management:Understanding of database systems (both relational and NoSQL) for data storage and retrieval.
- Operating Systems:Familiarity with Linux/Unix environments, as they are commonly used in big data infrastructure.
- Version Control:Proficiency in Git for code management and collaboration.
Additional Technical Skills
- Machine Learning: Knowledge of machine learning algorithms and libraries (e.g., MLlib in Spark) for building and deploying models on large datasets.
- Real-time Data Processing: Experience with stream processing frameworks like Spark Streaming or Kafka Streams.
- API Development: Ability to develop and integrate APIs for data access and exchange.
- Testing and Debugging: Skills in writing unit tests and debugging PySpark applications.
- Automation and Orchestration: Experience with workflow automation tools like Apache Airflow.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.