Overview
Skills
Job Details
Key Responsibilities:
•Design, develop, and optimize scalable data pipelines using Apache Spark, Python, and Hadoop.
•Implement robust data ingestion, transformation, and storage solutions for large-scale datasets.
•Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions.
•Manage and deploy Big Data tools and frameworks including Kafka, Hive, HBase, and Flink.
•Ensure data quality, integrity, and availability across distributed systems.
•Conduct performance tuning and benchmarking of Big Data applications.
•Implement data governance practices including metadata management and data lineage tracking.
•Stay current with emerging technologies and integrate them into the data ecosystem as needed.
Required Qualifications:
•6+ years of experience in software development with a focus on Big Data technologies.
•Strong programming skills in Python.
•Hands-on experience with Hadoop, Spark, Kafka, and NoSQL databases.
•Experience building and maintaining ETL/ELT pipelines.
•Familiarity with cloud platforms (AWS, Google Cloud Platform, or Azure) is a plus.
•Excellent problem-solving and communication skills.
Preferred Skills:
•Experience migrating ETL frameworks from proprietary tools (e.g., Ab Initio) to open-source platforms like Spark.
•Knowledge of machine learning and data analytics tools.
•Experience in financial services or core banking systems is a strong plus.