Design, develop, and maintain ETL pipelines using Pentaho Data Integration (PDI/Kettle).
Develop Python scripts and applications for data processing, automation, and integration tasks.
Build and optimize data pipelines for large datasets from multiple sources such as APIs, databases, and flat files.
Integrate Pentaho ETL workflows with Python-based data processing frameworks.
Perform data transformation, cleansing, and validation to ensure data quality and accuracy.
Work with SQL databases (Oracle, MySQL, PostgreSQL, SQL Server) for data extraction and transformation.
Troubleshoot and optimize ETL jobs and data workflows for performance improvements.
Collaborate with data engineers, BI teams, and business stakeholders to deliver data solutions.
Maintain documentation for ETL processes, workflows, and data architecture.
Implement best practices for data governance, security, and performance optimization.
Strong programming experience in Python (Pandas, NumPy, PySpark is a plus).
Hands-on experience with Pentaho Data Integration (PDI/Kettle).
Experience in ETL development and data integration projects.
Strong SQL skills and experience with relational databases.
Experience with data transformation, data warehousing, and data pipelines.
Knowledge of REST APIs and data ingestion techniques.
Familiarity with Linux/Unix environments.
Strong debugging and performance tuning skills.