Overview
Skills
Job Details
Role: Cloud Data Engineer
Location: 5 days onsite at McLean VA
Duration: 6+ Months Contract
Job Description:
In this project raw data about Loans and Public Finances is collected from various sources and stored on Amazon S3 in Raw format. Multiple ETL process are used to ingest data from S3 location to target Snowflake.
PySpark, Talend, IICS, Python and Shell Scripts are used to perform ETL operations on data in the database and stored in Snowflake warehouse for consumers. BMC control-M tool is used for orchestration of ETL jobs involved in data migration pipeline. Jenkins Pipeline and bitbucket are used to store and promote code to higher environments.
Modernized Data Ingestion: Actively contributed to migrating legacy flat file ingestion processes to the Informatica Intelligent Cloud Services (IICS) framework.
Pioneered API Integration: First to design, develop, and implement an IICS data ingestion pipeline utilizing external API calls.
Control-M Orchestrations: Designed and developed Control-M orchestration workflows for multiple use cases, including IICS, BYOL, File Watcher, Stored Procedures on Snowflake, Talend, and OS commands/scripts.
Snowflake Integration: Developed Snowflake SQL stored procedures to generate and load data for DMC into Splunk dashboards.
Performance Optimization: Enhanced performance of data ingestion by optimizing stored procedures, achieving a 50% improvement.
IICS and Snowflake Integration: Built multiple IICS workflows to execute stored procedures within Snowflake environments.
Problem Solving and Innovation: Identified and implemented workarounds to overcome platform limitations in Control-M and IICS.
Involved in testing using automated framework
Experienced in managing manual deployment activities, including operating change tickets in ServiceNow, coordinating with deployment teams, and maintaining code baselines. Proficient in ensuring seamless deployment processes and artifact management to support development and production environments.
Performed load and retrieved unstructured data.
Experienced in working with Amazon Web Services (AWS) EC2 for computing and S3 as storage mechanism, step-functions for monitoring.
Developed AutoSys jobs for running ingestion shell scripts.
Experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
Responsible for managing and reviewing Hadoop log files.
Followed agile methodology, interacted directly with the client provided & receive feedback on the features, suggest/implement optimal solutions, and tailor application to customer needs.
Optimized data pipeline performance by 30% through updating efficient Python modules, enhancing data ingestion into Snowflake.
Collaborated with cross-functional teams to design and develop ETL workflows using Talend and PySpark, ensuring accurate data transformation and storage in Snowflake.
Designed and executed BMC Control-M jobs to automate and monitor ETL processes, ensuring high reliability and efficiency in data migration pipelines.
Pioneered the integration of Jenkins Pipeline for automated code deployment, reducing manual intervention by 50%.
Actively involved in troubleshooting and resolving production issues to ensure system stability.