Develop Services to enable data ingestion from and synchronization with system which exposes required data access mechanisms ensuring near-real-time updates
Ingest data from multiple sources using pysaprk AWS Glue jobs and any other cloud ELT pipelines
Design and implement an event-driven architecture using AWS Event Bridge, Kafka, or SNS/SQS for real-time data streaming
Design, implement, and maintain scalable data pipelines that integrate both on-prem and AWS cloud environments.
Develop efficient Python scripts and applications using libraries like pandas, NumPy, etc., to handle and process large datasets.
Work with various NoSQL databases (e.g., MongoDB, Cassandra, DynamoDB) to support high-performance data storage and retrieval.
Develop and deploy applications in a cloud-native architecture, leveraging modern cloud technologies for scalability and resilience.
Continuously monitor data workflows and systems, troubleshoot issues, and optimize performance for reliability and scalability transition existing pipeline to MSSQL server
Experience updating Terraform scripts to add new resources, modify existing infrastructure, or optimize configurations.
Ability to collaborate with DevOps and infrastructure teams to ensure infrastructure changes meet operational and security standards.
Expertise in writing, optimizing, and debugging complex SQL queries to support data extraction, transformation, and loading processes.
Skilled in identifying and resolving performance bottlenecks in SQL scripts to ensure efficient data processing.
Collaborate with the business application owner on the existing data architecture, including data ingestion, data pipelines, business logic, data consumption patterns, and analytics requirements
Design and document the target data architecture, pipelines, processing and analytics architecture
Identify opportunities for optimization and consolidation
Collaboration with data team on decomposition of business logic and data transformation