Job Title: Sr.
Experience Level – 13-15+ Years
Location: Fort Mill, SC/NY/Austin, TX (Hybrid 3-4 days on-site)
Contract
Role Summary:
We are seeking a Mid–Senior Data Engineer with strong expertise in AWS-based data engineering, real-time streaming technologies, and enterprise-grade data quality frameworks. The ideal candidate will design, build, and optimize scalable batch and streaming data pipelines, implement robust data validation and monitoring processes, and support mission-critical analytics platforms.
Key Responsibilities:
• Develop and maintain scalable ETL/ELT pipelines using AWS Glue, PySpark, and Python
• Build event-driven workflows using AWS Lambda
• Design and manage real-time streaming solutions using Kafka, KSQL, and Apache Flink
• Implement and enforce comprehensive data quality frameworks, including validation, profiling, monitoring, and reconciliation
• Optimize data processing performance, scalability, reliability, and cost in cloud environments
• Collaborate with cross-functional teams to deliver reliable, production-grade data platforms and ensure data integrity across the pipeline
Required Skills:
• Strong hands-on experience with Python and PySpark
• Proven expertise in AWS Glue, Lambda, and other cloud-native data services
• Solid experience with the Kafka ecosystem (topics, partitions, consumer groups, streaming patterns)
• Demonstrated experience building and supporting data quality frameworks (validation rules, reconciliation checks, profiling, anomaly detection)
• Strong understanding of distributed data processing and scalable architecture patterns
Good-to-Have Skills:
• Experience with Apache Flink for real-time stream processing and stateful computations
• Knowledge of KSQL or other streaming SQL engines
• Exposure to CI/CD pipelines, IaC (Terraform/CloudFormation), and DevOps practices
• Familiarity with data lake/lakehouse architectures and table formats such as Iceberg, Delta, or Hudi
• Experience working in enterprise or financial data environments