Job Title: Senior Data Engineer - AWS
Location: Richardson, TX 75082 (Onsite-Hybrid)
Must Have Skills:
⦁ 5+ years in designing and deploying big data applications and ETL jobs using PySpark APIs/SparkSQL.
⦁ Strong experience with AWS services across multiple domains:
⦁ Collection: Kinesis, DMS
⦁ Storage: S3, RDS, Redshift, DynamoDB
⦁ Analytics & ML: Glue, EMR, Athena, SageMaker, Bedrock
⦁ Compute: EC2, Lambda, ECS
⦁ Security: IAM, KMS, SSE
⦁ Proficiency in SQL and relational databases (Oracle, SQL Server, Teradata); expert-level query tuning.
⦁ Hands-on experience with Python development, REST APIs (AWS API Gateway, Node.js), and CI/CD pipelines using GitHub.
⦁ Familiarity with file formats (JSON, Parquet, Avro) and Linux/Unix shell scripting.
⦁ Exposure to Docker/Kubernetes, Delta Lake APIs, and data quality frameworks.
⦁ AWS certification (Developer Associate or higher) preferred.
Detailed Job Description:
Key Responsibilities:
⦁ Architect and maintain data pipelines using AWS native services (Glue, Kinesis, Lambda, S3, Redshift).
⦁ Design and optimize data models on AWS Cloud leveraging Redshift, RDS, and S3.
⦁ Implement ETL/ELT workflows and PySpark jobs for data ingestion, transformation, and storage.
⦁ Operationalize self-service data preparation tools (e.g., Trifacta) on AWS.
⦁ Conduct performance engineering for large-scale data lakes in production environments.
⦁ Participate in design workshops, provide trade-offs and recommendations for solution architecture.
⦁ Mentor engineers on coding best practices, problem-solving, and AWS service utilization.
⦁ Define code review processes, deployment strategies, and ensure compliance with security standards.
⦁ Collaborate with System Architect and Scrum Master to manage dependencies, risks, and blockers.
⦁ Support test strategy, defect resolution, and root cause analysis during warranty periods.
⦁ Maintain documentation in Confluence and ensure team alignment on standards and practices.