Overview
Skills
Job Details
Face to Face interview is Mandatory
We are seeking a highly skilled and motivated Software Engineer with a background in building and optimizing large-scale data platforms. This role is crucial for designing, developing, and maintaining our next-generation data infrastructure, enabling robust data ingestion, processing, and analytics capabilities critical to Charter Communication's operations.
Detailed Job Duties:
I. Data Platform Development & Engineering:
Data Ingestion & Orchestration (NiFi): Design, develop, and maintain robust and scalable data ingestion pipelines using Apache NiFi to extract, transform, and load data from diverse internal and external sources (e.g., operational databases, APIs, logs, streaming data) into the data lake.
Big Data Processing (Spark): Implement complex ETL/ELT processes and data transformations using Apache Spark (PySpark or Scala) on AWS EMR or similar distributed computing environments, ensuring efficient processing of petabyte-scale datasets.
Data Lake Table Management (Iceberg): Work extensively with Apache Iceberg table format to build and manage large-scale analytical tables within the AWS S3-based data lake. This includes defining table schemas, managing schema evolution, optimizing partition strategies, and leveraging features like time travel for data versioning and recovery.
Data Modeling: Design and implement optimized data models (dimensional, denormalized, star/snowflake schemas) within the Iceberg framework to support various business intelligence, reporting, and analytical workloads.
API Development: Develop and maintain RESTful APIs or data services to expose curated data from the data lake to downstream applications and analytical tools.
II. AWS Cloud Infrastructure & Automation:
Develop and maintain Infrastructure as Code using tools like AWS CloudFormation or Terraform for automated provisioning and management of cloud resources, ensuring consistency and repeatability.
Pipeline Automation: Implement and optimize continuous integration/continuous deployment (CI/CD) pipelines for data applications and infrastructure using tools like Jenkins, GitLab CI/CD, or AWS CodePipeline.
III. Performance Optimization & Troubleshooting:
Performance Tuning: Proactively identify and resolve performance bottlenecks within NiFi data flows, Spark jobs, and Iceberg table queries. This includes optimizing data partitioning, file formats (e.g., Parquet, ORC), memory configurations, and resource allocation.
Monitoring & Alerting: Implement comprehensive monitoring, logging, and alerting solutions (e.g., AWS CloudWatch, Prometheus, Grafana) for data pipelines and infrastructure to ensure proactive issue detection and resolution.
Troubleshooting & Root Cause Analysis: Diagnose and resolve complex data pipeline failures, data quality issues, and infrastructure problems, performing root cause analysis to implement long-term preventative measures.
IV. Data Governance, Quality & Security:
Data Quality: Implement data validation rules and mechanisms within NiFi and Spark pipelines to ensure the accuracy, completeness, and consistency of data entering and residing in the data lake.
Security & Compliance: Adhere to and implement security best practices (e.g., IAM roles, encryption, least privilege) and data governance policies to protect sensitive data within AWS and across all data processing stages.
Documentation: Create and maintain clear, concise technical documentation for data pipelines, data models, architectural designs, and operational procedures.
Required Qualifications:
3+ years of professional experience as a Software Engineer
Hands-on experience with AWS cloud services
Proven experience designing, developing, and managing data flows with Apache NiFi.
Practical experience working with Apache Iceberg for building and managing data lake tables.
Solid understanding of data warehousing concepts, data lakes, ETL/ELT processes, and data modeling.
Proficiency in at least one programming language like Python or Scala.
Experience with version control systems (e.g., Git).
Excellent problem-solving, analytical, and communication skills.