Software Engineer

Overview

On Site

$60+

Accepts corp to corp applications

Contract - W2

Contract - 12 Month(s)

Skills

Apache NiFi

aws

python

Amazon Web Services

Apache Spark

Apache Parquet

Big Data

Data Warehouse

Data Lake

Job Details

Face to Face interview is Mandatory

We are seeking a highly skilled and motivated Software Engineer with a background in building and optimizing large-scale data platforms. This role is crucial for designing, developing, and maintaining our next-generation data infrastructure, enabling robust data ingestion, processing, and analytics capabilities critical to Charter Communication's operations.

Detailed Job Duties:

I. Data Platform Development & Engineering:

Data Ingestion & Orchestration (NiFi): Design, develop, and maintain robust and scalable data ingestion pipelines using Apache NiFi to extract, transform, and load data from diverse internal and external sources (e.g., operational databases, APIs, logs, streaming data) into the data lake.

Big Data Processing (Spark): Implement complex ETL/ELT processes and data transformations using Apache Spark (PySpark or Scala) on AWS EMR or similar distributed computing environments, ensuring efficient processing of petabyte-scale datasets.

Data Lake Table Management (Iceberg): Work extensively with Apache Iceberg table format to build and manage large-scale analytical tables within the AWS S3-based data lake. This includes defining table schemas, managing schema evolution, optimizing partition strategies, and leveraging features like time travel for data versioning and recovery.

Data Modeling: Design and implement optimized data models (dimensional, denormalized, star/snowflake schemas) within the Iceberg framework to support various business intelligence, reporting, and analytical workloads.

API Development: Develop and maintain RESTful APIs or data services to expose curated data from the data lake to downstream applications and analytical tools.

II. AWS Cloud Infrastructure & Automation:

Develop and maintain Infrastructure as Code using tools like AWS CloudFormation or Terraform for automated provisioning and management of cloud resources, ensuring consistency and repeatability.

Pipeline Automation: Implement and optimize continuous integration/continuous deployment (CI/CD) pipelines for data applications and infrastructure using tools like Jenkins, GitLab CI/CD, or AWS CodePipeline.

III. Performance Optimization & Troubleshooting:

Performance Tuning: Proactively identify and resolve performance bottlenecks within NiFi data flows, Spark jobs, and Iceberg table queries. This includes optimizing data partitioning, file formats (e.g., Parquet, ORC), memory configurations, and resource allocation.

Monitoring & Alerting: Implement comprehensive monitoring, logging, and alerting solutions (e.g., AWS CloudWatch, Prometheus, Grafana) for data pipelines and infrastructure to ensure proactive issue detection and resolution.

Troubleshooting & Root Cause Analysis: Diagnose and resolve complex data pipeline failures, data quality issues, and infrastructure problems, performing root cause analysis to implement long-term preventative measures.

IV. Data Governance, Quality & Security:

Data Quality: Implement data validation rules and mechanisms within NiFi and Spark pipelines to ensure the accuracy, completeness, and consistency of data entering and residing in the data lake.

Security & Compliance: Adhere to and implement security best practices (e.g., IAM roles, encryption, least privilege) and data governance policies to protect sensitive data within AWS and across all data processing stages.

Documentation: Create and maintain clear, concise technical documentation for data pipelines, data models, architectural designs, and operational procedures.

Required Qualifications:

3+ years of professional experience as a Software Engineer

Hands-on experience with AWS cloud services

Proven experience designing, developing, and managing data flows with Apache NiFi.

Practical experience working with Apache Iceberg for building and managing data lake tables.

Solid understanding of data warehousing concepts, data lakes, ETL/ELT processes, and data modeling.

Proficiency in at least one programming language like Python or Scala.

Experience with version control systems (e.g., Git).

Excellent problem-solving, analytical, and communication skills.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Tetra Computing

Share