Job Description Principal Data Engineer
Location: Johnston, RI
Role Overview
Principal-level Java engineer to design and build enterprise-grade, real-time and batch data processing systems using Java, Spark, Kafka, and Microservices architecture. Strong focus on event-driven pipelines, API development (build + consume), and high-volume streaming platforms.
Key Responsibilities
Architect, design, and implement enterprise-grade Java-based data platforms and distributed processing systems
Build and maintain production-ready Spark applications (Java) for batch and real-time processing
Design and evolve Kafka-based event streaming and ingestion pipelines
Develop and consume REST APIs within microservices architecture
Lead architecture ensuring scalability, reliability, and regulatory compliance
Apply strong object-oriented design and engineering practices
Mentor engineers on performance tuning and production readiness
Design and implement MDM solutions (match, merge, survivorship logic)
Ensure data quality, observability, and system stability
Support production deployments and operational handoffs
Required Skills & Experience
10 12+ years experience in Java/backend or data engineering
Hands-on experience building real-time data pipelines (Kafka, Spark Streaming/Flink)
Solid knowledge of relational databases (Redshift, PostgreSQL, Snowflake) and NoSQL databases (MongoDB or similar)
Strong Kafka and event-driven architecture experience
Strong Microservices experience (Spring Boot, REST APIs)
Experience in API development and API consumption
Hands-on Spark experience (batch and streaming)
Strong SQL and data modeling skills
AWS experience (S3, Glue, EMR, Redshift)
Experience in regulated/data governance environments
CI/CD, Git, Docker/Kubernetes familiarity
Preferred
Scala or Python experience
Talend/DataStage exposure
Data lake experience (Iceberg/Parquet)
Frontend/API integration exposure
Experience supporting large-scale production systems
Mandatory Screening Criteria
Candidates must have hands-on experience building real-time/event-driven data pipelines using Kafka and Spark/Flink, along with strong microservices and API development experience.