Principal Data Platform Engineer
New Iron is leading the search for a Principal Data Platform Engineer to join a Fortune 500 company specializing in industrial manufacturing.
This is a direct-hire remote role for a customer based in Charlotte, NC. Remote candidates are strongly encouraged to apply.
You will join a core platform development team designing and implementing high-performance data pipelines using cutting edge technologies.
Your primary responsibility will be to architect reliable large scale data ingestion pipelines that land inbound data from various data stores throughout our client’s on-premise and cloud-based data lakes.
You will join a development team supporting advanced analytics projects by requiring data validation, data profiling automation, and CI/CD to ensure maintainability of inbound data flows.
If you are an expert on query languages and data cleansing, and have experience working on large scale data engineering projects, we'd love to talk to you!
Responsibilities may include:
- Design and implement highly performant data ingestion pipelines from landed to cleansed, batch to streamed unstructured data using Apache Spark
- Verify ingestion pipelines built for inbound data reliably, identify unusual data conditions and able to initiate applicable amends actions when needed
- Verify pipelines are built for architecturally and operationally integrated with outbound engineering, data contextualization, and production pipelines
- Work with cross data domain specialist that understand the delivery of their data, collaborate to collect, land and prepare data at scale
- Deliver and present proofs of concept implementations to stakeholders and wide company functional teams that are interested to support your code for their own projects
- Participate in code reviews and improve software engineering standards and best practices and share knowledge with peers
- 10+ years of experience with software development
- Full stack experience developing large scale distributed systems
- Experience with modern JVM language such as Java, Scala
- Experience with Python
- Expert with Agile development; CI/CD environment
- Experience developing and maintaining ETL and ELT pipelines for Data Warehousing (On-prem and Cloud)
- Experience with production using SQL and DDL
- Strong Hands on with Spark core architecture e.g S3, parquet and Delta Lake architecture, similar technologies and tools
- Expert with Apache Spark platform for developing batch, micro-batch and streaming ingestion pipelines and leveraging all levels of the API (e.g., SparkContext, DataFrames, DataSets, GraphFrames, SparkSQL, SparkML, Scala, PySpark)
- DevOps experience with AWS services
- Experience with Terraform, CloudFormation, Git, Jira etc.
- Expert using query languages (e.g., SQL or Spark SQL)
- Expert with Data Cleansing tools
- Expert with traditional relational and polyglot persistence technologies
- Values and prioritizes well designed, testable, extensible and maintainable code
- Excellent technical communication, collaboration, time-management skills
Nice to Have:
- Master degree in Computer Science or a related field
- Past experience with full stack app development ( front/backend and microservices)
- Strong experience with Kubernetes and Docker
- Expert with AWS services such as S3, EC2, DMS, RDS, RedShift, DynamoDB, CloudTrail,EKS, IAM and CloudWatch
- Expert with data management fundamentals and data storage
- Familiarity with Oracle, Microsoft SQL, SSIS, SSRS
- Familiarity with enterprise ETL and integrations tools such as Informatica, Mulesoft
- Familiarity with open source data integration and DAG tools such as NiFi, Airflow, Streamsets, etc.
- Familiarity with Data sources and integration solutions ( used in manufacturing enterprises) such as Maximo, Pi Integrator, etc.
- Reporting and analysis tools such as PowerBI, Tableau, etc.
Candidates must be authorized to work in the United States on a full-time basis for any employer. Principals only.Recruiters, please do not contact this job poster.