Role : Data Architect/Application Architect
Location: Remote (Pittsburgh, PA)
Duration: 6 to 12 Months
MOI : Telephonic & Video
Primary Skills: GIT, Docker, ETL/ELT, Advanced SQL/ KQL, and IaaS
Job Description:
We are seeking an experienced Data Engineer contractor to support our steel manufacturing operations. This individual will design, build, and optimize data pipelines and infrastructure, enabling advanced analytics, process automation, and data-driven decision-making. The Data Engineer will work closely with data scientists, process engineering, and IT teams to ensure data reliability and actionable insights across the manufacturing lifecycle.
Key Responsibilities:
Develop/maintain scalable and reliable data pipelines for industrial data (like real-time streaming, time series, IoT, sensors, MES, ERP systems data)
Integrate data from different sources (databases, clouds, on-premises) and Engineer workflows for efficient ETL/ELT processing and data validation.
Collaborate with architects, data engineers, data scientists, analysts, and business stakeholders to define and deliver solutions.
Collaborate with IT admins, network/security engineers, and cross-functional teams to support stable production operations and troubleshoot infrastructure issues (including managing and integrating IaaC, PaaS, and SaaS solutions).
Capable of managing backlog, supporting QA/testing, and communicating requirements with business stakeholders in the steel manufacturing domain.
Mentoring team members, providing guidance, facilitating skill growth, offering technical coaching, and encouraging best practices across teams via code reviews.
Build and maintain data infrastructure in compliance with data governance and security best practices
Requirements:
Bachelor’s degree in computer science or related fields with 5+ years’ experience as a Data Engineer.
Strong experience in building, maintaining, and optimizing ETL/ELT data pipelines using Python, Pandas, PySpark, and orchestrating workflows like Apache Airflow and the Kedro framework.
Advanced SQL/ KQL query development and optimization across Oracle, MSSQL, and MySQL databases (hosted on-premises or via PaaS offerings).
Developing and consuming Flask-based and Fast API RESTful APIs for data services and integration.
Proficiency in Linux shell scripting for automation and data workflow management.
Experience with DevOps practices, including CI/CD for data pipelines and use of tools such as Git, Docker, and IaaC frameworks for provisioning and deployment.
Hands-on experience deploying solutions across multiple clouds (OCI, Azure, Google Cloud Platform), including the setup of cross-cloud data integration and transfer techniques.
Experience with cloud platforms (OCI, Azure, Google) and big data tools (Spark, Hadoop, Kafka, Databricks)
Understanding data modeling, data profiling, data quality, data lake/warehouse architectures, and data indigestion from operational technologies.
Familiarity with industrial protocols, time-series databases (like OSIsoft PI), and manufacturing data (MES, PLC)
Strong troubleshooting, process automation, and root-cause analysis skills
Skills:
Role Responsibility Area Preferred Tools & Skills
Data Ingestion Pipeline Python, PySpark, Airflow, Kedro, Linux shell scripting
API Development Flask, Fast API, RESTful design
Data Storage & Querying SQL (Oracle, MSSQL, MySQL), KQL (Azure Data Explorer), Bigdata (Hadoop, Oracle BDS), OSIsoft PI
Cloud Integration Multi-cloud platforms (OCI, Azure, Google Cloud Platform), Data Sharing across cloud (Databricks)
Real-Time Data Streaming Kafka, Azure Event Hub, EMQX
Reporting Tools Tableau, OAC, Power BI
Collaboration Wiki, Azure DevOps Boards, MS Office 365
Data Governance & Quality Data profiling/validation tools (Pandas Profiling), SaaS monitoring (e.g., Great Expectations), lineage tracking (Cloud Data Catalogs)