This is a full-time, direct-hire Data Engineer role focused on Microsoft Fabric and Azure data engineering, based about 20 minutes north of Downtown Pittsburgh with a hybrid schedule and local residency required.
-
Design, develop, and maintain scalable, production-grade data pipelines and integrations using Microsoft Fabric, Azure Data Factory, Fabric Data Factory, Azure Databricks, Azure Event Hubs, OneLake, Fabric Lakehouse, and Fabric Data Warehouse.
-
Build analytics-ready datasets to support pricing, supply chain, POS sales, customer behavior analytics, executive dashboards, and AI/ML workloads.
-
Implement dual-engine data pipelines leveraging Azure Data Factory for structured batch workloads and Azure Event Hubs / Kafka for real-time event ingestion.
-
Support multiple ingestion patterns including batch ETL/ELT, CDC/database mirroring, streaming ingestion, API-based integrations, and SaaS connectors.
-
Develop near real-time analytics solutions using Eventstream and Real-Time Intelligence capabilities in Microsoft Fabric.
-
Design and optimize PySpark workloads in Azure Databricks and Fabric Spark to process high-volume historical datasets, XML/JSON log files, streaming transactional events, and operational telemetry data.
-
Build scalable transformation logic that supports both streaming and batch architectures.
-
Model and transform enterprise data using ANSI SQL, T-SQL, dbt, and Lakehouse design principles.
-
Design star and snowflake schemas, semantic models, and curated analytical datasets to enable governed self-service analytics across the organization.
-
Maintain and optimize Azure Data Lake Storage Gen2 environments, including Delta Lake formats, ACID-compliant patterns, schema evolution, partitioning, and performance tuning.
-
Support enterprise Lakehouse architecture leveraging Microsoft Fabric OneLake.
-
Partner closely with Analytics and Business stakeholders to deliver Power BI dashboards, executive scorecards, KPI reporting, and self-service analytics solutions, including semantic models, Direct Lake datasets, row-level security, and data governance standards.
-
Enable Copilot-driven analytics and AI-assisted reporting capabilities on top of governed datasets.
-
Deploy and manage cloud infrastructure using Terraform, Azure Resource Manager (ARM), and Infrastructure-as-Code practices.
-
Automate CI/CD workflows for data pipelines and analytics assets using Azure DevOps, Git, and Docker.
-
Orchestrate and schedule enterprise workflows with Azure Data Factory, Fabric Pipelines, Managed Apache Airflow, and Control-M (where applicable).
-
Implement robust data observability, including automated monitoring and alerting for batch failures, streaming interruptions, data quality issues, schema drift, and pipeline latency.
-
Build checksum and reconciliation frameworks between source systems and analytics platforms to support enterprise data governance and operational resiliency initiatives.
-
Bachelor’s or Master’s degree in Data Science, Computer Science, Information Systems, Engineering, Statistics, Mathematics, or a related technical field.
-
Experience delivering and supporting Power BI analytics, including semantic models, Direct Lake datasets, and row-level security.
-
Hands-on work with Fabric Real-Time Intelligence, OneLake, REST APIs, XML/JSON processing, and event-driven architectures.
-
Exposure to AI/ML workloads and tools such as Azure OpenAI, Copilot integrations, and predictive analytics solutions.
-
Experience supporting large-scale enterprise analytics environments with complex operational datasets and strict SLAs.