Overview
On Site
Accepts corp to corp applications
Contract - W2
Skills
Cribl workers
Vector agents
NiFi processors
Job Details
Lead the architecture and implementation of a comprehensive observability strategy across the entire SIEM modernization ecosystem, spanning data pipeline layers (Cribl, Vector, NiFi), event transport (Event Hubs), intermediate storage (Blob), and multiple downstream platforms (Splunk, Snowflake, ADX, Log Analytics, Anvilogic).
Design and build end-to-end telemetry and traceability for data events as they move across platforms, enabling real-time visibility into ingestion, transformation, routing, and storage processes.
Develop and maintain dashboards and alerting mechanisms to detect:
o Faults and failures (e.g., dropped messages, ingestion lags, retry loops)
o Latency or throughput bottlenecks across pipelines
o Schema mismatches or format errors
o Duplicate, delayed, or missing data
o Data quality anomalies at point of ingestion and final storage
Instrument each pipeline component (e.g., Cribl workers, Vector agents, NiFi processors) with health and performance metrics, using native exporters, APIs, or custom collectors.
Ensure observability tooling is in place for Azure Event Hubs, including partition health, consumer group lag, and throttling events.
Monitor Blob storage utilization and access patterns to identify ingest failures, access permission issues, or object lifecycle gaps.
Implement and enforce correlation IDs or tracing metadata to follow data across systems and detect where in the pipeline an issue originates.
Integrate monitoring solutions with Grafana, Azure Monitor, and PowerBI to support multiple stakeholder needs (technical, operational, and executive-level views).
Partner closely with Security Engineering, Platform Engineering, and Data Engineering to ensure observability insights are actionable and result in measurable improvements.
Design and build end-to-end telemetry and traceability for data events as they move across platforms, enabling real-time visibility into ingestion, transformation, routing, and storage processes.
Develop and maintain dashboards and alerting mechanisms to detect:
o Faults and failures (e.g., dropped messages, ingestion lags, retry loops)
o Latency or throughput bottlenecks across pipelines
o Schema mismatches or format errors
o Duplicate, delayed, or missing data
o Data quality anomalies at point of ingestion and final storage
Instrument each pipeline component (e.g., Cribl workers, Vector agents, NiFi processors) with health and performance metrics, using native exporters, APIs, or custom collectors.
Ensure observability tooling is in place for Azure Event Hubs, including partition health, consumer group lag, and throttling events.
Monitor Blob storage utilization and access patterns to identify ingest failures, access permission issues, or object lifecycle gaps.
Implement and enforce correlation IDs or tracing metadata to follow data across systems and detect where in the pipeline an issue originates.
Integrate monitoring solutions with Grafana, Azure Monitor, and PowerBI to support multiple stakeholder needs (technical, operational, and executive-level views).
Partner closely with Security Engineering, Platform Engineering, and Data Engineering to ensure observability insights are actionable and result in measurable improvements.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.