Overview
Skills
Job Details
Hello,
Hope you are doing well.
I have an urgent opening of Grafana Architect at Basking Ridge, NJ with one of our direct client. Please let me know if you are interested in below role.
**Kindly share your updated resume**
Job Title: Grafana Architect
Location: Basking Ridge, NJ
Duration: 3+ Months
We are seeking a seasoned Grafana Architect with strong expertise in designing and implementing observability and monitoring solutions across multi-cloud (AWS, Azure, Google Cloud Platform) and on-premise environments. The ideal candidate will have deep hands-on experience with Grafana, Prometheus, Loki, Tempo, and integrations with various telemetry sources. You will be responsible for end-to-end observability strategy, architectural governance, implementation, and evangelizing best practices across teams.
Key Responsibilities
- Architect and implement scalable observability solutions across hybrid/multi-cloud and on-premise environments using Grafana OSS/Enterprise.
- Define monitoring strategies, SLOs/SLIs, dashboards, alerts, and reporting mechanisms for infrastructure, applications, and services.
- Integrate Grafana with Prometheus, Loki, Tempo, InfluxDB, Elasticsearch, cloud-native tools (e.g., AWS CloudWatch, Azure Monitor, Google Cloud Platform Operations Suite), and on-prem systems.
- Lead design and implementation of custom plugins, data sources, and dashboards for cross-platform observability.
- Build and standardize templates, alerting rules, and RBAC models within Grafana Enterprise.
- Collaborate with DevOps, SRE, Cloud, and App teams to define observability needs and onboard them into the platform.
- Define and implement monitoring as code (MaC) practices using Terraform/Ansible for observability infrastructure.
- Govern and optimize telemetry collection (logs, metrics, traces) for performance, cost, and usability.
- Lead capacity planning, HA/DR design, performance tuning, and upgrades for Grafana stack.
- Provide thought leadership on OpenTelemetry, distributed tracing, log aggregation, and AIOps capabilities.
- Conduct training, documentation, and internal community engagement around observability tools.
Required Skills & Experience
- 5+ years of hands-on experience with Grafana, including dashboard design, plugin development, and user management.
- Strong expertise with Prometheus, Loki, Tempo, Alertmanager, and OpenTelemetry.
- Proven experience designing multi-cloud (AWS, Azure, Google Cloud Platform) observability frameworks.
- Experience integrating with on-premise systems (e.g., vSphere, bare-metal monitoring, SNMP, legacy tools).
- Hands-on with Terraform, Helm, Ansible, GitOps practices for monitoring infrastructure.
- Strong scripting and automation skills (Python, Bash, etc.).
- In-depth knowledge of monitoring standards, telemetry formats (Prometheus metrics, OTLP, JSON logs).
- Proficient in SRE principles (SLOs, SLIs, error budgets, alerting strategy).
- Experience with RBAC, LDAP/SAML integration, Grafana Enterprise features.
- Strong troubleshooting skills in distributed systems and observability pipelines.
- Excellent communication, stakeholder management, and leadership skills.
Nice to Have
- Experience with AIOps/ML-based anomaly detection in observability.
- Knowledge of security and compliance considerations in monitoring (e.g., SOC2, PCI).
- Exposure to SIEM tools like Splunk, Chronicle, or Elastic Security.
- Experience with Kafka, Fluent Bit, Vector, or similar log forwarding pipelines.
Certifications (Preferred)
- Grafana Certified Observability Professional
- AWS/Google Cloud Platform/Azure Solution Architect Associate or Professional
- Certified Kubernetes Administrator (CKA)
Regards,
Akash Gangwar
Veridian Tech Solutions, Inc.
11931 Wick Chester Lane Suite 150
Houston, TX, 77043