Job Title: Victoria Metrics Architect
Remote
Required Qualifications:
VictoriaMetrics — Expert Level
Candidates should have hands-on Victoria Metrics production experience at scale for this role.
· Significant production experience operating Victoria Metrics at scale — VMCluster deployments handling sustained, high-cardinality workloads in live environments. This is the non-negotiable baseline for the role.
· VMCluster internals at depth: the write path from VMInsert through VMStorage replication, the query fan-out and merge behavior of VMSelect, and the performance implications of topology decisions on ingestion throughput and query latency.
· Active time series lifecycle management: how time series are created, sustained, and expired; the relationship between cardinality and memory pressure; and the ability to diagnose and remediate a cardinality explosion in a production environment.
· MetricsQL fluency: advanced aggregation, rollup window semantics, subquery patterns, and query design that reduces load on VMStorage at scale.
· VMAgent at depth: scrape configuration, stream aggregation for edge-side cardinality reduction, rate limiting, deduplication, and write buffering continuity during upstream unavailability.
· VMAuth multi-tenancy: per-tenant routing via VMUser custom resources, token-based authentication, and read/write path segregation.
· VMAlert and VMAnomaly: alerting and recording rule design, anomaly model selection, and integration with enterprise alert dispatch systems.
· Federation design: global query layer architecture, cross-cluster deduplication, and remote_write performance tuning under high-cardinality ingestion at sustained scale.
· Storage architecture: retention modelling, down sampling, backup and restore, and capacity planning for time-series workloads.
· VictoriaMetrics Operator: lifecycle management of all VM custom resource definitions and upgrade strategy on OpenShift.
Red Hat OpenShift — Production Depth
· Substantial Kubernetes experience with a material portion on Red Hat OpenShift in bare-metal or on-premises enterprise environments — not exclusively managed cloud Kubernetes.
· OpenShift security model: Security Context Constraints, Network Policy, namespace RBAC, and the constraints that apply to stateful, high-throughput workloads.
· StatefulSet lifecycle, PersistentVolumeClaim management, and StorageClass selection for write-intensive time-series workloads.
· OCP upgrade path management and the implications for Operator compatibility and cluster monitoring interactions.
· Multi-cluster OpenShift topology: hub and spoke architectures, cross-cluster networking, and remote scrape or remote_write connectivity across cluster boundaries.
· Comfort designing for IPv6 and dual-stack network environments — increasingly common in carrier-grade infrastructure deployments.
GitOps and CI/CD Delivery
· GitOps-native delivery as a professional standard: all platform configuration managed in Git, no manual changes to production cluster state, and a clear promotion gate model from lab through to production.
· ArgoCD at production scale: application hierarchy design, sync policy configuration, health checks for custom resources, and multi-cluster application deployment.
· Kustomize overlay strategy for multi-cluster and multi-tenant deployments — base definitions with environment-specific patches.
· GitLab CI/CD pipeline design: manifest validation, environment promotion gates, and automated operator upgrade pipelines.
· Terraform or equivalent infrastructure-as-code for provisioning supporting platform resources.
Security and Identity
· HashiCorp Vault at production depth: dynamic secrets, Vault Secrets Operator synchronisation, token lifecycle management, and PKI secrets engine integration for certificate issuance.
· Enterprise PKI: TLS certificate lifecycle, automated renewal, and CA distribution to distributed cluster workloads.
· OIDC and OAuth2 integration: platform service authentication via an enterprise identity provider, service account token federation, and the elimination of static credential patterns.
· Zero Trust design as a default: every interface between platform components authenticated and encrypted; no implicit trust between tenants, ingestion sources, or query consumers.
Telecommunications and Network Observability
· Proven experience designing or operating observability platforms for telecommunications infrastructure — 5G core, RAN, transport, or carrier-grade edge environments.
· FCAPS framework alignment: mapping Fault, Configuration, Accounting, Performance, and Security monitoring requirements to metric taxonomies, alerting rules, and operational dashboards.
· Heterogeneous vendor telemetry integration: Prometheus exporter compatibility assessment, OpenMetrics format validation, and labelling standardization across multi-vendor sources.
· Multi-vendor, multi-tenant metrics ingestion design: label isolation strategy, per-vendor cardinality allocation, and data segregation enforcement at the proxy and routing layer.
· Enterprise NOC integration: alert routing design from evaluation engine through to ticketing or event management platforms, deduplication, suppression, and severity mapping.
Alchemy: Transforming Your Professional Vision into Reality
Since our inception in 2013, Alchemy has been dedicated to reshaping organizational performance through innovative IT services. With a vision to empower businesses seeking a transformative edge, we’ve positioned ourselves at the forefront of digitization and software modernization.
Our name reflects our mission: to transmute technology into gold-standard solutions for our esteemed clients. We proudly serve a diverse range of sectors, including IT and ITES, BFSI, Telecom and Media, Automotive, Manufacturing, Energy, Oil and Gas, Real Estate, Retail, Healthcare, and more.
With a global footprint spanning the USA, India, Europe, Canada, Singapore, Japan, and parts of Central and West Africa, we harness a unique blend of competencies, frameworks, and cutting-edge technologies. Together, we drive growth and innovation across industries, helping organizations turn their visions into reality.
Alchemy – Connecting Talent with Opportunities (Diversity, Equity and Inclusion)