The Kubernetes Networking Platform Senior Engineer will lead the design, delivery, and operation of networking capabilities across the enterprise Kubernetes platform. This includes critical components such as ingress controllers, service mesh, DNS, and traffic management.
This engineer will join a team responsible for building a secure, scalable, and observable networking layer that enables application teams to seamlessly connect, communicate, and expose services within and outside the cluster. The ideal candidate brings deep experience in distributed systems and networking, and is passionate about building platform abstractions that simplify complexity for developers while maintaining enterprise-grade reliability and security.
We are transforming the way technology is managed at Client . Automation, DevOps, and product-oriented platform engineering are the new standards as we enable rapid innovation, speed-to-market, and resilient operations. As we are early in this journey, strong technical leadership and the ability to influence and elevate others is critical.
CANDIDATE PROFILE
Required:
- Undergraduate degree in an engineering or computer science discipline and/or equivalent experience/certification
- 6+ years of technology experience, including:
- 3+ years in a platform, infrastructure, or systems engineering role
- 3+ years working with public cloud platforms (AWS, Azure, Google Cloud Platform)
- Strong experience with Kubernetes, including:
- Networking fundamentals (CNI, service discovery, load balancing)
- Kubernetes networking primitives (Services, Ingress, NetworkPolicy)
- Hands-on experience with Kubernetes networking components, such as:
- Gateway and Ingress controllers (e.g., kgateway, NGINX, ALB, or similar)
- Service mesh technologies (e.g., Istio, Cilium, or similar)
- DNS systems (CoreDNS, External DNS, or enterprise DNS integration)
- Experience designing and operating highly available, distributed systems (99.99% uptime) with attention to latency, resiliency, and failure modes
- Strong troubleshooting skills across layers (application, network, infrastructure)
- Proven ability to implement Infrastructure as Code and automation using tools such as Terraform, Helm, and GitOps workflows
- Mindset of "automate first", continuously identifying and eliminating manual processes
- Experience working within a platform-as-a-product model, including:
- Treating internal platform capabilities as products
- Gathering feedback from users (application teams)
- Iterating based on adoption and usability
- Strong collaboration habits, including:
- Code reviews as a primary mechanism for quality and knowledge sharing
- Writing clear, user-focused documentation
- Contributing to and evolving engineering standards across teams
- Comfort using AI-powered development tools (e.g., coding assistants, copilots, or similar) to accelerate development, troubleshooting, and documentation
- Ability to critically evaluate AI-generated output, ensuring correctness, security, and alignment with platform standards
Experience leveraging AI tools to:
- Accelerate Infrastructure as Code development
- Troubleshoot complex system and networking issues
- Improve documentation and developer experience
- Strong engineering judgment to determine when to rely on AI vs. when to deep dive manually, especially in complex distributed systems and production incidents
Preferred:
- Experience with one or more high-level programming languages (Go, Python, Java, or similar)
- Deep understanding of:
- Layer 4 / Layer 7 networking concepts
- Traffic routing, load balancing strategies, and API gateway patterns
- mTLS, zero-trust networking, and secure service-to-service communication
- Experience operating service mesh at scale (traffic shaping, retries, circuit breaking, observability)
- Familiarity with cloud-native networking integrations:
- AWS (ALB/NLB, VPC Lattice, Route53, PrivateLink)
- Hybrid or multi-cluster networking patterns
- Experience with observability tooling (metrics, logs, tracing) for network and service performance
- Ability to influence platform adoption and drive best practices across application teams
- Experience supporting or implementing self-healing, resilient infrastructure patterns
CORE WORK ACTIVITIES
Kubernetes & Platform Engineering
- Design, build, and operate Kubernetes networking capabilities including ingress, service mesh, and DNS
- Develop and maintain standardized, self-service networking patterns for application teams
- Implement and manage traffic routing strategies, including canary deployments, blue/green releases, and failover mechanisms
- Ensure secure communication through network policies, mTLS, and zero-trust principles
- Continuously improve platform reliability, scalability, and performance through automation and observability
- Troubleshoot complex networking issues across distributed systems and drive root cause analysis
- Partner with security, platform, and application teams to define and enforce networking standards
- Build tooling and automation to improve developer experience and reduce operational overhead
- Maintain clear and consumable documentation for platform users
- Stay current with emerging trends in Kubernetes and cloud-native networking
Platform Operations, Reliability & Security
- Serve in on call rotation
- Support monitoring, logging, and observability integrations (Prometheus, Grafana, ELK, OpenTelemetry)
- Implement and maintain platform level security controls, including OPA/Gatekeeper, secrets management (e.g., Vault), and IAM guardrails
- Participate in disaster recovery, backup/restore operations, and upgrade cycles
- Review issues, logs, and metrics to identify trends and propose improvements
- Maintain clear and complete documentation for system configurations and operational procedures
Collaboration & Leadership
- Participate in architectural discussions to help application teams make efficient platform decisions
- Provide mentorship to junior engineers and contribute to peer reviews
- Support interviewing and help foster a modern engineering culture
- Collaborate with cross-functional teams, including software engineering, cloud operations, and security
Managing Priorities and Delivery
- Contribute to planning, prioritization, and organization of engineering work to meet delivery timelines
- Provide technical leadership for successful platform feature delivery
- Assist with evaluating vendor solutions and tooling, providing recommendations to leadership
- Communicate technical concepts clearly to stakeholders, both technical and non-technical
- Understand business priorities and contribute to delivering against performance and budget goals
- Perform other reasonable duties as assigned