Senior Cloud Platform Engineer
Overview
Senior Cloud Platform Engineer responsible for designing, building, and operating scalable, secure, and cost-efficient cloud infrastructure. Focus on platform enablement, infrastructure automation, CI/CD, event-driven streaming, and operational excellence across multi-account AWS environments empowering engineering teams through self-service tooling and reliable, well-governed cloud foundations.
Key Responsibilities
Design and operate scalable, highly available, multi-account cloud infrastructure (AWS)
Build and maintain Infrastructure-as-Code modules and standards using Terraform
Develop reusable platform patterns, landing zones, and golden paths for engineering teams
Optimize and operate CI/CD pipelines (Jenkins, GitHub Actions, Harness)
Enable developer self-service and reduce manual intervention through automation
Manage Kubernetes platforms (EKS) networking, scaling, upgrades, and workload onboarding
Operate and support Kafka-based event streaming platforms topics, schemas, connectors, and cluster reliability
Build and integrate REST APIs and self-service tooling to streamline platform workflows
Implement cloud security and governance (IAM, OAuth/OIDC, OKTA, SSL/TLS, secrets management)
Drive cloud cost optimization, capacity planning, and FinOps practices
Implement observability metrics, logging, tracing, alerting, and SLOs
Lead incident response, troubleshooting, and root cause analysis across platform and runtime systems
Partner with application teams to troubleshoot infrastructure, deployment, and runtime issues
Drive continuous improvement using operational insights and user feedback
Enhance documentation, runbooks, and platform usability
Technical Skills
Cloud: AWS (EKS, EC2, VPC/Networking, IAM, S3, RDS, Lambda)
IaC: Terraform (modules, state management, policy-as-code)
CI/CD: GitHub Actions, Harness
APIs & Integration: REST APIs (design, development, integration), Async APIs
Containers & Orchestration: Docker, Kubernetes (EKS)
Event Streaming: Kafka, Confluent (topics, schemas, Kafka Connect, cluster linking)
Monitoring/Observability: Datadog, CloudWatch
Security: OAuth, OIDC, OKTA, SSL/TLS, IAM, secrets management
Programming/Scripting: Java/Python
Key Competencies
Strong troubleshooting and problem-solving across distributed systems
Ability to translate operational issues into durable platform improvements
Systems-thinking approach to reliability, security, and cost
Effective collaboration and technical mentorship across engineering teams
Must Have & Desired Skills: Must Have:
* Cloud: AWS (EKS, EC2, VPC/Networking, IAM, S3, RDS, Lambda)
* IaC: Terraform (modules, state management, policy-as-code)
* CI/CD: GitHub Actions, Harness
* APIs & Integration: REST APIs (design, development, integration), Async APIs
* Containers & Orchestration: Docker, Kubernetes (EKS)
* Security: OAuth, OIDC, OKTA, SSL/TLS, IAM, secrets management
Desired:
* Event Streaming: Kafka, Confluent (topics, schemas, Kafka Connect, cluster linking)
* Monitoring/Observability: Datadog, CloudWatch
* Programming/Scripting: Java/Python
--