Overview
Skills
Job Details
SRE Key Responsibilities
Design and manage multi-account AWS infrastructure (VPC, Route Tables, EC2, ECS, EKS 1.33, RDS, DynamoDB, Elastic ache Valley, S3, Transit Gateway, Resource Access Manager, Lambda, CloudFormation, AWS Backup)
Configure load balancing and traffic management (ELB, NLB, Target Groups with gRPC, Route53, Global Accelerator, CloudFront)
Implement security and compliance controls (IAM, IAM Identity Center, SCP, = Guard Duty, WAF, CloudTrail, ACM, Secrets Manager, OKTA integration)
Manage Cloudflare infrastructure (Zero Trust, Argo Smart Routing, DNS, Workers, Load Balancer, Bot Management, WAF, Rules & Policies, Cache)
Manage S3 with Access Policies, Lifecycle Policies, S3 Storage Lens optimization, and cross-region replication
Operate messaging and notification services (SNS, SES, SQS)
Architect and manage multi-cluster EKS environments with HA and cross-region DR
scenarios using Istio service mesh, Network Policies, Karpenter, HPA, KEDA, Argo CD, Argo Rollouts
Implement and maintain Argo CD for multi-cluster application management with HA and cross-region DR configurations
Configure Argo CD Application Sets for managing applications across multiple EKS clusters
Implement ECR with global cross-region replication for container image distribution and disaster recovery
Implement Aurora Global Database for cross-region DR, manage Aurora RDS (MySQL and PostgreSQL) and standalone MySQL/PostgreSQL instances for development
Design and maintain RDS cross-region replication, automated backups, failover strategies, and upgrade procedures
Establish and maintain DevOps practices including change management, release management and deployment strategies
Build resilient CI/CD pipelines with cross-region artifact replication, automated testing, and failover capabilities
Develop and maintain GitHub Actions shared internal workflows and reusable actions for standardized deployments
Implement change approval workflows, deployment gates, and release coordination
processes
Implement Cross plane for automated feature environment creation, upgrades, and AWS resource provisioning
Deploy applications using Helm, Customize with Overlay Patches, Json net, and Cross plane for infrastructure orchestration
Maintain platform operators (External DNS, External Secrets, Reloader) and custom CRDs
Build comprehensive observability stack & Dashboards (Grafana, Thanos/Prometheus, Loki, Alert manager, Open Telemetry Alloy/Tempo/Beyla/Pyro scope)
Configure exporters (Blackbox, MySQL, Redis, YACE CloudWatch, Cloudflare, Node Exporter, Prometheus Push Gateway)
Support data platforms (Kafka/Kafka UI, Minion, Airflow, JupyterHub, DASK, Superset, Imply, AWS Glue, Athena, Quick Sight, Bedrock)
Optimize CI/CD with GitHub Actions, Actions Runner Controller (ARC), runs-on.com, GitHub Rulesets
Manage mobile app delivery pipelines (Unity Build Management, Fastlane, Google Play Developer, Apple Developer/Enterprise, Applivery)
Implement and maintain all infrastructure using Terraform/Open Tofu with Scalr, backporting existing resources into code
Automate operational tasks wherever possible; create comprehensive runbooks for no automatable procedures
Conduct thorough post-mortem analysis after incidents, documenting learnings and implementing preventive measures
Drive cost optimization initiatives using S3 Storage Lens, CloudWatch metrics, rightsizing recommendations, and resource lifecycle management
Develop automation in Bash, Python, Go, C#/.NET (Unity Game Engine)
Maintain developer experience (Backstage, Click Up, Miro, Shared GitHub Action/Workflows)
Integrate monitoring and alerting (PagerDuty, Cronitor, Wiz, CloudWatch)
Core Expertise:
Multi-account AWS architecture with Transit Gateway, Resource Access Manager, VPC design, and Route Tables
Kubernetes/EKS high availability with cross-region disaster recovery scenarios
Multi-cluster EKS management with service mesh (Istio), autoscaling (Karpenter, KEDA), GitOps (Argo CD)
Argo CD enterprise deployment for multi-cluster application management with HA and cross-region DR
Argo CD Application Sets, app-of-apps patterns with Helm, and cluster management strategies
ECR global cross-region replication strategies for container image distribution and DR
Cloudflare enterprise features (Zero Trust, Argo Smart Routing, DNS management, Workers, Load Balancer, Bot Management, Cache optimization, WAF Rules & Other Security Policies)
Aurora Global Database implementation and management for cross-region DR
Aurora RDS (MySQL and PostgreSQL engines) and standalone MySQL/PostgreSQL instance management
RDS cross-region replication, automated failover, disaster recovery, and version upgrade strategies
DevOps best practices including change management, release management, and deployment coordination
Resilient CI/CD pipelines with automated testing, cross-region artifact distribution, and failover
GitHub Actions shared workflows and reusable actions development for internal use
Cross plane for Kubernetes-native infrastructure provisioning, feature environment automation, and upgrade orchestration
Expert-level Terraform/Open Tofu with enterprise policy management (Scalr)
Infrastructure backporting and migration from ClickOps to IaC
Complete observability stack (Prometheus, Grafana, Loki, Open Telemetry, distributed tracing)
Data pipeline orchestration (Kafka, Airflow) and analytics platforms (Superset, Imply)
GitHub Actions with self-hosted runners (ARC, runs-on.com)
Proficiency in Python, Bash, Go, and C#/.NET for automation development
Security implementations (IAM, SCP, OKTA, WAF, Guard Duty, Wiz)
Mobile CI/CD (Unity, Fastlane, Apple/Google distribution & Applivery during Development)
Disaster recovery planning, testing, and automation (AWS Backup, cross-region strategies)
AI/ML infrastructure experience (AWS Bedrock)
Cost optimization strategies and Quick Sight for AWS Cost Review
Post-mortem facilitation and blameless incident analysis
Runbook creation and maintenance for operational procedures
Technical Skills:
Container orchestration with advanced networking and progressive delivery
Infrastructure as Code and GitOps methodologies with automation-first mindset
Change management workflows, approval gates, and release orchestration
CI/CD pipeline design with automated testing, security scanning, and deployment strategies
Incident response, on-call management, post-mortem analysis, DR execution
Cross plane composition design and custom resource definitions
Custom CRD and operator development in Kubernetes
Event-driven architecture (Lambda, SQS, SNS, SES)
Real-time analytics and BI platforms
Developer portal management (Backstage)
Multi-region failover automation and orchestration
Cost analysis and optimization using native AWS tools
Automation of repetitive operational tasks
Technical documentation and runbook authoring
Database performance tuning and optimization (Aurora, MySQL, PostgreSQL)
Argo CD backup, restore, and disaster recovery procedures
Cloudflare Workers development & deployment using Wrangler
Soft Skills: Strong troubleshooting, cross-functional communication, self-directed, documentation-focused, cost-conscious, continuous improvement mindset
ACI (Advanced Computing International) is a Global Technology Services, Products & Solutions Company focused on designing and delivering the next generation applications and digital experiences for businesses and consumers. We specialize in Big Data & Analytics, Digital Transformation, IT Service Management, Cognitive Solutions, Artificial Intelligence, IOT & Future Networks, DevOps, Enterprise Applications & Managed Infrastructure Services & Industry Specific Solutions.
Leveraging the insights gained from working on innovative solutions and disruptive technologies, ACI develops Solutions to enhance business performance, accelerate product & applications time-to-market, harmonize Consumer Experiences and streamline their business operations. ACI works with clients across different business sectors: Financial Services, Healthcare, Manufacturing, Hi-Tech, Media, Utilities, Public sector, Retail, Telecom, E-commerce & Logistics, and Higher Education. ACI s core DNA is built on Innovation and co-existence to build a collaborative ecosystem where companies and consumers win.