Position : AWS Architect
Location: Dallas, TX
Role overview
The Senior AWS Cloud/Infrastructure Engineer/Architect will own the design, implementation, and operation of largescale, secure, and highly available platforms on AWS, with a strong focus on eventdriven architectures, streaming (MSK/MSF), caching data stores, data lake/table formats such as Iceberg, observability with OpenTelemetry, and container orchestration on EKS.
This role is handson and customerfacing, partnering with architecture, platform, data, and product teams to build scalable foundations and enable highvelocity delivery.
Key responsibilities
- Design and implement cloudnative architectures on AWS using services such as VPC, EC2, EKS, S3, RDS/Aurora, IAM, CloudWatch, and KMS, following WellArchitected and security best practices.
- Lead the design and operation of eventdriven systems using Amazon MSK (Managed Streaming for Apache Kafka) and/or managed streaming frameworks (e.g., Kinesis/Kafkabased MSF), including topic design, partitioning, consumer groups, schema evolution, and backpressure handling.
- Architect and manage caching layers and inmemory data stores (e.g., Amazon ElastiCache for Redis/Memcached or similar) to improve performance, reduce latency, and offload downstream databases.
- Implement and support data lakehouse patterns using Apache Iceberg or similar table formats on object storage (e.g., S3), including table design, partitioning, schema evolution, and performance optimization for analytical and nearrealtime workloads.
- Design, provision, and operate Kubernetes clusters on Amazon EKS, including node groups, autoscaling, networking, ingress, service mesh (where applicable), secrets management, and multienvironment separation.
- Implement fullstack observability using OpenTelemetry (traces, metrics, logs), integrating with centralized telemetry backends, defining SLOs/SLIs, and enabling deep visibility into distributed, eventdriven workloads.
- Build and maintain InfrastructureasCode (IaC) using tools such as Terraform and/or AWS CloudFormation, enforcing reusable modules, environment parity, and Gitbased workflows.
- Establish and enhance CI/CD pipelines for infrastructure and application deployments on AWS/EKS/MSK, including automated testing, security scans, canary/bluegreen releases, and rollback strategies.
- Ensure platform security, compliance, and governance, including IAM roles and policies, network segmentation, encryption in transit/at rest, secrets management, and audit logging.
- Monitor and optimize cost, performance, and resilience of AWS environments; drive capacity planning, rightsizing, and architectural improvements for high availability and disaster recovery.
- Troubleshoot complex production incidents across EKS, MSK, event pipelines, caching tiers, and data platforms, driving root cause analysis and longterm remediation.
- Mentor engineers, champion engineering best practices, and collaborate with architects and product teams to align platform roadmaps with business goals.
Required skills and experience
- 10+ years of handson experience in cloud engineering, infrastructure engineering, or platform/SRE roles, with at least 5+ years focused primarily on AWS.
- Strong expertise with core AWS services: VPC, IAM, EC2, EKS/ECS, S3, RDS/Aurora, CloudWatch/CloudTrail, KMS, and networking (subnets, routing, security groups, NACLs, load balancers).
- Proven production experience with Amazon MSK or equivalent Kafkabased managed streaming platforms (MSF), including cluster operations, capacity planning, security, and observability.
- Practical experience with eventdriven and streaming architectures (e.g., Kafka/Kinesis + consumers, stream processing, CQRS, pub/sub patterns) in missioncritical systems.
- Handson experience with caching data stores and distributed caches (e.g., Redis, Memcached, ElastiCache), including eviction strategies, key design, and cacheaside/writethrough patterns.
- Experience implementing or operating data lake or lakehouse solutions on S3 or similar, using Apache Iceberg or comparable table formats (e.g., Delta Lake, Hudi), and integrating with analytics/processing engines.
- Strong Kubernetes and EKS background, including cluster lifecycle management, Helm or similar packaging, autoscaling, network policies, and container security baselines.
- Deep understanding of observability, distributed tracing, and telemetry; handson with OpenTelemetry SDKs/collectors and integration into logging/metrics/tracing backends.
- Proficiency with IaC tools such as Terraform and/or CloudFormation, plus strong Git and DevOps practices around code review, branching, and automated testing.
- Solid scripting or programming skills (e.g., Python, Bash, Go, or similar) for automation, tooling, and glue code around AWS, MSK, EKS, and observability stacks.
- Strong knowledge of security, networking, and compliance in cloud environments, including leastprivilege IAM, network isolation, certificate management, and secrets rotation.
- Excellent communication and stakeholder management skills, with experience collaborating in crossfunctional teams and mentoring engineers at midlevel and below.
Nicetohave qualifications
- Experience with service meshes (e.g., Istio, Linkerd) on EKS for traffic management, mTLS, and advanced observability.
- Exposure to bigdata/analytics ecosystems around Iceberg or similar (e.g., Spark, Flink, Trino, Athena, Glue, EMR) and streaming ETL pipelines.
- Handson experience with additional managed streaming services (e.g., Amazon Kinesis, Azure Event Hubs, Google Cloud Platform Pub/Sub) in multicloud or hybrid environments.
- AWS certifications such as AWS Certified Solutions Architect Professional, DevOps Engineer Professional, or specialty certifications in Security or Advanced Networking.
- Prior experience in SRE, platform engineering, or reliabilityfocused roles with strong emphasis on SLOs, error budgets, and incident management.
Best Regards,
Deepak Gulia Sr. Talent Acquisition-USA |
| |
100 Campus Drive, Suite 420, Florham Park, NJ 07932 |
Phone : +01 | | ;/p> |
|
| |