Overview
Skills
Job Details
Hope you are doing great!!! Please have a look at the below requirement and let me know if you are comfortable with the position ASAP with your updated word resume.
Role: Site Reliability Engineer (SRE) with AWS
Location: Richmond, VA - Onsite Only Locals
Contract on W2
We are seeking a highly skilled Site Reliability Engineer (SRE) with deep experience in DevOps, Performance Testing, and AWS (ideally at the AWS Solution Architect level). This individual will play a key role in ensuring the reliability, scalability, and performance of our systems while actively participating in architecture discussions, CI/CD automation, monitoring/alerting, and production incident resolution.
Key Responsibilities:
- Design and implement scalable, resilient cloud-native infrastructure on AWS.
- Own the SRE function: availability, latency, performance, monitoring, emergency response, and capacity planning.
- Collaborate with engineering and product teams to improve system reliability, speed, and performance.
- Set up, maintain, and improve CI/CD pipelines using tools like Jenkins, GitHub Actions, or CodePipeline.
- Perform load and stress testing, analyze performance bottlenecks, and provide remediation strategies.
- Lead incident response, root cause analysis, and develop robust postmortem practices.
- Define and implement infrastructure as code (IaC) using Terraform or CloudFormation.
- Manage observability stack: logs, metrics, traces using tools like CloudWatch, Datadog, Prometheus, Grafana, or Splunk.
- Participate in on-call rotation and improve runbooks and self-healing systems.
Required Skills & Qualifications:
- 10/10 expertise in AWS must have hands-on experience building secure, scalable cloud architectures.
- AWS Solution Architect Associate or Professional certification is highly preferred.
- Proven SRE and DevOps experience with a strong problem-solving mindset.
- Proficiency in Performance Testing tools like JMeter, Gatling, k6, or Locust.
- Deep understanding of distributed systems, microservices, and cloud-native architectures.
- Strong knowledge of containerization using Docker and orchestration tools like Kubernetes or ECS.
- Expert in Infrastructure as Code tools: Terraform, CloudFormation, etc.
- Familiarity with modern CI/CD pipelines and automation tools.
- Proficient in scripting with Python, Bash, or Go.
- Strong experience with monitoring and alerting tools.
- Solid knowledge of networking, security, and cloud cost optimization.
Nice to Have:
- Experience in chaos engineering or resilience testing.
- Exposure to multi-account AWS environments and Control Tower / SCPs.
- Familiarity with service mesh (Istio, Linkerd) and API gateways.
- Prior experience working in regulated environments (e.g., financial services).