Overview
Skills
Job Details
What Working at Hexaware offers:
Hexaware is a dynamic and innovative IT organization committed to delivering cutting-edge solutions to our clients worldwide. We pride ourselves on fostering a collaborative and inclusive work environment where every team member is valued and empowered to succeed.
Hexaware provides access to a vast array of tools that enhance, revolutionize, and advance professional profile. We complete the circle with excellent growth opportunities, chances to collaborate with highly visible customers, chances to work alongside bright brains, and the perfect work-life balance.
With an ever-expanding portfolio of capabilities, we delve deep into and identify the source of our motivation. Although technology is at the core of our solutions, it is still the people and their passion that fuel Hexaware s commitment towards creating smiles.
At Hexaware we encourage to challenge oneself to achieve full potential and propel growth. We trust and empower to disrupt the status quo and innovate for a better future. We encourage an open and inspiring culture that fosters learning and brings talented, passionate, and caring people together.
We are always interested in, and want to support, the professional and personal you. We offer a wide array of programs to help expand skills and supercharge careers. We help discover passion the driving force that makes one smile and innovate, create, and make a difference every day.
The Hexaware Advantage: Your Workplace Benefits
Health benefits with low-cost employee premium.
Range of voluntary benefits such as Legal, Identity theft and Critical Care Coverage training and upskilling opportunities through Udemy and Hexavarsity
Job Title: Site Reliability Engineer (SRE)
Location: McLean, VA
Job Summary
We are seeking a Site Reliability Engineer (SRE) who combines software engineering expertise with IT operations to ensure the reliability, availability, scalability, and performance of critical systems and services.
Key Responsibilities
System Reliability: Design, implement, and maintain automated solutions to ensure high availability, resiliency, and scalability of applications and services.
Incident Management: Respond to production incidents, develop protocols to minimize downtime, conduct post-mortems, and implement preventive measures.
Monitoring & Observability: Set up and manage monitoring systems to track performance metrics, ensuring system health and addressing potential issues proactively.
Performance Optimization: Analyze system performance, identify bottlenecks, and optimize for speed, scalability, and resource utilization.
Automation: Leverage automation tools to reduce manual interventions and ensure efficiency, repeatability, and minimal human error.
Collaboration: Work closely with stakeholders to support new features, deployments, and compliance initiatives.
Capacity Planning: Forecast resource needs and plan for future growth to maintain system stability and scalability.
Documentation: Create and maintain up-to-date documentation for systems, processes, and troubleshooting procedures.
Continuous Improvement: Stay current with emerging technologies and practices to design and deliver best-in-class solutions.
Required Qualifications
Strong sense of accountability and ownership to identify and drive improvements.
Excellent communication skills to convey complex information clearly and persuasively.
Ability to work independently and collaboratively in a fast-paced environment, including evenings/weekends as needed.
Technical Expertise:
o End-to-end observability solutions (Elastic Observability, Elastic APM, Distributed Tracing, OpenTelemetry).
o Linux/Unix system administration and cloud infrastructure (AWS, Azure, Google Cloud).
o Programming/scripting languages (Java, Python, Go, Bash, Spring Boot, PySpark).
o Data management and data warehousing (MongoDB, Snowflake, SQL).
o CI/CD tools and configuration management (Jenkins, Ansible, Terraform).
o Containerization and orchestration (Docker, Kubernetes, EKS).
o Networking, databases, and distributed systems.
Experience with incident response and post-mortem processes.
Bachelor s degree in Computer Science, Information Technology, or equivalent experience.
Privacy Statement:
The information you provide will be used in accordance with the terms of our Privacy Policy and will be used specifically for the business/processing purpose of the event. You should be aware that we may share your details with our approved vendors for this event to be handled successfully.