Title of Position: Senior Java Site Reliability Engineer
Location: McLean, VA - Hybrid
Duration: Long Term Contract
Employment: Contract
Rate: $Negotiable
Interview type: Video Interview
Key Responsibilities
· Support and maintain highly available production platforms across cloud and distributed environments. Drive incident management, root cause analysis, problem management, and platform stability initiatives.
· Monitor and maintain uptime of Java applications and microservices.
· Proactively identify and resolve application performance bottlenecks.
· Conduct root cause analysis (RCA) for application outages and incidents.
· Implement resiliency patterns including circuit breakers, retries, and failover mechanisms.
· Lead reliability engineering efforts focused on system availability, performance optimization, and operational excellence. Implement and enhance observability solutions including monitoring, logging, alerting, and incident response automation.
· Collaborate with development, infrastructure, and cloud engineering teams to improve deployment reliability and operational efficiency. Support infrastructure modernization, cloud transformation, and platform automation initiatives.
· Coordinate disaster recovery testing, resiliency validation, capacity planning, and production readiness reviews. Provide technical leadership and mentor offshore/onshore engineering teams.
Required Experience
· 16–20 years of experience in Site Reliability Engineering (SRE), Production Engineering, Platform Engineering, or Application Support.
· Strong experience supporting large-scale enterprise production environments. Proven background in incident management, problem management, and operational support.
· Experience working within banking, financial services, fintech, or other highly regulated industries. Hands-on experience supporting mission-critical applications with stringent availability and performance requirements.
Required Skills
· Java
· Linux/Unix Administration
· Kubernetes and Container Platforms
· Docker
· Cloud Platforms (AWS, Azure, or Google Cloud Platform)
· CI/CD Tools (Jenkins, GitHub Actions, GitLab CI/CD, ArgoCD)
· Infrastructure as Code (Terraform, Ansible)
· Monitoring & Observability Tools (Splunk, Datadog, Grafana, Prometheus, Moog soft)
· ServiceNow, JIRA, Confluence
· Python, Bash, or Shell Scripting
· SQL and Database Troubleshooting
· Application Performance Monitoring (APM)
· Production Release Management
· Disaster Recovery and High Availability Architectures
Education
· Bachelor''''s degree in Computer Science, Information Systems, Engineering, or a related technical discipline
About Arthur Grand
Arthur Grand is an IT services firm specializing in Digital Transformation initiatives for Federal, Commercial, State & local customers. Since 2012, Arthur Grand has been successfully supporting and delivering IT services to our customers in the areas of enterprise modernization and transformation with a core focus on emerging technologies including Cloud Solutions (AWS, Azure), Agile Development, Custom Programming, Full Stack Development, DevOps, DevSecOps, CI/CD, Web Development, Mobile APP Development, Data Visualization, Data Warehousing, Financial/ERP System Implementation and Infrastructure Management. Arthur Grand’s culture of delivery excellence, combined with a commitment to bring the best talent to provide services, has earned our company an unparalleled reputation for delivering transformative results.
We are a minority-owned staff augmentation and technology consulting company
To keep our valued employees, we need to keep them engaged in challenging, interesting work, offer market-relevant benefits, and provide continued opportunities for professional growth. Please send your resume for immediate consideration
If you are interested in the above opportunity, we warmly invite you to join our team.
Arthur Grand Technologies is an Equal Opportunity Employer (including disability/vets)