Overview
Skills
Job Details
Hi,
Title: Java Developer with SRE
Location: 3 days in a week onsite at Atlanta, GA (Primary)
Duration: 12 Months Contract with possibility of extension
Experience: 5+ Years
Experience processing a culture of learning through the development and sharing of skills, knowledge, process and tools
A driving passion for finding solutions to hard problems at scale and operationalizing them
Exceptional critical thinking and communication skills, with a passion for leveraging documentation as a tool for constant improvement
Experience analyzing, troubleshooting, triaging and resolving application alerts, incidents, and issues affecting Production & Non-Production systems in a complex architecture & ecosystem of front-end, middleware, and back-end systems
Experience performing 7x24 Monitoring and developing Monitoring Dashboards & Alerts
Experience performing Incident Management, Incident Communications, Root Cause Analysis, Problem Management and Change Management
Experience analyzing, troubleshooting and resolving front-end, API, and back-end system performance and security issues
Experience developing, managing and troubleshooting code deployment pipelines
Experienced with designing, building, and optimizing automated pipelines with automated testing and automated security controls
Experience with working in Agile Scrum teams with demonstrated success leading improvements
Experience with Application Performance Monitoring, Observability tools and technologies Architecture.
Understand the system and application architecture
Guide the architecture and development teams on how to make applications highly available, reliable, performant and secure at global scale
Partner with architecture team to ensure operability, measurability, and manageability are accounted for in business features and enablers
Conduct proactive 7x24 Monitoring and develop Monitoring Dashboards & Alerts
Perform Incident Management, Incident Communications, Root Cause Analysis and Problem Management
Conduct monitoring, analysis and prevention of performance & security issues
Collaborate with product owners and managers to establish service level objectives for applications and agreed consequences if the objectives are not being met
Collaborate with development team members to swarm, troubleshoot, and resolve problems
Drive the Root Cause Analysis of production and non-production issues and other failures within the application and system
Design, build, and champion automated solutions to optimize application/service/platform uptime with minimal human intervention
Create and implement standards and best practices, driving adoption across development teams and external vendors as applicable
Research, design, and develop solutions meeting internal and external compliance, security requirements and standards for Site Security & Reliability Engineering
Depending on assignment, the candidate may be asked to help manage security risk to prevent, detect and remediate security incidents vulnerabilities and/or compliance items within internal and external applications via automated and manual techniques