Job Title: SRE-DevOps Engineer
About CodeForce 360
Making a career choice is amongst the most critical choices one can make, and it's important for the choice to be calculated with factors such as a company's run of success since its inception and more. But, when you come across a company that has reputation proven with nothing but an illustrious run of success since the day it began, you don't need to think of anything else. That's precisely what some of our employees and prospective employees think when they came across CodeForce 360.
|Standard Skill Requirements||Specific Details||Years||Ins.||Proficiency|
|SDLC||10+ yrs experience in full-stack application development & maintenance in DevOps/SRE role||7+||7+||Expert|
|Skill Name||Custom Skill Requirements||Years||Ins.||Proficiency|
|AWS Developer skills||5+ yrs experience in AWS services to perform the trouble shooting and development activities for platform/application enhancements||7+||7+||Advanced|
|DevOps skills||Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, CodePipeline, etc.), automation and configuration tools (Puppet, Ancible, etc)||7+||7+||Expert|
Site Reliability & DevOps Engineer
- The Site Reliability & DevOps Engineer is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment. When error budget is below the threshold/within tolerance limits, SRE works on application development and bug fixes activities as part of DevOps responsibilities.
Role & Responsibilities:
- Help build a Site Reliability Engineering culture by sharing best practices, approaches, documentation, and code with other engineering teams
- Define and setup KPIs to monitor Error Budgets
- Implement strategies to ensure Error Budgets stay above the defined-acceptance levels
- Define and implement response mechanisms when Error Budget thresholds are breached
- Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually;
- Able to troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot infrastructure and application issues, including development and testing
- Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation (design, develop and test);
- Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability;
- Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and organizational efficiency;
- Work closely with software engineers and QAs to ensure the system is responding properly to no-functional requirements such as performance, security, and availability;
- Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it;
- Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure;
- Keep up-to-date with security and proactively identify, diagnose, and solve complex security issues.
- Design, Develop & Test Java, SpringBoot, GraphQL based REST/JSON Web Services deployed on AWS ECS Fargate.
- Design, Develop & Test Typescript, NodeJS based REST/JSON Web Services deployed on AWS Lambda.
- Design, Develop & Test AWS AppSync based GraphQL services.
- Design, Develop & Test Terraform based Infrastructure as Code scripts to automate AWS infrastructure setup
- Bachelor's Degree in Computer Science or related; or equivalent combination of education and experience
- 10+ yrs experience in full-stack application development & maintenance in DevOps/SRE role
- 3+ yrs experience in the above-mentioned AWS services to perform the trouble shooting and development activities for platform/application enhancements
- Proficient in scripting languages such as Powershell and/or Python
- Troubleshooting utilizing built-in browser tools
- Ability to distill technical and complex principles or scenarios to all levels of our organization
- Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, CodePipeline, etc.), automation and configuration tools (Puppet, Ancible, etc) a plus.
- Knowledge of public clouds (Google Cloud Platform, AWS, Azure) inclusive of implementing projects on public clouds a plus.
- Ability to self-govern workload and show discipline around priority and time management, even while working remotely or in the absence of direct management for an extended period of time
- Ability and willingness to adapt to new application stacks and new technology concepts as the business evolves over time
- Excellent communication skills, both verbal and written
- Ability to collaborate with local and remote teams in different time zones
- Ability to present/lead technical discussions.
- SRE practice setup including standards, guidelines, metrics etc
- Solution design
- Issue resolution to include documentation, code development, testing and deployment
- Solutions and Code reviewed and signed off by Product Owner and App Engineer;
How to Apply
Job ID: JPC - 129613
For more information, please contact below:
Qualified individuals will be contacted for an interview.