Duration: Very long term, initial contract 2 years with extension
Hiring for Multiple roles
AWS Cloud Operations Engineer provides a wide variety of systems administration and Cloud engineer support functions. Operation tasks are conducted primarily in AWS public cloud, with some work in traditional data centers. You will be part of a back-end systems support team for our growing portfolio of cloud-based software applications. You will use infrastructure monitoring tools and respond to alerts to continuously improve the stability of our systems. Emphasis will be placed on operational duties, tasks will include automating and maintaining cloud services, supporting application development teams on the cloud, performing security and performance related compliance and monitoring tasks, and conducting research and POCs to bring enhancements to the environments. The role will support troubleshooting incidents and change requests. There will also be tasks associated with onboarding and supporting application teams to enterprise DevOps CI/CD infrastructure tool sets. The role will contribute to the documentation of run books, guidelines, and best practices. Participate in support rotation schedule with off hours support.
- Deploy and support automated AWS cloud-based tools and environments in support of application teams.
- Analyze and response to incidents and problems including the development of automated monitoring and remediation to maintain uptime and expected service levels. This includes cloud infrastructure, applications, middleware, and other 3rd party software.
- Analyze and resolve problems associated with the operating systems and middleware, for example Redhat Linux, JBoss, Apache, Tomcat, Windows Server, IIS, etc.
- Manage, configure, respond and resolve AWS Security alerts including vulnerabilities and patch management.
- Design, generate and interpret operational reports related to system health status, capacity management and system performance management.
- Determine root cause for incidents, correlate recurring incidents to systemic problems, and drive towards resolution.
- Contribute to the build-out of cloud infrastructure, for example, working with services such as load balancers, gateways, firewalls, subnets, security groups, and storage options.
- Use scripting and automation tools to increase efficiency, performance, and cost reductions, for example CloudFormation,Terraform, Unix Shell, Python, PowerShell, Ansible, etc.
- Work closely with application teams following Agile methods and principles.
- Contribute and collaborate to design, document, and publish Engineering standards, principles, guidelines and best practices.
- Seek opportunities to increase efficiency through research and investigation, application team input, automation options, POCs, etc.
- Experience with core AWS services like EC2, S3, SNS, Lambda, CloudWatch and CloudTrail.
- Experience in the design, development, and implementation of AWS-based infrastructure solutions using AWS APIs, and Python with boto3.
- Strong scripting experience in Python and PowerShell/Bash.
- Windows and Linux system administration: OS, middleware, application layer
- Server, network, and storage performance benchmarking and optimization.
- In-depth understanding of the operational dependencies of applications, networks, systems, security, and policy.
- Experience with cloud orchestrations tools like AWS CloudFormation and/or Terraform, with an emphasis on creating modular architecture.
- Experience with AWS IAM.
- Proficient in using Git branching, push/pull requests, and advanced Git workflows.
- Experience with Jenkins, Ansible or similar tools.
- Experience with application build technologies.
- Demonstrated knowledge of DevOps principles. Hands-on experience required.
- Strong networking knowledge, preferably with DNS, subnets, routing, security groups, listing, firewalls and various networking infrastructure.
- CDK, Control Tower, AWS Control Tower Customization Solution
- Experience in containerization and orchestration using Docker, Kubernetes, or Fargate/EKS/ECS.
- Familiar with analytics and log aggregation tools such as Splunk or Microsoft BI