Overview
Skills
Job Details
Job Description:
Job Title: Cloud Operation Support Engineer
Location: Berkeley Heights, NJ (Onsite M-F)
Type: Full Time with Infinite Computer Solutions
Job Overview:
Join our dynamic team as a CloudOps Engineer for our new Azure-based product. In this role, you will be instrumental in building and maintaining the technology that powers this small business payments solution, with a particular focus on the Microsoft Azure Cloud ecosystem. Your contributions will directly impact the success of companies worldwide. Our Technology team comprises experts dedicated to evaluating and enhancing current systems while innovating for the future.
You will leverage your analytical and troubleshooting skills to ensure seamless project participation, maintain support continuity, and handle rotating on-call escalations. Your responsibilities will include leading the detection and resolution of issues affecting the delivery of industry-leading financial solutions. Teamwork and creativity are crucial in this role, as you will be the go-to point for escalations, tasked with resolving customer-impacting issues in a dynamic and fast-paced environment.
Essential Role Responsibilities:
- Provide hands-on support for existing environments, including software installation, patch installation, upgrades, query writing, configuration, security, system monitoring and tuning, disaster recovery planning, and release deployments.
- Collaborate with client services to understand customer needs, provide technical support, and ensure seamless integration and operation of our solutions, enhancing overall client satisfaction and experience.
- Implement tools and automation for build, configuration management, continuous integration (CI), deployment, and application monitoring.
- Automate and evolve infrastructure, deployment strategies, and testing to support a quick turnaround of deployments.
- Work closely with Engineering to ensure all relevant KPIs are implemented within the monitoring framework.
- Participate in all Production Support activities during incidents and outages. Be a hands-on technical resource capable of resolving all technical issues within lower and upper environments and making recommendations for performance and capacity improvements.
- Participate in capacity planning, tuning systems stability, provisioning, performance, and scaling of the application infrastructure.
- Efficiently resolve issues in a 24x7 environment, ensuring minimal impact and rapid resolution.
The ideal profile should have hands-on expertise in the following areas:
- Azure Platform Operations( Build and maintain resources Non prod and Production environments , debug and resolve issues)
- Kubernetes (upgrading and managing clusters and not just deploying applications)
- Linux administration (Resolve Linux issues ,understand networking and fix connectivity issues)
- Terraform(some experience in writing Terraform code, Infrastructure provisioning and maintenance , fix Terraform issues)
- Experience in handling production environment