Overview
Skills
Job Details
Job Description
Participate in on-call rotations for 24/7 Azure Production support
Respond to and resolve production incidents and outages
Conduct root cause analysis for major incidents and implement preventive measures.
Provide advanced technical support for Azure Databricks, including troubleshooting and resolving complex issues.
Monitor system logs, detect issues, and resolve technical problems related to the Kubernetes infrastructure.
Maintain and optimize Azure infrastructure for production environments.
Monitor system health, performance, and security.
Implement and manage backup and disaster recovery solutions
Perform capacity planning and cost optimization
Automate routine tasks and create self-healing systems
Implement and maintain security controls and compliance measures
Collaborate with development teams to improve application performance and scalability
Manage and optimize Azure resources utilization
Stay updated with new Azure features and best practices
Manage and optimize costs through FinOps practices
Assist in the design and implementation of new Azure-based solutions
Document processes, configurations, and system architectures
Strong understanding of Azure services, including Azure Kubernetes Service (AKS)
Create / Maintain Scripts: Bash and PowerShell, to maintain the infrastructure.
Create / Maintain Terraform Scripts
Mentor junior team members and share knowledge
Azure Administrator Associate (AZ-104) & (AZ-204)
Azure Network Engineer Associate (AZ-700)