Overview
Skills
Job Details
Position: Apache Druid
Location: Remote
Duration: Contract C2C
Job Description:
Ensure high availability and reliability of production systems.
Implement and maintain robust monitoring and alerting systems.
Participate in on-call rotations to respond to incidents and outages.
Conduct post-incident reviews and implement preventative measures.
Automation and Infrastructure as Code (IaC):
Automate infrastructure provisioning, configuration, and deployment using IaC tools (e.g., Terraform, Ansible).
Develop and maintain CI/CD pipelines to streamline software releases.
Optimize and automate data pipelines and workflows.
Apache Druid Management:
Manage and optimize Apache Druid clusters for high performance and scalability.
Troubleshoot Druid performance issues and implement solutions.
Design and implement Druid data ingestion and query optimization strategies.
Apache Airflow Orchestration:
Design, develop, and maintain Airflow DAGs for data orchestration and workflow automation.
Monitor Airflow performance and troubleshoot issues.
Optimize Airflow workflows for efficiency and reliability.
Monitoring and Logging:
Implement and maintain comprehensive monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).
Analyze metrics and logs to identify performance bottlenecks and potential issues.
Create and maintain dashboards for visualizing system health and performance.
Collaboration and Communication:
Collaborate with development, data, and operations teams to ensure smooth operations.
Communicate effectively with stakeholders regarding system status and incidents.
Document processes and procedures
Thanks,
Nitesh