Overview
Skills
Job Details
We are hiring for Automation Engineer. Please let me know if you are looking for job change. Thanks.
Job Title: Automation Engineer
Location: Remote (PST Time Support)
Duration: 1 year+ CTH
Job Summary:
We are seeking a skilled and proactive Automation Engineer to join our infrastructure and operations team. In this role, you will focus on building and maintaining automation frameworks and operational tooling that ensure consistency, visibility, and high performance across our infrastructure. You will work closely with other infrastructure, SRE, and monitoring teams to improve system reliability, reduce manual toil, and integrate observability platforms using tools like Ansible, Victoriametrics, Prometheus, SALT Stack, CloudVision, BigPanda, and Kentik.
Key Responsibilities:
- Work on Nvidia in-house automation framework built to convert workflows in the form of SOPs (standards of practice) into automated steps.
- Design, develop, and maintain automation playbooks using Ansible and SALT Stack to manage automation workflows.
- Integrate and scale infrastructure monitoring and alerting platforms such as Prometheus, BigPanda, and Kentik.
- Collaborate with network teams to automate and orchestrate tasks using Arista CloudVision APIs etc. and automation pipelines.
- Implement and optimize observability dashboards, alert routing, and remediation logic across systems and tools.
- Participate in incident response, root cause analysis, and postmortem documentation.
- Champion automation best practices, infrastructure-as-code principles, and continuous improvement of deployment pipelines.
- Build scalable solutions to reduce manual operational burden and support environment growth.
Required Qualifications:
- 5+ years of experience as an Automation, DevOps, or SRE Engineer in large-scale infrastructure environments.
- Strong hands-on experience with Ansible and SALT Stack for configuration management and automation.
- Solid understanding of Prometheus metrics collection, exporters, and alerting rules.
- Experience integrating and managing observability and incident management platforms like BigPanda and Kentik.
- Working knowledge of Arista CloudVision and associated automation tooling or APIs.
- Familiarity with CI/CD pipelines, Git-based workflows, and scripting languages (e.g., Python, Bash).
- Comfortable working in Linux-based systems and cloud/hybrid environments