Looking for architect, Design engineer and Automation SME for the role below. This is a fully remote position.
Automation Engineers and Architects Monitoring & Alerting Focused
Key Activities:
Unified Monitoring Framework: Build an integrated solution leveraging existing tools (Splunk, Dynatrace, security platforms) and local logs from Windows/Linux servers, infrastructure components (storage arrays, SAN switches, network devices), databases (Oracle, SQL Server, MySQL, MongoDB), backup systems (Rubrik, Data Domain, Infinibox), compute nodes (Dell servers), VMware environments, IBM Power/AIX, and IBM LinuxOne.
Automated Alerting & Proactive Response: Develop intelligent alerting mechanisms and automated remediation workflows to reduce manual intervention and accelerate incident resolution.
Data Integration & Gap Closure: Aggregate and normalize data from multiple sources, including platform tools and local logs, to fill visibility gaps and provide actionable insights.
Dashboard Development: Create a common GUI-based dashboard for real-time monitoring, alerting, and reporting across all infrastructure layers.
Skills & Tools: Utilize Ansible, Python, PowerShell, shell scripting, and GUI development to deliver scalable automation solutions.
Business Impact:
- Improved Reliability: Proactive detection and automated remediation reduce outages and service degradation.
- Operational Efficiency: Significant reduction in manual monitoring and troubleshooting efforts.
- Enhanced Security & Compliance: Centralized visibility into logs and alerts ensures faster response to security events.
- Scalability: A common framework supports growth and complexity without proportional increases in headcount.
Job Skill Requirements and Experience
Core Technical Skills:
Automation & Scripting:
- Proficiency in Python, Ansible, PowerShell, and shell scripting (Bash/Korn).
- Ability to develop automation workflows for monitoring, alerting, and remediation.
Monitoring & Logging Tools:
- Hands-on experience with Splunk, Dynatrace, and other enterprise monitoring platforms.
- Familiarity with log aggregation and parsing from multiple sources (OS, applications, infrastructure components).
Infrastructure Knowledge:
- Strong understanding of Linux (RHEL) and Windows Server environments.
- Exposure to VMware, IBM Power/AIX, and IBM LinuxOne systems.
- Knowledge of storage arrays, SAN switches, network switches, and IP traffic monitoring.
- Experience with backup platforms (Rubrik, Data Domain, Infinibox).
- Familiarity with database systems (Oracle, SQL Server, MySQL, MongoDB).
GUI Development:
- Ability to build dashboard interfaces for real-time monitoring and alerting (using frameworks like Flask/Django for Python or similar).
Additional Skills:
Data Integration:
- Ability to aggregate and normalize data from multiple sources for unified alerting.
Security & Compliance Awareness:
- Understanding of security logs and compliance requirements for infrastructure monitoring.
Problem-Solving & Creativity:
- Ability to identify gaps in current monitoring and design innovative solutions.
Experience:
- 5+ years in infrastructure automation or systems engineering roles.
- Proven track record in building automation frameworks and monitoring solutions.
- Experience working in large-scale, distributed environments with global teams.
- Prior involvement in proactive alerting and automated remediation projects is highly desirable.