INFRASTRUCTURE AUTOMATION ENGINEER at At ALPHARETTA, GA

  • ALPHARETTA, GA
  • Posted 1 day ago | Updated 3 hours ago

Overview

On Site
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 6+ Month(s)

Skills

MONITORING
OBSERVABILITY
Prometheus
Grafana

Job Details

Position Name : INFRA AUTOMATION - OBSERVABILITY AND MONITORING

Location: ALPHARETTA, GA

Job Type: C2C/W2

YEARS OF EXPERIENCE NEEDED 8-20 YRS

JOB SUMMARY

An observability engineer designs, implements, and maintains systems to monitor, analyze, and report on the health and performance of software applications and infrastructure, ensuring high availability, performance, and security. They are crucial in understanding complex IT systems and proactively addressing potential issues.

KEY RESPONSIBILITIES:

  • Designing and Implementing Observability Pipelines: Observability engineers create robust pipelines to collect, aggregate, and analyze data from various sources.
  • Monitoring and ing: They establish monitoring systems and s to detect anomalies and performance issues in real-time.
  • Metric & Instrumentation Standards: Defining common metric standards for every stage of the Application Lifecycle process and Instrumentation standards and scripting including OTel standards alignment
  • Data Analysis and Visualization: They analyze telemetry data (logs, metrics, traces) to gain insights into system behavior and identify trends.
  • Incident Response: They investigate and troubleshoot incidents, using observability data to understand the root cause and implement solutions.
  • Collaboration and Communication: They collaborate with development, SRE, and other teams to ensure observability practices are integrated into workflows and to share insights.
  • Staying Up-to-Date: They stay current with the latest trends in observability, logging, monitoring, and cloud technologies.
  • Documentation and Knowledge Sharing: They create comprehensive documentation for observability systems and processes and share knowledge with other teams.
  • Skills and Knowledge:
  • Strong understanding of distributed systems: They need to understand the complexities of modern architectures, including microservices, cloud-native environments, and hybrid infrastructure.
  • Proficiency in observability tools: They are familiar with tools for logging, metrics, and tracing, such as ELK Stack, Prometheus, Grafana, and distributed tracing systems.
  • Data analysis and visualization skills: They can analyze telemetry data to identify trends and patterns and create visualizations to communicate insights.
  • Scripting and automation: They can automate tasks and create scripts to manage observability infrastructure.
  • Problem-solving skills: They can diagnose and troubleshoot system issues using observability data.
  • Communication skills: They can effectively communicate technical information to both technical and non-technical audiences.
  • Experience with cloud platforms: They have experience with cloud platforms like AWS, Azure, and Google Cloud Platform.
  • Understanding of IT service management practices: They understand IT service management practices like change management, release management, incident management, and problem management.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.