Overview
Skills
Job Details
Job Title: Sr. Monitoring Engineer/Sr. Observability Engineer
Location: Headquarters / Telecommute
Classification (HR only): Exempt Non-Exempt
Reports To (Title): COO Widescope Consulting and Contracting
JOB SUMMARY
The statements below are not intended to be all-inclusive of the duties and responsibilities of the position. Based on leadership decisions and business needs, all other duties as assigned will be expected for each position.
Widescope Consulting and Contracting is proud to serve our nation's military and Veterans. We support federal agencies in advancing the United States health care system and improving the overall health and well-being of those who serve or have served our country. Our health services are designed to help people live healthier lives.
This position is part of Widescope's Technology Product Management and Planning organization. The team manages all aspects of the product lifecycle from initial definition and planning through production, launch, and retirement. The focus is technical, overseeing strategy, design, development, and the end-of-life process for new, existing, or acquired products. We are looking for a senior Level Observability engineer. The team is responsible for enterprise infrastructure, application, and network monitoring for on-prem, hybrid, and various Clouds. The selected candidate will be joining a team of skilled engineers with a broad background in enterprise monitoring and Observability. As an Observability Engineer, you will be focused on maintaining the reliability, scalability and availability of our Log management solution as well as our Metrics and Observability platform which heavily uses automation (terraform, Ansible and scripts), this role requires maintaining performance KPI of our solutions and defining their SLOs.
Role Capabilities:
- Maintain and deploy monitoring and alerting.
- Design, configuration and maintenance of log aggregation solution at a large scale.
- Set up and manage ingestion pipelines and data transformations
- Have the mindset of automate any task .
- Monitoring and Alerting: Build and maintain robust monitoring systems using tools like Elk, Dynatrace, Prometheus and Grafana to detect potential issues early and trigger alerts for timely response.
- Maintain associated documentation as it applies to our audit and certification requirements
- Participate in troubleshooting, capacity planning, and performance analysis activities
- Research new monitoring requirements and in many cases write code to accomplish.
- Strong expertise in setting up monitoring policies/rules/templates; and writing scripts to accomplish monitoring requirements.
JOB QUALIFICATIONS
Required:
- BS/MS in CS/engineering or equivalent, OR 5+ years of experience.
- 3+ years of experience working directly with monitoring tools as either an Admin, SME or as an Architect, preferably with Dynatrace and/or ELK.
- Hands-on experience with designing data pipelines using Logstash and/or fluentbit/fluentd.
- Fluent in writing scripts in languages like Python and (Bash or powershell) to automate tasks.
- Experience in Terraform and Ansible. Syntax, best practices, and managing complex configurations in multi commercial and Gov clouds to build and manage infra and applications.
- Very good working knowledge with Linux OS.
- Highly self-motivated and directed
- Good analytical and problem-solving/troubleshooting abilities.
Preferred:
- Knowledge of SNMP, TCP dump and tracing.
- Knowledge of AIOPS platform.
- Other scripting experience (JavaScript, Java, PowerShell, or others)