Overview
Remote
On Site
$70.0000 - $80.0000
Full Time
Skills
Observability
Monitoring
Site Reliability
Infrastructure
Performance Optimization
Job Details
Client - Financial Services
Title - Observability Engineer (Lead)
Location - Charlotte, NC OR Dallas, TX OR Baltimore, MD OR Salt Lake City, UT (Hybrid)
Type - Contract to hire
DayToDay Responsbilities:
- Identify gaps in logging, observability & measurement of performance of critical transactions. Help establish new instrumentation so we have more visibility to performance.
- Review end-to-end performance of critical transactions to identify patterns & trends, identify bottlenecks & risks
- Make recommendations to improve performance of critical transactions and optimize capacity. Collaborate with cross-functional teams to align on improvements, prioritize focus areas and implement.
- Troubleshoot issues in production and lower environments to identify causes of the issues and brainstorm options to resolve
- Develop and execute a comprehensive capacity plan for select technology services
- Assist the organization in conducting load & performance testing on critical systems to identify potential bottlenecks
- Develop and maintain documentation related to capacity planning and performance testing
- Provide leadership to establish a performance and capacity management practice
Qualifications
- Prior enterprise level experience
- Proven ability to enable change and lead IT efforts
- Bachelor's degree in Computer Science, Information Systems, Mathematics or a related field
- 10+ years of experience with multiple types of technology and various operating systems including cloud computing environments and virtualization technologies
- 5+ years experience writing code and using command line interfaces, including experience writing SQL statements and performing system administration functions
- 5+ years experience evaluating transactional performance and system capacity issues & limitations
- Experience building instrumentation of logs, observability & monitoring to provide visibility of transaction performance and bottlenecks
- Experience aggregating data and analyzing data to draw conclusions about performance risks & issues
- Experience using a combination of monitoring tools and system logs to troubleshoot incidents, identify the cause and determine how to resolve.
- Experience forecasting how much capacity is needed to accommodate potential future increases in traffic
- Experience identifying improvement recommendations and implementing improvements to transactional performance and capacity challenges
- Excellent communication skills and ability to summarize complex technical details into a story executives can understand
- Ability to work collaboratively in a team environment
- Ability to thrive in a fast-paced, dynamic environment
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.