Site Reliability Engineering Manager

Microsoft Windows, Instrumentation, Leadership, Linux, Management skills, Metrics, Networking, Effective communication, Engineering, Golang, IT management, Internet, Node.js, Accountability, Capacity management, Communication skills, Decision-making, Time management, Solution delivery, Status reports, System requirements, Unix, QA, Real-time, Process management, Product development, Productivity, Project planning, Python, FOCUS, Organizational skills, Performance analysis, Planning, Presentations, Reliability engineering, Reporting, Scalability, Software development, sre, Site Reliability, Site Reliability Manager, DevOps, Manager, Saas
Full Time
$160,000+
Work from home available

Job Description

As a Site Reliability Manager you will drive system reliability, developer productivity and reducing time to market by striving to reduce technical debt of the services your SRE team supports. We seek a manager who is passionate about system reliability to influence and drive the strategic SRE mission.

About the role

  • The successful candidate will possess an outstanding record of professional experience and will thrive in an environment that demands accountability. They must possess significant technology management and product development experience. They must also have strong planning, organizational, communication skills, and be a key driver to help the team understand the big picture perspective.
  • Proven leader of technology solutions in a high volume transaction environment.
  • Accomplished leader with 5+ years managing regional and global areas.
  • Have excellent time management, communication, decision-making, presentation, and organizational skills.
  • Maintain excellent written and verbal communications with clients, employees, and management chain, including status reports, project plans, presentations, etc.
  • Ability to lead across functions and motivate a matrix staff

Responsibilities

  • Engage, influence, and evangelize SRE practices with development, operational and product groups to align technology service/solution delivery.
  • Drive quality accountability within the organization with well-defined processes, metrics, and goals for process quality. This includes leading effective postmortems and ensuring actions are followed-up.
  • Manage availability, latency, scalability and efficiency of applications development by instilling engineering reliability into our development life cycle with a focus on fault tolerant approaches.
  • Drive capacity planning, performance analysis, instrumentation and other non-functional systems requirements.
  • Must be able to define and report "progress" on strategic initiatives and project level tasks to all stakeholders including senior executives, clients and use effective communication approaches with each constituency.
  • Implement metrics driven processes to ensure service quality targets are met.

Skills Needed

    • Understanding of how to influence peers and other leaders to build a culture around reliability and transparency
    • Strong management skills, with a servant leadership mindset.
    • Expert knowledge in all aspects of designing, developing, managing large real-time systems.
    • Project and process management
    • Prior successful experience as a systems performance or site/systems reliability engineer.
    • Mastery of Linux/Unix.
    • Knowledge of Windows operating systems
    • Knowledge of NodeJS, Python or Golang Programming
    • Mastery of fault tolerant approaches in a large scale distributed environment and high performance systems,
    • Demonstrated experience working in large, complex systems environments.
    • Deep understanding of internet and networking protocols.
    • A passion for performance excellence, robustness and engineering mindset
Dice Id : 80121007
Position Id : 7107826
Originally Posted : 2 months ago
Have a Job? Post it