Apply Now

Technical Operations Manager Production/Web/SAAS

Remote • Posted 60+ days ago • Updated 11 hours ago

Full Time

No Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

🔗 Matching skills to job...

Job Details

Skills

API Management
Amazon DynamoDB
Amazon Kinesis
Amazon RDS
Business Continuity Planning
Google Cloud Platform
DevOps
Database
Disaster Recovery
Apache Tomcat
Amazon Web Services
Authentication
Manufacturing
Microsoft Azure
MongoDB
Legacy Systems
High Availability
ITIL
IaaS
Cloud Computing
Amazon S3
Business Intelligence
Product Engineering
Team Leadership
Technical Direction
Value Engineering
SQL

Summary

KEYS TO THE OPEN POSITION FROM THE PRESIDENT

Someone with experience from a smaller start up company(15-40 Million Range in revenue).

Director of technology operations:

Responsibilities:

SaaS delivery
Release management
Reliability/uptime (SRE/DevOps)
DevSecOps practices
Feedback loop to product/dev
Successful Migration of (portion of) windows/web hybrid to SaaS
Manage 3-5 staff over time (hire as well) and work with 3^rd party certified AWS/Azure partners

Technical Operations Manager Web/SAAS

Job Overview
We are seeking an exceptional Senior Manager of Site Reliability Engineering (SRE) to lead our global SRE organization and drive operational excellence across our multi-cloud SaaS platform. This role is critical to our mission of delivering reliable, scalable, and performant solutions to thousands of customers worldwide. The successful candidate will be an integral part of a fast growing Manufacturing Automation Software Company.

Success Metrics:

Customer Impact: Reduced MTTR and improved customer satisfaction scores
Reliability: Achievement of 99.9%+ uptime SLAs across all products and regions
Team Growth: Successful scaling of global SRE organization with low attrition
Proactive Prevention: Reduction in incident frequency through automated detection and prevention
Cross-functional Collaboration: Improved partnership metrics with Product, Engineering, and Customer Success teams

About Us

Our Company is a leading provider of innovative manufacturing quality management software (QMS) and Supplier Quality Management software (SQS) that transforms how the world's most demanding industries operate. For over a decade, we've empowered aerospace giants, automotive manufacturers, medical device companies, and energy sector leaders to eliminate quality incidents, reduce costs, and consistently hit delivery targets all while maintaining the highest quality standards and compliance.

Responsibilities Leadership & Strategy

Lead and scale a global SRE organization spanning multiple time zones
Develop and execute SRE strategy aligned with business objectives and customer success metrics
Drive cultural transformation toward reliability-first engineering practices across the organization
Partner closely with Customer Success to ensure customer-centric approach to all SRE initiatives
Establish and maintain SLAs, SLOs, and error budgets that balance reliability with feature velocity

Incident Management & Response

Lead enterprise-wide incident management, ensuring rapid detection, response, and resolution
Serve as executive point of contact during critical incidents
Drive comprehensive root cause analysis (RCA) processes with actionable prevention strategies
Establish and maintain 24/7 on-call rotation and escalation procedures across global teams
Develop and execute disaster recovery and business continuity plans

Technical Leadership

Provide technical direction for complex, multi-cloud infrastructure spanning AWS, Azure, and Google Cloud Platform
Oversee reliability engineering for our entire product portfolio
Lead application performance monitoring initiatives
Drive modernization efforts and ensure optimal performance across geographically distributed DCs
Drive best practices in tuning SQL and NoSQL data platforms

Platform Reliability

Ensure high availability and performance of services including: AWS (ECS, ECR, RDS, Aurora, SQS, SNS, Kinesis, S3, DynamoDB, OpenSearch), Authentication (Auth0/Okta CIC), Integration platforms (Workato), BI (Looker), API management (Apigee), Legacy systems (Tomcat, MongoDB)
Manage reliability for thousands of customers in North America and EU

Operational Excellence

Establish observability standardization strategy (Sumo Logic, New Relic and Grafana)
Drive automation initiatives to reduce manual operational overhead
Implement chaos engineering and reliability testing practices
Lead capacity planning and performance optimization efforts
Establish metrics-driven culture with focus on customer impact measurements

Qualifications Leadership Experience

15+ years in SRE, DevOps, or Infrastructure Engineering roles with 5+ years in senior positions
Proven track record of scaling global engineering teams across multiple time zones
Experience leading teams through high-stakes incident response and customer escalations
Someone with a smaller company growth mindset would be very useful.
Strong organizational skills with ability to influence cross-functional stakeholders

Technical Expertise

Deep expertise in multi-cloud environments (AWS primary, Azure secondary, Google Cloud Platform preferred)
Extensive experience with containerization, orchestration, and modern deployment practices
Strong background in database technologies
Proficiency with observability tools (New Relic, Grafana, Sumo Logic, or similar)
Experience with large-scale Java applications and legacy system modernization

SRE & Operations

Demonstrated success implementing SRE principles in large-scale production environments
Experience with ITIL, incident management frameworks and tools
Background in establishing and maintaining SLAs for enterprise SaaS products

Preferred

Background with authentication systems (Auth0, Okta, SAML, OAuth)
Experience with API management platforms and integration architectures
Previous exposure to CDN optimization and global content delivery
Relevant certifications in AWS, Azure, or SRE practices

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10124613
Position Id: 8778526
Posted 30+ days ago

Contact the job poster

Bill Babik

Recruiter1 @ Case Interactive

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Sr. Engineer, Site Reliability

Remote or Oregon

•

16d ago

ABOUT US Syniti, part of Capgemini, tackles the hardest work in data for the world's largest organizations. We combine intelligent software with deep data expertise to help the Fortune2000 tackle complex data challenges and drive measurable business outcomes with business-ready data. Syniti's Data First strategy transforms data from an afterthought into a strategic asset-unlocking insights, reducing risk, and fueling growth. With over 5,000 successful projects, we support the full data lifecycle

Full-time

USD 134,941.25 - 171,411.75 per year

Manager for DevOps and DBA Engineer - Assistant Vice President

Remote or Salt Lake City, Utah

•

Today

About the Role iCapital is looking to hire a Manager for DevOps and DBA Engineering Manager to join the Annuities Platform team. This role is responsible for leading the strategy, design, and execution of DevOps practices and database platform operations supporting enterprise SaaS solutions. This role owns CI/CD pipelines and SQL Server database systems to ensure high availability, scalability, performance, and security across production and non-production environments. As a Technical Manager, t

Full-time

USD 150,000.00 - 180,000.00 per year

Lead Site Reliability Engineer

Remote or Ohio

•

Today

Overview Impact the Moment Could your creative thinking build the future? A Lead Site Reliability Engineer at McGraw Hill makes a difference for learners and educators across the world. Our team needs individuals with new ideas who connect with people in innovative ways. How can you make an Impact? McGraw Hill, a leading provider of digital educational resources and content, is seeking a Lead Site Reliability Engineer to lead a team of 6 Engineers for our Digital Platform Group in supporting

Full-time

USD 124,000.00 - 155,000.00 per year

Data Operations Lead

Remote

•

Yesterday

At Workiva, we are building the data foundation for the next generation of AI-enabled platform. Our Data Platform Ops team is at the center of this mission, merging software development with infrastructure operations to design and manage large-scale. We are looking for a Data Operations Lead to ensure the reliability, scalability, and operational excellence of our analytics and data systems. This role is operationally critical and technically demanding. As a senior technical leader, you will be

Full-time

USD 151,000.00 - 242,000.00 per year

Search all similar jobs

Remote jobs at Case Interactive