Site Reliability Engineering Architect (SRE Architect)
Remote • Posted 3 hours ago • Updated 3 hours agoContract W2
Remote
Depends on Experience


INSPYR Solutions
Fitment
Dice Job Match Score™
👤 Reviewing your profile...
Job Details
Skills
- SRE
- SRE ARCHITECT
- DEVOPS ARCHITECT
- DEVOPS
- AI
- AUTOMATION
Summary
Title: Site Reliability Engineering Architect
Location: Fully Remote, EST Hours
Duration: W2 Contract, 12-months
Work Requirements: Holders
Position Overview
The Site Reliability Engineering Architect is a senior technical leader responsible for designing and evolving automation-first, AI-augmented reliability platforms for large-scale cloud environments. This role owns the architecture that enables systems to detect, decide, and act with minimal human intervention. The Architect defines how automation, intelligent systems, and engineers interact in production, ensuring reliability scales without proportional growth in operational effort. This role sets technical direction, establishes standards, and delivers platforms that reduce toil while improving resilience and delivery velocity. Hybrid or remote flexibility available.Core Responsibilities
- Design reliability architectures that prioritize automation and intelligent decision-making over manual processes. Define patterns for fault isolation, graceful degradation, and recovery that assume automated and AI-assisted execution. Ensure reliability, security, and governance requirements are embedded directly into operational systems and workflows. Establish architectural standards that reduce complexity, human dependency, and operational risk.
- Architect event-driven automation platforms that span detection, decisioning, and execution. Design and implement workflow orchestration systems capable of handling both low-risk autonomous actions and higher-risk human-approved operations. Replace ticket-driven and static runbook processes with executable, testable automation. Standardize automation patterns across incident response, change execution, and platform operations. Ensure automation systems are resilient, observable, and auditable.
- Design and own internal AI-driven operational platforms that act as a centralized interface for reliability and automation workflows. Build systems that allow intelligent components to retrieve operational context, reason over signals, and invoke controlled actions across infrastructure and services. Establish architectures for agent coordination, capability discovery, and safe execution in production environments. Define guardrails, approval paths, observability, and auditability for AI-initiated actions. Integrate AI-driven decisioning directly into operational workflows rather than treating it as an external enhancement.
- Architect observability systems that feed automation and intelligent decision-making rather than static dashboards. Design signal pipelines that correlate metrics, logs, traces, and events into actionable context. Reduce alert fatigue through context-aware, noise-resistant detection and prioritization. Ensure every operational signal has a defined automated or AI-assisted response path. Drive continuous improvement through trend analysis and systemic remediation.
- Define governance-backed use of enterprise low-code automation platforms to accelerate operational workflows. Enable secure, scalable automation for approvals, communications, enrichment, and orchestration while preventing platform sprawl. Establish clear boundaries between low-code automation and code-first systems. Integrate enterprise automation tools with cloud-native automation and AI-driven operational platforms.
- Serve as the architectural authority for reliability, automation, and AI-driven operations. Mentor senior engineers and raise organizational maturity in automation and intelligent systems. Partner with engineering, security, and compliance teams to deliver safe, scalable operational platforms. Own reference architectures, operational standards, and long-term technical direction. Challenge designs that increase operational risk, toil, or manual dependency.
Required Qualifications
- 5+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Infrastructure Engineering supporting complex distributed systems. Proven experience designing and operating automation-heavy or autonomous operational platforms. Strong programming and automation skills using modern languages and frameworks. Hands-on experience with workflow orchestration and event-driven systems. Practical experience integrating AI or intelligent decision systems into production operations. Deep understanding of failure modes, blast radius management, and risk-aware automation.
Preferred Qualifications
- Experience designing or implementing agent-based or AI-assisted operational systems. Familiarity with modern AI platforms and model integration for operational use cases. Experience with control-plane architectures for automation and intelligent systems. Enterprise automation and governance experience. Knowledge of cost-aware reliability design, FinOps principles, and zero-trust security models. Relevant cloud or platform certifications.
Success Metrics
- Reduction in manual operational toil and human intervention. Increased adoption of automated and AI-assisted remediation. Faster detection, triage, and resolution of incidents. Improved reliability and service-level objective attainment. Scalable operational platforms that support growth without proportional increases in operational staffing.
About INSPYR Solutions
Technology is our focus and quality is our commitment. As a national expert in delivering flexible technology and talent solutions, we strategically align industry and technical expertise with our clients' business objectives and cultural needs. Our solutions are tailored to each client and include a wide variety of professional services, project, and talent solutions. By always striving for excellence and focusing on the human aspect of our business, we work seamlessly with our talent and clients to match the right solutions to the right opportunities. Learn more about us at inspyrsolutions.com.
INSPYR Solutions provides Equal Employment Opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, or genetics. In addition to federal law requirements, INSPYR Solutions complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 10228513
- Position Id: 26-00525
- Posted 3 hours ago
Company Info
About INSPYR Solutions
As a leading technology solutions company, we connect top IT talent with clients to provide innovative business solutions through our IT Staffing, Professional Services, and Infrastructure Solutions divisions. There are four elements that set us apart and serve as pillars of our company philosophy: Quality, Expertise, People, and Relationships. By always striving for excellence in these areas and focusing on the human aspect of our business, we work seamlessly together with our talent and clients to match the right solutions to the right opportunities.
Twitter
Facebook
Instagram

Create job alert
Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs