Staff Software Engineer - Platform & Reliability

Remote • Posted 5 hours ago • Updated 5 hours ago
Full Time
No Travel Required
Remote
$140,000 - $160,000/yr
Fitment

Dice Job Match Score™

👾 Reticulating splines...

Job Details

Skills

  • Cloud Computing
  • Google Cloud Platform
  • Kubernetes
  • Python
  • Continuous Delivery
  • Continuous Integration
  • Artificial Intelligence
  • Leadership
  • Mentorship
  • Microservices
  • Product Engineering
  • Vertex
  • Terraform
  • Workflow
  • GCP
  • Amazon Web Services
  • CI/CD pipelines
  • CI/CD

Summary

The Sr. Staff Software Engineer - Platform & Reliability will be part of the new Product Engineering team tasked with designing and building the next generation of Agentic AI-powered products for. Acting as the Technical Lead and Primary Architect, you will be a hands-on leader responsible for the team’s overall delivery of the runtime environment and automation for AI services and Agents. You will lead a small squad by decomposing complex platform requirements—such as AI-specific CI/CD, agent observability, and automated scaling—into actionable tasks while remaining deeply embedded in the codebase

Key Responsibilities

     Technical Lead & Execution: Lead the technical delivery of the Agentic Platform by translating high-level infrastructure roadmaps into actionable development tasks. You will own tasks breakdown for your squad, ensuring high-quality output through technical mentorship and rigorous architectural oversight.

     Automated Agent Delivery - CI/CD: Architect and implement high-velocity CI/CD pipelines specifically designed for the lifecycle of AI Agents and services, including automated model evaluation and blue-green deployments for agentic workflows on Google Cloud Platform.

     Cloud Infrastructure Engineering: Lead the design and implementation of our cloud-native infrastructure on Google Cloud Platform using Terraform and Kubernetes (GKE). You will own the core runtime environment where autonomous agents and transactional microservices coexist.

     Agentic Observability & SRE: Apply SRE principles to build a specialized monitoring and alerting stack for AI agents. You will implement tracing for agent "reasoning loops" and ensure the reliability of the underlying Vector and Graph data stores.

     AI-Native SDLC Leadership: Actively utilize coding agents to plan, generate, and refactor platform code and Infrastructure as Code IaC, maintaining high velocity while ensuring code quality.

     Scale & Performance: Monitor and optimize the performance and cost-effectiveness of AI workloads, ensuring our platform can handle high-frequency agent calls and multi-modal data processing.

     Security & Governance: Own the implementation of secure runtime boundaries, ensuring that both human users and AI agents operate within strict, audited permission sets

 
 
Experience: 10+ years of Software or Platform Engineering experience, with a background as a hands-on engineer who has successfully led technical squads.
 
Technical Stack: Expert mastery of Google Cloud Platform (GKE, Vertex AI), Terraform, Kubernetes, and Python.
 
Product AI Platform: Proven track record of designing and shipping production platforms for AI/LLM workloads, including specialized CI/CD and observability for agentic architectures.
 
Reliability Mindset: Strong command of SRE principles, including experience with SLOs, error budgets, and troubleshooting complex distributed systems.
 
Cloud Infrastructure: Experienced in working with cloud platforms (Google Cloud Platform, AWS) and deploying containerized services that are secure and scalable.
 
Coding Agents: Demonstrated proficiency in using coding agents to accelerate the SDLC and plan and code complex engineering tasks.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10428795
  • Position Id: 8946428
  • Posted 5 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote or Santa Ana, California

Today

Full-time

USD 148,600.00 - 198,200.00 per year

Remote or Scottsdale, Arizona

Today

Full-time

USD 106,605.00 per year

Remote

Today

Easy Apply

Full-time

140,000 - 160,000

Remote

Today

Easy Apply

Full-time

$110000 - $150000

Search all similar jobs