Overview
USD 129,467.00 - 194,201.00 per year
Full Time
Skills
System Imaging
LinkedIn
Scalability
IaaS
Apache HiveMind
Incident Management
Collaboration
Root Cause Analysis
Operational Excellence
Reliability Engineering
DevOps
Amazon Web Services
Amazon EC2
Remote Desktop Services
Amazon RDS
Kubernetes
Grafana
Scripting
Python
Bash
Terraform
Computer Networking
Linux
Cloud Computing
Workflow
Military
Recruiting
Artificial Intelligence
Job Details
Founded in 2015, Shield AI is a venture-backed defense technology company with the mission of protecting service members and civilians with intelligent, autonomous systems. Its products include Hivemind Enterprise-EdgeOS, Pilot, Commander, and Forge-as well as V-BAT and Sentient Vision Systems (wide-area motion imaging software). With offices in San Diego, Dallas, Washington, D.C., Abu Dhabi (UAE), Kyiv (Ukraine), and Melbourne (Australia), Shield AI's technology actively supports U.S. and allied operations worldwide. For more information, visit Follow Shield AI on LinkedIn, X and Instagram.
Job Description:
As a Site Reliability Engineer at Hivemind, you will play a key role in ensuring the performance, reliability, and scalability of our cloud infrastructure. You'll be responsible for building and maintaining monitoring and alerting systems for both internal and external services, defining and evolving incident response strategies, and automating operational processes to minimize risk and eliminate toil. Your work will directly impact the stability and resilience of Hivemind's platform, helping us deliver exceptional experiences to our users.
What You'll Do:
Required Qualifications:
Preferred Qualifications:
$129,467 - $194,201 a year
#LI-LD1
#LC
Full-time regular employee offer package:
Pay within range listed + Bonus + Benefits + Equity
Temporary employee offer package:
Pay within range listed above + temporary benefits package (applicable after 60 days of employment)
Salary compensation is influenced by a wide array of factors including but not limited to skill set, level of experience, licenses and certifications, and specific work location. All offers are contingent on a cleared background and possible reference check. Military fellows and part-time employees are not eligible for benefits. Please speak to your talent acquisition representative for more information.
Shield AI is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please let us know.
Job Description:
As a Site Reliability Engineer at Hivemind, you will play a key role in ensuring the performance, reliability, and scalability of our cloud infrastructure. You'll be responsible for building and maintaining monitoring and alerting systems for both internal and external services, defining and evolving incident response strategies, and automating operational processes to minimize risk and eliminate toil. Your work will directly impact the stability and resilience of Hivemind's platform, helping us deliver exceptional experiences to our users.
What You'll Do:
- Design, implement, and maintain robust monitoring, logging, and alerting systems
- Define incident response procedures and participate in on-call rotations
- Identify and resolve reliability and performance issues across services
- Develop automation tools to streamline operations and reduce manual interventions
- Collaborate with engineering teams to ensure new services are production-ready
- Conduct root cause analyses and implement post-incident improvements
- Champion a culture of reliability, observability, and operational excellence
Required Qualifications:
- 5+ years of experience in Site Reliability Engineering, DevOps, or related roles
- Strong experience with AWS services (EC2, ECS/EKS, RDS, IAM, etc.)
- Deep understanding of Kubernetes and containerized deployments
- Proficiency with monitoring and observability tools (e.g. Prometheus, Grafana, Datadog, ELK)
- Strong scripting or programming skills (Python, Go, Bash, etc.)
- Experience with infrastructure-as-code (Terraform, CloudFormation, or similar)
- Solid understanding of networking, Linux systems, and distributed architectures
Preferred Qualifications:
- Experience with service meshes (e.g., Istio or Linkerd)
- Familiarity with security best practices in cloud environments
- Exposure to GitOps workflows and tools (e.g., ArgoCD or Flux)
$129,467 - $194,201 a year
#LI-LD1
#LC
Full-time regular employee offer package:
Pay within range listed + Bonus + Benefits + Equity
Temporary employee offer package:
Pay within range listed above + temporary benefits package (applicable after 60 days of employment)
Salary compensation is influenced by a wide array of factors including but not limited to skill set, level of experience, licenses and certifications, and specific work location. All offers are contingent on a cleared background and possible reference check. Military fellows and part-time employees are not eligible for benefits. Please speak to your talent acquisition representative for more information.
Shield AI is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please let us know.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.