Principal Platform Engineer (Site Reliability Engineer)

Overview

On Site
Depends on Experience
Full Time

Skills

Data Engineering
Continuous Delivery
DevSecOps
FOCUS
Innovation
Network Operations
Management
Preventive Maintenance
Performance Management
Project Management
System Monitoring
Team Leadership
Reporting
Service Level
Documentation
Collaboration
Dashboard
Standard Operating Procedure
Incident Management
NOC
UPS
RMF
Risk Management Framework
IaaS
Amazon Web Services
Microsoft Azure
Virtualization
Computer Networking
Dragon NaturallySpeaking
DNS
Cloud Computing
Kubernetes
Docker
Grafana
JIRA
Workflow
Terraform
DoD
Integrated Circuit
Internal Communications
IC
Cyber Security
Regulatory Compliance
Communication
Leadership
Security Clearance
Continuous Integration
IT Operations
Network
System Administration
Clarity

Job Details

Clarity Innovations is a trusted national security partner, dedicated to safeguarding our nation's interests and delivering innovative solutions that empower the Intelligence Community (IC) and Department of Defense (DoD) to transform data into actionable intelligence, ensuring mission success in an evolving world.

Our mission-first software and data engineering platform modernizes data operations, utilizing advanced workflows, CI/CD, and secure DevSecOps practices. We focus on challenges in Information Warfare, Cyber Operations, Operational Security, and Data Structuring, enabling end-to-end solutions that drive operational impact.

We are committed to delivering cutting-edge tools and capabilities that address the most complex national security challenges, empowering our partners to stay ahead of emerging threats and ensuring the success of their critical missions. At Clarity, we are people-focused and set on being a destination employer for top talent, offering an environment where innovation thrives, careers grow, and individuals are valued. Join us as we continue to lead innovation and tackle the most pressing challenges in national security.

Position Overview

The Network Operations Center Engineer assists the NOC Lead to manage and oversee the daily operations of an 8am - 5pm EST classified cloud development environment, with a strong emphasis on maintaining Kubernetes-hosted services. The NOC Engineer is responsible for coordinating incident response, system monitoring, team leadership, performance reporting, and ensuring the development environment's security and availability.

Key Responsibilities
  • Carry out day-to-day operations of the classified NOC, ensuring adherence to service level agreements and system uptime requirements
  • Perform monitoring and support of cloud-based systems, networks, and containerized applications in Kubernetes clusters
  • Coordinate incident response, troubleshooting, and escalation procedures
  • Ensure timely detection, resolution, and documentation of service-impacting events
  • When NOC lead is absent, act as the primary point of contact for cloud system alerts, outages, and classified network incidents; communicate status to stakeholders and leadership
  • Ensure 24/7 observability of network, platform, and container-level components using tools such as Prometheus, Grafana, Fluentd, and Elastic Stack
  • Draft technical guidance for NOC staff and collaborate with engineering, cybersecurity, and cloud teams
  • Maintain situational awareness of the system through dashboards, logs, and proactive monitoring tools
  • Develop and maintain standard operating procedures, incident response plans, runbooks, and shift logs
  • Assist NOC lead conducting daily stand-ups, shift handovers, and weekly ops reviews
  • Generate operational metrics and performance reports
  • Ensure compliance with federal security policies and contribute to continuous accreditation of the cloud system under RMF
  • Perform readiness drills, after-action reviews, and contribute to lessons-learned activities

Qualifications
  • Expertise in cloud infrastructure (AWS GovCloud, Azure Government, or C2S/C2E/JWCC), virtualization, and hybrid environments
  • Understanding of secure networking, load balancers, DNS in cloud-native architectures, and inter-cluster communication
  • Operational experience with Kubernetes, containerized workloads, and supporting technologies (Docker, Helm, Fluentd, Kustomize)
  • Strong understanding of monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and ticketing systems (e.g., osTicket, Jira)
  • Familiarity with GitOps workflows and infrastructure as code using Terraform or Flux
  • Familiarity with DoD/IC cybersecurity compliance standards, ATO processes, and classified system governance
  • Excellent communication skills and the ability to clearly brief complex operational topics to leadership and mission partners

Preferred Qualifications
  • Active US TS/SCI security clearance with CI polygraph or higher
  • 5+ years of experience in IT operations or network/system administration

We are an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.

Create a Job Alert

Interested in building your career at Clarity Innovations? Get future opportunities sent straight to your email.
Create alert
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.