SRE Architect

Remote • Posted 1 day ago • Updated 1 day ago
Contract Independent
Contract W2
No Travel Required
Remote
$70 - $80/hr
Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

  • Splunk engineering
  • Python
  • Shell scripting
  • automation
  • AWS cloud services
  • Kubernetes
  • Nagios

Summary

We are looking for a highly experienced Senior Observability & Site Reliability Engineer to support large-scale enterprise platforms and mission-critical applications. The ideal candidate will have deep hands-on experience in building and operating end-to-end monitoring, logging, and alerting solutions across distributed environments.

This role involves close collaboration with development, infrastructure, and operations teams to ensure platform reliability, performance visibility, and incident response effectiveness.


Key Responsibilities

  • Design, implement, and maintain enterprise observability solutions using Splunk Enterprise including dashboards, alerts, and data ingestion pipelines
  • Develop and enhance monitoring frameworks for infrastructure, applications, and web platforms
  • Automate operational processes using Linux shell scripting and Python
  • Implement intelligent alerting strategies to reduce noise and improve incident response efficiency
  • Provide L3 production support for business-critical applications and infrastructure
  • Support cloud and containerized deployments across AWS and Kubernetes environments
  • Collaborate with engineering teams to standardize logging and telemetry practices
  • Drive root cause analysis, post-incident reviews, and continuous reliability improvements
  • Build operational runbooks, disaster recovery procedures, and service continuity plans
  • Integrate monitoring and deployment workflows with CI/CD tools such as Jenkins, Git, and TeamCity
  • Support database monitoring and performance analysis across SQL Server, Oracle, DB2, and MySQL platforms
  • Participate in ITIL-based change, incident, and problem management processes

Required Skills

  • Strong hands-on expertise in Splunk engineering, administration, and architecture
  • Advanced experience in Linux / Unix environments
  • Proficiency in Python, Shell scripting, and automation frameworks
  • Experience with AWS cloud services and Kubernetes / Docker platforms
  • Knowledge of monitoring tools such as Nagios and custom observability solutions
  • Experience supporting high-availability web platforms and distributed systems
  • Strong troubleshooting and production incident management skills
  • Understanding of CI/CD pipelines and deployment automation
  • Familiarity with ITIL processes and service management tools like ServiceNow

Preferred Qualifications

  • Splunk certifications (Power User / Admin / Architect)
  • Experience building large-scale telemetry platforms
  • Background in financial services or high-transaction enterprise environments
  • Experience designing intelligent alerting and automated incident workflows

 

Experience Level

  • 15+ years in production engineering / SRE / observability roles
  • Prior experience supporting mission-critical enterprise systems
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10112653
  • Position Id: HK-777
  • Posted 1 day ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

4d ago

Easy Apply

Contract

Depends on Experience

Remote

15d ago

Easy Apply

Contract

$35 - $40

Remote

Today

Easy Apply

Contract

Depends on Experience

Remote or New York, New York

Today

Contract

$50 - $60 hourly

Search all similar jobs