Senior Site Reliability Engineer

Network, NT, Engineer, Development, Software, Engineering, Python, Java, Linux, Windows, Nomad, IBM, SQL, Oracle
Full Time

Job Description

Our purpose is to serve the nation with the single most trusted and capable health information network, built to increase patient safety, lower costs and ensure quality care.

What You're Like

You have a relentless desire to get stuff done. A true champion of the collective objective, you're energized by applying your expertise with a group of talented people to achieve something important, together. You're curious and passionate about new technologies and cutting-edge innovation, and you enjoy pulling together complex pieces of a puzzle to deliver a powerful and meaningful end-product. You're a translator between people and teams that don't use the same lingo. You don't settle for short-term gratification-you're into long-haul, incremental efforts that require endurance, patience and next-level collaboration.

What We're Like

Surescripts Network Technology and Operations (NT&O) is comprised of smart people who love to work towards a common goal, often delivering an innovative, industry-leading solution to the healthcare marketplace. We pride ourselves on quality work that's grounded in complete transparency and accountability. When a project goes off the rails-and they do from time to time-we rally around each other to fix it and move on. While our gratification is often the result of a months-long effort, we never tire of delivering a huge result that has an exponentially positive impact on the healthcare system-whether it's quality, cost or patient safety.

OK, But Here's What It's Really Like

Working at Surescripts NT&O, your thinking cap will always be on. You'll be challenged to make sure that disparate, cross-functional pieces come together to create a desired result. You might swarm around some work if you think a milestone is about to be missed. You'll work to quickly understand diverse technologies, and establish and maintain relationships with groups who see and talk about things differently.

Job Summary:

The primary purpose of this position is to collaborate with the agile teams and operations staff to ensure smooth transition of software applications from software development to production environments, and on-going service availability. The Senior Site Reliability Engineer serves as a go-to team member on the capabilities and limits of the multi-data center production infrastructure.

Responsibilities:
  • Actively participate in hand off between Development and Operations following our DevOps methodology. Ensure smooth transition of software applications from software development to production environments.
  • Provide requirements for service maintainability and resiliency to Software Engineering teams.
  • Collaborate with the Software Engineering teams to define best practices promoting service reliability and fault-tolerance. Ensure best practices are part of the design.
  • Design and implement improvements that enhance service reliability, infrastructure resiliency and security, and data availability.
  • Troubleshoot and resolve issues related to Production and Staging systems configuration.
  • Develop and automate emergency recovery procedures, deployment schedules, post-maintenance validation, and other operational activities.
  • Provide expertise for all matters related to the service operations and act as a first level of escalation for any issues. Troubleshoot and provide root cause analysis for issues spanning code, network, database and systems components.
  • Collaborate with Product and Software Development teams to define Service Level Agreements (SLAs), Objectives (SLOs) and Indicators (SLIs). Collect SLI metrics and establish monitoring based on SLO thresholds and other product requirements. Develop product specific reliability requirements to support SLOs. Understand application dependencies, review dependency handling and health checks. Evaluate whether the dependency reliability is adequate to meet SLOs.
  • Collaborate with the Software Development and Operations teams to define infrastructure requirements and architecture. Ensure the infrastructure meets performance and capacity requirements.
  • Ensure service availability during software upgrades, and infrastructure and database maintenance.
  • Provide technical leadership and mentoring to other members of SRE team.
  • Participate in on-call rotation.


Qualifications:
Basic Requirements:
  • Bachelor's degree in computer science, information sciences or related field, or equivalent experience.
  • 5+ years proven development skills in one or more programming languages: Python, Java, Go, Ruby, shell scripting or similar.
  • 5+ years of software development, automation or infrastructure as code experience.
  • Ability to analyze network traces and troubleshoot application performance problems.
  • Ability to conceptualize a distributed service, it's dependencies and the transactional flow.
  • Experience with Unix/Linux and Windows operating system administration and networking architecture.
  • Experience providing technical leadership and architectural guidance to Software Development teams.


Preferred Qualifications:
  • 7+ years proven development skills in one or more programming languages: Python, Java, Go, Ruby, shell scripting or similar.
  • 7+ years of software development, automation or infrastructure as code experience.
  • Cloud infrastructure as code experience, e.g., Terraform, CloudFormation.
  • Experience with configuration management tools Ansible, Chef, Puppet, Salt, and application schedulers like Kubernetes, Nomad, DockerSwam.
  • Experience monitoring/supporting Kafka, IBM MQ.
  • Experience querying SQL and No SQL databases. Familiarity with Oracle, Hadoop or Cassandra database architecture.
  • Experience building CI/CD tools (Jenkins, Teamcity) for a production application in an enterprise environment.
  • Demonstrated ability to triage processing bottlenecks.
  • Experience with monitoring systems: Influx, Splunk, Zenoss, AppDynamics or similar.
  • Experience troubleshooting certificate issues and PKI infrastructure.


Surescripts is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate on the basis of race, color, religion, age, national origin, ancestry, disability, medical condition, marital status, pregnancy, genetic information, gender, sexual orientation, parental status, gender identity, gender expression, veteran status, or any other status protected under federal, state, or local law.
Dice Id : 10335395
Position Id : REQ1082
Originally Posted : 1 month ago
Have a Job? Post it