Overview
On Site
Full Time
Skills
Information Technology
Supervision
High Availability
Scripting
Capacity Management
Performance Tuning
Incident Management
Disaster Recovery
Business Continuity Planning
Evaluation
Scalability
Continuous Improvement
Reliability Engineering
Systems Architecture
Strategic Planning
Decision-making
Teamwork
Exceed
Software Development
GitLab
GitHub
Critical Thinking
Problem Solving
Conflict Resolution
Customer Service
Python
Java
Version Control
Git
Screenwriting
Docker
Linux
Computer Networking
Firewall
Root Cause Analysis
Project Management
Agile
Scrum
Application Development
DevOps
FOCUS
IT Management
Ansible
Terraform
Bash
Continuous Integration and Development
Continuous Integration
Continuous Delivery
Server Administration
Network
Mentorship
Orchestration
Amazon Web Services
Microsoft Azure
Google Cloud Platform
Google Cloud
SaaS
IaaS
PaaS
Enterprise Architecture
Artificial Intelligence
Machine Learning (ML)
Red Hat Linux
Kubernetes
Cloud Computing
Health Care
Leadership
Vulnerability Management
Technical Support
System Documentation
Information Security
Policies and Procedures
Presentations
Quality Assurance
Change Management
Documentation
Workflow
Research
Customer Relationship Management (CRM)
Management
Military
Collaboration
Innovation
Microsoft Exchange
ProVision
Recruiting
Job Details
At Duke Health, we're driven by a commitment to compassionate care that changes the lives of patients, their loved ones, and the greater community. No matter where your talents lie, join us and discover how we can advance health together.
About Duke Health Technology Solutions
Pursue your passion for caring and innovation with Duke Heath Technology Solutions, which is dedicated to the transformation, development, and management of enterprise information technology solutions across Duke Health. By harnessing the power of innovative technologies like cloud computing and artificial intelligence - and pairing them with a forward-thinking approach - Duke Health Technology Solutions is revolutionizing the future of health care at Duke Health and beyond.
Occupational Summary
The DHTS Systems Analyst-Site Reliability Engineer (SRE) is responsible for designing, implementing, and maintaining large-scale distributed systems with a focus on reliability, scalability, and performance. The SRE collaborates with development teams to ensure that applications and services are designed and operated to meet reliability targets and scale efficiently. This role involves working with OpenShift for on-premises environments and Azure Kubernetes Service (AKS) for cloud-based solutions.
Essential Tasks/Responsibilities
Level 1 (DHTS System Analyst 1)
Under direct supervision, assist in monitoring and maintaining production systems to ensure high availability and performance, including OpenShift clusters on-premises and AKS in the cloud.
Participate in on-call rotations to respond to system alerts and incidents.
Assist in troubleshooting and resolving system issues and outages across both on-premises and cloud environments.
Help implement and maintain automation scripts for routine tasks and deployments in OpenShift and AKS.
Contribute to the creation and maintenance of documentation for systems and processes.
Assist in capacity planning and performance tuning of systems in both OpenShift and AKS environments.
Participate in post-incident reviews and help implement recommendations.
Learn and apply SRE best practices and methodologies specific to container orchestration platforms.
Collaborate with development teams to improve system reliability and efficiency across on-premises and cloud infrastructures.
Level 2 (DHTS System Analyst 2)
In addition to the duties described for Level 1, the Level 2 SRE will:
Independently design and implement monitoring solutions for complex systems in OpenShift
and AKS environments.
Lead incident response efforts and coordinate with multiple teams during outages, considering
the nuances of both on-premises and cloud infrastructures.
Develop and implement automation solutions to improve system reliability and efficiency across
OpenShift and AKS platforms.
Conduct thorough root cause analysis for incidents and propose long-term solutions that align
with the organization's hybrid infrastructure strategy.
Contribute to the design and implementation of disaster recovery and business continuity plans,
leveraging both on-premises and cloud resources.
Mentor junior team members and provide technical guidance on OpenShift and AKS best
practices.
Participate in the evaluation and implementation of new technologies and tools that
complement OpenShift and AKS environments.
Collaborate with development teams to define and implement SLIs, SLOs, and SLAs across both
platforms.
Contribute to the development of architectural improvements to enhance system reliability and
scalability in a hybrid infrastructure model.
Level 3 (DHTS System Analyst 3)
In addition to the duties described for Level 2, the Level 3 SRE will:
Function as a technical leader and subject matter expert in reliability engineering, with deep
expertise in both OpenShift and AKS.
Lead the design and implementation of large-scale, complex distributed systems across onpremises
OpenShift and cloud-based AKS environments.
Develop and implement strategies for continual improvement of system reliability,
performance, and efficiency in a hybrid infrastructure model.
Lead cross-functional projects to improve overall system architecture and reliability, considering
the strengths and limitations of both OpenShift and AKS.
Provide advanced troubleshooting and problem-solving for critical production issues in both onpremises
and cloud environments.
Develop and maintain relationships with key stakeholders across the organization to align SRE
practices with business objectives.
Drive the adoption of SRE best practices and methodologies across the organization, tailored to
the specific needs of OpenShift and AKS platforms.
Contribute to the definition of technical standards and best practices for the SRE team, ensuring
consistency across on-premises and cloud environments.
Mentor and provide technical leadership to junior and mid-level SREs in both OpenShift and AKS
technologies.
Participate in strategic planning for infrastructure and reliability improvements, considering the
long-term evolution of the hybrid infrastructure model.
Represent the SRE team in high-level technical discussions and decision-making processes
related to container orchestration and cloud strategy.
Advancement to the next level requires employee, at a minimum, successfully attain the following:
1. Proven ability to work at the next level: This involves demonstrating the skills and competencies
required for the next level of responsibility. Employees should have demonstrated that they can
handle tasks and challenges that are typically associated with the higher position.
2. Potential to serve beyond the next level: This measure looks at the employee's long-term
potential and their ability to grow within the organization. The employee should have the vision,
ambition, and capability to take on even greater responsibilities in the future.
3. Consistently demonstrates a values-based approach in how they work: Employees should
consistently exhibit behaviors and decision-making processes that align with DUHS values. The
exhibited values are integrity, teamwork, diversity excellence and safety. Patient-focused is also
critical to success.
4. Is considered one of the top performers at their level across the organization: This measure
evaluates the employee's overall performance and reputation within DHTS. Top performers are
often recognized for their exceptional contributions, reliability, and ability to exceed expectations.
We will select the best and not the best available.
Required Qualifications at this Level
Education
Bachelor's degree in a related field is preferred, or equivalent work experience.
Experience
Level 1 (DHTS System Analyst 1): 0-4 years of software development experience and/or IT
solutions engineering.
Level 2 (DHTS System Analyst 2): Minimum 5 years of software development experience and/or
IT solutions engineering.
Level 3 (DHTS System Analyst 3): Minimum 10 years of software development experience
and/or IT solutions engineering.
Required Skills and Knowledge
Level 1 (DHTS System Analyst 1)
Basic understanding of Application Development Lifecycle, ideally with DevOps focus
Familiarity with script writing (e.g., Ansible Playbooks, Helm Charts)
Basic knowledge of containerization and orchestration technologies (Docker, Kubernetes,
OpenShift)
Familiarity with CI/CD technologies like GitLab CI or GitHub Actions
Basic understanding of server administration (preferably Linux)
Understanding of networking topologies, firewall rules, and certificate management
Ability to analyze customer requirements and translate into effective solutions
Critical thinking and problem-solving skills
Strong customer service orientation
Basic troubleshooting and root cause analysis skills
Familiarity with project management and Agile/SCRUM methodologies
Proficiency in at least one programming language (e.g., Python, Go, Java)
Familiarity with version control systems (e.g., Git)
Level 2 (DHTS System Analyst 2)
All Level 1 skills, plus:
Strong experience with Application Development Lifecycle, with a DevOps focus
Proficiency in script writing (e.g., Ansible Playbooks, Helm Charts)
Extensive experience with containerization and orchestration technologies (Docker, Kubernetes,
OpenShift)
Strong experience with CI/CD technologies and practices
Advanced knowledge of server administration (preferably Linux)
Solid understanding of networking topologies, firewall rules, and certificate management
Proven ability to analyze complex customer requirements and translate into effective solutions
Advanced troubleshooting and root cause analysis skills
Strong project management skills, including Agile/SCRUM experience
Experience with cloud platforms (AWS, Azure, Google Cloud Platform) and services (SaaS, IaaS, PaaS, FaaS)
Knowledge of Enterprise Architecture best practices
Familiarity with AI and ML concepts
Level 3 (DHTS System Analyst 3)
All Level 2 skills, plus:
Technical leadership in application development with a DevOps/CI focus
Technical leadership in automation (Ansible, Terraform, Bash)
Extensive experience with Continuous Integration / Continuous Delivery
Extensive experience with server administration
Expert knowledge of network and security concepts
Proven ability to lead and mentor teams in adopting and optimizing container orchestration
practices
Expert knowledge of cloud platforms (AWS, Azure, Google Cloud Platform) and services (SaaS, IaaS, PaaS, FaaS)
Expert knowledge of Enterprise Architecture best practices
Advanced knowledge of AI and ML concepts and their application in SRE practices
Desired Skills (All Levels)
Red Hat OpenShift certifications
CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer)
certifications
Experience with multi-cloud environments
Knowledge of FHIR APIs and healthcare-specific technologies
Excellent time management, organizational, and task prioritization skills
Strong presentation skills
Ability to communicate effectively with non-technical staff and members of interdisciplinary
teams
Ability to interact well and effectively communicate with all levels of leadership
Experience with data and system flow diagramming
Familiarity with vulnerability management and patching for application containers
Additional Responsibilities (All Levels)
Provide application system support for team apps, including rotating 24x7 support
Develop relationships with vendors to ensure customer needs are met in a timely manner
Author and update system documentation to share all knowledge acquired in the developer
guide
Ensure systems conform to Duke Information Security Office policies and procedures
Assist in oral and written presentations to project teams, customers, and management
Coordinate and perform application testing
Follow established Change Management processes
Provide feedback on departmental processes and procedures and suggest improvements
Plan and coordinate system and application upgrades
Identify internal resources to build project teams as required
Perform detailed analysis and documentation of customer workflows
Collaborate with Administrative, Clinical, and Research customers to understand and meet
needs
Develop relationships with key customer management representatives
Intent:
The intent of this job description is to provide a representative and level of the types of duties and
responsibilities that will be required of positions given this title and shall not be construed as a
declaration of the total of the specific duties and responsibilities of any particular position. Employees
may be directed to perform job-related tasks other than those specifically presented in this description.
Equal Opportunity:
Duke University is an Affirmative Action/Equal Opportunity Employer committed to providing
employment opportunity without regard to an individual's age, color, disability, gender, gender
expression, gender identity, genetic information, national origin, race, religion, sex, sexual orientation,
or veteran status.
Duke aspires to create a community built on collaboration, innovation, creativity, and belonging. Our
collective success depends on the robust exchange of ideas-an exchange that is best when the rich
diversity of our perspectives, backgrounds, and experiences flourishes. To achieve this exchange, it is
essential that all members of the community feel secure and welcome, that the contributions of all
individuals are respected, and that all voices are heard. All members of our community have a
responsibility to uphold these values.
Essential Job Function:
Certain jobs at Duke University and Duke University Health System may include essential job functions
that require specific physical and/or mental abilities. Additional information and provision for requests
for reasonable accommodation will be provided by each hiring department.
Duke is an Equal Opportunity Employer committed to providing employment opportunity without regard to an individual's age, color, disability, gender, gender expression, gender identity, genetic information, national origin, race, religion, sex (including pregnancy and pregnancy related conditions), sexual orientation or military status.
Duke aspires to create a community built on collaboration, innovation, creativity, and belonging. Our collective success depends onthe robust exchange of ideas-an exchange that is best when the rich diversity of our perspectives, backgrounds, and experiences flourishes. To achieve this exchange, it is essential that all members of the community feel secure and welcome, that the contributions of all individuals are respected, and that all voices are heard. All members of our community have a responsibility to uphold these values.
Essential Physical Job Functions: Certain jobs at Duke University and Duke University Health System may include essential job functions that require specific physical and/or mental abilities. Additional information and provision for requests for reasonable accommodation will be provided by each hiring department.
About Duke Health Technology Solutions
Pursue your passion for caring and innovation with Duke Heath Technology Solutions, which is dedicated to the transformation, development, and management of enterprise information technology solutions across Duke Health. By harnessing the power of innovative technologies like cloud computing and artificial intelligence - and pairing them with a forward-thinking approach - Duke Health Technology Solutions is revolutionizing the future of health care at Duke Health and beyond.
Occupational Summary
The DHTS Systems Analyst-Site Reliability Engineer (SRE) is responsible for designing, implementing, and maintaining large-scale distributed systems with a focus on reliability, scalability, and performance. The SRE collaborates with development teams to ensure that applications and services are designed and operated to meet reliability targets and scale efficiently. This role involves working with OpenShift for on-premises environments and Azure Kubernetes Service (AKS) for cloud-based solutions.
Essential Tasks/Responsibilities
Level 1 (DHTS System Analyst 1)
Under direct supervision, assist in monitoring and maintaining production systems to ensure high availability and performance, including OpenShift clusters on-premises and AKS in the cloud.
Participate in on-call rotations to respond to system alerts and incidents.
Assist in troubleshooting and resolving system issues and outages across both on-premises and cloud environments.
Help implement and maintain automation scripts for routine tasks and deployments in OpenShift and AKS.
Contribute to the creation and maintenance of documentation for systems and processes.
Assist in capacity planning and performance tuning of systems in both OpenShift and AKS environments.
Participate in post-incident reviews and help implement recommendations.
Learn and apply SRE best practices and methodologies specific to container orchestration platforms.
Collaborate with development teams to improve system reliability and efficiency across on-premises and cloud infrastructures.
Level 2 (DHTS System Analyst 2)
In addition to the duties described for Level 1, the Level 2 SRE will:
Independently design and implement monitoring solutions for complex systems in OpenShift
and AKS environments.
Lead incident response efforts and coordinate with multiple teams during outages, considering
the nuances of both on-premises and cloud infrastructures.
Develop and implement automation solutions to improve system reliability and efficiency across
OpenShift and AKS platforms.
Conduct thorough root cause analysis for incidents and propose long-term solutions that align
with the organization's hybrid infrastructure strategy.
Contribute to the design and implementation of disaster recovery and business continuity plans,
leveraging both on-premises and cloud resources.
Mentor junior team members and provide technical guidance on OpenShift and AKS best
practices.
Participate in the evaluation and implementation of new technologies and tools that
complement OpenShift and AKS environments.
Collaborate with development teams to define and implement SLIs, SLOs, and SLAs across both
platforms.
Contribute to the development of architectural improvements to enhance system reliability and
scalability in a hybrid infrastructure model.
Level 3 (DHTS System Analyst 3)
In addition to the duties described for Level 2, the Level 3 SRE will:
Function as a technical leader and subject matter expert in reliability engineering, with deep
expertise in both OpenShift and AKS.
Lead the design and implementation of large-scale, complex distributed systems across onpremises
OpenShift and cloud-based AKS environments.
Develop and implement strategies for continual improvement of system reliability,
performance, and efficiency in a hybrid infrastructure model.
Lead cross-functional projects to improve overall system architecture and reliability, considering
the strengths and limitations of both OpenShift and AKS.
Provide advanced troubleshooting and problem-solving for critical production issues in both onpremises
and cloud environments.
Develop and maintain relationships with key stakeholders across the organization to align SRE
practices with business objectives.
Drive the adoption of SRE best practices and methodologies across the organization, tailored to
the specific needs of OpenShift and AKS platforms.
Contribute to the definition of technical standards and best practices for the SRE team, ensuring
consistency across on-premises and cloud environments.
Mentor and provide technical leadership to junior and mid-level SREs in both OpenShift and AKS
technologies.
Participate in strategic planning for infrastructure and reliability improvements, considering the
long-term evolution of the hybrid infrastructure model.
Represent the SRE team in high-level technical discussions and decision-making processes
related to container orchestration and cloud strategy.
Advancement to the next level requires employee, at a minimum, successfully attain the following:
1. Proven ability to work at the next level: This involves demonstrating the skills and competencies
required for the next level of responsibility. Employees should have demonstrated that they can
handle tasks and challenges that are typically associated with the higher position.
2. Potential to serve beyond the next level: This measure looks at the employee's long-term
potential and their ability to grow within the organization. The employee should have the vision,
ambition, and capability to take on even greater responsibilities in the future.
3. Consistently demonstrates a values-based approach in how they work: Employees should
consistently exhibit behaviors and decision-making processes that align with DUHS values. The
exhibited values are integrity, teamwork, diversity excellence and safety. Patient-focused is also
critical to success.
4. Is considered one of the top performers at their level across the organization: This measure
evaluates the employee's overall performance and reputation within DHTS. Top performers are
often recognized for their exceptional contributions, reliability, and ability to exceed expectations.
We will select the best and not the best available.
Required Qualifications at this Level
Education
Bachelor's degree in a related field is preferred, or equivalent work experience.
Experience
Level 1 (DHTS System Analyst 1): 0-4 years of software development experience and/or IT
solutions engineering.
Level 2 (DHTS System Analyst 2): Minimum 5 years of software development experience and/or
IT solutions engineering.
Level 3 (DHTS System Analyst 3): Minimum 10 years of software development experience
and/or IT solutions engineering.
Required Skills and Knowledge
Level 1 (DHTS System Analyst 1)
Basic understanding of Application Development Lifecycle, ideally with DevOps focus
Familiarity with script writing (e.g., Ansible Playbooks, Helm Charts)
Basic knowledge of containerization and orchestration technologies (Docker, Kubernetes,
OpenShift)
Familiarity with CI/CD technologies like GitLab CI or GitHub Actions
Basic understanding of server administration (preferably Linux)
Understanding of networking topologies, firewall rules, and certificate management
Ability to analyze customer requirements and translate into effective solutions
Critical thinking and problem-solving skills
Strong customer service orientation
Basic troubleshooting and root cause analysis skills
Familiarity with project management and Agile/SCRUM methodologies
Proficiency in at least one programming language (e.g., Python, Go, Java)
Familiarity with version control systems (e.g., Git)
Level 2 (DHTS System Analyst 2)
All Level 1 skills, plus:
Strong experience with Application Development Lifecycle, with a DevOps focus
Proficiency in script writing (e.g., Ansible Playbooks, Helm Charts)
Extensive experience with containerization and orchestration technologies (Docker, Kubernetes,
OpenShift)
Strong experience with CI/CD technologies and practices
Advanced knowledge of server administration (preferably Linux)
Solid understanding of networking topologies, firewall rules, and certificate management
Proven ability to analyze complex customer requirements and translate into effective solutions
Advanced troubleshooting and root cause analysis skills
Strong project management skills, including Agile/SCRUM experience
Experience with cloud platforms (AWS, Azure, Google Cloud Platform) and services (SaaS, IaaS, PaaS, FaaS)
Knowledge of Enterprise Architecture best practices
Familiarity with AI and ML concepts
Level 3 (DHTS System Analyst 3)
All Level 2 skills, plus:
Technical leadership in application development with a DevOps/CI focus
Technical leadership in automation (Ansible, Terraform, Bash)
Extensive experience with Continuous Integration / Continuous Delivery
Extensive experience with server administration
Expert knowledge of network and security concepts
Proven ability to lead and mentor teams in adopting and optimizing container orchestration
practices
Expert knowledge of cloud platforms (AWS, Azure, Google Cloud Platform) and services (SaaS, IaaS, PaaS, FaaS)
Expert knowledge of Enterprise Architecture best practices
Advanced knowledge of AI and ML concepts and their application in SRE practices
Desired Skills (All Levels)
Red Hat OpenShift certifications
CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer)
certifications
Experience with multi-cloud environments
Knowledge of FHIR APIs and healthcare-specific technologies
Excellent time management, organizational, and task prioritization skills
Strong presentation skills
Ability to communicate effectively with non-technical staff and members of interdisciplinary
teams
Ability to interact well and effectively communicate with all levels of leadership
Experience with data and system flow diagramming
Familiarity with vulnerability management and patching for application containers
Additional Responsibilities (All Levels)
Provide application system support for team apps, including rotating 24x7 support
Develop relationships with vendors to ensure customer needs are met in a timely manner
Author and update system documentation to share all knowledge acquired in the developer
guide
Ensure systems conform to Duke Information Security Office policies and procedures
Assist in oral and written presentations to project teams, customers, and management
Coordinate and perform application testing
Follow established Change Management processes
Provide feedback on departmental processes and procedures and suggest improvements
Plan and coordinate system and application upgrades
Identify internal resources to build project teams as required
Perform detailed analysis and documentation of customer workflows
Collaborate with Administrative, Clinical, and Research customers to understand and meet
needs
Develop relationships with key customer management representatives
Intent:
The intent of this job description is to provide a representative and level of the types of duties and
responsibilities that will be required of positions given this title and shall not be construed as a
declaration of the total of the specific duties and responsibilities of any particular position. Employees
may be directed to perform job-related tasks other than those specifically presented in this description.
Equal Opportunity:
Duke University is an Affirmative Action/Equal Opportunity Employer committed to providing
employment opportunity without regard to an individual's age, color, disability, gender, gender
expression, gender identity, genetic information, national origin, race, religion, sex, sexual orientation,
or veteran status.
Duke aspires to create a community built on collaboration, innovation, creativity, and belonging. Our
collective success depends on the robust exchange of ideas-an exchange that is best when the rich
diversity of our perspectives, backgrounds, and experiences flourishes. To achieve this exchange, it is
essential that all members of the community feel secure and welcome, that the contributions of all
individuals are respected, and that all voices are heard. All members of our community have a
responsibility to uphold these values.
Essential Job Function:
Certain jobs at Duke University and Duke University Health System may include essential job functions
that require specific physical and/or mental abilities. Additional information and provision for requests
for reasonable accommodation will be provided by each hiring department.
Duke is an Equal Opportunity Employer committed to providing employment opportunity without regard to an individual's age, color, disability, gender, gender expression, gender identity, genetic information, national origin, race, religion, sex (including pregnancy and pregnancy related conditions), sexual orientation or military status.
Duke aspires to create a community built on collaboration, innovation, creativity, and belonging. Our collective success depends onthe robust exchange of ideas-an exchange that is best when the rich diversity of our perspectives, backgrounds, and experiences flourishes. To achieve this exchange, it is essential that all members of the community feel secure and welcome, that the contributions of all individuals are respected, and that all voices are heard. All members of our community have a responsibility to uphold these values.
Essential Physical Job Functions: Certain jobs at Duke University and Duke University Health System may include essential job functions that require specific physical and/or mental abilities. Additional information and provision for requests for reasonable accommodation will be provided by each hiring department.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.