Overview
On Site
$55 - $70 hr
Contract - W2
Contract - Independent
Contract - 6+ mo(s)
Skills
Please refer to job description
Job Details
Join a Global Leader in Workforce Solutions Net2Source Inc.
Who We Are
Net2Source Inc. isn t just another staffing company, we re a powerhouse of innovation, connecting top talent with the right opportunities. Recognized for 300% growth in the past three years, we operate in 34 countries with a global team of 5,500+. Our mission? To bridge the talent gap with precision Right Talent. Right Time. Right Place. Right Price.
Job Description: Experienced DevOps/Site Reliability Engineer
*** s Digital Analytics and System Health software engineering team is seeking a talented and highly motivated DevOps / Site Reliability Engineer to join our team in Ridley Park, Pennsylvania, Hazelwood, Missouri, Plano, Texas or Oklahoma City, Oklahoma.
The ideal candidate will possess a strong foundation in DevOps and practical experience in owning and operating platform services and underlying infrastructure to help ensure the reliability, scalability, and performance of our systems. You will work closely with a cross-functional team to implement automated monitoring, incident response, capacity planning, and runbooks. You will contribute to the evolution of our reliability practices, instrumentation, and error budgets, while gaining hands-on experience with our production systems. This role suits those who enjoy building scalable platforms, automating end-to-end processes, and improving the overall user experience. As part of the team, you will tackle a broad range of complex tasks using modern tools and methodologies, contributing to the evolution of our digital and analytics solutions.
At ***, we are all innovators on a mission to connect, protect, explore and inspire. From the seabed to outer space, you ll learn and grow, contributing to work that shapes the world. Find your future with us.
Position Responsibilities
Maintain and improve the reliability, availability, and performance of production services, with a focus on reducing incident frequency and recovery/restoration time.
Design, implement, and operate monitoring, alerting, logging, and tracing solutions to provide end-to-end visibility of systems and dependencies.
Respond to and resolve production incidents, participate in post-incident reviews, and help implement corrective actions.
Build and maintain runbooks, standard operating procedures, and automation to reduce manual toil and improve operational consistency.
Collaborate with software engineers to optimize code for reliability, scalability, and resilience, and assist with capacity planning and performance tuning.
Implement and manage CI/CD pipelines, deployment strategies, and blue/green/canary release patterns to ensure safe and rapid software delivery.
Manage infrastructure and assist with provisioning, scaling, and maintaining cloud resources.
Enforce security and compliance best practices in the production environment, including access controls, secrets management, and secure logging.
Participate in on-call coverage, rotate responsibilities, and communicate clearly with stakeholders about status and risks.
Contribute to reliability-related projects, tooling, and initiatives that improve platform health and developer experience.
Infrastructure reliability and resilience: regularly assess and improve the reliability of core infrastructure components (networking, storage, compute, databases, caching layers) with emphasis on redundancy, fault tolerance, and scalable failover strategies.
Participate in defining disaster recovery objectives (RPO, RTO), implement capabilities (backup/restore, cross-region failover, site failover), and conduct regular exercises to validate recovery procedures.
Ensure robust backup/restore procedures, perform regular backup validation, and protect critical data across regions and environments.
Forecast growth, model failure domains, and ensure capacity buffers and scalable architectures to withstand regional outages or component failures.
Basic Qualifications (Required Skills/Experience)
Bachelor s degree in Computer Science, Information Technology, or a related field (or equivalent practical experience).
5-7 years of experience in DevOps or a related field.
Strong Linux/Unix administration skills and proficiency in at least one scripting language (e.g., Python, Bash).
Experience with cloud platforms, containers, and orchestration (AWS/Azure/Google Cloud Platform, Docker/Kubernetes).
Familiarity with containerization (Docker) and container orchestration (Kubernetes).
Experience with monitoring and observability tools (Prometheus, Grafana, ELK/EFK, OpenTelemetry).
Solid understanding of incident management processes, on-call practices, and post-mortem analysis.
Knowledge of CI/CD concepts and tooling (e.g., Jenkins, GitHub Actions, GitLab CI) and automation scripting.
Strong problem-solving, debugging, and communication skills; ability to work in a collaborative, cross-functional environment.
Preferred Qualifications (Desired Skills/Experience)
Bachelor s degree in Information Technology, Computer Science or a related field, or equivalent practical experience.
ITIL/ITSM or similar service management certifications (ITIL Foundation or equivalent) environments is a plus.
Knowledge of DoD or government security requirements or other regulated environments is a plus.
1+ years of experience in the Aerospace industry
Why Work With Us?
We believe in more than just jobs we build careers. At Net2Source, we champion leadership at all levels, celebrate diverse perspectives, and empower you to make an impact. Think work-life balance, professional growth, and a collaborative culture where your ideas matter.
Our Commitment to Inclusion & Equity
Net2Source is an equal opportunity employer, dedicated to fostering a workplace where diverse talents and perspectives are valued. We make all employment decisions based on merit, ensuring a culture of respect, fairness, and opportunity for all, regardless of age, gender, ethnicity, disability, or other protected characteristics.
Awards & Recognition
" America s Most Honored Businesses (Top 10%)
" Fastest-Growing Staffing Firm by Staffing Industry Analysts
" INC 5000 List for Eight Consecutive Years
" Top 100 by Dallas Business Journal
" Spirit of Alliance Award by Agile1
Ready to Level Up Your Career?
Click Apply Now and let s make it happen.
Best regards,
[[YourFirstName]] [[YourLastName]]
[[YourTitle]]
[[YourAddress]]
Who We Are
Net2Source Inc. isn t just another staffing company, we re a powerhouse of innovation, connecting top talent with the right opportunities. Recognized for 300% growth in the past three years, we operate in 34 countries with a global team of 5,500+. Our mission? To bridge the talent gap with precision Right Talent. Right Time. Right Place. Right Price.
Job Description: Experienced DevOps/Site Reliability Engineer
*** s Digital Analytics and System Health software engineering team is seeking a talented and highly motivated DevOps / Site Reliability Engineer to join our team in Ridley Park, Pennsylvania, Hazelwood, Missouri, Plano, Texas or Oklahoma City, Oklahoma.
The ideal candidate will possess a strong foundation in DevOps and practical experience in owning and operating platform services and underlying infrastructure to help ensure the reliability, scalability, and performance of our systems. You will work closely with a cross-functional team to implement automated monitoring, incident response, capacity planning, and runbooks. You will contribute to the evolution of our reliability practices, instrumentation, and error budgets, while gaining hands-on experience with our production systems. This role suits those who enjoy building scalable platforms, automating end-to-end processes, and improving the overall user experience. As part of the team, you will tackle a broad range of complex tasks using modern tools and methodologies, contributing to the evolution of our digital and analytics solutions.
At ***, we are all innovators on a mission to connect, protect, explore and inspire. From the seabed to outer space, you ll learn and grow, contributing to work that shapes the world. Find your future with us.
Position Responsibilities
Maintain and improve the reliability, availability, and performance of production services, with a focus on reducing incident frequency and recovery/restoration time.
Design, implement, and operate monitoring, alerting, logging, and tracing solutions to provide end-to-end visibility of systems and dependencies.
Respond to and resolve production incidents, participate in post-incident reviews, and help implement corrective actions.
Build and maintain runbooks, standard operating procedures, and automation to reduce manual toil and improve operational consistency.
Collaborate with software engineers to optimize code for reliability, scalability, and resilience, and assist with capacity planning and performance tuning.
Implement and manage CI/CD pipelines, deployment strategies, and blue/green/canary release patterns to ensure safe and rapid software delivery.
Manage infrastructure and assist with provisioning, scaling, and maintaining cloud resources.
Enforce security and compliance best practices in the production environment, including access controls, secrets management, and secure logging.
Participate in on-call coverage, rotate responsibilities, and communicate clearly with stakeholders about status and risks.
Contribute to reliability-related projects, tooling, and initiatives that improve platform health and developer experience.
Infrastructure reliability and resilience: regularly assess and improve the reliability of core infrastructure components (networking, storage, compute, databases, caching layers) with emphasis on redundancy, fault tolerance, and scalable failover strategies.
Participate in defining disaster recovery objectives (RPO, RTO), implement capabilities (backup/restore, cross-region failover, site failover), and conduct regular exercises to validate recovery procedures.
Ensure robust backup/restore procedures, perform regular backup validation, and protect critical data across regions and environments.
Forecast growth, model failure domains, and ensure capacity buffers and scalable architectures to withstand regional outages or component failures.
Basic Qualifications (Required Skills/Experience)
Bachelor s degree in Computer Science, Information Technology, or a related field (or equivalent practical experience).
5-7 years of experience in DevOps or a related field.
Strong Linux/Unix administration skills and proficiency in at least one scripting language (e.g., Python, Bash).
Experience with cloud platforms, containers, and orchestration (AWS/Azure/Google Cloud Platform, Docker/Kubernetes).
Familiarity with containerization (Docker) and container orchestration (Kubernetes).
Experience with monitoring and observability tools (Prometheus, Grafana, ELK/EFK, OpenTelemetry).
Solid understanding of incident management processes, on-call practices, and post-mortem analysis.
Knowledge of CI/CD concepts and tooling (e.g., Jenkins, GitHub Actions, GitLab CI) and automation scripting.
Strong problem-solving, debugging, and communication skills; ability to work in a collaborative, cross-functional environment.
Preferred Qualifications (Desired Skills/Experience)
Bachelor s degree in Information Technology, Computer Science or a related field, or equivalent practical experience.
ITIL/ITSM or similar service management certifications (ITIL Foundation or equivalent) environments is a plus.
Knowledge of DoD or government security requirements or other regulated environments is a plus.
1+ years of experience in the Aerospace industry
Why Work With Us?
We believe in more than just jobs we build careers. At Net2Source, we champion leadership at all levels, celebrate diverse perspectives, and empower you to make an impact. Think work-life balance, professional growth, and a collaborative culture where your ideas matter.
Our Commitment to Inclusion & Equity
Net2Source is an equal opportunity employer, dedicated to fostering a workplace where diverse talents and perspectives are valued. We make all employment decisions based on merit, ensuring a culture of respect, fairness, and opportunity for all, regardless of age, gender, ethnicity, disability, or other protected characteristics.
Awards & Recognition
" America s Most Honored Businesses (Top 10%)
" Fastest-Growing Staffing Firm by Staffing Industry Analysts
" INC 5000 List for Eight Consecutive Years
" Top 100 by Dallas Business Journal
" Spirit of Alliance Award by Agile1
Ready to Level Up Your Career?
Click Apply Now and let s make it happen.
Best regards,
[[YourFirstName]] [[YourLastName]]
[[YourTitle]]
[[YourAddress]]
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.