Overview
On Site
USD 104,000.00 - 202,000.00 per year
Full Time
Skills
GTP
Microsoft Excel
Technical Communication
Software Engineering
Communication
Electronic Commerce
Linux Administration
File Systems
Client/server
Firewall
Proxies
Physical Layer
Data Link Layer
Production Support
Log Analysis
MMS
JIRA
Adobe SiteCatalyst
Omniture
MEAN Stack
Recovery
Internal Communications
Workflow
Programming Languages
React.js
Node.js
Shell Scripting
Database
Scripting
Software Development
Docker
Kubernetes
DevOps
Systems Engineering
Operational Excellence
Computer Science
Incident Management
Problem Solving
Conflict Resolution
Computer Networking
Network
TCP/IP
UDP
ICMP
IP
Intellectual Property
Dragon NaturallySpeaking
DNS
OSI
Load Balancing
Unix
Linux
Grafana
Kibana
Graphite
Nagios
New Relic
Dynatrace
Cloud Computing
Microsoft Azure
Google Cloud
Google Cloud Platform
OpenStack
Version Control
Git
JavaScript
API
ServiceNow
Splunk
Java
Python
Shell
Data Science
Machine Learning (ML)
Finance
Life Insurance
Military
Exceed
Software Asset Management
English
Web Content
WCAG
Assistive Technology
Accessibility
Job Details
Position Summary...
What you'll do...
As a Site Reliability Operations Engineer within the Global Technology Platforms (GTP) Command and Control Center (CCC) Team you will work with other CCC, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability of Walmart's technology stack.
You're right for the job if you are comfortable in monitoring, detecting, major incident response with a technical team of engineer's laser focused on restoring service across complex distributed systems. To successfully achieve this, you will draw upon your knowledge of the tech stack and tools to surface key data. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our Application, DevOps and other cross functional engineering teams to support our next generation "always up" cloud-based technology platforms.
You will utilize your Software engineering skills to understand the technology stack and use this knowledge to ensure systems continue to meet production ready standards, Operational Excellence is key! Good judgement is crucial as you will own detection, prioritization, critical engagement, and communication of the incident process until issue is remediated. Your ability to continuously challenge yourself and develop a strong network with peers and stakeholders cross functionally will see you exceed in this role. Our goal is to protect the customer, merchant and associate experience and deliver outstanding levels of availability across Walmart Global Technology.
About the Role
Omnichannel eCommerce production support
o Acquire in-depth technical knowledge of omnichannel cloud platforms, web traffic flows, micro-services, and service dependencies for major incident resolution.
Unix/Linux administration
o Provide support for Unix and Linux systems from Kernel to Shell and beyond, taking into consideration system libraries, file systems, and client-server protocols.
Networking knowledge and troubleshooting
Leverage knowledge of network technologies such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, CDN, OSI layers, Firewalls, Gateway, Proxy, and Load balancers.
Cloud understanding and triaging
o Provide L1 and L2 production support for multiple cloud technologies such as Open stack, Cloud Native platform, Microsoft Azure, and Google Cloud Platform for triaging critical issues using various internal and vendor-related tools.
Alert, Monitoring, Log analysis
o Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools like Grafana, Prometheus, MMS, Kibana, Graphite, Service Now, JIRA, Dynatrace, New Relic, Omniture, Splunk, and CDN logs [Reduce MTTD - Mean Time to Detect]
Incident triage, Escalation and Resolution
o Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery [Reduce MTTE - Mean Time to Engage], and focusing on immediate restoration [ Reduce MTTR - Mean Time to Restore] of large-scale enterprise systems.
Enhance Monitoring solutions
o Develop enterprise monitoring and utilize tooling software solutions such as Grafana, Kibana, Splunk, Graphite, New Relic, to improve visibility, pro-actively detect issues and restore system availability
Enhance Alerting solutions
o Designing and implementing JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight and xMatters
Develop Tools and support
o Design and develop solutions for widespread internal communications for cloud applications support or workflows for infrastructure availability issues with various internal applications with multiple programming languages like Java, JavaScript (React, Node JS), Python and Shell programming technologies like Prometheus, Database Query languages
Automation and Self-healing
o Demonstrate knowledge of scripting and software development for automation and self-healing of multi-cloud environments. Help enhance existing solutions by developing automation with Docker, Kubernetes and working with DevOps and Engineering partners
Required Skills:
2+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.
Bachelor's Degree in Computer Science or a related field, or relevant work experience.
Strong and demonstrable incident management skills with relevant experience in an enterprise organization.
Experience and exposure working in a 24/7 operations support environment.
Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative and drive.
Experience investigating, analyzing and troubleshooting large scale enterprise systems.
Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
Experience administering Unix/Linux in a production environment.
Experience working with and developing enterprise monitoring/tooling/logging solutions like Grafana, Kibana, Splunk, Openobserve, Graphite, Nagios, New Relic, DynaTrace and Prometheus.
Working knowledge of one or more cloud technologies such as AZURE, Google Cloud Platform, OpenStack.
Experience with distributed version control like Git or similar
Designing and implementing JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight, Splunk, and xMatters
Programming experience in one or more of the following languages: Go, Java, Python, Shell, etc.
Experience in data science/machine learning would be advantageous.
At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable.
For information about PTO, see ;br>
Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.
For information about benefits and eligibility, see One.Walmart.
Sunnyvale, California US-11657:The annual salary range for this position is $104,000.00-$202,000.00
Bentonville, Arkansas US-09050:The annual salary range for this position is $80,000.00-$155,000.00
Additional compensation includes annual or quarterly performance bonuses.
Additional compensation for certain positions may also include:
- Stock
Minimum Qualifications...
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area.
Option 2: 3 years' experience in software engineering or related area.
Preferred Qualifications...
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart's accessibility standards and guidelines for supporting an inclusive culture.
Primary Location...
1345 Crossman Ave, Sunnyvale, CA 94089-1114, United States of America
Walmart and its subsidiaries are committed to maintaining a drug-free workplace and has a no tolerance policy regarding the use of illegal drugs and alcohol on the job. This policy applies to all employees and aims to create a safe and productive work environment.
What you'll do...
As a Site Reliability Operations Engineer within the Global Technology Platforms (GTP) Command and Control Center (CCC) Team you will work with other CCC, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability of Walmart's technology stack.
You're right for the job if you are comfortable in monitoring, detecting, major incident response with a technical team of engineer's laser focused on restoring service across complex distributed systems. To successfully achieve this, you will draw upon your knowledge of the tech stack and tools to surface key data. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our Application, DevOps and other cross functional engineering teams to support our next generation "always up" cloud-based technology platforms.
You will utilize your Software engineering skills to understand the technology stack and use this knowledge to ensure systems continue to meet production ready standards, Operational Excellence is key! Good judgement is crucial as you will own detection, prioritization, critical engagement, and communication of the incident process until issue is remediated. Your ability to continuously challenge yourself and develop a strong network with peers and stakeholders cross functionally will see you exceed in this role. Our goal is to protect the customer, merchant and associate experience and deliver outstanding levels of availability across Walmart Global Technology.
About the Role
Omnichannel eCommerce production support
o Acquire in-depth technical knowledge of omnichannel cloud platforms, web traffic flows, micro-services, and service dependencies for major incident resolution.
Unix/Linux administration
o Provide support for Unix and Linux systems from Kernel to Shell and beyond, taking into consideration system libraries, file systems, and client-server protocols.
Networking knowledge and troubleshooting
Leverage knowledge of network technologies such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, CDN, OSI layers, Firewalls, Gateway, Proxy, and Load balancers.
Cloud understanding and triaging
o Provide L1 and L2 production support for multiple cloud technologies such as Open stack, Cloud Native platform, Microsoft Azure, and Google Cloud Platform for triaging critical issues using various internal and vendor-related tools.
Alert, Monitoring, Log analysis
o Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools like Grafana, Prometheus, MMS, Kibana, Graphite, Service Now, JIRA, Dynatrace, New Relic, Omniture, Splunk, and CDN logs [Reduce MTTD - Mean Time to Detect]
Incident triage, Escalation and Resolution
o Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery [Reduce MTTE - Mean Time to Engage], and focusing on immediate restoration [ Reduce MTTR - Mean Time to Restore] of large-scale enterprise systems.
Enhance Monitoring solutions
o Develop enterprise monitoring and utilize tooling software solutions such as Grafana, Kibana, Splunk, Graphite, New Relic, to improve visibility, pro-actively detect issues and restore system availability
Enhance Alerting solutions
o Designing and implementing JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight and xMatters
Develop Tools and support
o Design and develop solutions for widespread internal communications for cloud applications support or workflows for infrastructure availability issues with various internal applications with multiple programming languages like Java, JavaScript (React, Node JS), Python and Shell programming technologies like Prometheus, Database Query languages
Automation and Self-healing
o Demonstrate knowledge of scripting and software development for automation and self-healing of multi-cloud environments. Help enhance existing solutions by developing automation with Docker, Kubernetes and working with DevOps and Engineering partners
Required Skills:
2+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.
Bachelor's Degree in Computer Science or a related field, or relevant work experience.
Strong and demonstrable incident management skills with relevant experience in an enterprise organization.
Experience and exposure working in a 24/7 operations support environment.
Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative and drive.
Experience investigating, analyzing and troubleshooting large scale enterprise systems.
Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
Experience administering Unix/Linux in a production environment.
Experience working with and developing enterprise monitoring/tooling/logging solutions like Grafana, Kibana, Splunk, Openobserve, Graphite, Nagios, New Relic, DynaTrace and Prometheus.
Working knowledge of one or more cloud technologies such as AZURE, Google Cloud Platform, OpenStack.
Experience with distributed version control like Git or similar
Designing and implementing JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight, Splunk, and xMatters
Programming experience in one or more of the following languages: Go, Java, Python, Shell, etc.
Experience in data science/machine learning would be advantageous.
At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable.
For information about PTO, see ;br>
Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.
For information about benefits and eligibility, see One.Walmart.
Sunnyvale, California US-11657:The annual salary range for this position is $104,000.00-$202,000.00
Bentonville, Arkansas US-09050:The annual salary range for this position is $80,000.00-$155,000.00
Additional compensation includes annual or quarterly performance bonuses.
Additional compensation for certain positions may also include:
- Stock
Minimum Qualifications...
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area.
Option 2: 3 years' experience in software engineering or related area.
Preferred Qualifications...
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart's accessibility standards and guidelines for supporting an inclusive culture.
Primary Location...
1345 Crossman Ave, Sunnyvale, CA 94089-1114, United States of America
Walmart and its subsidiaries are committed to maintaining a drug-free workplace and has a no tolerance policy regarding the use of illegal drugs and alcohol on the job. This policy applies to all employees and aims to create a safe and productive work environment.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.