Company:QUALCOMM SEMICONDUCTORES Y SISTEMAS AVANZADOS DE BAJA CALIFORNIA
Job Area:Engineering Group, Engineering Group > Software Engineering
General Summary:Cloud Infrastructure & Infrastructure as Code- Design, build, and manage cloud infrastructure with a primary focus on AWS, integrated with OpenStack environments
- Build and maintain Infrastructure as Code using:
- Terraform
- Ansible
- Kubernetes (manifests / Helm)
- Design infrastructure solutions for:
- Scalability
- High availability
- Performance
- Reliability
- Cost efficiency
- Implement redundancy, failover, and disaster-recovery patterns across services and regions
- Perform capacity planning based on performance metrics, usage trends, and utilization data
Kubernetes & Platform Reliability- Operate and scale production Kubernetes clusters in large-scale environments
- Partner with development and QA teams to:
- Improve system reliability and resiliency
- Automate scalability and availability mechanisms
- Apply SRE principles including:
- Service reliability ownership
- Proactive failure prevention
- Continuous improvement of operational processes
- Support microservices-based and distributed system architectures
CI/CD, Automation & Operational Excellence- Manage and evolve CI/CD pipelines (e.g., Jenkins)
- Automate infrastructure provisioning, configuration, and lifecycle management
- Write, maintain, and improve runbooks for operational processes
- Build automation to reduce manual intervention and operational toil
- Plan and execute infrastructure upgrades and maintenance activities
- Proactively identify and address technical and infrastructure debt
Data Platforms & Streaming Systems- Operate, tune, and scale data and streaming platforms, including:
- Kafka, Zookeeper
- NiFi
- Elasticsearch
- MySQL, Vertica
- Diagnose and resolve performance and stability issues across data pipelines
- Ensure data platform reliability, throughput, and resilience at scale
AI-Assisted SRE & Intelligent Automation- Design and maintain knowledge-driven automated runbooks and operational bots
- Develop AI-assisted operational workflows, including:
- Incident analysis and summarization
- Intelligent diagnostics and remediation suggestions
- Automation of repetitive operational decision-making
- Work with LLM-based agent frameworks (e.g., Claude Agent SDK or similar):
- Integrate agents with logs, metrics, monitoring, and internal tools
- Implement guard-railed, controlled-action automation for production use
- Research and propose new concepts, tools, and AI-driven approaches to improve reliability and efficiency
Monitoring, Reliability & Incident Management- Design and operate monitoring and observability systems using:
- Prometheus
- Grafana
- ELK stack
- Improve alert quality, signal-to-noise ratio, and troubleshooting efficiency
- Lead incident response activities, root cause analysis, and post-incident reviews
- Support software engineers in debugging complex production issues across distributed systems
- Embed reliability, automation, and operational readiness into system design
Experience Required- Extensive experience operating large-scale distributed cloud systems
- Hands-on experience with AWS in production environments
- Direct experience working with OpenStack
- Strong Linux background in large-scale SaaS or production systems
- Ability to:
- Maintain and improve existing mission-critical systems
- Prioritize and systematically reduce technical and infrastructure debt
- Strong understanding of designing for operational excellence, not just greenfield solutions
Required Skills- Programming: Strong experience with Python and/or Go
- Cloud & IaC: Terraform, Ansible, CloudFormation or equivalent
- Containers: Kubernetes (production experience)
- CI/CD: Jenkins and modern CI/CD practices
- Data & Streaming: Kafka, NiFi, Elasticsearch, MySQL, Vertica, Zookeeper
- Observability: Prometheus, Grafana, ELK
- Infrastructure: Nginx, Linux internals
- AI / Automation (advantage):
- Experience integrating AI or LLMs into operational workflows
- Familiarity with agent-based automation concepts
Experience Guidelines3+ years in:
- overall experience managing infrastructure
- Linux administration in large-scale environments
- operating production systems on AWS and/or OpenStack
- managing Kubernetes in production
- using infrastructure as code
- working with CI/CD systems
Minimum Qualifications: Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience.
OR
Master's degree in Engineering, Information Systems, Computer Science, or related field and 1+ year of Software Engineering or related work experience.
OR
PhD in Engineering, Information Systems, Computer Science, or related field.
2+ years of academic or work experience with Programming Language such as C, C++, Java, Python, etc.
Applicants: Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail or call Qualcomm's toll-free number found here. Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process. Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities. We will not respond here to requests for updates on applications or resume inquiries).
Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.
To all Staffing and Recruiting Agencies: Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications.
If you would like more information about this role, please contact Qualcomm Careers.