Title: Application Support & Development Engineer - Hybrid
Mandatory skills:
MLOps, Machine Learning Operations,
Google Cloud Platform, Google Cloud Platform,
VertexAI, IAM,
Java, Python,
NoSQL database,
containerization, orchestration, Docker,
Terraform,
ML, machine learning, machine learning models, machine learning pipelines, machine learning workflows,
ML enablement, ML platforms, ML platform components, ML enablement tooling, ML platform tools, MLOps workflows, MLOps patterns,
cloud infrastructure, cloud native systems, cloud native infrastructure, cloud resources,
CI/CD pipelines, automation, Git-based workflows,
APIs, backend services,
model deployment, lifecycle patterns, reusable infrastructure, service developer platforms
Description:
We are seeking a highly skilled Application Support & Development Engineer to support and enhance demand forecasting on-demand interfaces and assortment planning capabilities. This role is critical to ensuring system reliability, performance, and continuous improvement of user-facing applications.
This position combines real-time monitoring, incident response, user support, and break-fix activities with hands-on development work, including implementing enhancements, fixing defects in code, and contributing to system and pipeline improvements. The role works in close partnership with engineering and data science teams to deliver both operational excellence and incremental product development.
The ideal candidate brings a DevOps-oriented engineering mindset, combining strong software development skills with systems operations, troubleshooting expertise, and a passion for building and maintaining high-performing, data-driven applications.
Key Responsibilities:
Monitor application health, performance, and availability across distributed systems
Troubleshoot and resolve production issues, including latency, data inconsistencies, and system failures
Perform root-cause analysis and implement durable fixes, including code changes where appropriate
Design, develop, and deploy enhancements, bug fixes, and small features across applications and data pipelines
Provide timely support to end users, including ticket triage, escalation, and resolution
Collaborate with engineering and data science teams to design, implement, and validate solutions
Support, maintain, and enhance data pipelines, ensuring reliable and efficient data flow into planning and forecasting systems
Maintain and improve observability through logging, metrics, and alerting tools
Participate in incident management processes, adhering to defined SLOs/SLAs
Contribute to documentation, runbooks, and knowledge-sharing practices
Participate in code reviews and follow best practices for testing, deployment, and maintainability
Assist with and improve deployment processes and CI/CD pipelines
Skillset Details:
Application Reliability, Performance & Engineering
Strong experience in application monitoring, troubleshooting, performance tuning, and implementing fixes in distributed systems.
Familiarity with observability tools (e.g., Grafana, Prometheus, Splunk, Datadog, or equivalent).
Understanding of scalability principles and experience diagnosing and resolving latency, throughput, or memory issues in production systems.
Ability to perform root-cause analysis and translate findings into code or configuration improvements.
Software Engineering & Development
Solid proficiency in at least one backend programming language used in the system (e.g., Python, Java, Kotlin, or Scala).
Experience developing, testing, and deploying production-grade code, including bug fixes and enhancements.
Familiarity with API-based systems, microservices, and event-driven architectures.
Competence in using Git for version control, code reviews, and structured change management processes.
Data and Pipeline Engineering
Working knowledge of data pipelines and batch/stream processing tools (e.g., Apache Spark).
Experience developing or modifying ETL/ELT workflows and ensuring data quality and reliability.
Understanding of data validation, logging, and error-handling practices in ML or analytics-driven applications.
Ability to both support and enhance data pipelines feeding forecasting and planning systems.
Cloud and Deployment Infrastructure
Hands-on experience with cloud platforms (Google Cloud Platform, Azure, or AWS) for building, deploying, and managing applications.
Familiarity with containerization (Docker, Kubernetes) and CI/CD tools (e.g., Jenkins, GitLab CI).
Basic understanding of infrastructure-as-code (e.g., Terraform or Cloud Deployment Manager) preferred.
Experience contributing to deployment automation and improving release processes.
Incident Response, Operations & Continuous Improvement
Experience in production support or SRE-style operations, including ticket triage, escalation, and communication with end users.
Ability to work within Service Level Objectives (SLOs) and contribute to reducing incident frequency through engineering improvements.
Comfortable collaborating across time zones and balancing operational support with development priorities.
Communication and Collaboration
Strong written and verbal communication skills to coordinate effectively with Minneapolis engineering and data science teams.
Ability to translate user-reported issues into actionable engineering tasks and code changes.
Proactive and collaborative approach to working with cross-functional partners (merch planners, demand planning, inventory teams).
Skillset Must Have:
Application Reliability & Performance
Experience monitoring, troubleshooting, and improving distributed systems
Hands-on use of observability tools (Grafana, Prometheus, Splunk, Datadog, etc.)
Ability to diagnose and resolve production issues (latency, memory, throughput)
Strong root-cause analysis with ability to implement code/configuration fixes
Software Engineering & Development
Proficiency in at least one backend language (Python, Java, Kotlin, or Scala)
Ability to read, debug, modify, and write production code
Experience building or enhancing APIs, microservices, or event-driven systems
Version control experience (Git) and structured code review practices
Data & Pipeline Engineering
Working knowledge of data pipelines (batch and/or streaming)
Experience developing or supporting tools like Apache Spark or similar frameworks
Understanding of data validation, logging, and error handling
Ability to monitor, troubleshoot, and enhance ETL/ELT workflows
Cloud & Infrastructure
Hands-on experience with cloud platforms (AWS, Google Cloud Platform, or Azure)
Familiarity with containerization (Docker, Kubernetes)
Experience with CI/CD pipelines (Jenkins, GitLab CI, etc.)
Incident Response & Support
Production support or SRE experience (triage, escalation, resolution)
Ability to work within SLOs/SLAs and contribute to operational improvements
Experience supporting end users and handling break-fix scenarios
Skillset Nice to Have
Experience with infrastructure-as-code (Terraform, Cloud Deployment Manager)
Exposure to ML/analytics-driven systems or forecasting platforms
Experience building or enhancing scalable data services or APIs
Advanced performance tuning and scalability optimization experience
Familiarity with retail, merchandising, or supply chain systems
Experience supporting globally distributed teams across time zones
Knowledge of automated alerting, runbooks, and operational playbooks
TECHNICAL SKILLS
Must Have
Application Reliability & Performance Experience monitoring, troubleshooting, and tuning distributed systems Hands-on use of observability tools (Grafana, Prometheus, Splunk, Datadog, etc.) Ability to diagnose production issues (latency, memory, throughput) Strong root-cause analysis and incident resolution skills Software Engineering & Debugging Proficiency in at least one backend language (Python, Java, Kotlin, or Scala) Ability to read, debug, and modify existing codebases Experience with APIs, microservices, and event-driven systems Version control experience (Git) and structured change management Data & Pipeline Operations Working knowledge of data pipelines (batch and/or streaming) Experience supporting tools like Apache Spark or similar frameworks Understanding of data validation, logging, and error handling Ability to monitor and troubleshoot ETL/ELT workflows Cloud & Infrastructure Hands-on experience with cloud platforms (AWS, Google Cloud Platform, or Azure) Familiarity with containerization (Docker, Kubernetes) Experience with CI/CD pipelines (Jenkins, GitLab CI, etc.) Incident Response & Support Production support or SRE experience (triage, escalation, resolution) Ability to work within SLOs/SLAs and document incidents Experience supporting end users and handling break-fix scenarios
Nice To Have
Experience with infrastructure-as-code (Terraform, Cloud Deployment Manager) Exposure to ML/analytics-driven systems or forecasting platforms Advanced performance tuning and scalability optimization experience Familiarity with retail, merchandising, or supply chain systems Experience supporting globally distributed teams across time zones Knowledge of automated alerting, runbooks, and operational playbooks
Notes:
Hybrid preferred- Onsite Wednesday/Thursday
VIVA USA is an equal opportunity employer and is committed to maintaining a professional working environment that is free from discrimination and unlawful harassment. The Management, contractors, and staff of VIVA USA shall respect others without regard to race, sex, religion, age, color, creed, national or ethnic origin, physical, mental or sensory disability, marital status, sexual orientation, or status as a Vietnam-era, recently separated veteran, Active war time or campaign badge veteran, Armed forces service medal veteran, or disabled veteran. Please contact us at for any complaints, comments and suggestions.
Contact Details :
Account co-ordinator: Ramadas Kumaresan
VIVA USA INC.
3601 Algonquin Road, Suite 425
Rolling Meadows, IL 60008
|