AI Infrastructure & Experience Engineer

Mountain View, CA, US • Posted 2 hours ago • Updated 2 hours ago
Contract W2
4 Months
No Travel Required
On-site
Depends on Experience
Company Branding Image
Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

  • API
  • ARM
  • Artificial Intelligence
  • C++
  • Caching
  • CUDA
  • Computer Hardware
  • GPU
  • Docker
  • Design Of Experiments
  • JavaScript
  • Machine Learning (ML)
  • Kubernetes
  • Optimization
  • React.js
  • Python
  • SPIN
  • Orchestration
  • Performance Metrics
  • WebSocket
  • Computer Science
  • Network
  • SEC
  • Debugging
  • Communication
  • Workflow
  • Rust

Summary

A global consumer device company based in Mountain View, CA is looking for AI Infrastructure & Experience Engineer to join their team!

Key Responsibilities

  • Deploy and tune multiple LLMs and generative multimodal models on local inference hardware. Optimize performance metrics (TTFT, tokens/sec) via model quantization, caching strategies, and architecture-specific adjustments.
  • Leverage deep knowledge of the CUDA environment to build custom kernels, ensuring maximum utilization of the low cost GPU compute.
  • Seamlessly bridge inference backends with orchestration layers (LiteLLM, Ollama, etc.) and frontends like OpenWebUI.
  • Build functional, high-fidelity demos showcasing model memory capabilities, agentic workflows, and context-aware web search.
  • Implement communication protocols to bridge local AI compute with peripheral devices, including smart TVs, household appliances, and XR hardware.


Qualifications:

  • 3 years of relevant industry experience required
  • Recent experience in model optimization required
  • Proven experience with NVIDIA eco-systems and ARM64 architecture.
  • Advanced proficiency in C++, Python, and Rust. Deep familiarity with CUDA and the ability to author/debug custom CUDA kernels for compute-intensive tasks.
  • Extensive experience with modern inference engines (llama.cpp, TensorRT-LLM, Ollama) and orchestration frameworks (LiteLLM).
  • Robust understanding of asynchronous programming (FastAPI), containerization (Docker/Kubernetes), sandbox environments, and API design for low-latency communication.
  • Ability to quickly spin up modern frontend UIs (React, Next.js, or similar) to present AI-driven intelligence to end users.
  • Familiarity with WebSockets, gRPC, and REST for device-to-device communication in a local network environment.
  • Degree in Computer Science, Machine Learning or Artificial Intelligence Specialization preferred, but not required.



Type: Contract
Duration: 4 months with extension
Work Location: Mountain View, CA (onsite)
Pay range: $ 64.00  - $ 79.00 (DOE)

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10365912
  • Position Id: 9005646
  • Posted 2 hours ago

Company Info

About OSI Engineering, Inc.

OSI Engineering delivers professional engineering consultants and contractors to enable you to meet your time-to-market demands. Our technical knowledge of your specific technology, streamline the process to deliver the right engineer with the right technical expertise to add value with minimal ramp up time. Additionally, on-call access to our highly-skilled engineering pool enables your business to stay ahead of the curve.

Contact the job poster
DD

Darshana Desai

Recruiter @ OSI Engineering, Inc.
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs