Overview
Skills
Job Details
About the Role:
We are seeking a highly skilled Platform Engineer with deep expertise in NVIDIA GPU architecture, Triton Inference Server, and large-scale system deployments. The ideal candidate will have hands-on experience optimizing and scaling inference workloads, integrating AI models into production environments, and fine-tuning performance across heterogeneous systems.
location: Jersey City, New Jersey
job type: Contract
salary: $75 - 90 per hour
work hours: 8am to 5pm
education: Bachelors
responsibilities:
Key Responsibilities:
- Design, implement, and maintain high-performance GPU-powered inference infrastructure using NVIDIA GPUs and Triton Inference Server.
- Scale and optimize inference workloads to meet production demands, ensuring low-latency, high-throughput service delivery.
- Integrate Triton with various model frameworks (e.g., TensorFlow, PyTorch, ONNX) and data pipelines.
- Develop and maintain CI/CD pipelines for model deployment and system upgrades.
- Monitor GPU utilization and system performance; troubleshoot and resolve bottlenecks across the inference stack.
- Collaborate with data scientists, DevOps, and ML engineers to ensure smooth deployment of models into scalable environments.
- Implement strategies for GPU resource sharing, batching, dynamic model loading, and A/B testing.
- Stay current with the latest in GPU-based computing, NVIDIA ecosystem tools (e.g., CUDA, TensorRT), and emerging AI infrastructure practices.
qualifications:
- Bachelor's or Master's in Computer Science, Electrical Engineering, or a related field.
- 3+ years of experience in system engineering with a focus on GPU-based inference systems.
- Proficiency with Triton Inference Server, TensorRT, and CUDA.
- Solid experience with containerization (Docker, Kubernetes) and infrastructure-as-code tools.
- Strong understanding of Linux systems, networking, and GPU resource management.
- Experience deploying and scaling AI/ML models in production environments.
- Familiarity with profiling tools and performance optimization techniques for GPUs.
skills:
- Bachelor's or Master's in Computer Science, Electrical Engineering, or a related field.
- 3+ years of experience in system engineering with a focus on GPU-based inference systems.
- Proficiency with Triton Inference Server, TensorRT, and CUDA.
- Solid experience with containerization (Docker, Kubernetes) and infrastructure-as-code tools.
- Strong understanding of Linux systems, networking, and GPU resource management.
- Experience deploying and scaling AI/ML models in production environments.
- Familiarity with profiling tools and performance optimization techniques for GPUs.
Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.
At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact
Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including: medical, prescription, dental, vision, AD&D, and life insurance offerings, short-term disability, and a 401K plan (all benefits are based on eligibility).
This posting is open for thirty (30) days.