Staff Engineer AI Compute
Remote • Posted 3 hours ago • Updated 3 hours ago

Cloud Destinations LLC
Dice Job Match Score™
👤 Reviewing your profile...
Job Details
Skills
- AI Compute
- Infrastructure as code
- Kubernetes
- Helm
- adopting Argo CD and Argo Workflows
- ML Infrastructure
Summary
- Team has undergone transition in the past charter, they ve taken on more responsibilities in the AI space. The investment will grow moving forward so the Infrastructure teams will need to support them in that respect.
- Understanding of how ML works in large companies would be a bonus, familiarity with Kubernetes (have they built operators on Kubernetes)
- Expertise in Infrastructure as code, -- Chef, Puppet, terraform, spacelift
- Screening should sit in Kubernetes, Helm, adopting Argo CD and Argo Workflows
- Preferred experience lower in the stack: Chef, salt and puppet
- Experience in working with Cloud Providers (Client is an AWS Shop)
- AI compute exp preferred is around the ML ops experience
- Understanding of how a GPU stack works (ie, how does it work on lifecycle)
- Fundamentals of how a GPU gets used on ML models.
- How do they use Memory in the
- Potential coding exercises involved in interviews (algorithms and data structure) for example: if you re given given a linked-lists data structure can you reverse it
- We have ongoing ability to Slack the HM in the event any questions around the tech stack
- Provide technical leadership on high-impact projects
- Influence and coach a distributed team of engineers
- Drive reliability, cost efficiency and capability enhancements for GPU fleet
- Facilitate cross-team alignment on goals, outcomes, and timelines
- Manage project priorities, deadlines, and deliverables
- Contribute to and execute the multi-year strategy for AI Compute Platform
- Design, develop, test, deploy, maintain, and enhance the AI Compute Platform Your Expertise:
- BS, MS or Ph.D. in computer science or related field, or equivalent work experience
- 7+ years of relevant work experience in infrastructure
- 4+ years of expertise with a public cloud provider (AWS, Google Cloud Platform, Azure) and their infrastructure as a service offering (e.g. EC2).
- Experience setting technical direction, planning, and successfully executing on large projects spanning multiple teams
- Kubernetes Experience is required.
- ML Infrastructure (LLM fundamentals, tuning, optimization) Experience is preferred.
- Dice Id: 91097117
- Position Id: 8871230
- Posted 3 hours ago
Company Info
One of the leading US-based staffing and IT consulting partner. Experience exceptional service and top-tier talent across industries. Count on us for staffing solutions that cater to the unique demands of the American market.
Our experienced recruiters ensure a seamless fit within your team, accelerating success. But we go beyond staffing and empower employees with fully sponsored certification programs, keeping them ahead. Experience comprehensive benefits including health, wellness coverage, dental insurance, vision insurance, as well as flexible hours, remote work options, and a robust 401K plan to ensure a secure future at the companies we represent.
At Cloud Destinations, we bring industry expertise and a passion for excellence. From Enterprise Cloud Strategy to Managed Infrastructure Services, Digital Transformation, BI & Data Analytics, Security, Data Engineering, and more, we navigate the IT landscape with finesse. Choose us as your trusted partner, witness transformative talent and exceptional service. Let's unlock new possibilities and drive your success in the dynamic world of IT together.
Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs
