Software Engineer II

  • Austin, TX
  • Posted 23 hours ago | Updated 11 hours ago

Overview

On Site
Full Time

Skills

Video
Storage
Python
Kubernetes
Management
Recovery
Amazon Web Services
Google Cloud
Google Cloud Platform
Cloud Computing
Git
Writing
Collaboration
PyTorch
API

Job Details

Position: Software Engineer II
Location: Austin TX - 100% Remote
Interview: Video
Duration: 9+ months

Fully Remote (Any timezone will work) Working hours per week: 20 hrs (Part-time)

Job Description (including duties, skills, education):
2-3 years of working experience will be OK.
Create a framework for managing jobs on a combination of on-premises and external cloud compute, where a job is allocated a compute node to load an LLM, retrieve queries from remote storage, and generate responses from the LLM given the queries. The framework must be robust and include fault tolerance so that jobs can be restarted/recovered upon failure and child nodes are brought down cleanly upon success.

Required skills include:
* Experience working with Python
* Experience working with LLM inference libraries (e.g. vLLM, transformers, or nemotron)
* Experience with Kubernetes, including managing tasks given available infrastructure, and coordinating containers
* Experience building robust distributed applications that allow graceful failure and recovery
* Experience working with AWS/Google Cloud Platform cloud platforms
* Experience using basic git functionality, including branching, pull requests, reviews, and merging
* Experience writing unit tests
* Excellent collaboration skills are required

Bonus skills include:
* Experience with PyTorch
* Experience with API design

#TB_EN
#TB_EN
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.