Do something big and innovative! Stretch your creative muscles and work on big issues. Since 1989, we have developed technology environments, applications, and tools by providing experienced teams to implement, enhance, and maintain our clients essential systems and applications. Come join the Scalence team!
Job Title: ML Serving Operations Analyst
Duration: 12+ months
Location: Remote - Pacific work hours (Must be local to bay area)
Pay rate: up to $55/hr. W2 with benefits
Job Summary:
Resource Management team is responsible for end-to-end resource planning and provisioning on our client s infrastructure, including Budgeting, Compute, Storage, Accelerators & Network, Data Center infrastructure resources to support Engineering ( Eng ) & Site Reliability Engineering ( SRE ) service related requests. Responsible for handling tactical execution tasks that cannot yet be automated in order to improve service response times and reduce risk to client s infrastructure.
Ideal candidate will have an engineering degree like a computer science major with experience in running Terminal Commands and will have really good understanding of SQL and the terminology of computer hardware.
Requirements:
1. Respond to Pool Minding Alerts to proactively keep production service pools Healthy & reduce reliability risk.
2. Manage Resource Requests from SRE/Eng to FTE team for all Infrastructure services.
3. Manage Supply Planning Operations including ordering of weekly resources (Machine Orders), writing the weekly health reports, monitoring in progress orders, and escalating in case of SLO slippage for critical growth dependencies.
4. Establish migration execution plans to move services between locations to mitigate against data center constraints.
5. Execute replacement plans for large-scale infrastructure projects, i.e. cluster turndowns, cluster migrations due to limited data center space, service rebalance due to resource constraints.
6. Assist in Special Projects (e.g. building data pipelines for automated reporting & metrics management).
7. Update vendor playbooks as process changes, subject to FTE review and approval.
Other requirements:
1. Required to attend weekly meetings with the client stakeholders and any additional meetings that the client feels is necessary.
2. Required to provide written reports such as: Weekly Supply/Demand fulfillment status report; Weekly Flexpool low inventory alert report; Weekly Operation ticket queue report on aging tickets and reasons; and Operational project status report.
3. Respond to resource ticket requests;
4. Manage resource pool alerts and machine orders;
5. Support pool migrations; and
6. Perform data analysis to measure operational performance.