Sr Site Reliability Engineer

Overview

On Site
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 6 month(s)

Skills

SKILLS

Job Details

MatchPoint Solutions is a fast-growing, young, energetic global IT-Engineering services company with clients across the US. We provide technology solutions to various clients like Uber, Robinhood, Netflix, Airbnb, Google, Sephora, and more! More recently, we have expanded to working internationally in Canada, China, Ireland, UK, Brazil, and India. Through our culture of innovation, we inspire, build, and deliver business results, from idea to outcome. We keep our clients on the cutting edge of the latest technologies and provide solutions by using industry-specific best practices and expertise.

We are excited to be continuously expanding our team. If you are interested in this position, please send over your updated resume. We look forward to hearing from you!

Job Title: Sr Site Reliability Engineer (IPP Team)

Number of Roles: 2 - two locations

Target Start Date: Beginning of Feb

Location: Austin, Tx or Santa Clara, Ca. (some on-site work required)

Rate: $65-78/hr on W2

Responsibilities:

  • Develop framework and scripts to automate workflows and deployments in the private cloud environment.
  • Deploy and maintain a large farm of machines using world-class Configuration Management & Infrastructure Automation (IaaC) tools like Chef, Ansible, Terraform
  • Develop extensive monitoring systems to have fast, reliable and real-time pulse of the various infrastructure subsystems (Zabbix, Grafana)
  • Participate in on-call & rotational L1 support for round-the-clock monitoring and remediation of the infrastructure.
  • Work closely with multi-functional teams across the globe to ensure service uptime and SLAs are maintained.
  • Solve complex problems involving multi site infrastructure scaling, leading GPU product bring-up in infrastructure, integrating GPU test suites to infrastructure harness etc.
  • Automation and performance tuning of regression test frameworks, creation of self healing/automated recovery solutions for multi-geo regression farms.
  • Assist in roll-out and deployment of new development features aimed at supporting the latest Nvidia hardware and technologies.

Looking for:

  • Bachelor's or Master's Degree in Computer Science or Software Engineering, or equivalent demonstrable experience.
  • 2+ years of relevant experience in cloud technologies.
  • Ability to analyze and debug source code to triage, root cause and resolve issues in the infrastructure. Collaborate with the development teams in improving the build and test infrastructure.
  • Familiar with maintenance and setup of Linux, Windows hosts and popular open source applications such as Nginx, Apache HTTP, Apache Tomcat and MySQL server.
  • Hands-on programming experience with any including but not limited to Python (preferred), TCL, JAVA etc. Unix shell proficiency is expected.
  • Experience in MySQL/No-SQL(plus), should be able to write complex queries.
  • Experience with version control systems like Perforce, GIT is expected.

Stand out with:

  • Experience with public clouds (AWS, Google Cloud Platform, Azure), VM and container virtualization technologies like VMware, KVM, Docker and Kubernetes.
  • Background with automating bare metal and VM provisioning.
  • Experience with debugging GPU performance issues, embedded device software development and automation, software driver development and CUDA/TensorRT applications.
Benefits include Medical, Dental, Vision and 401k.

MatchPoint Solutions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.