Site Reliability Engineer- 5 days onsite NoHo, NYC

Overview

On Site
150k - 250k
Full Time

Skills

Financial Services
Finance
Artificial Intelligence
IaaS
High Availability
Scalability
Collaboration
DevOps
Workflow
Incident Management
Database
Documentation
Standard Operating Procedure
Computer Science
Information Technology
Amazon Web Services
Microsoft Azure
Amazon EC2
Amazon S3
Virtual Private Cloud
Management
Kubernetes
Linux Administration
Shell Scripting
Terraform
Scripting
Bash
Python
Computer Networking
TCP/IP
DNS
Dragon NaturallySpeaking
Firewall
Continuous Delivery
Jenkins
GitLab
Continuous Integration
GitHub
Cloud Computing
Regulatory Compliance
SAP BASIS

Job Details

Site Reliability Engineer

This company is developing AI thought partners designed to enhance human intelligence and creativity, transforming how knowledge is created and shared in financial services. We're unapologetically ambitious driven by a clear goal: to build the world's leading Financial AI company.

The company is located in in NoHo, NYC and will be 5 days onsite.

What You Will Be Doing:
  • Cloud Infrastructure Management: Design, implement, and maintain robust cloud infrastructure on AWS and/or Azure to ensure high availability, scalability, and fault tolerance.
  • Monitoring & System Health: Leverage Datadog to build proactive monitoring and alerting systems, enabling rapid detection and resolution of performance issues.
  • Kubernetes & Container Management: Administer and optimize Kubernetes clusters, utilizing Helm for efficient package management and deployment automation.
  • Automation & Infrastructure as Code: Develop and maintain Infrastructure as Code (IaC) using Terraform; automate routine tasks with scripts written in Bash or Python.
  • Cross-Functional Collaboration: Partner with development and operations teams to foster a DevOps mindset, streamline CI/CD workflows, and implement best practices.
  • Incident Response & Troubleshooting: Diagnose and resolve complex issues across OS, networking, and database layers in cloud-based environments.
  • Documentation: Create and maintain thorough documentation of infrastructure configurations, standard operating procedures, and troubleshooting playbooks.

Required Skills & Experience:
  • Bachelor's degree in Computer Science, Information Technology, or a related discipline.
  • 3-5 years of hands-on experience with AWS and/or Azure, including services such as EC2, S3, VPC, and Lambda.
  • 2-3 years managing Kubernetes clusters in production environments.
  • 2-3 years of experience using Helm for Kubernetes application deployments.
  • 2-3 years working with monitoring platforms like Datadog.
  • 3-5 years of experience in Linux system administration and shell scripting.
  • 2-3 years of experience with Infrastructure as Code (Terraform preferred).
  • Strong scripting abilities in Bash and Python.
  • Solid understanding of networking concepts, including TCP/IP, DNS, firewalls, and load balancers.
  • Experience with CI/CD tools such as Jenkins, GitLab CI, or GitHub Actions.
  • Familiarity with cloud-native security practices and regulatory compliance standards.
Applicants must be currently authorized to work in the United States on a full-time basis now and in the future.
This position doesn't provide sponsorship.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Motion Recruitment Partners, LLC