Infrastructure Engineer

Overview

On Site
Depends on Experience
Full Time

Skills

Apache Flink
Apache Spark
Cloud Computing
Docker
Kubernetes
cluster management
Workflow
Python
Machine Learning (ML)

Job Details

Key Responsibilities:
Design, build, and scale robust distributed compute infrastructure optimized for high throughput offline processing.
Architect and develop scalable solutions leveraging frameworks like Ray, Spark, Dask, Hadoop, Flink, or Beam.
Interface directly with quantitative researchers and data-driven teams to ensure their computational workloads run efficiently.
Implement and manage cluster management technologies such as Kubernetes, Docker Swarm,
YARN, SLURM, or Nomad.
Troubleshoot and optimize performance bottlenecks in large-scale distributed systems.
Develop tools to parallelize, automate, and streamline research processes and workflows.
Continuously evaluate and recommend improvements to the compute infrastructure, understanding trade-offs across different technologies and cloud platforms.

Qualifications:
Strong programming skills in Python.
Demonstrated expertise in distributed, parallel, or cloud computing frameworks.
Hands-on experience designing and scaling offline compute systems.
Proven ability to build and optimize compute clusters and distributed architectures.
Experience collaborating closely with client teams or researchers leveraging technical solutions.
Demonstrated curiosity and passion for computing infrastructure and technology.

Preferred Qualifications:
Experience working directly with quantitative or ML-focused researchers.
Prior experience designing and deploying compute infrastructure from the ground up.

Familiarity with modern, cloud-native technology stacks (e.g., AWS EMR, AWS Batch,
Kubernetes).
Experience working in environments involving machine learning or data-intensive workloads.
Familiarity with vendor-specific solutions like AWS Parallel Cluster, Azure Batch, or Google
Kubernetes Engine.

What We Offer:
A greenfield opportunity to build and significantly influence the firm's technical infrastructure.
Exposure to a diverse set of challenging problems, large-scale datasets, and cutting-edge
technologies.
An innovative, high-impact environment with significant collaboration across research and
engineering teams.
The chance to work with massive datasets including historical trading data, textual data, and
alternative datasets, processing petabytes of information in an offline computational setting.
An empowering and rapidly growing organization with a culture emphasizing technical
excellence and innovation.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Application Management Services LLC