Job Description -
Senior WEKA Storage Solutions Engineer (HPC Environment)
Must be an architect/Designer level on weka
Full-time position, but they would consider a contractor if they are from a Hedge fund or a big financial institution.
Location: Manhattan, NY (Hybrid) — can be fully remote if located outside the area.
Role Summary: Seeking an architect-level storage engineer to design, build, and operate large-scale WEKA distributed storage environments for high-performance computing (HPC) workloads. The role focuses on performance, scalability, automation, and operational excellence across on-premises and cloud storage infrastructures.
Key Responsibilities
- Design, build, and maintain large-scale HPC WEKA storage and compute environments to support business growth.
- Optimize performance, scalability, and capacity across high-performance storage systems.
- Automate deployment, configuration, and monitoring processes using Chef, Ansible, and Python.
- Support hybrid storage architectures spanning on-premises and cloud (AWS, Google Cloud Platform).
- Collaborate with global engineering and infrastructure teams to enforce standards and enhance system reliability.
- Troubleshoot complex performance and integration issues across hardware, OS, and distributed systems layers.
- Implement observability solutions (Prometheus, Grafana, Datadog, ELK) to monitor and tune storage performance.
- Develop and maintain infrastructure as code for storage and compute environments.
Requirements
- 5+ years in infrastructure engineering with a focus on distributed storage systems in a Linux environment.
- Proven hands-on experience with WEKA or similar parallel file systems (GPFS, Lustre, Ceph).
- Strong Python skills for automation and tool development.
- Experience managing petabyte-scale storage systems.
- Familiarity with containers, hypervisors, and public cloud infrastructure (AWS, Google Cloud Platform).
- Solid understanding of CI/CD, version control, and modern infrastructure practices.
- Bachelor's degree in Computer Science, Engineering, or related field preferred.
- Strong problem-solving skills and the ability to work independently in fast-paced environments.
Technologies & Tools: WEKA, GPFS, Lustre, Ceph, Linux, Python, Chef, Ansible, Prometheus, Grafana, Datadog, ELK, AWS, Google Cloud Platform, containers, hypervisors, CI/CD and infrastructure-as-code tooling.