Overview
Skills
Job Details
We are seeking a highly skilled and experienced Senior Storage Engineer to join our Infrastructure Engineering team.
This role focuses on designing, deploying, and maintaining high-performance storage solutions optimized for AI/ML workloads.
The ideal candidate will have deep hands-on expertise with Weka and VAST Data storage platforms, and a proven track record of supporting large-scale data environments in AI-driven organizations.
Key Responsibilities:
Design and implement scalable storage infrastructure to support AI/ML workloads, high-throughput data pipelines, and mission-critical applications.
Serve as subject matter expert for WekaIO and VAST storage platforms, ensuring optimal performance, reliability, and cost-efficiency.
Collaborate with AI, Data Science, and Engineering teams to understand workload characteristics and storage requirements.
Monitor and manage system performance, capacity planning, and lifecycle management for storage environments.
Lead troubleshooting and root cause analysis for storage-related issues, ensuring high availability and minimal downtime.
Automate storage provisioning, monitoring, and reporting using scripting and infrastructure-as-code tools.
Evaluate and integrate new storage technologies and solutions to support emerging business and technical requirements.
Ensure security, data integrity, and compliance across all storage systems.
Mentor junior engineers and contribute to best practices documentation and architectural standards.
Required Qualifications:
Bachelor s or Master s degree in Computer Science, Information Systems, or a related field.
7+ years of experience in enterprise storage engineering roles, with at least 2 3 years focused on AI/ML infrastructure.
Deep hands-on experience with WekaIO (WekaFS) and VAST Data storage solutions, including deployment, tuning, and support.
Strong understanding of NVMe, RDMA, InfiniBand, and parallel file systems.
Solid experience with AI/ML workloads (e.g., TensorFlow, PyTorch), GPU compute environments (e.g., NVIDIA DGX), and high-throughput storage demands.
Proficiency with Linux systems, networking, and automation (e.g., Ansible, Terraform, Python scripting).
Familiarity with cloud-based storage architectures and hybrid cloud strategies.
Excellent troubleshooting, documentation, and communication skills.
Preferred Qualifications:
Experience working with Kubernetes, containerized AI/ML workflows, or data orchestration tools.
Knowledge of data protection, backup strategies, and disaster recovery in high-performance environments.
Exposure to other file/object storage platforms (e.g., NetApp, Dell PowerScale, Pure Storage) a plus