Overview
Skills
Job Details
Mortgage Company
AI/ML Platform Engineer
MUST BE local to the Columbus OH area
Can work remotely BUT MUST be in the Columbus OH area
Needed ASAP
DIRECT HIRE- must work on W2
We're looking for a Platform Engineer with proven experience in Artificial Intelligence/Machine Learning to join our team. In this role, your primary focus will be on designing, building, and optimizing robust, scalable platforms that empower our development and engineering teams to develop, deploy, and manage AI/ML solutions efficiently. You will collaborate closely with cross-functional teams to automate workflows, streamline model deployment, and ensure the reliability and performance of our AI/ML infrastructure. This is a hands-on engineering role where you'll have the opportunity to shape our AI/ML strategy, implement best practices, and drive innovation across the organization.
What You'll Do:
Artificial Intelligence / Machine Learning
- Design, build, and maintain scalable AI/ML platform infrastructure to support the full lifecycle of machine learning models, from development to deployment and monitoring.
- Develop, automate, and optimize data pipelines for efficient data ingestion, preprocessing, and feature engineering to enable robust model training and evaluation.
- Develop and enforce model governance policies around version control, lineage, and auditability to ensure compliance and reproducibility.
- Implement data validation and quality assurance frameworks to detect and handle anomalies or data drift before they impact model performance.
- Collaborate with development and engineering teams to define requirements, integrate AI/ML capabilities, and deliver end-to-end solutions.
- Optimize AI/ML workloads for hardware accelerators such as GPUs, ensuring efficient resource utilization and scalability across multiple compute nodes.
- Evaluate, select, and configure hardware components (CPUs, GPUs, memory, storage) to meet the demands of large-scale AI/ML training and inference workloads.
Platform Engineering & Automation
- Security and data protection is job zero.
- Design, develop, and maintain automated deployment workflows for software delivery.
- Implement, and support CI/CD pipelines to streamline application and infrastructure changes.
- Manage and optimize version control systems to facilitate efficient development collaboration.
- Work with compute and container platforms to ensure scalable and resilient deployments.
- Utilize cloud services to enhance infrastructure automation and optimize cost efficiency.
- Enhance observability, monitoring, and logging for deployment workflows.
- Collaborate with development and operations teams to troubleshoot and improve platform performance.
- Maintain security best practices across deployment processes and infrastructure.
- Collaborate effectively with cross-functional teams to contribute to platform initiatives and provide technical support across multiple projects, ensuring alignment with broader team goals.
- Provide maintenance and support for SaaS services (e.g., ServiceNow, GitHub, Harness) ensuring system health, troubleshooting issues, and optimizing performance.
- Perform Platform Engineering efforts, assignments, and/or projects in a timely manner as directed by Manager.
- Attend Agile meetings and report status while keeping good documentation on progress within the chosen task tracking system such as JIRA.
- Write clear and concise Standard Operation Procedures or Team Documentation on the chosen documentation system such as Confluence.
- Courageous about reporting and communicating success as well as short comings.
- Other duties as assigned.
What You'll Need:
- High school diploma or equivalent required.
- Ambition to work toward obtaining relevant certifications (Terraform, Harness, AWS, etc.).
- Experience with version control platforms (e.g., GitHub, Bitbucket).
- Knowledge of compute platforms (e.g., VMware, OpenShift, Kubernetes).
- Experience with cloud services (AWS preferred).
- Proficiency in scripting and automation (e.g., Python, Go, Bash, Terraform, Ansible).
- Understanding of Infrastructure as Code (IaC) and deployment automation.
- Strong problem-solving skills and ability to work in a collaborative environment.
- Proficient in the following computer infrastructure related areas: Unix/Linux Operating systems, Windows Operating Systems, Storage (NAS/SAN) concepts, VMware and Virtualization concepts, and Networking fundamentals.
- Experience with ServiceNow Integrations Hub.
- Understanding of application lifecycle management.
- Skills in writing, modifying, and debugging code.
- Excellent communications and interpersonal skills.
- Courageous about reporting and communicating success as well as short comings.
- Strong organizational and multi-tasking skills.
- Proven track record of innovation and forward thinking.
Extreme sense of ownership.