Overview
Skills
Job Details
AI Infrastructure Architect
Location Remote USA
Duration Long term
Note: Client need Core AI Infrastructure Architect role, particularly in areas requiring hands-on experience with AI infrastructure and development tools.
Job description:
Architect the Future: Lead the end-to-end design and development of our AI infrastructure, encompassing hardware, software, networking, and multi-cloud environments
Innovate and Evaluate: Assess, select, and implement the best-in-class technologies, tools, and frameworks (e.g., TensorFlow, PyTorch, Kubernetes, Docker) to build and maintain our AI platforms.
Optimize for Performance: Engineer and implement infrastructure that scales seamlessly with our evolving AI/ML needs, continuously monitoring and optimizing for both performance and cost-efficiency.
Champion Security and Compliance: Define and enforce infrastructure standards and best practices, ensuring unwavering compliance with security policies, data protection regulations, and ethical AI principles.
Build Data-Driven Pipelines: Collaborate on the architecture and implementation of highly efficient data pipelines for our AI models, from ingestion and storage to processing and management.
Lead and Inspire: Provide technical leadership and mentorship to cross-functional teams, fostering a culture of excellence and best practices in AI infrastructure.
Solve Complex Challenges: Diagnose and resolve intricate infrastructure issues, ensuring the high availability and reliability of all our AI systems.
Stay Ahead of the Curve: Keep your finger on the pulse of the latest advancements in AI, machine learning, and cloud computing, driving innovation within the organization.
Document for Success: Create and maintain comprehensive documentation for all AI infrastructure designs, implementations, and operational procedures.
What You'll Bring
Education: A Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Experience:
15+ years of experience in infrastructure architecture.
A minimum of 5 years of dedicated experience in designing and building AI-specific infrastructure.
Proven success in deploying scalable and secure AI solutions in cloud environments.
Extensive hands-on experience with containerization and orchestration technologies like Docker and Kubernetes.
Technical Prowess:
Proficiency with the command line and experience with both cloud-native and on-premise data center deployments.
A strong understanding of deep learning architectures and the latest developments in Large Language Models (LLMs).
Expertise in NVIDIA hardware and software, including performance tuning and diagnostics.
Hands-on experience with GPU systems, including performance testing, tuning, and benchmarking.
Proficiency in programming languages such as Python.
A deep understanding of cloud service models (IaaS, PaaS, SaaS) and cloud-native architectures.
In-depth knowledge of networking, storage, and security best practices in a cloud context.
Experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Familiarity with DevOps and MLOps principles and practices.
Soft Skills:
Exceptional problem-solving and analytical abilities with a data-driven approach.
Excellent communication and interpersonal skills, capable of articulating complex technical concepts to diverse audiences
Proven ability to lead, mentor, and collaborate effectively within a team environment.
A strategic mindset with the ability to align technical solutions with overarching business goals.
A proactive, adaptable, and continuous learner who thrives in a dynamic technological landscape.