This is Srikanth from Reliable Software. We currently have an opportunity with one of our direct clients and would like to share the details with you. Please review the information below and let me know if you are interested. If you would like to be considered, kindly share your updated resume at
Job Title: Lead Infrastructure Engineer with AWS and Databricks.
Location: Remote
Duration: 6-month contract to hire
Job Description:-
We are seeking a highly experienced Senior AWS / Databricks Infrastructure Engineer to join our platform engineering team. This individual will play a critical role in designing, scaling, and operating our cloud-based data platform, with a strong focus on AWS networking, Databricks infrastructure, and enterprise data integrations.
This is a hands-on, senior-level role requiring deep technical expertise and the ability to operate independently in a fast-paced environment. The ideal candidate is a proactive problem solver who can quickly onboard, take ownership of complex infrastructure challenges, and drive operational excellence.
Responsibilities
Cloud Infrastructure and Networking
- Design, implement, and manage complex AWS networking environments, including:
- CIDR planning, subnet design, and IP address management
- IPv4 exhaustion mitigation strategies and subnet sizing optimization
- VPC architecture, peering, and multi-account connectivity
- Troubleshoot and resolve advanced networking issues across distributed cloud environments
- Ensure scalable and secure network architecture across all AWS accounts
Databricks Platform Operations
- Own and optimize Databricks deployments on AWS, including:
- Workspace networking configuration
- Compute cluster scaling, performance tuning, and cost optimization
- IP allocation and release management for ephemeral job clusters
- Support large-scale, production-grade data workloads using the Databricks Lakehouse Platform
Infrastructure as Code & Automation
- Develop and manage infrastructure using Terraform across multi-environment and multi-account setups
- Implement modular, reusable infrastructure patterns aligned with best practices
- Drive automation across infrastructure provisioning and operational workflows
CI/CD & Platform Engineering
- Build, optimize, and maintain CI/CD pipelines for infrastructure, applications, and data workloads
- Improve pipeline reliability, performance, and deployment automation
- Partner with engineering teams to enable seamless continuous delivery
Kubernetes and Platform Operations
- Operate and manage EKS/Kubernetes environments:
- Deploy workloads using Helm and kubectl
- Configure observability, logging, and monitoring
- Troubleshoot cluster and application issues
- Support a broad platform ecosystem including AWS, Databricks, and containerized services
Enterprise Integrations
- Manage and troubleshoot integrations between:
- AWS services
- Databricks
- Snowflake, Redshift, and other enterprise platforms
- Ensure reliability and performance across interconnected systems
Operational Excellence and Governance
- Strengthen infrastructure governance processes, including:
- RFC/change management
- Release coordination
- Incident response and root cause analysis
- Improve platform reliability, observability, and operational practices
Qualifications
- 5+ years of experience building and operating large-scale cloud infrastructure
- 2+ years in a data-focused DevOps, SRE, or platform engineering role
- Deep expertise in AWS architecture and networking, including multi-account environments
- Strong hands-on experience with:
- VPC design, CIDR/IP planning, subnet management, and VPC peering
- AWS IAM and cloud security best practices
- Proven experience managing infrastructure with Terraform at scale
- Experience building and optimizing CI/CD pipelines (e.g., GitHub Actions, Jenkins)
- Strong operational mindset with experience in incident management and production support
- Ability to work independently and ramp up quickly in complex environment
Preferred Qualifications
- Hands-on experience with the Databricks Lakehouse Platform, including:
- Unity Catalog
- Delta Lake
- MLflow
- Experience administering Kubernetes (EKS) clusters in production environments
- Familiarity with large-scale data platforms and distributed systems
Educational Qualifications:
- Required - Bachelor’s degree in Computer Science, Information Technology, Computer Engineering or closely related or equivalent.
- Preferred - Master’s degree in Management Information Systems (MIS), Computer Science, Big Data or Analytics or equivalent.
Travel:
· Open to travel based up on the nature of the engagement.
Thanks & Regards
Srikanth Donkani Resource Manager | Reliable Software
Direct:
|
AI & Analytics Generative AI Machine Learning Cloud DevOps SAP Data Engineering Data Science Databricks Snowflake |
Industries: Government | Healthcare | Banking | Manufacturing | Retail ISO Cert: 9001 | 27001 Equal Employment Opportunity Reliable Software employment does not discriminate on the basis of race, religion, gender, sexual orientation, age or any other basis as covered by federal, state, or local law. Employment decisions are based solely on qualifications, merit and business needs. |