Apply Now

Site Reliability Engineer

Lake Buena Vista, FL, US • Posted 3 days ago • Updated 7 hours ago

Contract W2

On-site

$83 - $90/hr

Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

Microsoft Windows Vista
Reporting
IaaS
Generative Artificial Intelligence (AI)
Microsoft Dynamics AX
DaaS
JIRA
Splunk
AppDynamics
Scalability
Backup Administration
Capacity Management
Artificial Intelligence
Business Continuity Planning
Mobile Applications
UI
Node.js
TypeScript
IT Management
Adaptability
Data Management
Collaboration
User Experience
Technical Writing
Emerging Technologies
Mobile Development
Innovation
Kubernetes
Configuration Management
Terraform
Management
Google Cloud
Google Cloud Platform
Amazon Web Services
Scripting
Python
Bash
YAML
Workflow
PostgreSQL
Redis
Apache Kafka
MongoDB
Continuous Integration
Continuous Delivery
Orchestration
GitHub
GitLab
Jenkins
Microsoft Azure
DevOps
Agile
Scrum
Technical Direction
Mentorship
Communication
System Security
Identity Management
Data Security
Regulatory Compliance
Leadership
Cloud Computing
Computer Science
Information Systems

Summary

Job Title: Site Reliability Engineer
Job Location: Lake Buena Vista Florida 32830
Job Duration: 22 months (~ 1 year and 10 months) Contract

About the Role & Team

The Lead Site Reliability Engineer will report to the Sr. Manager, Generative AI Engineering and play a key role in guiding the JedAI team's cloud infrastructure and generative AI platform reliability strategy.

You will lead infrastructure strategy across multi-cloud environments (Google Cloud Platform, AWS, and Azure) supporting our Generative AI and Conversational Experience platforms.

You'll modernize and manage applications including LiteLLM, Open Web UI, Archestra, Arize AX, and support back-end systems like Kafka, PostgreSQL, Redis, Vault, MongoDB, and n8n, ensuring they meet our internal UI and security standards.

What You'll Do

Plan, design, and build Helm charts, and infrastructure Terraform to maintain an annual 99.99% availability SLAs.
Lead and mentor a team of Site Reliability Engineers and DevOps specialists within the our AI platform.
Architect, design, and maintain infrastructure environments supporting AI and data service workloads across Google Cloud Platform (primary), AWS (secondary), and Azure (tertiary).
Identify, plan, and assign work for other peer team members (Jira).
Review and provide feedback on platform sizing and volume estimations.
Assist the capacity planning team to ensure scalability boundaries are aligned with expected workloads.
Implement our observability, monitoring, alerting, and tracing best practices across platform components (Splunk, OpenTelemetry, Prometheus, AppDynamics).
Plan, design, and implement automated deployment processes via Harness.
Plan, design, and implement modern enterprise rollout patterns such as blue/green deployments, canary deployments, and feature flags.
Provide guidance to the platform architecture team with respect to solution infrastructure and scalability.
Establish and support operational maintenance processes including backups, version updates, capacity planning, and security patching.
Evaluate and pitch recommendations on emerging DevOps and SRE technologies, influencing our reliability strategy across AI & platform teams.
Ensure team compliance with our governance, security, and business continuity frameworks.

Why This Role is Needed

Rapid Growth and Innovation

The Digital Architecture & Engineering team is experiencing rapid growth and expansion, requiring an architect to guide the development of our mobile platform to meet evolving user needs and business objectives.

Complex Architecture

Our mobile application utilizes a complex architecture involving Flutter, Server Driven UI, Node.js, Typescript, Runtime, and Cloud services (AWS/Google Cloud Platform).
This requires a deep understanding of these technologies and the ability to design a cohesive and efficient system.

Technical Leadership

We need a strong technical leader who can mentor and guide our development team, ensuring best practices, code quality, and efficient development processes.

Future Proofing

The Lead Software Architect will be responsible for designing scalable and adaptable architecture that can accommodate future growth, new features, and evolving technologies.

What You Will Do

Define and implement the overall mobile architecture, including backend integration, and data management.
Lead the development of new features and functionalities, ensuring alignment with business requirements and user needs.
Collaborate with cross-functional teams (design, product, backend) to ensure seamless integration and optimal user experience.
Develop and maintain technical documentation, including architecture diagrams, design specifications, and coding standards.
Mentor and guide junior developers, fostering a culture of continuous learning and improvement.
Stay abreast of emerging technologies and trends in mobile development, identifying opportunities for innovation and improvement.

Basic Qualifications:

7+ years of SRE, DevOps, or platform engineering experience.
Expert in Kubernetes operations, cluster scaling, and Helm-based configuration management.
Advanced knowledge of Terraform and Harness for automated deployment and configuration.
Proven experience managing multi-cloud services on Google Cloud Platform, AWS, and Azure.
Strong scripting in Python, Bash, and YAML for automation and reliability workflows.
Experience with PostgreSQL, Redis, Kafka, MongoDB, and Vault in production environments.
Proficiency in CI/CD orchestration technologies (Harness, GitHub Actions, GitLab, Jenkins, and Azure DevOps) with deployment automation, feature flags, and observability.
Self-motivated with strong leadership ability in Agile/Scrum environments; ability to set technical direction and mentor peers.
Strong written communication skills; particularly in clearly explaining technical topics to less-technical audiences.
Outstanding troubleshooting and diagnostic skills across distributed systems.
Deep understanding of system security, identity management, and data protection compliance models.

Preferred Qualifications

Prior leadership in hybrid cloud environments.
Experience leading large infrastructure-focused initiatives.

Education

Bachelor's degree in Computer Science, Information Systems, or equivalent relevant experience.
Master's degree preferred.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10105282
Position Id: 881624
Posted 3 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Lake Mary, Florida

•

Today

Job Description At BNY, our culture allows us to run our company better and enables employees' growth and success. As a leading global financial services company at the heart of the global financial system, we influence nearly 20% of the world's investible assets. Every day, our teams harness cutting-edge AI and breakthrough technologies to collaborate with clients, driving transformative solutions that redefine industries and uplift communities worldwide. Recognized as a top destination for i

Full-time

Staff Site Reliability Engineer

Remote

•

Today

About AlphaSense: The world's most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert calls, news, trade journals, and clients' own research content. The acquisition of Tegus by AlphaSense in 2024 advances our shared mission to

Full-time

USD 150,000.00 - 225,000.00 per year

Site Reliability Engineer

Remote

•

Today

Site Reliability Engineer Location: Remote, United States Employment Type: Full-Time Benefits Offered: Vision, Medical, Life, Dental, 401K Gross Annual Base Salary: USD 114,000-148,000 Additional variable compensation and benefits may apply. Total compensation is based on experience, skills, and location using objective, job-related criteria. Summary As a Site Reliability Engineer, you will focus on ensuring the platform and services customers rely on are reliable, performant, and highly availa

Full-time

USD 114,000.00 - 148,000.00 per year

Site Reliability Engineer

Remote

•

Today

Type of Requisition: Regular Clearance Level Must Currently Possess: Secret Clearance Level Must Be Able to Obtain: Top Secret/SCI Public Trust/Other Required: None Job Family: IT Infrastructure and Operations Job Qualifications: Skills: Complex Systems, Networking Hardware, Systems Architecture, Systems Design, Technical Guidance Certifications: None Experience: 15 + years of related experience ship Required: Yes Job Description: GDIT is seeking a Site Reliability Engineer (SRE) to help

Full-time

USD 164,382.00 - 215,050.00 per year

Search all similar jobs

More jobs at Motion Recruitment Partners, LLC in Lake Buena Vista, FL

Site Reliability Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs