Sr. Site Reliability Engineer

  • Austin, TX
  • Posted 1 day ago | Updated 4 hours ago

Overview

On Site
USD 119,000.00 - 150,000.00 per year
Full Time

Skills

Business Operations
Acquisition
Financing
Repair
Workflow
Finance
Software Support
Enterprise Software
Systems Engineering
Continuous Improvement
High Availability
Disaster Recovery
Recovery
RPO
Reporting
Scripting
Terraform
Analytics
Forecasting
Regulatory Compliance
Documentation
Incident Management
Design Review
Management
Research
Computer Science
SaaS
Continuous Integration
Continuous Delivery
Scalability
Capacity Management
Network
Operating Systems
New Relic
Grafana
Analytical Skill
Conflict Resolution
Problem Solving
Attention To Detail
Microservices
Java
RabbitMQ
SQL
NoSQL
Database
Nginx
F5
Linux
Communication
Collaboration
Amazon Web Services
Cloud Computing
Apache Kafka
Snow Flake Schema
Databricks
API
Retail
Testing
Military
Law

Job Details

About Us

CDK Global is a leading provider of cloud-based software to dealerships and Original Equipment Manufacturers ("OEMs") across automotive and related industries. The Company's cloud-based, software as a service ("SaaS") platform enables dealerships to manage their end-to-end business operations including the acquisition, sale, financing, insuring, repair, and maintenance of vehicles. By automating and streamlining critical workflows, the integrated platform of modern solutions enables dealers to sell and service more vehicles by creating simple and convenient experiences for customers and improves their financial and operational performance.

At CDK Global, we are focused on connections that allow us to deliver world-class software, support, and data insights. Our values define who we are and how we show up for each other, our customers, and our communities.

Our values: Stay Curious, Own It, Be Open, Create Possibilities

The CDK Global technology team is looking for collaborative innovators passionate about making their mark on emerging enterprise software products. We are crafting and honing cloud technology for the automotive retail industry that will change the landscape for automotive dealers, original equipment manufacturers (OEMs), independent software vendors (ISVs), and their customers. One of the key roles we are looking for is a site reliability engineer, who will have the opportunity to manage the solutions and cloud infrastructures for our Unified Integration Platform. If combining software and systems engineering expertise to build and run large-scale, massively distributed, fault-tolerant, highly available, and performant enterprise-grade solutions is your passion, this role is for you.

Growth potential, flexibility, and material impact on the success and quality of next-gen solutions make CDK an excellent choice for those who thrive in challenging, fast-paced engineering environments. The possibilities for impact are endless. We have exceptional opportunities to evolve our industry by driving change through modern technologies.

If you are an engineer who is passionate about technology solutions, wants to work with the best software craftsmen in the industry, and is looking for an exciting career with a leader in the automotive retail vertical, blazing the trail on the digital frontier, you may have found your new home.

Responsibilities:

  • Engage in and improve the whole lifecycle of solutions, from inception and design, through to build/test, deployment, operation, and refinement

  • Lead the planning, execution, and continuous improvement of High Availability (HA) and Disaster Recovery (DR) testing. Own the validation of Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) across critical systems

  • Ensure our solutions are reliable, fault-tolerant, secure, efficiently scalable, available, reachable, and cost-effective

  • Establish meaningful SLAs and SLOs, and/or proactively track compliance, report on trending, champion with the concerned product teams, and help ensure compliance

  • Measure, monitor, and proactively alert on resource consumption, error rates, traffic anomalies, availability, performance, reachability, and overall system health using New Relic, Grafana, and other tools

  • Quickly respond to and prevent disruptions to users. If a disruption does occur, quickly respond to and resolve incidents efficiently

  • Expertly troubleshoot issues with distributed systems, interactions between cloud technology layers and components, and common dependencies at scale

  • Practice sustainable incident response, blameless postmortems and prompt implementation of recommended changes to prevent recurrence

  • Contribute to the development and implementation of routine maintenance automation and alerting

  • Develop and maintain automation scripts and tools using infrastructure-as-code (IaC) principles (e.g., Terraform, CloudFormation) to streamline deployments and operations

  • Administer and optimize OpenSearch clusters for log aggregation, search, and analytics. Develop and maintain efficient indexing and search strategies

  • Forecast capacity requirements and plan for future growth to ensure adequate resources are available

  • Implement and maintain security best practices for AWS infrastructure and applications. Ensure compliance with relevant security standards and regulations

  • Recommend configurations optimal for cloud technology solutions and modify the code base that defines systems or cloud technologies to improve the reliability, availability, efficiency, observability, performance, and operability of supported products

  • Collaborate well with cross-functional teams across product, architecture, engineering, infrastructure, and security to ensure that reliability standards are integrated into the development and deployment of all solutions

  • Maintain up-to-date documentation on system configurations, incident response protocols, and operational best practices

  • Earnestly participate in code/design reviews and regular meetings with the engineering teams that develop and/or manage the products in question

  • Research and maintain an awareness of industry trends, advances in distributed systems and cloud technologies, tools, and/or processes for maintaining and improving product availability, reliability, efficiency, observability, and/or performance

  • Contribute to the implementation of new solutions within the team by identifying ways they can be applied to solve persistent problems

  • Ensure that uniform enterprise-wide architecture and design standards are adhered to

Minimum qualifications:

  • Bachelor's degree, or equivalent experience, in Computer Science, Engineering, or related field, with 8+ years of relevant experience with large-scale enterprise-grade solutions

  • A strong background in architecture/design and currently working in a similar role, in a forward-thinking and fast-paced business

  • 4+ years professional SRE experience relevant to the responsibilities listed above, including event-driven architectures, cloud native and distributed / SaaS solutions

  • 4+ years of experience with CI/CD pipelines, infrastructure as code, proactive monitoring, smart alerting, ensuring performance/scalability, and proactive capacity management of enterprise-grade solutions

  • Expertise in troubleshooting across the entire stack: network, server, operating system, and application

  • Expertise with monitoring and alerting tools (e.g., New Relic, Prometheus, Grafana)

  • Strong analytical and problem-solving skills, with a keen attention to detail

  • Experience with Microservices, Java, Node, Kafka/RabbitMQ, SQL, and NoSQL databases, Istio, NGINX, F5, AWS API Gateway, ECS, any Infra as Code

  • Experience deploying, maintaining, and troubleshooting containerized applications

  • A level of comfort with Linux

  • Solid communication and collaboration skills

Preferred qualifications:

  • Certification in AWS or related cloud technologies

  • Experience with Confluent Kafka, Snowflake, DataBricks and Apigee API Gateway

  • Automotive retail experience

Salary Range: $119,000 - $150,000

CDK Global is committed to fair and equitable compensation practices. Compensation packages are based on several factors, including but not limited to skills, experience, certifications, and work location. The total compensation package for this position may also include annual performance bonus, benefits and/or other applicable incentive compensation plans.We offer Medical, dental, and vision benefits in addition to:
  • Paid Time Off (PTO)
  • 401K Matching Program
  • Tuition Reimbursement

At CDK, we believe inclusion and diversity are essential in inspiring meaningful connections to our people, customers and communities. We are open, curious and encourage different views, so that everyone can be their best selves and make an impact.

CDK is an Equal Opportunity Employer committed to creating an inclusive workforce where everyone is valued. Qualified applicants will receive consideration for employment without regard to race, color, creed, ancestry, national origin, gender, sexual orientation, gender identity, gender expression, marital status, creed or religion, age, disability (including pregnancy), results of genetic testing, service in the military, veteran status or any other category protected by law.

Applicants for employment in the US must be authorized to work in the US. CDK may offer employer visa sponsorship to applicants.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.