Site Reliability Engineering (SRE) Lead

  • Equifax,
  • Alpharetta, GA
  • 2 hours ago
company banner
Engineering, Computer, Web, Software, Engineer, Systems, WebSphere, Python, Perl, QA, Exchange
Full Time
Work from home not available Travel not required

Job Description

Who is Equifax?

Equifax is a global information solutions company that uses trusted unique data, creative analytics, technology and industry expertise to power organizations and individuals around the world by transforming knowledge into insights that help make more informed business and personal decisions.

Regardless of location or role, the individual and collective work of our people makes a difference in our business.

We are looking for individuals who can help us disrupt the marketplace. You will do this by delivering leading-edge technology that builds and delivers unparalleled customized insights that enrich both the performance of businesses and the lives of consumers.

We will give you the opportunity to drive innovation and automation across the enterprise. This will include tool and process integrations across all business units within Equifax globally.
Job Description
A Site Reliability Engineering (SRE) at Equifax is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. The SRE Lead ensures that services-both internally critical and externally systems-have reliability and uptime appropriate to users' and customers' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.

Equifax's SRE culture of diversity, intellectual curiosity, and problem solving is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We urge them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on significant projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

You help build it, you help run it, and you help own it!

  • Oversight for production operations of our systems, as well as development/engineering of solutions to maximize system reliability & automation.
  • Responsible for root cause analysis of incidents and prevention of recurrence thru the creative design and development of technical solutions as well as process improvements.
  • Lead a global team responsible for critical business functions and partner with other infrastructure, operations, and development teams to identify and implement automation opportunities to drive down toil, reduce technical debt, and improve system reliability.
  • Engage in and improve the whole lifecycle of software development services-from inception and design, through deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health in a 24x7 environment.
  • Scale systems sustainably through mechanisms like automation and evolve systems by supporting changes that improve reliability and velocity.
  • Practice incident response and blameless postmortems.
  • Influence and create new designs, architecture, standards, and methods for large-scale systems.
  • Binding and orchestrating the system infrastructure with the application layer to enable High Availability/Clustering load balancing and integration;
  • Provide technical guidance or support for the development or troubleshooting of systems;
  • Responsible for establishing end-to-end monitoring and alerting on all critical aspects to ensure SLOs, SLIs, and SLAs and get notifications of possible issues for all systems;
  • Develop automated solutions to address potential problems before they result in a service interruption and demonstrate a passion for automation, including CI/CD automation;
  • Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria.

What makes you a great fit for this role?
  • Bachelors of Science degree in Computer Science, Engineering, or equivalent experience.
  • Good understanding of (SRE) and DevOps philosophies, technologies, platforms and tools, SLA management, incident resolution, and automation;
  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive;
  • Ability to articulate to more experienced management a technical strategy in clear, concise, understandable terms;
  • Ability to debug and optimize code and automate routine tasks;
  • 5-7 years of experience in one or more of the following: Amazon Web Services, Google Cloud Platform, Kubernetes, etc.;
  • 5-7 years of experience building JavaEE applications using, build tools like Maven/ANT, Subversion, JIRA Jenkins, Bitbucket and Chef;
  • 5-7 years of experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, Confluence, GIT-BitBucket, Maven, Gradle, RunDeck, is a plus);
  • You've created automation using Chef, Puppet or another SCM tool; Docker and container scheduler services such as ECS or Kubernetes is desirable;
  • You've worked with Nginx, Tomcat, HAProxy, Redis, Elastic Search, MongoDB, and RabbitMQ, Kafka, Zookeeper;
  • 5-7 years of experience as SCM/release engineer, or in a position with similar skill sets and responsibilities (Software Engineer, Systems Engineer, Systems Administrator);
  • 5-7 years of experience performing source code control management Subversion/GIT including branching, merging, tagging, etc.;
  • 5-7 years of experience configuring and administering JavaEE application servers (Tomcat, WebSphere, WebLogic, etc.);
  • 5-7 years of experience with scripting language such as Unix Shells, Python, Perl, Shell, bash, ksh);
  • 5-7 years of experience configuring, building, and supporting apps and operations in a public cloud environment (AWS, Azure, GCP);
  • 5-7 years of experience with Monitoring and Logging tools (Elastic Search, ELK, AppDynamics, Splunk, etc.);
  • Collaborate well others including developers, QA, and ownership teams to resolve issues;
  • Knowledge of Agile / Scrum methodologies and principles;
  • Possess excellent written and verbal communication skills with the ability to communicate with team members at all levels, including business leaders;

What will make you stand out above the rest?
  • A real passion for and the ability to learn new technologies

The Perks of being an Equifax Employee?

We offer an excellent compensation packages with high-reaching market salaries and 401k matching, along with the works: comprehensive healthcare packages, schedule flexibility, work from home opportunities, paid time off, and organizational growth potential.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

To speak to us about this role in more detail apply online.
This position sits in our Alpharetta location, which includes a state of the art gym / fitness center, onsite dry cleaning services, onsite caf with "Grab & Go" selections and mobile pay options, food trucks, and car wash services.


Primary Location:
USA-Atlanta JV White

Function - Tech Engineering and Service Ops

Full time

Company Information

Equifax is a global data, analytics, and technology company. We believe knowledge drives progress. We blend unique data, analytics, and technology with a passion for serving customers globally, to create insights that power decisions to move people forward. Headquartered in Atlanta, Equifax operates or has investments in 24 countries in North America, Central and South America, Europe and the Asia Pacific region. It is a member of Standard & Poor's (S&P) 500® Index, and its common stock is traded on the New York Stock Exchange (NYSE) under the symbol EFX. Equifax employs approximately 11,000 employees worldwide. 

Dice Id : 10184596
Position Id : J00089225
Originally Posted : 1 month ago

Similar Positions at Equifax

Site Reliability Engineer - Lead
  • Alpharetta, GA
  • 23 hours ago
Site Reliability Engineer
  • Alpharetta, GA
  • 23 hours ago
Site Reliability Engineer
  • Alpharetta, GA
  • 23 hours ago
Site Reliability Engineer
  • Alpharetta, GA
  • 23 hours ago
Site Reliability Engineer - Cloud
  • Alpharetta, GA
  • 23 hours ago
Decisioning Tribe: Site Reliability Engineer
  • Alpharetta, GA
  • 23 hours ago
Site Reliability Engineer
  • Alpharetta, GA
  • 23 hours ago
Site Reliability Engineer Cloud (AWS or GCP)
  • Alpharetta, GA
  • 23 hours ago
Lead Software Engineer - Big Data
  • Alpharetta, GA
  • 23 hours ago