Site Reliability Engineer- REMOTE

SRE, Java, Splunk, AwS, load runner/Jmeter, CICD, disaster recovery, Monitoring
Contract W2, Contract Independent, Contract Corp-To-Corp, 24 Months
Depends on Experience

Job Description

Design and Implement Full Stack Java based custom tooling solutions aimed at automating and optimizing away toil.

Instantiate Site Reliability Engineering practice at * igniting the practice, principles, and culture leading by example. Assist in training skilled peer er and partnering with peer platform embedded SRE teams.

Introduce enterprise capabilities, tools, and innovation improving availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, dashboard visualization, CI/CD integration, continuous testing (performance, smoke, regression, functional, chaos) introduce continuous improvement, standardization/automation, capabilities to conduct destructive and resiliency testing

Introducing self-healing and autonomic capabilities solving for complex operational and systemic issues with precision including building and training models, automating cognitive processes, leveraging cutting edge technologies to improve availability of products we provide to customers

Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLI adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications, alerting/notification systems.

Share support responsibilities for critical applications and customer journeys on-boarded to SRE including remediation of issues through Agile

Proven Technical Expertise with one or more of the following:

o Software Development: Java/J2EE, REST, Micro Services, Messaging Technologies like Kafka or MQ, JavaScript frameworks like React or Bootstrap, SQL

o OS and Platform - Linux; Cloud Technologies AWS, Google Cloud Platform or Azure; Container platforms

o Cl/CD and Automation: Jenkins, Gitlab, SonarQube, Artifactory

Observability and AIOPS: Grafana, Prometheus, ELK or SPLUNK, Jaeger or Zip kin, AppDynamics, Dynatrace or similar

Experience in one or more of the following areas is desired:

o AIOPS: Big Panda, Moogsoft, Artificial Intelligence (Al) and Machine learning (Client) Frameworks

Testing: Gremlin, Chaos Monkey, Chaos tool kit, JMeter, Blaze meter, Load runner

Excellent problem-solving skills and proactivity in resolving issues / blocker

Thank you,


Dice Id : 10110032
Position Id : SRE2022
Originally Posted : 4 weeks ago
Have a Job? Post it

Similar Positions

W2 Position : Site Reliability Engineer || Atlanta, GA || Remote
  • Bitsoft International, Inc.
  • Atlanta, GA, USA
Senior Site Reliability Engineer (Remote)
  • KAR Global
  • Carmel, IN, USA
Site Reliability Engineer (Remote)
  • KAR Global
  • Carmel, IN, USA
100% Remote Principal SRE and DevOps / AWS / Python / EKS
  • Motion Recruitment
  • Los Angeles, CA, USA
Performance Engineer
  • ScrumLink, Inc.
  • Charlotte, NC, USA
Java Full Stack Developer
  • Diamond Pick
  • Plano, TX, USA
Performance Tester
  • Infostretch Corporation
  • St. Louis, MO, USA
Sr Site Reliability Engineer
  • Charles Schwab & Co., Inc.
  • Westlake, TX, USA