Google Site Reliability Engineer

Buffalo Grove, IL, US • Posted 8 days ago • Updated 8 days ago

Contract W2

No Travel Required

On-site

$55 - $60/hr

CogniSoft Technologies

Fitment

Dice Job Match Score™

✨ Finding the perfect fit...

Job Details

Skills

ServiceNow
Microsoft Visual Studio
Problem Management
PySpark
Python
Grafana
Incident Management
Machine Learning (ML)
Problem Solving
Dashboard
GitHub
Good Clinical Practice
Google Cloud Platform
Cloud Storage
Collaboration
Communication
Conflict Resolution
Ab Initio
Agile
Analytical Skill
Tidal
ZEKE
SQL
Soft Skills
Software Engineering
Splunk
Tableau
Budget
Cloud Computing
Recovery
Root Cause Analysis
SLA

Summary

Title: Google Site Reliability Engineer

Location: Buffalo Grove, IL - Hybrid

Duration: Long Term Project

Job Roles/Responsibilities:

Technical Skillset:

Knowledge/experience in Google Cloud Platform (Big Query, Cloud storage, Dataproc, GKE, Airflow/Composer ,Pub-sub, Cloud function, Cloud SQL etc).
Knowledge/experience in Github & Visual Studio code.
Knowledge/experience in MS-Copilot.
Knowledge/experience in Prometheus, Grafana & Splunk.
Knowledge in Python/PySpark/Machine learning is an added advantage

Job Description:

Site Reliability Engineers combined software engineering with systems and infrastructure operations to build and run large, reliable, scalable services.

Role focused on:

Responsible for Incident Detection & Logging and meeting agreed SLA for incident tickets.
Responsible for Bridge Activation & Communication (P1–P2).
Postmortem Preparation (Within 24–72 Hours) & Root Cause Analysis.
Responsible for critical monitoring activities ,Problem Management & Grafana Integration.
Participate in oncall rotations, handle incidents, and drive timely mitigation and recovery.
Automating operational work so services can scale without manual toil also operating highly available, low latency & secure systems.
Defining and measuring reliability through SLIs/SLOs and error budgets.
Build and maintain observability: metrics, logs, traces, dashboards, and alerts for critical services.
Tune alerting to reduce noise while ensuring rapid detection of user impacting issues.
Lead or contribute to post incident reviews and root cause analysis and ensure follow up actions are implemented to prevent recurrence.
Added Advantage if resource is familiar on Tools Tidal, Service Now, Xmatters, Abinitio, Tableau, Opsgenie & Zeke.

Soft skills:

Clear written and verbal communication, particularly under pressure (e.g., during incidents).
Ability to collaborate across multiple teams and influence engineering practices through expertise rather than authority.
Strong communication, analytical ,Knowledge on entire Incident management life cycle process, Agile model experience, and problem-solving skills

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10527831
Position Id: 8893401
Posted 8 days ago

Company Info

About CogniSoft Technologies

CogniSoft Technologies focuses on providing niche Business Intelligence & Data Analytic solutions to businesses across various industries. We are Tableau Alliance Partners. We are also an SAP certified Company.

We are a team of experienced, dedicated, hardworking, and innovative professionals who have had proven their expertise in different business verticals such as Banking and Financial Industry, Healthcare, Supply chain, Insurance, Software Development, IT consulting, Business consulting etc.

Go to company profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs