Google Site Reliability Engineer
Buffalo Grove, IL, US • Posted 8 days ago • Updated 8 days agoContract W2
No Travel Required
On-site
$55 - $60/hr


CogniSoft Technologies
Fitment
Dice Job Match Score™
✨ Finding the perfect fit...
Job Details
Skills
- ServiceNow
- Microsoft Visual Studio
- Problem Management
- PySpark
- Python
- Grafana
- Incident Management
- Machine Learning (ML)
- Problem Solving
- Dashboard
- GitHub
- Good Clinical Practice
- Google Cloud Platform
- Cloud Storage
- Collaboration
- Communication
- Conflict Resolution
- Ab Initio
- Agile
- Analytical Skill
- Tidal
- ZEKE
- SQL
- Soft Skills
- Software Engineering
- Splunk
- Tableau
- Budget
- Cloud Computing
- Recovery
- Root Cause Analysis
- SLA
Summary
Title: Google Site Reliability Engineer
Location: Buffalo Grove, IL - Hybrid
Duration: Long Term Project
Job Roles/Responsibilities:
Technical Skillset:
- Knowledge/experience in Google Cloud Platform (Big Query, Cloud storage, Dataproc, GKE, Airflow/Composer ,Pub-sub, Cloud function, Cloud SQL etc).
- Knowledge/experience in Github & Visual Studio code.
- Knowledge/experience in MS-Copilot.
- Knowledge/experience in Prometheus, Grafana & Splunk.
- Knowledge in Python/PySpark/Machine learning is an added advantage
Job Description:
Site Reliability Engineers combined software engineering with systems and infrastructure operations to build and run large, reliable, scalable services.
Role focused on:
- Responsible for Incident Detection & Logging and meeting agreed SLA for incident tickets.
- Responsible for Bridge Activation & Communication (P1–P2).
- Postmortem Preparation (Within 24–72 Hours) & Root Cause Analysis.
- Responsible for critical monitoring activities ,Problem Management & Grafana Integration.
- Participate in oncall rotations, handle incidents, and drive timely mitigation and recovery.
- Automating operational work so services can scale without manual toil also operating highly available, low latency & secure systems.
- Defining and measuring reliability through SLIs/SLOs and error budgets.
- Build and maintain observability: metrics, logs, traces, dashboards, and alerts for critical services.
- Tune alerting to reduce noise while ensuring rapid detection of user impacting issues.
- Lead or contribute to post incident reviews and root cause analysis and ensure follow up actions are implemented to prevent recurrence.
- Added Advantage if resource is familiar on Tools Tidal, Service Now, Xmatters, Abinitio, Tableau, Opsgenie & Zeke.
Soft skills:
- Clear written and verbal communication, particularly under pressure (e.g., during incidents).
- Ability to collaborate across multiple teams and influence engineering practices through expertise rather than authority.
- Strong communication, analytical ,Knowledge on entire Incident management life cycle process, Agile model experience, and problem-solving skills
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 10527831
- Position Id: 8893401
- Posted 8 days ago
Company Info
CogniSoft Technologies focuses on providing niche Business Intelligence & Data Analytic solutions to businesses across various industries. We are Tableau Alliance Partners. We are also an SAP certified Company.
We are a team of experienced, dedicated, hardworking, and innovative professionals who have had proven their expertise in different business verticals such as Banking and Financial Industry, Healthcare, Supply chain, Insurance, Software Development, IT consulting, Business consulting etc.
Create job alert
Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs