.SRE Engineer
Hybrid in Austin, TX, US • Posted 7 hours ago • Updated 7 hours ago

CLPS Global
Dice Job Match Score™
🫥 Flibbertigibetting...
Job Details
Skills
- Amazon RDS
- Amazon Web Services
- Artificial Intelligence
- Cisco
- Cloud Computing
- Dynatrace
- Design Architecture
- Docker
- Database Security
- Customer Focus
- Database
- DevOps
- F5
- GitLab
- FOCUS
- Google Cloud
- Good Clinical Practice
- Grafana
- Gap Analysis
- Java
- Kibana
- Kubernetes
- Linux
- IT Operations
- Google Cloud Platform
- Microsoft Azure
- Scalability
- SSO
- SolarWinds
- Splunk
- Terraform
- Remote Desktop Services
- Visualization
- Open Source
- Certified Kubernetes Administrator (CKA)
- AWS Certified DevOps Engineer Professional
- Google Cloud Professional
- DevOps Engineer
- Certified Site Reliability Engineer (CSRE)
- Middleware
- Cloud Technology (AWS): Control Tower
- Project Setup
- Creating Accounts
- RDS
- Linux Commands
- GitLab CICD Setup
Summary
We are currently seeking a highly skilled SRE hands-on Engineer with solid experience in Observability Assessment (current state, target state and solutioning), data collection and automation to help lead transformational initiatives within IT operations, encompassing development as well. As a crucial figure in this role, you will participate/help with various technology domain groups and cross functional teams on observability gap analysis and solutioning (automation and manual fixes)
Responsibilities:
· Participate in design, architecture of reliable, scalable, and high-performance systems and services with a focus on operational excellence, availability, and performance.
· Primary skillset to be expertise in Observability as a service, Dashboard as a services, monitoring as a services and alert as a service in all technology domains (application, infrastructure, database, security, middleware, network etc.,) Telemetry data collection using Dynatrace APM, SolarWinds, CISCO Switches, F5, Databases, Open-Source tools (Prometheus and Grafana), Log Aggregations (Kibana or Splunk) and AIOPS Tools.
· Practical experience implementing Golden Signals (latency, traffic, errors, saturation) using related telemetry sources.
· Experience performing Observability current-state assessments and gap analysis
· Experience instrumenting OTEL Framework for .Net and Java applications.
· Configure application performance monitoring (APM), infrastructure monitoring, synthetic monitoring, RUM, and log monitoring.
· Integrate Dynatrace with CI/CD pipelines, alerting tools, ITSM systems, and incident automation frameworks.
· Tune alert thresholds, baselines, and AI-driven anomaly detection to reduce noise and improve actionable insights.
· Deeper understanding of Login authentication mechanisms using Ping, ForgeRock and SiteMinder technologies (session management and cookie management)
· Define best practices and principles for SRE, including monitoring, alerting, and automation.
· Collaborate with development teams on resiliency to ensure that services and applications are designed with operational reliability in mind.
· Implement monitoring systems to assess the performance of applications and infrastructure and proactively identifying areas for optimization.
· Ability to develop close relationship with other operational teams to integrate SRE practices and drive overall operational improvements across enterprise.
· Stay up to date on industry trends, new technologies, and best practices in SRE and applying relevant advancements to the organization.
· Ability to build strong working relationships across different levels, client focus mindset.
Qualifications:
· Around 7-10 years of SRE hands on experience with cloud technologies, development, SRE toolsets and automation
· Experience performing Observability current-state assessments, gap analysis and solutioning (automation and manual fixes) in all technology domains (application, infrastructure, database, security, middleware, network etc.,),
· Strong hands-on automation experience in Observability as a code, dashboard as a code, monitoring as a code, alert as a code (Instrumentation, templates, automatic deployment, visualization and alerting)
· Practical experience implementing Golden Signals (latency, traffic, errors, saturation) using related telemetry sources.
· Strong hands-on experience with any Cloud Technology (AWS): Control Tower, Project Setup, Creating Accounts, RDS, SSO
· Solid understanding and hands on experience with Docker/Kubernetes
· Should have good experience with Linux Commands, GitLab CICD Setup and Terraform (state management, etc)
· Monitoring & alerting setup experience with Splunk, Prometheus, Grafana, Kibana, ELK, with pref. for APM (Dynatrace).
· Good understanding of Observability Framework leveraging programmatic SLI/SLO blueprints to standardize the collection of golden signals.
·
· Strong skills in APM, distributed tracing, synthetic & real user monitoring, log monitoring, and Davis AI configuration
· Own the design, configuration, CICD deployment, and optimization for enterprise-wide observability tools.
· Experience integrating, automation, and cloud platforms (AWS, Azure, Google Cloud Platform).
· Extended experience instrumenting OTEL Framework.
· Hands on experience with Dynatrace Plug-and-play observability modules (OKit) development for Observability Developers Java and .Net applications.
· Define monitoring standards, best practices, and governance to ensure consistency and scalability.
· Experience to deploy and tune OneAgent, build end-to-end PurePath tracing, and leverage Smartscape topology for proactive performance monitoring and root-cause analysis.
· Collaborate with application and infrastructure teams to troubleshoot performance issues and implement permanent fixes.
Good to have:
· Any of the relevant professional certifications – Certified Site Reliability Engineer (CSRE), Certified Kubernetes Administrator (CKA),AWS Certified DevOps Engineer Professional, , Google Cloud Professional; DevOps Engineer
- Dice Id: 91131444
- Position Id: 8868661
- Posted 7 hours ago
Company Info
CLPS is a NASDAQ-listed (Nasdaq: CLPS) and global information technology, consulting and solutions service provider focused on delivering services to global institutions in banking, insurance and the financial sectors, both in China and globally.
For more than ten years as an IT, business know-how and talent solutions provider for such clients, CLPS has expanded its service network to clients in the global financial industry, including large financial institutions from the US, Europe, Australia and Hong Kong and their PRC-based IT centers.
CLPS has created and developed a particular market niche by providing turn-key financial solutions as well as supplying its clients’ needs for talent creation and development.
We maintain 18 delivery and R&D centers, of which ten are located in China and eight globally, to serve different customers in various geographic locations. By combining onsite and onshore support and consulting with scalable and high-efficiency offsite and offshore services and processing, we are able to meet client demands in a cost-effective manner while retaining significant operational flexibility.
Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs
