Job Title - Senior Splunk Infrastructure Engineer
Location - Santa Clara, California
(Hybrid 2 3 days in office; collaboration with US teams)
Job Summary
The Credit Karma Observability team is seeking a Senior Splunk Infrastructure Engineer to manage and optimize large-scale logging and monitoring platforms. The role focuses on maintaining multiple production Splunk clusters that support engineering troubleshooting and security compliance. This position requires deep expertise in Splunk Enterprise, Linux systems, SaltStack, and Google Cloud Platform (Google Cloud Platform). The ideal candidate will play a key role in ensuring platform reliability, scalability, and performance in a highly regulated fintech environment.
Key Responsibilities
Splunk Platform Administration
Manage health, performance, and stability of multiple Splunk environments including:
Search Head Clusters
Indexer Clusters
Heavy Forwarders
Support and maintain Splunk Enterprise Security (ES) infrastructure
Manage data pipelines, parsing rules, and routing on Heavy Forwarders
Reliability, HA & Disaster Recovery
Design, implement, and maintain High Availability (HA) and Disaster Recovery (DR) strategies
Ensure platform resilience and business continuity across regions
Configuration & Infrastructure Management
Develop and maintain Infrastructure as Code using SaltStack
Write and manage complex Salt states and formulas for Splunk and Linux VM configurations
Implement safe configuration deployment strategies including canary testing and staged rollouts
Cloud & Systems Engineering
Provision, monitor, and scale infrastructure on Google Cloud Platform (Google Cloud Platform)
Perform deep Linux troubleshooting including kernel tuning, disk I/O, memory, and network optimization
Manage Splunk infrastructure primarily hosted on Virtual Machines (VMs); Forwarders deployed in Kubernetes
Operations & Support
Participate in on-call rotations to ensure 24/7 platform availability
Execute maintenance windows, Splunk upgrades, and patching cycles
Maintain and update runbooks and technical documentation related to infrastructure and data pipelines
Required Qualifications
Technical Skills
5+ years of experience administering large-scale Splunk Enterprise environments
Strong hands-on experience with Indexer Clustering and Search Head Clustering
Advanced proficiency in SPL (Splunk Processing Language)
Hands-on experience with Splunk Enterprise Security (ES)
Strong expertise in SaltStack configuration management
Deep understanding of Unix/Linux internals (RHEL, CentOS, Ubuntu)
Experience troubleshooting system-level performance and resource issues
Hands-on experience with Google Cloud Platform (Google Cloud Platform), including GCE and networking
Proficiency in Python and/or Bash scripting
Preferred Qualifications
Experience with Terraform for infrastructure provisioning
Exposure to Kubernetes, Helm, Flux, and GitOps-based deployment models
Experience building internal platform or done-for-you infrastructure solutions
Multi-cloud experience (Google Cloud Platform, AWS, Azure)
Knowledge of OpenTelemetry (OTEL) and migration strategies from Splunk Universal Forwarders
Experience working in highly regulated or fintech environments
Working Hours & Collaboration
Primary Time Zone: India (IST)
Overlap Requirement: Availability from 8:00 PM 11:00 PM IST, at least 2 days per week, for:
Pair programming with US-based teams
Onboarding and architectural discussions
Handover of critical maintenance and operational context
Key Skills:-
Splunk
CentOS / Linux
Data Analysis
Disaster Recovery
Workflow Management
Observability & Monitoring