ELK Admin

Overview

On Site
Depends on Experience
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 12 Month(s)

Skills

ELK Admin
Elasticsearch
cluster configuration
Kibana
Grafana
monitoring tools
Data Management
Security & Compliance
X-Pack security
Elastic Stack
Prometheus
CloudWatch
Infrastructure as Code (IaC)

Job Details

Role: ELK Admin

Location: San Jose CA (Hybrid)

 

Job Description:

Cluster Management & Operations

  • Design, deploy, and manage production Elasticsearch clusters across multiple environments (development, staging, production)
  • Perform cluster configuration, node management, and shard allocation strategies
  • Monitor cluster health, node statistics, and index performance using Kibana, Grafana, or other monitoring tools
  • Implement and manage cluster upgrades, patches, and rolling restarts with zero downtime
  • Manage multi-datacenter and cross-cluster replication/search configurations

Performance Optimization

  • Analyze and optimize query performance, including DSL queries, aggregations, and search templates
  • Design and implement effective index mapping strategies, analyzers, and tokenizers
  • Optimize shard size, allocation, and rebalancing policies for optimal performance
  • Tune JVM heap settings, garbage collection, and system-level configurations
  • Conduct capacity planning and performance forecasting

Data Management

  • Design and implement index lifecycle management (ILM) policies for data retention and rollover
  • Manage index templates, component templates, and index aliases
  • Implement data archiving, snapshot, and restore procedures
  • Monitor index growth and implement data tiering strategies (hot-warm-cold architecture)
  • Manage ingest pipelines and data transformation processes
  • Ensure data integrity and consistency across clusters

Security & Compliance

  • Implement and manage X-Pack security features including authentication and authorization
  • Configure role-based access control (RBAC) and field-level security
  • Manage SSL/TLS certificates and encrypted communications
  • Implement audit logging and security monitoring

Backup & Disaster Recovery

  • Design and implement comprehensive backup and recovery strategies
  • Configure and manage snapshot repositories (S3, NFS, Azure Blob, GCS)
  • Test disaster recovery procedures and document recovery time objectives (RTO/RPO)
  • Implement cross-region backup strategies for business continuity
  • Automate snapshot schedules and retention policies

Monitoring & Troubleshooting

  • Set up comprehensive monitoring and alerting using Elastic Stack, Prometheus, or CloudWatch
  • Troubleshoot cluster issues including split brain, slow queries, and memory problems
  • Analyze logs and metrics to identify and resolve performance bottlenecks
  • Respond to production incidents and participate in on-call rotation
  • Create and maintain runbooks for common operational procedures

Automation & Infrastructure as Code

  • Develop automation scripts using Python, Bash, or other scripting languages
  • Implement Infrastructure as Code (IaC) using Terraform, Ansible
  • Automate routine maintenance tasks and operational workflows
  • Integrate Elasticsearch with CI/CD pipelines
  • Develop custom tools for monitoring and management

Collaboration & Documentation

  • Work with development teams to optimize application queries and data models
  • Provide technical guidance on Elasticsearch best practices
  • Create and maintain comprehensive technical documentation
  • Conduct knowledge transfer sessions and training for team members

Participate in architecture reviews and capacity planning discussions

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.