Role :Splunk DynaTrace SME Administrator
Remote
Position Summary: A Splunk Dynatrace Administrator is a key IT operations professional responsible for the end-to-end management, maintenance, and optimization of an organization's observability platforms. The primary goal of this position is to ensure high availability, performance, and reliability of IT systems by leveraging both Splunk for log analytics and security events and Dynatrace for application performance monitoring (APM).
The individual in this role acts as a Subject Matter Expert (SME) who drives the adoption, integration, and best practices for using these tools to monitor complex, hybrid cloud environments and provide actionable insights to various business, development, and security teams.
Qualifications Required:
· A Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field is often preferred, or equivalent professional experience.
· 5+ years of experience in IT infrastructure, systems administration, networking, or a related technical field.
- Certifications: Relevant industry certifications are highly valued and can include:
o Splunk Core Certified Administrator or Splunk Certified Observability Engineer.
o Dynatrace: Dynatrace Associate or Professional Certification in Administration.
o Cloud/IT: Certifications in cloud platforms (AWS, Azure) or IT Service Management (ITIL) frameworks.
Qualifications desired:
· Deep understanding of the architecture, components, data ingestion, indexing, and search capabilities of both Splunk and Dynatrace.
· Proficiency in Splunk Search Processing Language (SPL) and potentially Dynatrace Query Language (DQL) for complex data analysis, reporting, and alerting.
· Solid experience with both Linux and Windows operating systems.
- Excellent analytical and troubleshooting abilities to quickly diagnose and resolve complex technical issues across different systems.
- Strong written and verbal communication skills to effectively collaborate with diverse technical and non-technical teams and document processes clearly.
· Ability to work effectively both independently and as part of a cross-functional team (DevOps, Security, Operations).
Essential Functions and Responsibilities:
· Onboard and integrate diverse data sources (logs, metrics, traces) from various systems (cloud, on-premise, databases) into the platforms, ensuring accurate parsing, normalization, and enrichment.
· Onboard and integrate diverse data sources (logs, metrics, traces) from various systems (cloud, on-premise, databases) into the platforms, ensuring accurate parsing, normalization, and enrichment.
· Manage integrations between , Dynatrace, and other third-party IT Service Management (ITSM) and security tools (e.g., ServiceNow, PagerDuty) to streamline workflows and incident response.
· Design, develop, and maintain intuitive dashboards, reports, and alerts (using Splunk's SPL and Dynatrace's capabilities) to address operational, business, and security requirements.
· Act as a technical point of contact for platform issues, performing root cause analysis to quickly diagnose and resolve complex problems related to data ingestion, application performance, or system functionality.
· Collaborate closely with IT operations, security, and development teams to understand monitoring needs and provide technical guidance, support, and training to maximize the effective use of the tools.
· Create and maintain detailed documentation of operational procedures, configurations, and best practices.
· Advocate for continuous improvement in monitoring coverage, alert tuning, and observability practices across the organization.
Success factors/job competencies:
· A primary indicator of success is the ability to proactively detect potential issues using AI-powered insights and automation, leading to a significant reduction in Mean Time To Detect (MTTD) and Mean Time To Respond (MTTR) when incidents occur.
· Success involves transforming raw logs, metrics, and traces into actionable, meaningful business insights through effective dashboarding, reporting, and communication to stakeholders.
· Consistently identifying and implementing automation for operational tasks (e.g., data onboarding, configuration management) and staying current with the latest features and best practices are crucial success factors.
· The ability to implement a structured methodology for data ingestion, naming conventions, and role-based access controls to manage growing data volumes and complexity in a detailed, organized manner.
· A successful admin fosters strong collaboration with development, security, and operations teams, breaking down silos and ensuring the monitoring solutions align with diverse team needs and goals.
· Consistently identifying and implementing automation for operational tasks (e.g., data onboarding, configuration management) and staying current with the latest features and best practices are crucial success factors.
Performance standards:
· A standard of reducing "alert fatigue" by ensuring a high signal-to-noise ratio in alerts, only notifying teams of genuine, actionable issues.
· Maintain comprehensive and up-to-date documentation for all configurations, processes, and standard operating procedures (SOPs).
· Provide effective support and training to end-users (developers, operations, security teams) to enable self-service and maximize the value derived from the tools.
· Demonstrate a reduction in major incidents resulting from proactive monitoring, alerting, and trend analysis using the platforms' AI and machine learning capabilities.