Summary:
The Associate Principal, Linux Administrator is responsible for managing and maintaining enterprise Linux server environments through hands-on daily operational activities. This role ensures the stability, performance, security, and availability of Linux infrastructure across on-premises and cloud environments. The position requires strong operational expertise, a proactive approach to system health, and the ability to work across US Day and night shifts to provide continuous 24/7 coverage for critical infrastructure.
Primary Duties and Responsibilities:
To perform this job successfully, an individual must be able to perform each primary duty satisfactorily.
- Monitor, manage, and maintain enterprise Linux server environments (RHEL, CentOS, Ubuntu, Amazon Linux) on a day-to-day basis across on-premises and cloud infrastructure
- Perform routine system health checks including CPU, memory, disk utilization, and process monitoring across all Linux servers
- Respond to system alerts, service failures, and performance degradation in a timely manner; triage and resolve incidents within defined SLA windows
- Manage user accounts, groups, permissions, and SSH key administration across Linux systems
- Administer cron jobs, scheduled tasks, and system services (systemd, init) to ensure uninterrupted operations
- Perform log analysis and monitoring using tools such as journald, rsyslog, ELK Stack, Splunk, or CloudWatch to identify anomalies and recurring issues
- Execute day-to-day storage operations including LVM management, filesystem extension, NFS mount management, and disk space remediation
- Troubleshoot OS-level issues including boot failures, kernel panics, network connectivity problems, and service disruptions
- Coordinate and execute scheduled maintenance activities including reboots, service restarts, and configuration updates during approved change windows
- Apply security patches, kernel updates, and bug fixes to Linux servers in alignment with the enterprise patching schedule using Red Hat Satellite and Ansible Automation Platform (AAP)
- Validate patch deployments in non-production environments prior to production rollouts and perform post-patch validation checks to confirm system stability and service availability
- Support emergency and zero-day vulnerability patching as directed by the security team
- Enforce CIS benchmark standards and security baselines on Linux systems; remediate non-compliant configurations
- Perform periodic security scans using OpenSCAP, Lynis, or Nessus and document findings for remediation tracking
- Actively manage incidents, service requests, and change records using ServiceNow, ensuring timely updates, proper categorization, and SLA compliance
- Serve as an escalation point for Tier 1/Tier 2 Linux issues during assigned shift
- Participate in root cause analysis (RCA) and post-incident reviews for major Linux-related outages
- Execute approved change requests during maintenance windows including patching, configuration changes, and server builds
- Maintain clear and accurate shift handover notes to ensure operational continuity across US day and night shifts
- Provision new Linux servers (physical, virtual, and cloud) following approved build standards and golden image baselines
- Configure servers post-build including network settings, storage mounts, security hardening, and application-level prerequisites
- Support AMI (Amazon Machine Image) updates and golden image refreshes for AWS EC2 instances
- Execute Ansible playbooks for configuration management, compliance enforcement, and routine operational tasks
- Proactively monitor infrastructure dashboards (Nagios, Prometheus, Grafana, CloudWatch) and act on alerts
- Identify performance bottlenecks and work with senior engineers to implement optimizations
- Perform capacity monitoring and report disk, CPU, and memory trends as inputs to capacity planning
- Work assigned US day shift (EST/CST 8 AM – 6 PM) or US night shift (EST/CST 6 PM – 6 AM) rotations to provide 24/7 operational coverage including weekend rotation as required
- Act as the primary Linux operations contact during assigned shift for incident response, change execution, and escalation management
- Follow runbooks and standard operating procedures (SOPs) for all operational activities and maintain shift logs with current server and service status
- Create and maintain runbooks, SOPs, knowledge base articles, and operational checklists in Confluence
- Track tasks, incidents, and project work in JIRA with accurate and timely status updates
- Collaborate with network, storage, security, and application teams to resolve cross-functional issues
- Participate in team meetings, sprint reviews, and operational planning sessions
- Participate in on-call rotation and provide support for critical systems as needed
Qualifications:
The requirements listed are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the primary functions.
- 5–8 years of progressive hands-on experience in Linux/Unix system administration in an enterprise environment
- 2+ years in a senior or lead operational role with shift-based responsibilities
- Demonstrated experience managing Linux servers in production environments at scale
- Strong hands-on experience with Ansible Automation Platform (AAP) including playbook execution and operational automation
- Working experience with Red Hat Satellite for patch deployment and content lifecycle management
- Hands-on experience with AWS Linux EC2 instances including basic instance operations, snapshots, and security group management
- Demonstrated experience implementing and maintaining CIS benchmarks and security baselines across enterprise Linux systems
- Extensive experience with enterprise Linux patching programs including change management, patch validation, and emergency patching procedures
- Working experience with ITSM platforms such as ServiceNow for incident, problem, and change management
- Proficiency with collaboration and project management tools including JIRA and Confluence
- Ability to work US day shift (EST/CST 8 AM – 6 PM) or night shift (EST/CST 6 PM – 6 AM) on a rotating or fixed schedule including weekends as required
- Excellent problem-solving abilities and analytical thinking skills
- Strong written and verbal communication skills with ability to produce clear shift handover notes and operational documentation
- Ability to multitask and prioritize effectively in a fast-paced operational environment
Technical Skills:
Required Core Skills:
- Advanced hands-on proficiency in Red Hat Enterprise Linux administration and troubleshooting
- Strong knowledge of Linux internals including kernel management, systemd, networking stack, storage subsystems, and filesystem management
- Hands-on experience with Ansible Automation Platform (AAP) for operational task automation and playbook execution
- Proficiency with Red Hat Satellite for patch deployment, content views, and lifecycle management
- Strong experience with Linux patching and patch management including:
- Enterprise-scale patch deployment using Red Hat Satellite and Ansible
- Patch testing and validation in non-production environments
- Emergency and zero-day vulnerability patching procedures
- Kernel patching strategies and post-patch system validation
- Patch rollback and recovery procedures
- Compliance reporting and audit trail maintenance
- AWS Systems Manager Patch Manager for cloud-based patching
- Experience implementing CIS benchmark standards and security hardening including:
- CIS RHEL/Ubuntu/Amazon Linux benchmark implementation (Level 1 and Level 2)
- Automated compliance scanning using OpenSCAP, Lynis, or similar tools
- Remediation of CIS benchmark findings and exceptions management
- Security baseline enforcement using Ansible and configuration management
- Security audit preparation and response
- Proficiency with AWS Linux EC2 including instance management, AMI operations, snapshots, launch templates, and Amazon Linux/RHEL on AWS
- Hands-on experience with log management and analysis tools (rsyslog, journald, ELK Stack, Splunk, CloudWatch)
- Proficiency with monitoring solutions (Nagios, Prometheus, Grafana, CloudWatch) for alert management and performance monitoring
- Strong knowledge of LVM, NFS, and Linux storage management
- Proficiency with ITSM tools:
- ServiceNow for incident management, problem management, change management, and CMDB
- JIRA for task management, project tracking, and agile workflows
- Confluence for documentation, knowledge management, and runbook maintenance
- Strong networking knowledge (TCP/IP, DNS, DHCP, routing, firewalls)
- Proficiency in scripting (Bash, Python) for operational automation and task scripting
Additional Technical Skills:
- Familiarity with Terraform or CloudFormation for infrastructure-as-code support tasks
- Experience with backup solutions (Veeam, AWS Backup, snapshots)
- Experience with AMI creation, customization, and lifecycle management in AWS
- Knowledge of high-availability configurations (Pacemaker, Corosync)
- Deep knowledge of containerization technologies (Docker, Podman)
- Experience with version control systems (Git, GitLab, GitHub)
- Knowledge of package management (RPM, YUM, DNF) and repository management
- Understanding of security frameworks and compliance standards (CIS benchmarks, STIG, NIST)
- Experience with security scanning tools (OpenSCAP, Nessus, Qualys, Lynis)
- Knowledge of virtualization technologies (VMware, KVM)
Education and/or Experience:
- Minimum: Bachelor’s degree in computer science, Information Technology, or related field, or equivalent combination of education and experience
- 5–8 years of relevant hands-on Linux system administration experience required
- 2+ years of hands-on experience with Ansible Automation Platform required
- 2+ years of hands-on experience with AWS Linux EC2 and cloud infrastructure required
- 2+ years of working experience with Red Hat Satellite preferred
- Demonstrated experience executing Linux patching programs and supporting disaster recovery procedures required
- Proven experience implementing CIS benchmarks and security hardening across Linux environments required
- Working experience with ITSM tools (ServiceNow, JIRA, Confluence) for ticket management and documentation required
- Availability and willingness to work US day and/or night shift rotations including weekends required
- Red Hat Certified System Administrator (RHCSA) or Red Hat Certified Engineer (RHCE) strongly preferred
- AWS Certified SysOps Administrator or AWS Certified Solutions Architect – Associate strongly preferred
- ITIL Foundation certification a plus
- CompTIA Linux+ or Security+ a plus