Title: Senior Linux System Administrator (L3)
Location: Irving, TX (Onsite)
Type: Fulltime (Tata Consultancy Services)
Role Summary
Seeking a hands-on L3 Linux Administrator to own stability, availability, and performance across large-scale Linux environments. The role demands deep troubleshooting skills, strong exposure to Veritas Clustering (VCS), SAN/NAS storage, and close coordination with data center teams for hardware incidents. The ideal candidate will work independently, lead incident resolution, and improve BAU operations through automation and best practices.
________________________________________
Key Responsibilities
Linux Administration (L3)
· Administer and troubleshoot RHEL, Oracle Linux, CentOS, SUSE in production.
· Diagnose complex OS issues: kernel panics, boot/GRUB failures, filesystem corruption, resource contention (CPU/RAM/I/O/Network), SELinux/AppArmor denials.
· Patch and upgrade OS at scale; manage package repositories and kernel updates with rollback strategies.
· Implement and audit security hardening (firewalld/iptables, CIS benchmarks, PAM, sudo, SSH, auditd).
· Manage system services (systemd), cron/timers, users/groups, sudoers, and system-wide configuration.
· Veritas Cluster Server (VCS/InfoScale)
· Install, configure, and administer VCS for HA/DR across multi-node clusters.
· Create/maintain service groups, resources, dependency trees; configure LLT/GAB, I/O fencing, and quorum.
· Integrate VxVM/VxFS (disk groups, volumes, file systems) with application failover.
· Conduct DR drills, failover testing, and root cause analysis for cluster events.
· Storage: SAN & NAS
· Liaise with storage teams for LUN provisioning, zoning, masking; validate multipathing (DM Multipath/PowerPath).
· Build and maintain filesystems (ext4/xfs/VxFS), mount policies, fstab and autofs.
· Manage NFS/CIFS/SMB exports/mounts, permissions, quotas, and locking issues.
· Troubleshoot pathing, latency, and I/O bottlenecks using OS, HBA, and array-side telemetry.
· Data Center & Hardware Coordination
· Coordinate with DC teams for racking/stacking, cabling, console access, and physical triage.
· Diagnose hardware faults (CPU, memory, NIC/HBA, disks/RAID/SSD, backplane, PSU, fans) and firmware/BIOS alignment.
· Raise and track OEM tickets (Dell/HP/IBM/Cisco), manage RMA, and oversee replacements and post-fix validation.
· BAU Operations & Incident Management
· Act as L3 escalation for P1/P2 incidents; drive bridge calls and lead technical recovery.
· Perform deep-dive log analysis (journald, syslog, dmesg, audit logs, application logs).
· Create/run SOPs/runbooks, maintain KB articles, and implement problem management (RCA, corrective actions).
· Support on-call rotation and scheduled maintenance windows (change management, CAB, MOPs).
· Networking (Host-Level)
· Troubleshoot TCP/IP, routing, VLANs/bonding/teaming, MTU, host firewalls, DNS/DHCP, NTP/Chrony.
· Collaborate with network teams on L2/L3 connectivity, load balancers, and firewall rules.
________________________________________
Required Experience & Skills
· 8–12+ years in enterprise Linux system administration with proven L3 ownership.
· Strong hands-on with VCS (Veritas Cluster Server), VxVM, VxFS, and HA/DR patterns.
· Solid SAN/NAS experience: LUNs, zoning, multipath, NFS/SMB.
· Demonstrated success working independently and leading during critical incidents.
· Advanced troubleshooting: kernel, performance, storage, and cluster-level failures.
· Scripting proficiency (Bash; Python preferred). Familiar with Ansible.
· Familiarity with VMware/KVM and basic cloud (AWS/Azure/Linux in cloud) concepts.
· Strong documentation discipline (SOPs, MOPs, RCAs) and ITIL-aligned processes