Overview
Skills
Job Details
We are looking for a skilled and proactive Systems Operations and Monitoring Engineer to support 24x7 infrastructure monitoring and operations, ensuring the stability, security, and performance of server, network, and database systems. The ideal candidate will be responsible for monitoring critical infrastructure components, resolving issues promptly, and maintaining compliance with Federal Information Security Modernization Act (FISMA) standards.
Education:
- Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent work experience)
Responsibilities:
24x7 Infrastructure Monitoring:
Maintain continuous (24x7) monitoring and configuration of server and network operations using automated monitoring systems to ensure operational stability and uptime.Protocol Configuration and Management:
Install, configure, and maintain services such as File Transfer Protocol (FTP), Secure File Transfer Protocol (SFTP), and Hypertext Transfer Protocol (HTTP) across multiple environments.Troubleshooting and Incident Resolution:
Provide rapid issue diagnosis and resolution across server, network, and database platforms. Escalate and communicate issues effectively with appropriate mitigation plans.Server and Database Health Monitoring:
Continuously monitor server and database health across the network. Generate reports and alerts for abnormal behavior or performance issues and engage necessary stakeholders.Recovery and Root Cause Analysis:
Perform server recovery, data retrieval, and detailed root cause analysis in the event of outages or system failures, ensuring minimal downtime and business impact.Compliance and Security:
Structure operations in accordance with FISMA requirements, maintaining high levels of security compliance.Documentation and Reporting:
Maintain detailed logs, incident reports, and documentation of monitoring configurations, system changes, and recovery procedures to support audit readiness and operational continuity.3 5+ years of experience in IT operations, infrastructure monitoring, or system administration in a 24x7 environment
Strong expertise in server protocols including FTP, SFTP, and HTTP
Experience with system and network monitoring tools (e.g., Nagios, SolarWinds, Zabbix, etc.)
Solid understanding of server recovery processes, backup and restore methodologies
Familiarity with FISMA compliance and federal security guidelines
Excellent troubleshooting, communication, and documentation skills
Ability to work in rotational shifts, including nights, weekends, and holidays
Preferred Skills:
Experience with Linux and Windows server environments
Familiarity with cloud infrastructure (AWS, Azure, or similar)
Automation/scripting skills (Shell, Python, PowerShell)
Previous experience in a federal or government IT environment