Overview
Remote
$55 - $60
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 6 Month(s)
Skills
USC Only
Job Details
s
Job Summary
The Website Administration & Site Reliability Manager is responsible for ensuring the stability, security, and performance of the organization s web platforms. This role oversees website infrastructure, manages system configurations, and implements best practices for uptime and reliability. Additionally, the position drives observability initiatives by deploying monitoring, alerting, and logging solutions to proactively identify and resolve issues. The ideal candidate combines technical expertise in web technologies with strong problem-solving skills and a commitment to delivering a seamless user experience.
The Website Administration & Site Reliability Manager is responsible for ensuring the stability, security, and performance of the organization s web platforms. This role oversees website infrastructure, manages system configurations, and implements best practices for uptime and reliability. Additionally, the position drives observability initiatives by deploying monitoring, alerting, and logging solutions to proactively identify and resolve issues. The ideal candidate combines technical expertise in web technologies with strong problem-solving skills and a commitment to delivering a seamless user experience.
Key Responsibilities
- Website Administration: Manage hosting environments, DNS, SSL certificates, and content delivery networks (CDNs). Ensure compliance with security standards and optimize site performance.
- Site Reliability: Implement and maintain high-availability architectures, disaster recovery plans, and automated deployment pipelines. Monitor uptime and resolve incidents promptly.
- Observability: Design and maintain monitoring dashboards, alerting systems, and log aggregation tools to provide real-time visibility into system health and performance.
- Collaboration: Work closely with development, security, and infrastructure teams to ensure smooth deployments and continuous improvement.
- Incident Management: Lead root cause analysis and post-mortem reviews to prevent recurrence of issues.
Required Skills
- Strong knowledge of web servers (e.g., Apache, Nginx), cloud platforms (AWS, Azure, Google Cloud Platform), and containerization (Docker, Kubernetes).
- Expertise in monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging systems (e.g., ELK stack).
- Familiarity with CI/CD pipelines and automation frameworks.
- Excellent troubleshooting and communication skills.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.