Overview
Skills
Job Details
About the Role
We are seeking a talented and motivated Site Reliability Engineer (SRE) to join a global high frequency trading firm.
This role focuses on ensuring the reliability, scalability, and performance of critical systems while leveraging Linux, Python, and Ansible for automation and infrastructure management. The ideal candidate will have a strong background in systems engineering, automation, and a passion for improving operational efficiency.
System Reliability and Performance:
- Maintain and optimize Linux-based systems to ensure high availability and performance.
- Monitor system health and proactively address issues to minimize downtime.
- Implement robust incident response processes to resolve critical system outages efficiently.
Automation and Infrastructure Management:
- Develop and maintain automation scripts using Python to streamline repetitive tasks.
- Use Ansible to automate configuration management, software deployments, and infrastructure provisioning.
- Implement Infrastructure as Code (IaC) practices to enhance scalability and consistency.
Monitoring and Observability:
- Design and implement monitoring solutions using tools like Prometheus, Grafana, or similar platforms.
- Create dashboards and alerts to provide visibility into system performance and reliability metrics.
- Analyze logs and metrics to identify trends, bottlenecks, or areas for improvement.
Collaboration and Continuous Improvement:
- Work closely with development teams to integrate reliability best practices into the software development lifecycle (SDLC).
- Identify opportunities to improve system architecture, processes, or tools to enhance overall reliability.
Required Qualifications
- Strong experience managing Linux-based systems in production environments.
- Proficiency in Python for scripting, automation, and tool development.
- Hands-on experience with Ansible for configuration management and automation.
- Familiarity with CI/CD pipelines and DevOps practices.
- Knowledge of containerization technologies such as Docker or Kubernetes is a plus.
Key Competencies
- Strong problem-solving skills with a focus on root cause analysis and prevention.
- Excellent communication skills for collaborating with cross-functional teams.
- A proactive mindset with a passion for improving system reliability and efficiency.
Benefits
- Competitive total compensation up to $400k
- Opportunities for professional growth through training programs and certifications.
This role is ideal for candidates who thrive in fast-paced environments, excel at automating complex workflows, and are passionate about building reliable systems that drive business success. If you re ready to make an impact as an SRE engineer working with Linux, Python, and Ansible, we encourage you to apply!