Site Reliability Engineer (SRE)

Agile, Analysis, Architecture, Automated, Bash, DCE, Development, Diagnostic, Firewall, Genetic, GIT, Hardware, Jenkins, Libraries, Linux, Management, Metrics, Networking, Networks, Programming, Project, Protocols, Python, Scrum, Security, System Architecture, Testing, Unix, VM
Full Time
Work from home not available Travel not required

Job Description






GTL is seeking experienced and talented Site Reliability Engineers who will be responsible for providing engineering and maintaining automated build and deployment tools, and provide Site Reliability Engineering support across the organization.



One job location will be at our offices in Reston, VA, and the other will be at our offices in Sacramento, CA.



The Site Reliability Engineers will be responsible for rapidly troubleshooting production problems in systems and services running in a Linux environment, resolving and improving performance problems, and assisting engineering with planned improvements and projects.



The selected candidate will need to conceptualize and architect highly available redundant pieces of infrastructure as it relates to
new projects entering production and the scaling of legacy systems. The ideal candidate will have extensive experience with Infrastructure as Code, DevOps, Automated Deployment tools, and working in a collaborative environment, with the ability to work independently and collaborate with a diversified cross functional team. This role acts as a critical member of the Site Reliability Engineering (SRE) team, interfacing with various departments within the organization.



Responsibilities:

  • Design, build and maintain:
    • Jenkins based automated build and deployment tools
    • Chef cookbooks, recipes, and libraries for existing and new application development
    • Terraform automation for VM platforms in collocated & company-owned datacenters as well as AWS
    • AWS automation, networking, security, management, and monitoring.
  • Develops multiple projects and tasks, providing engineering and production support, organized through the Kanban Agile process, participate in daily standup and drives solutions for production issues across the organization.
  • Proactively identifies problems with requirements (lack of clarity, inconsistencies, technical limitations) for their own work and adjacent work and communicates these issues early to help course-correct.
  • Write and maintain monitoring and maintenance processes using GTL tools such as Check_Mk, Zabbix, PagerDuty, Nagios and Orion.
  • Comply and follow system architecture processes and other requirements within documentation tool.
  • Create, review, and respond to service tickets in ticketing system and system alerts in internal and external alerting and metrics systems.
  • Be part of an on-call rotation for after-hours monitoring, maintenance, and escalations.
  • Picks up new technologies and patterns quickly and contributes a leading role to new designs and solutions.
  • Reviews solutions and work products for completeness, architectural integrity, security, extensibility and maintainability.
  • Follows current and helps develop standard practices for code design, review, test, versioning, and deployment.
  • Mentor and direct more junior engineers in best practices and increased levels of responsibility
  • Ensure that all systems and interfaces developed have a high level of security accountability, audit, and comply with any PII, PCI, SOC and other standards and policies.
  • Ability to investigate new technologies and make recommendations based on cost, efficiency, and matching best in class technologies that meet customer needs.
  • Mitigates risk by proactively designing solutions that have a high degree of automated testing, alarming, logging, and monitoring
  • Takes a lead role in anticipating and managing technical issues, mitigating risks, escalating issues appropriately and keeping all necessary parties informed.




Qualifications:

  • Bachelor s degree in Information Technology, Networking Engineering, Computer Science, Engineering or related field; 4 years of experience in lieu of a degree AND:
  • A minimum of five (5) years of experience infrastructure related engineering experience to include a minimum of three (3) years of Linux experience.
  • Strong working knowledge of Unix/Linux and IP Networking, IP Networks and Protocols, hardware issue troubleshooting, and maintaining highly-available enterprise software tools.
  • Experience in the following applications and technologies:
    • Unix/Linux Administration
    • Systems Security
    • Amazon Web Services (S3, EC2, RDS) administration
    • Programming and troubleshooting command line interfaces/script writing in Python, and/or Bash.
    • Continuous Integration (CI) such as Jenkins or Bamboo, and experience with leading DevOps tools to facilitate
      automation; Git, SonarQube, JUnit, Docker, OpenShift, container deployment, Linux, virtualization
    • Remote Access (Open SWAN, OpenVPN, IPSEC, GRE/IPSEC, etc.).
    • Experience maintaining automated tools such as Jenkins and Chef.
    • Writing and Maintaining Chef libraries and related deployment systems.
    • Systems monitoring, hardening software security, monitoring, firewall maintenance, and risk assessment is a plus. Architecture review policies, package management, Distributed File System Maintenance (OpenAFS, Swift Object Store, S3) is a plus
    • Diagnostics forensics, including root cause analysis using diagnostic and debugging tools to include strace, netstat, ngrep, iostat, mpstat, ps, pmap, lsof, etc. is a plus
  • Excellent presentation and communication skills with a high attention to detail.
  • Capable of prioritizing tasks with an understanding on the overall project and work product, identifies problems with requirements.
  • Demonstrates knowledge of industry trends, products, infrastructure and our build systems.
  • Communicates effectively across functions, able to work well with Product, Design, Analytics, etc. as necessary.
  • Strong organizational skills with the ability to prioritize multiple projects simultaneously in a face-paced environment.
  • Demonstrate a strong professional attitude with the ability to work in a collaborative, Scrum-team environment.
  • Able to take on projects and learn new concepts, working autonomously.
  • Ability to respond to after-hours issues, on call rotation schedule to include nights and weekends.
  • Ability to travel up to 5% per month.




GTL, an innovation leader in correctional technology, education solutions that assist in rehabilitating inmates, and payment services solutions for government. GTL leads the fields of correctional technology, education, and government payment services with visionary solutions and customized products that integrate seamlessly to deliver security, financial value, and operational efficiencies while aiding inmate rehabilitation and reducing recidivism rates.



GTL is committed to a policy of Equal Employment Opportunity and will not discriminate against an applicant or employee on the basis of race, color, religion, creed, national origin or ancestry, sex, pregnancy or pregnancy-related condition, age, physical or mental disability, veteran or military status, genetic information, sexual orientation, marital status, or any other characteristic or category protected by federal, state or local laws, regulations or ordinances. The information collected by this application is solely to determine suitability for employment, verify identity and maintain employment statistics on applicants.




#DCE
Dice Id : RTX1c38f5
Position Id : 286165
Have a Job? Post it