We are seeking a
Senior DevOps Engineer to enhance our high-performance computing services and collaborate closely with the scientific community to optimize research computing.
Join our team to build and operate cutting-edge HPC capabilities using automation and infrastructure-as-code. Apply now to contribute to innovative computational solutions in a dynamic environment.
To discover more about Cloud practice at EPAM Georgia, visit this page .
This position offers remote setup with the flexibility to work from any location in Georgia, whether it's your home, well-equipped offices in Tbilisi and Batumi or a coworking space in Kutaisi.
Responsibilities
- Design, implement, and maintain robust platform infrastructure using Infrastructure as Code tools such as Terraform
- Develop, deliver, and operate research computing services and applications
- Apply Site Reliability Engineering principles to manage HPC service deployment, monitoring, and incident response
- Solve complex technical problems related to HPC services and user applications
- Manage large-scale HPC, HTC, or BC computing environments for optimal performance
- Collaborate with scientific users to tailor HPC resources to research needs
- Automate deployment processes to ensure consistency across HPC infrastructure
- Maintain and administer large-scale cluster and server computing software such as Slurm, LSF, or Grid Engine
- Develop and maintain monitoring dashboards using tools like Grafana and Prometheus
- Work within a DevOps team environment following agile methodologies
- Operate and utilize virtualized private cloud resources such as OpenStack
- Administer large-scale parallel filesystems including Weka, GPFS, or Lustre
- Use configuration management tools like Ansible, Salt, or Puppet to manage IT operations
- Develop scripts and tools for HPC and DevOps platform operations using Bash and Python
Requirements
- 3+ years of experience with DevOps processes and automation using Infrastructure as Code tools such as Terraform
- Hands-on experience operating or engineering large-scale HPC or similar computing environments
- Proven expertise in Linux system administration including TCP/IP networking and storage subsystems
- Experience administering large-scale cluster management software such as Slurm, LSF, or Grid Engine
- Knowledge of configuration management tools like Ansible, Salt, or Puppet
- Experience working in agile DevOps teams
- Ability to develop and maintain monitoring tools such as Grafana and Prometheus
- Experience with scripting languages such as Bash and Python for automation and tool development
- Strong experience managing virtualized private cloud environments like OpenStack
- Scientific degree or equivalent experience in computationally intensive scientific data analysis
- Proven ability to manage relationships with third-party suppliers
- Upper-intermediate proficiency in English (B2+)
Nice to have
- Experience with container technologies such as LXD, Singularity, Docker, or Kubernetes
- Operation and configuration experience with public cloud platforms like AWS, Azure, or Google Cloud Platform
- Experience with HashiCorp tools such as Vault, Consul, and Nomad
- Development experience with programming languages such as Java, C++, Python, Ruby, or Perl
- Experience with parallel filesystems like Weka, GPFS, or Lustre
We offer/Benefits
We connect like-minded people- Delivering innovative solutions to industry leaders, making a global impact
- Enjoyable working environment, whether it is the vibrant office or the comfort of your own home
- Opportunity to work abroad for up to two months per year
- Relocation opportunities within our offices in 55+ countries
- Corporate and social events
We invest in your growth- Leadership development, career advising, soft skills and well-being programs
- Certifications, including Google Cloud Platform, Azure and AWS
- Unlimited access to LinkedIn Learning and Get Abstract
- Free English classes with certified teachers
We cover it all- Participation in the Employee Stock Purchase Plan
- Monetary bonuses for engaging in the referral program
- Comprehensive medical & family care package
- Five trust days per year (sick leave without a medical certificate)
- Benefits package (sports activities, a variety of stores and services)
EPAM Georgia is a team of innovators united by a passion for technology. The dynamic and inclusive culture we embrace helps positively impact our communities, clients, and employees. Here you will collaborate with multi-national teams, contribute to numerous cutting-edge projects, deliver the most creative solutions, and have an opportunity to learn. Our people are at the heart of our success, and we are proud to provide talents with a solid ground to develop and grow.
Why Choose Us
2024 Best Place to Work 2024 2024 Sitecore's Partner Experience Awards
Looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.
Find me a job