Overview
Skills
Job Details
Job Title: DevOps Engineer/Site Reliability Engineering
Location: Bentonville, AR (Day 1 onsite)
Duration: Long Term
Must have skills.
Python, DevOps,Google Cloud Platform, Kubernetes, Grafana, Splunk, Linux,UNIX, Airflow, Google Cloud Platform infrastructure, Storage
Key Responsibilities:
Design, implement, and troubleshoot CI/CD pipelines using Google Cloud Platform and Astronomer-managed Airflow.
Set up and maintain Google Cloud Platform infrastructure including storage, roles, and permissions.
Deploy and manage Airflow environments (prod/non-prod), handle version upgrades, and assist users in DAG setup and scheduling.
Monitor and respond to alerts (xMatters/Slack), provide on-call support, and manage user tickets and queries.
Plan and execute data migrations, resolving performance and configuration issues.
Optimize cloud storage, implement data lifecycle policies, and manage storage costs and performance.
Administer containerized workloads using Kubernetes, Docker, and Helm Charts.
Required Skills:
Strong background in Linux/UNIX systems administration and command-line tools.
Proficiency in Python and bash scripting
Hands-on experience with cloud platforms (AWS, Azure, Google Cloud, or others).
Strong experience in DevOps or Site Reliability Engineering roles.
Strong hands-on experience with Google Cloud Platform (Dataproc, IAM, Storage).
Experience managing container-based infrastructures, including Docker and Kubernetes.
In-depth knowledge of CI/CD tools and practices.
Deep understanding of DevOps principles, infrastructure as code, and cloud security best practices.
Monitoring and Logging tools : Prometheus, Grafana, Splunk etc
Torque Technologies LLC is an Equal Opportunity Employer (EOE). Qualified applicants are considered for employment without regard to age, race, color, religion, sex, national origin, sexual orientation, disability, or veteran status.