Flink Administrator

Overview

Full Time

Skills

Python
SQL
Big Data
Shell Scripting
Database
Linux
PL/SQL
Oracle
DEV OPS
Continuous Integration/Delivery
Scripting
Splunk
Change Management
Object Oriented Programming
Networking
Amazon Web Services
GCP
Metrics
Hadoop
Jenkins
Subversion
Puppet
CHEF
GitHub
GitLab
Docker
Kubernetes
Terraform
Application Deployment
Software Configuration
Fault-Tolerant
IAAS
Infrastructure Management
Streaming
Software Life Cycle
Performance Analysis
Forecasting
Packer
Welding
Inventory
Capacity Planning
Chef (All)
Switch Capacity

Job Details

Flink Administrator
Location : Dallas TX

Who are we looking for?

As a Big data Administrator, help in maintaining and administering on-premises and cloud based big data platform. Help in setting up Platform, automation, maintaining knowledgebase/ run books, troubleshooting, restoring service on platform and provide support.

Your responsibilities:

  • Build and support on-premises Hadoop, Flink (Cloudera Streaming Analytics) platform infrastructure and applications.
  • Deploy Flink based applications on the platform and use configuration management tools (such as Ansible, SaltStack, etc..) to manage them.
  • Deploy software to improve the availability, scalability, and efficiency of the platform.
  • Facilitate capacity planning and demand forecasting, software performance analysis, and system tuning.
  • Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
  • Partner with development teams in defining and implementing improvements.
  • Propose solutions related to server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance.
  • Troubleshoot priority incidents, facilitate blameless post-mortems.
  • Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions.
  • Work with development teams throughout the software life cycle ensuring sustainable software releases.
  • Lead and participate in tests; identify bottlenecks, opportunities for optimization, and capacity demands.
  • Participate in the 24x7 support coverage as needed.
  • Measurement and optimization of service performance
  • Tooling to enable observability services, Automating CI/CD pipelines.
  • Provide technical escalation, contribute in the on-call rotation.
  • Automate monitoring system to ensure uptime on production system.
  • Have experience in / be able to troubleshoot end-to-end on a private or public clouds Infrastructure.
  • Infrastructure Monitoring and Reports for all performance metrics.


Technical Skills:

  • 10+ years' experience in Hadoop, YARN infrastructure management and application deployment
  • Hands on experience in maintain and support Flink/Spark streaming application on Hadoop/ cloud environment.
  • 4+ years of experience in DevOps and Shell Scripting
  • SRE Engineer with strong experience in monitoring, troubleshooting and support.
  • Support rapid development and engineering productivity via release engineering, CI/CD & IaC automation, and build tools.
  • Perform health checks Apps/Infra to identify and proactively pre-empt issues from occurring (verification, alerts, etc).
  • Experience with Python including Object Oriented programming.
  • Working experience on Splunk to work on logs inventory creating dashboards, etc for various streams such as Linux, etc
  • Experience with Ansible, Puppet, SaltStack
  • Container administration and development utilizing Kubernetes, Docker, Mesos, or similar.
  • Infrastructure automation through Terraform, Chef, Ansible, Puppet, Packer or similar.
  • Experience with Cloud Orchestration frameworks, development and SRE support of these systems.
  • Experience with CI/CD pipelines including VCS (git, svn, etc), Gitlab Runners, Jenkins, Rundeck
  • Oracle Database knowledge in ATP, ADW and programming in SQL, PL/SQL
  • Cloud network experience
  • Experience with Linux
  • Experience working with fault tolerant, highly available, high throughput, distributed, scalable systems.
  • Integration with Code Deploy / GitHub Actions
  • Experience in IaaS tools like CFT, Terraform


Nice to have:

  • Experience with Kubernetes or other container orchestration framework.
  • Experience in public cloud-based solutions like Azure, Google Cloud Platform, AWS