Lead Data Engineer - Batch and Stream Processing

company banner
GM Financial
IT, TCP, IP, PC, Windows, Desktop, UNIX, Linux, WAN, Python, IBM, SAS, Tibco, Java, SQL, Oracle, Business Intelligence, Cognos
Full Time

Job Description


We are expanding our efforts into complementary data technologies for decision support in areas of ingesting and processing large data sets including data commonly referred to as semi-structured or unstructured data. Our interests are in enabling data science and search based applications on large and low latent data sets in both a batch and streaming context for processing. To that end, this role will engage with team counterparts in exploring and deploying technologies for creating data sets using a combination of batch and streaming transformation processes. These data sets support both off-line and in-line machine learning training and model execution. Other data sets support search engine based analytics. Exploration and deployment of technologies activities include identifying opportunities that impact business strategy, selecting data solutions software, and defining hardware requirements based on business requirements. Responsibility also includes coding, testing, and documentation of new or modified scalable analytic data systems including automation for deployment and monitoring. This role participates along with team counterparts to architect an end-to-end framework developed on a group of core data technologies. Other aspects of the role include developing standards and processes for data engineering projects and initiatives


Job Duties

  • Evaluate, research, experiment with batch and streaming data engineering technologies in a lab to keep pace with industry innovation while assessing business impact and viability for use cases associated with efforts in hand
  • Work with data engineering related groups to inform on and showcase capabilities of emerging technologies and to enable the adoption of these new technologies and associated techniques
  • Define and refine processes and procedures for the data engineering practice
  • Work closely with data scientists, data architects, ETL developers, other IT counterparts and business partners to identify, capture, collect and format data from the external sources, internal systems and the data warehouse
  • Code, test, deploy, monitor, document, and troubleshoot data engineering processing and associated automation
  • Define data engineering architecture both hardware and software reflective of business requirements to be included in end-to-end solution architecture
  • Educate and develop ETL developers on data engineering so as to enable transition to data engineer and practice
  • Perform other duties as assigned
  • Conform with all company policies and procedures



  • General working knowledge of networking concepts including TCP/IP, Subnetting, Routing, DHCP, Command line and DNS
  • Knowledge of PC hardware and software
  • Must have a broad understanding of enterprise computer hardware/software and corporate information systems
  • Strong knowledge of operating systems, applications and associated hardware (eg, Windows Desktop OSs, Windows OS, UNIX/Linux)
  • Understanding of enterprise computer hardware/software and information systems
  • Working knowledge of WAN security and design


  • Ability to accept change and to adapt to shifting organizational challenges and priorities
  • Ability to coach, develop and lead others
  • Ability to evaluate problems and issues quickly, and to make recommendations for courses of action
  • Ability to make independent decisions and use sound judgment in relation to the management of team members
  • Ability to prioritize tasks and ensure their completion in a timely manner
  • Excellent analytical and troubleshooting skills
  • Strong interpersonal, verbal and written skills

Additional Knowledge Skills and Abilities

  • Working knowledge with directed analytic graph stream processing using Beam, Flink, NiFi and/or Samza
  • Kubernetes - Spark on Kubernetes, Spark Operator
  • Working knowledge of Object Storage technologies like S3, Minio, Ceph, ADLS etc.
  • Delta Lake, Hudi, Iceberg
  • Python, PySpark, Spark ML, H2O, TensorFlow
  • Streaming platforms like Kafka, Pulsar, Rabbit MQ, IBM MQ etc.
  • Awareness or working knowledge of data processing with SAS, Salesforce, Snowflake etc.
  • Tibco Data Virtualization


  • Bachelor's Degree in related field or equivalent work or military experience required


  • 5-7 years software engineering to include Java, Scala, and Python programming languages strongly preferred with a minimum of 3 - 5 years required
  • 2-5 years hands-on experience with Spark ETL pipelines to process Big Data for Data Lake Ecosystem
  • 1-3 years Docker, Git, Maven, DevOps
  • 1-3 years Jupyter/Zeppelin Notebooks
  • 5-7 years Eclipse/IntelliJ IDEs, SQL databases like Oracle, SQL Server, etc.
  • 2-3 years NoSQL databases like Solr, Elasticsearch, etc.
  • 1-3 years Business Intelligence technologies like Power BI, Hue/Superset, Cognos
  • 2-3 years Linux Shell scripting, automation with Terraform, Ansible, etc. #LI-TS1

Company Information

GM Financial, a wholly owned subsidiary of General Motors, is a global provider of auto finance solutions, with operations in the U.S., Canada, China, and Latin America. We employ more than 9,000 hard-working team members in North America, and we're always looking for new people with diverse talents. GM Financial is a workplace where dedicated people have the opportunity to work together and celebrate our successes. Our culture is based on respect, integrity, innovation and personal development.
Dice Id : 10120555
Position Id : 2020-38381
Originally Posted : 2 months ago

Similar Positions at GM Financial

Big Data Developer/Engineer
  • Arlington, TX
  • 2 days ago
Sr Data Scientist - Machine Learning
  • Arlington, TX
  • 2 days ago
Lead Site Reliability Engineer
  • Arlington, TX
  • 2 days ago
Cybersecurity Analyst
  • Arlington, TX
  • 2 days ago
Support Technician II
  • Huntersville, NC
  • 2 days ago