Senior Data Engineer

Overview

Remote
Hybrid
BASED ON EXPERIENCE
Contract - W2
Contract - Independent

Skills

SANS
Microsoft Azure
Data Processing
Object-Oriented Programming
RDBMS
Software Development Methodology
Testing
Documentation
Data Management
Unstructured Data
Elasticsearch
Extraction
SQL
Apache Sqoop
Data Structure
Software Design
Debugging
Policies and Procedures
Innovation
Neo4j
GPU
OLAP
Time Series
Science
PB
Use Cases
Real-time
Batch Processing
Apache Flink
Data Science
Data Lake
Workflow
Fraud
Algorithms
Fraud Management
Repair
Storage
Network
Stacks Blockchain
Collaboration
Product Design
Operational Efficiency
Sourcing
Analytics
ASA
Scalability
Process Improvement
IT Management
Big Data
Machine Learning (ML)
Open Source
Agile
Sprint
Scrum
Quality Assurance
Customer Relationship Management (CRM)
Information Technology
Computer Science
Software Engineering
Management
Python
Distributed Computing
Distributed File System
HDFS
Apache Hive
Apache Pig
Java
J2EE
Web Services
XML
SOAP
Architectural Design
Requirements Analysis
Amazon Web Services
Cloudera
Distribution
Cloudera Impala
Apache Kafka
Streaming
Mentorship
Leadership
Structural Engineering
Apache Spark
MapReduce
NoSQL
Database
Apache HBase
Apache Cassandra
Vertica
MongoDB
Apache Hadoop
Tableau
Visualization
Data Architecture
Data Quality
Data Warehouse
Database Design
Data Modeling
Reporting
Meta-data Management
Extract
Transform
Load
Unix
Linux
Shell Scripting
Scripting
Analytical Skill
Project Management
Artificial Intelligence
Migration
Software Development
Data Analysis
Cloud Computing
Cyber Security
Google Cloud Platform
Google Cloud

Job Details

Job Title: Senior Data Engineer
Job Location: Remote
Job Type: Contract preferred

Job Description:
This position is responsible for working independently and with a team to develop, implement and manage data pipelines and data-driven applications on Big Data / Hadoop platforms on both on-premise and Cloud platforms (Azure, AWS, Google Cloud).
This position designs, develops, tests and maintains distributed data processing applications on a Big Data Platform.
The candidate should have a solid understanding of object-oriented programming, programming principles, Java, Big Data, XML, RDBMS, debugging and analyzing code, and working with others to ensure a high-quality product.
The candidate will Implement large-scale data ecosystems including data management, governance and the integration of structured and unstructured data to generate insights leveraging cloud-based platforms.
The candidate should follow the complete SDLC, including participating in the following stages: requirements, analysis, design, coding, testing, documentation, and implementation.

Responsibilities:
" Design and Implement large-scale data ecosystems including data management, governance and the integration of structured and unstructured data to generate insights leveraging on-premise and cloud based platforms
" Design, implement and deliver large scale enterprise applications using big data open source solutions such as Apache Hadoop, Apache Spark, Kafka, and ElasticSearch.
" Implement data pipelines and data-driven applications using Java, Python on distributed computing frameworks like MapReduce, Hadoop, Hive, Pig, Apache Spark, etc
" Partner with cross-functional platform teams to set up KAFKA queues, to streamline message flows and govern the transfer mechanism
" Create and maintain optimal data pipeline architecture. Assemble large, complex data sets that meet functional / non-functional business requirements. Build distributed, scalable, and reliable data pipelines that ingest and process data at scale
" Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and Big Data technologies. Execute and Automate Extract, Transform & Load (ETL) operations on large datasets using Big Data tools like Spark, Sqoop, MapReduce
" Design and develop data structures that support high performing and scalable analytic applications on one or more of these databases Hive, Impala , NoSQL Databases HBase, Apache Cassandra, Vertica, or MongoDB
" Drive software design meetings and analyze user needs to determine technical requirements.
" Consult with the end user to prototype, refine, test, and debug programs to meet needs.
" Design and implement solutions that comply with all security policies and procedures, to ensure that the highest level of system and data confidentiality, integrity and availability is maintained. " Be the leader in enhancing existing infrastructure and internalize the latest innovation in technologies like in-memory (Aerospike,Juno, Graph (Janusgraph, NEO4j), and GPU DB, real-time OLAP, time-series analysis et. al.
" Leverage automation, cognitive and science-based techniques to manage data, predict scenarios and prescribe actions.
" Design and build data services that deal with big data (> 90PB) at low latency (i.e. sub-second) for a variety of use-cases spanning near-real-time analytics and machine intelligence using both Stream and Batch processing frameworks on Hadoop ecosystem technologies (e.g. Yarn, HDFS, Presto, Spark, Flink, Beam)
" Work closely with Data science teams to integrate data, algorithms into data lake systems and automate different Machine Learning workflows and assist with data infrastructure needs
" Harnessing the power of data, machine learning, and humans and building cutting edge fraud detection, collusion detection techniques and algorithms to enhance and scaling fraud management platforms.
" Design and implement reporting and visualization for unstructured and structured data sets using visualization tools like Tableau, Zoomdata, Qlik etc
" Review existing computer systems to determine compatibility with projected or identified needs, researches and selects appropriate frameworks, including ensuring forward compatibility of existing systems. " Review, repair and modify software programs to ensure technical accuracy and reliability of the program.
" Work with cross-functional operations teams such as systems, storage, and network to design technology stacks
" Analyze and improve stability, efficiency, and scalability of the data platforms. Collaborate with architects, engineers, and business users on product design and feature
" Drive operational efficiency by maintaining data ecosystems, sourcing analytics expertise and providing Asa-Service offerings for continuous insights and improvements
" Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing data infrastructure for greater scalability, etc.
" Recommend and implement process improvements.
" Help document best practices in developing and deploying data pipelines and solutions
" Function as a front-line technical resource for "best practice and informal customer questions. Serve as technical expert on development projects
" Provide technical leadership and mentoring to engineers
" Share your work with the world-at-large via authoring blogs & articles, contributing to open source projects, and by speaking internally (and external to Clairvoyant).
" Bring new ideas in the cloud, big data, and machine learning software development. Leverage industry knowledge and stay close to technology developments in the open-source and cloud communities " Working in agile methodology, involve in Grooming, Sprint Planning, and daily Scrum meetings.
" Attend Requirements Review Meetings and provide feedback to ensure that the system meets all primary requirements as per standards.
" Work with QA and Business teams to resolve defects and issues by analyzing server logs.
" Provide project leadership for client initiatives. This would include project management functions and support services.
" Work with stakeholders including the management, Product, Data and Design teams to assist with datarelated technical issues and support their data infrastructure needs.
" Provide clear and constructive product feedback to Client Service Management teams based on customer requirements
" Bachelor's degree in Information Technology, Computer Science, Software Engineering, or related IT field required for this position
" Understand the complexities of building and managing data pipelines, data-driven applications
" Coding in Java, Python with experience in working with large data sets, experience working with distributed computing (MapReduce, Hadoop, Hadoop Distributed File System (HDFS), Hive, Pig, Apache Spark, etc (1 + years)
" Java, J2EE, Web Services, XML, SOAP (2+ years) " Application architectural design (1+ years)
" Ability to gather and document requirements, analysis, and specifications
" AWS Certification, Hadoop Certification or Spark Certification Preferred
" Strong technical expertise in Hadoop (Cloudera distribution), Impala, HBase, Kafka, Apache Spark, and Spark Streaming.
" Interpret business requirements and effectively implement into a software solution
" Take direction and mentoring from other senior developers and leadership
" Able to translate business requirements into logical and physical file structure design
" Ability to build and test rapidly Spark/Map Reduce code in a rapid, iterative manner
" Experience with NoSQL Databases HBase, Apache Cassandra, Vertica, or MongoDB
" Strong Hadoop scripting skills to process petabytes of data
" Experience designing and implementing reporting and visualization for unstructured and structured data sets.
Experience with Tableau or similar visualization tools preferred
" Understanding of the benefits of data warehousing, data architecture, data quality processes, data warehousing design and implementation, table structure, fact, and dimension tables, logical and physical database design, data modeling, reporting process metadata, and ETL processes.
" Experience in Unix/Linux shell scripting or similar programming/scripting knowledge
" Strong analytical skills regarding technical and project management issues
" Excellent communications and interpersonal skills
" Work independently and as a member of a team
" Handle multiple tasks concurrently.
" Ability to travel up to 25 percent of the time.

About Us:
InterSources Inc, a Certified Diverse Supplier, was founded in 2007 and offers innovative solutions to help clients with Digital Transformations across various domains and industries. Our history spans over 16 years and today we are an Award-Winning Global Software Consultancy solving complex problems with technology. We recognize that our employees and our clients are our strengths as the diverse talents and opportunities they bring to the table enable us to grow as a global platform and they are causally linked with our success. We provide strategic and technical advice, and we have expertise in areas covering Artificial Intelligence, Cloud Migration, Custom Software Development, Data Analytics Infrastructure & Cloud Solutions, Cyber Security Services, etc. We make reasonable accommodations for clients and employees and we do not discriminate based on any protected attribute including race, religion, color, national origin, gender sexual orientation, gender identity, age, or marital status. We also are a Google Cloud partner company. We align strategy with execution and provide secure service solutions by developing and using the latest technologies that thrive our resources to deliver industry-leading capabilities to our clients and customers, making it convenient for our clients to do business with InterSources Inc. Our teams also drive growth by refining technology-driven client experiences that put the users first, providing an unparalleled experience. This results in strengthening the core technologies of clients, enabling them to scale with flexibility, create seamless digital experiences and build lifelong relationships.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.