*This position requires US Citizenship
We are seeking a hands-on, seasoned Hadoop Developer to work on a mission critical information sharing platform ) that will enable the sharing of secure, accurate, and privacy-controlled data with approved stakeholders, while protecting sensitive data and preserving privacy within the federal agency components and other national security agencies. The system is deployed across the classified and unclassified domains within Federal Agency, and housed at Federal Agency centers. The individual will work closely with customers and infrastructure teams to design and implement Big Data analytic solutions on a Hadoop–based platform.
Optimize HDFS infrastructure, using MapReduce and Spark-based jobs
Test and refine data throughput within the cluster using Spark and MR jobs
Create custom analytic jobs to help extract knowledge and meaning from vast stores of data.
Refine a data processing pipeline focused on unstructured and semi-structured data refinement. Support quick turn and rapid implementations and larger scale and longer duration analytic capability implementations.
Ingest data from various structured and unstructured data sources into Hadoop and other distributed Big Data systems.
Support the sustainment and delivery of an automated ETL pipeline using a suite of COTS, GOTS, and other tools.
Validate data that is extracted from structured and unstructured data inputs, databases, and other repositories using scripts and other automated capabilities, logs, and queries. Enrich and transform extracted data, as required.
Monitor and report the data flow through the ETL process.
Perform data extractions, data purges, or data fixes in accordance with current internal procedures and policies.
Track development via user stories and decomposed technical tasks in a provided issue tracking software, including, JIRA.
Test and validate integration points with downstream columnar databases.
3+ years of experience with distributed scalable Big Data systems and/or NoSQL databases, including Hadoop, Accumulo, HBase
CDH-certified Hadoop Developer
Experience in MapReduce and Spark programming within the Hadoop Distributed File System (HDFS) and with processing large data stores (minimum 20 data nodes)
Experience with the design and development of multiple object–oriented systems ( 2+ years of experience with software development throughout the SDLC)
Experience with Open–Source Software or COTS products
Experience with Linux, including CentOS and Red Hat
Experience with working on Scrum or other Agile methodology
Ability to show flexibility, initiative, and innovation when dealing with ambiguous and fast–paced situations
Ability to obtain a security clearance
BS degree in CS or equivalent
Experience with Hadoop
Experience with R or Python
Experience with using repository management solutions
Experience with deploying applications in a Cloud environment
Experience with designing and developing automated analytic software, techniques, and algorithms
1600 Tysons Blvd, Suite 800 McLean, VA, 22102