CompuGain is an Information Technology and Business Consulting firm providing project-based solutions, software solutions and professional staffing services.
Cleanse, manipulate and analyze large datasets (Structured and Unstructured data XMLs, JSONs, PDFs) using Hadoop platform.
Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
Manage and implement data processes (Data Quality reports)
Develop data profiling, deduping logic, matching logic for analysis
Programming Languages experience in Python, PySpark and Spark for data ingestion
Programming experience in BigData platform using Hadoop platform
Present ideas and recommendations on Hadoop and other technologies best use to management
5+ years of experience in processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs)
3+ years of programming experience in Python, Spark for data processing and analysis.
Strong SQL experience is a must
3+ years of experience using Hadoop platform and performing analysis.
Education Requirements: Bachelor s degree in computer science, information systems or another related field.
Duration: 6 months
Thanks and Regards
Vinay Kumar | CompuGain
P: 703 520 1734
13241 Woodland Park Rd, Ste 610 Herndon, VA, 20171