Amazon EMR & Big Data Solutions Architect

Amazon EMR Spark PySpark Scala CloudFormation AWS CLI Amazon Athena AWS RedShift PostgreSQL Teradata Kinesis Firehose Kafka Spark Streaming Apache Flink
Full Time, Direct Placement
Depends On Experience
Work from home not available Travel required to 75%.

Job Description

US citizens and Green Card Holders and those authorized to work in the US are encouraged to apply. We are unable to sponsor H1b candidates at this time

Job Responsibilities:

  • Interface with client project sponsors to gather, assess and interpret client needs and requirements
  • Develop a data model and Data Lake design around stated use cases to capture client's KPIs and data transformations
  • Identify one or more relevant AWS services -- especially on Amazon EMR -- and an architecture that can support client workloads/use-cases; evaluate pros/cons among the identified options before arriving at a recommended solution optimal for the client's needs.
  • Be able to explain to the client the tradeoffs among the various AWS options, and why the recommended solution(s) and architecture was chosen as an optimal one for the the client's needs.
  • Work closely with the client and broader NorthBay Delivery team to implement in Agile fashion the architecture and chosen AWS services using AWS Best Practices and principles from the AWS Well-Architected Framework
  • Assess, document and translate goals, objectives, problem statements, etc. to our offshore team and onshore management
  • Advising on database performance, altering the ETL process, providing SQL transformations, discussing API integration, and deriving business and technical KPIs
  • Help transition the implemented solution into the hands of the client, including providing documentation the client can use to operate and maintain the solution.
  • Help client with its Continuous Improvement processes to learn from each customer project, including doing project retrospectives and writing up "Lessons Learned”.
  • Able to travel up to 70% (US- Domestic)

Qualifications - Must Haves:

  • Strong Design / Development Experience on Amazon EMR, preferably with Spark (PySpark, Scala)
  • Strong troubleshooting / admin experience with EMR – specific infrastructure (CloudFormation) code, deployment via AWS CLI, and bootstrap actions.
  • Ability to implement transient infrastructure (e.g. transient EMR clusters) that leverages decoupled storage (S3) and compute. Implement these using reproducible automated mechanisms like AWS CLI scripts, CloudFormation templates, and custom code leveraging AWS SDKs.
  • Strong experience on one or more MPP Data Warehouse Platforms preferably Amazon EMR (incl. Presto), Amazon Athena, AWS RedShift, PostgreSQL, Teradata or similar
  • Possess in-depth working knowledge and hands-on development experience in building Distributed Big Data Solutions including ingestion, caching, processing, consumption, logging & monitoring
  • Strong Development Experience on at least one or more event-driven streaming platforms preferably Kinesis, Firehose, Kafka, Spark Streaming, or Apache Flink
  • Strong Data Orchestration experience using one or more of these tools: AWS Step Functions, Lambda, AWS Data Pipeline, AWS Glue orchestration, Apache Airflow, Luigi or related
  • Strong understanding and experience with Cloud Storage infrastructure, and operationalizing AWS-based storage services & solutions preferably S3 or related
  • Strong technical communication skills and ability to engage a variety of business and technical audiences explaining features, metrics of Big Data technologies based on experience with previous solutions
  • Strong Understanding of at least one or more Cluster Managers (YARN, Hive, Kubernates, Pig, etc)

Nice to Haves:

  • Strong Data Cataloging experience preferably using AWS Glue or Other
  • Strong Development Experience on at least one NoSQL OR Document databases
  • Experience on at least one or More Ingestion Integration tools Like Apache NIFI or Streamset or related
  • Strong Development Experience on at least one Caching Tool like Amazon Elasticache (with Redis or Memcached) or Lucene
  • Strong Understanding and experience in Big Data Audit Logging and Monitoring solutions like AWS CloudTrail and CloudWatch.

Additional Qualifications:

  • 5+ years of AWS Solutions implementation, professional services experience, prefer Data Analytics space.
  • A passion for exploring data and extracting valuable insights.
  • Proven analytical, problem solving, and troubleshooting expertise.
  • Proficiency in SQL, preferably across a number of dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).
  • Exposure to developer tools/workflow (e.g., git/github, *nix, SSH)
  • Experience optimizing database/query performance.
  • Experience with AWS ecosystem (EC2, S3, RDS, Redshift).
  • Experience with business intelligence tools with a physical model (e.g., MicroStrategy, Business Objects, Cognos).
  • Experience with data warehousing.
  • Exposure to NoSQL-based, SQL-like technologies (e.g., Hive, Pig, Spark SQL/Shark, Impala, BigQuery)
  • Excellent verbal and written communication skills

Education and Experience:

  • Bachelor's Degree in Computer Science or Equivalent
  • Minimum five years of Big Data Engineering on AWS experience

Desired Certifications:

  • AWS Solution Architect
  • AWS Big data Specialty
  • Or any Data Centric Certifications

Visit our website at to learn more about us.

Dice Id : gtt
Position Id : 19-00324
Have a Job? Post it

Similar Positions

AWS/Big Data Technical Architect
  • Reliable Software Resources
  • Boston, MA
Big Data || Data Lake || AWS
  • TekShapers
  • Cambridge, MA
Sr. Cloud/ Big Data Developer - BURL4569
  • InfoGroup
  • Burlington, MA
Talend Big Data, Marlborough, MA, Full Time
  • Wipro Ltd.
  • Marlborough, MA
Big Data Architect
  • Tech Genomics LLC
  • Boston, MA
Big Data Architect
  • PredicaInc
  • Salt Lake City, Utah
Hadoop Big Data Engineer
  • Wayfair
  • Boston, MA
Data Solutions Architect/Manager
  • HireTalent, LLC
  • Boston, MA
Solution Architect Big Data (FTE/contract)
  • Systel,Inc.
  • Marlborough, MA
Big Data Developer
  • VLink Inc
  • Watertown, MA