Data Analyst – Data Science/Big Data
Philadelphia, PA 19103
- Degree in the following areas: Statistics, Data Science, Computer Science or relevant science or engineering discipline.
- Bring a combination of mathematical rigor and analytical thinking to create recipes that extract relevant insights from billions of rows of data to meaningfully improve user experience.
- 2+ years working within an enterprise data lake/warehouse environment or big data architecture.
- Understanding of machine learningtechniques andalgorithms, both theoretical underpinnings and craft.
- Applied statistics skillsand understanding of probabilitydistributions, statistical testing, regression, etc.
- Experience with common data science toolkits, such asscikit-learn, matplotlib, R, ggplot, etc. (excellence in at least one of these is highly desirable).
- Great communication skills.
- Proficiency in visualization tools, such as D3.js, Tableau, Looker, Amazon Quicksight, Grafana (excellence in at one or more is highly desirable).
- Proficiency in using query languages such as SQL and Hive.
- Good scripting and programming skills in, Python, R, and Scala (excellence in at least one of these is highly desirable).
- Data-oriented personality.
Preferred Additional Skills:
- Experience with working in Spark
- Experience with data visualization tools, such as D3.js, Tableau, Looker, Amazon Quicksight
- Experience with NoSQL databases, such asMongoDB, Redis/ElasticCache,Cassandra, HBase
- Building a strong intuitive understanding of the problem domain (Next Generation Access Networks) and identifying testable hypotheses to explain interesting phenomena in this domain.
- Selectingand transformingfeatures and building & optimizing classifiers using machine learning techniques.
- Integrating data from multiple sources including third party sources.
- Data mining using state-of-the-art methods.
- Enhancing data collection procedures to include information that is relevant for building analytic systems.
- Frequent meeting/communication with stakeholders to interpret their needs, plan/organize, and discuss progress and results.
- Developing actionable quantitative models in the areas of effectiveness, ROI, pricing and optimization.
- Doing extensive data exploration and ad-hoc analysis and presenting insight in a clear manner.
- Developing and communicating goals, strategies, tactics, project plans, timelines, and key performance metrics to reach goals.
Here are some of the specific technologies we use:
- SQL, Python, R, Scala, Java
- Visualization suites (AWS Quicksight, Grafana, ggplot, matplotlib, seaborn)
- Spark(AWS EMR, Databricks), AWS Lambda
- Avro, Parquet
- Stream Data Platforms: Kafka, AWS Kinesis
- MySQL, Cassandra,HBase,MongoDB, RDBMS
- Caching Frameworks (ElasticCache/Redis)
- Elasticsearch, Beats, Logstash, Kibana