Overview
Skills
Job Details
Role :: Data Scientist
Location :: Tampa, FL/Dallas ,TX - Onsite
Type :: Fulltime
Job Description
Must Have Technical/Functional Skills
- Programming & Libraries: Expert-level proficiency in Python and its core data science libraries (Pandas, NumPy, Scikit-learn). Strong proficiency in SQL for complex data extraction and manipulation.
- Machine Learning Frameworks: Hands-on experience with modern deep learning frameworks such as TensorFlow or PyTorch.
- Statistical Modeling: Deep understanding of statistical concepts and a wide range of machine learning algorithms, with proven experience in time-series forecasting and anomaly detection.
- Big Data Technologies: Demonstrable experience working with large datasets using distributed computing frameworks, specifically Apache Spark.
- Database Systems: Experience querying and working with data from multiple relational database systems (e.g., PostgreSQL, Oracle, MS SQL Server).
- Cloud Platforms: Experience building and deploying data science solutions on a major cloud platform (AWS, Google Cloud Platform, or Azure). Familiarity with their native ML services (e.g., AWS SageMaker, Google Vertex AI) is a strong plus.
- MLOps Tooling: Practical experience with MLOps principles and tools for model versioning, tracking, and deployment (e.g., MLflow, Docker).
- Communication and Storytelling: Excellent verbal and written communication skills, with a proven ability to explain complex technical concepts to a non-technical audience through visual storytelling.
Roles & Responsibilities
Druid Data Modeling & Schema Design:
o Design and implement efficient data schemas, dimensions, and metrics within Apache Druid for various analytical use cases (e.g., clickstream, IoT, application monitoring).
o Determine optimal partitioning, indexing (bitmap indexes), and rollup strategies to ensure sub-second query performance and efficient storage.
Data Ingestion Pipeline Development:
o Develop and manage real-time data ingestion pipelines into Druid from streaming sources like Apache Kafka, Amazon Kinesis, or other message queues.
o Implement batch data ingestion processes from data lakes (e.g., HDFS, Amazon S3, Azure Blob, Google Cloud Storage) or other databases.
o Ensure data quality, consistency, and exactly-once processing during ingestion.
Query Optimization & Performance Tuning:
o Write and optimize complex SQL queries (Druid SQL) for high-performance analytical workloads, including aggregations, filters, and time-series analysis.
o Analyze query plans and identify performance bottlenecks, implementing solutions such as segment optimization, query rewriting, or cluster configuration adjustments.
Programming & Libraries: Expert-level proficiency in Python and its core data science libraries (Pandas, NumPy, Scikit-learn). Strong proficiency in SQL for complex data extraction and manipulation.
Machine Learning Frameworks: Hands-on experience with modern deep learning frameworks such as TensorFlow or PyTorch.
Statistical Modeling: Deep understanding of statistical concepts and a wide range of machine learning algorithms, with proven experience in time-series forecasting and anomaly detection.
Big Data Technologies: Demonstrable experience working with large datasets using distributed computing frameworks, specifically Apache Spark.
Database Systems: Experience querying and working with data from multiple relational database systems (e.g., PostgreSQL, Oracle , MS SQL Server).
Cloud Platforms: Experience building and deploying data science solutions on a major cloud platform (AWS, Google Cloud Platform, or Azure). Familiarity with their native ML services (e.g., AWS SageMaker, Google Vertex AI) is a strong plus.
MLOps Tooling: Practical experience with MLOps principles and tools for model versioning, tracking, and deployment (e.g., MLflow, Docker).
Communication and Storytelling: Excellent verbal and written communication skills, with a proven ability to explain complex technical concepts to a non-technical audience through visual storytelling