Overview
Skills
Job Details
Position Type: Contract to Perm
Location: Minneapolis, MN (Fully Remote)
Description:
Specific Title of the Position: Data Scientist - Databricks/PySpark
Work Location: Remote/Telecommute
Work Hours: Estimated 8 hours/day, with flexible hours to accommodate remote work arrangements (e.g., 9am-5pm EST, but not strictly enforced).
Position Background and Business Impact:
The Data Scientist - Databricks/PySpark will play a critical role in driving business growth and improving operational efficiency by developing and deploying data-driven solutions using Databricks, PySpark, and other related technologies. This position will accomplish the following for the business:
- Develop and deploy scalable data pipelines and machine learning models to drive business insights and decision-making.
- Improve data quality and consistency through data cleansing and feature engineering techniques.
- Collaborate with cross-functional teams to identify business problems and develop data-driven solutions.
- Team Description: The Data Scientist - Databricks/PySpark will be working with a team of 8-10 data scientists and engineers with diverse skill sets, including:
- Data engineering (Databricks, PySpark, SQL)
- Machine learning (Python, R, TensorFlow, PyTorch)
- Data analysis and visualization (Tableau, Power BI)
- Business acumen (healthcare industry knowledge, business operations)
- The team culture is collaborative, dynamic, and focused on delivering high-quality results.
Top 5-10 Responsibilities:
- Develop and deploy scalable data pipelines using Databricks and PySpark.
- Design and implement machine learning models using Python and related libraries (e.g., Scikit-learn, TensorFlow).
- Collaborate with cross-functional teams to identify business problems and develop data-driven solutions.
- Perform data cleansing and feature engineering to improve data quality and consistency.
- Develop and maintain technical documentation for data pipelines and machine learning models.
- Work with stakeholders to identify and prioritize data-driven projects.
- Develop and deploy data visualizations to communicate insights to business stakeholders.
- Stay up-to-date with emerging trends and technologies in data science and machine learning.
- Collaborate with data engineers to ensure data quality and consistency.
- Participate in code reviews and contribute to the development of best practices for data science and engineering.
Ideal Candidate Background:
- 5+ years of experience in data science or a related field.
- Healthcare industry experience is a plus, but not required.
- Strong background in data engineering (Databricks, PySpark, SQL).
- Experience with machine learning (Python, R, TensorFlow, PyTorch).
- Required Skills/Attributes:
- 3+ years of experience with Databricks and PySpark.
- 2+ years of experience with Python and related libraries (e.g., Scikit-learn, TensorFlow).
- 2+ years of experience with SQL and data warehousing.
- Strong understanding of data cleansing and feature engineering techniques.
- Experience with machine learning model development and deployment.
- Excellent communication and collaboration skills.
Preferred Skills/Attributes:
- Experience with cloud-based data platforms (Azure).
- Experience with Agile development methodologies.
- Certification in data science or a related field (e.g., Certified Data Scientist).
- Professional License or Certification: Not required, but certifications like Certified Data Scientist or Certified Analytics Professional are a plus.