Data has become increasingly crucial to companies everywhere. And with databases filling up with petabytes of data every week, it’s vital that companies hire data scientists to analyze it all for crucial insights.
Data scientists are in high demand. It’s not easy work, though, and it takes a certain mindset to succeed in the role. We spoke with several experts to find out what it takes to become a data scientist—and what you can expect in a data scientist interview.
What are data scientists’ must-have qualities?
Kate Druckman, head of data sciences for a major fintech company, tells Dice that successful data scientists have a few key characteristics: “Analytical rigor and statistical skills, solid coding skills in SQL/Python, and storytelling/communication skills.” (Druckman is a pseudonym for a data scientist who preferred not to use their real name.)
Adam Sugano, Executive Director of Data Analytics at the University of California, Los Angeles (UCLA) adds: “Data science is a constantly evolving field, with new tools and technologies being introduced every year that require workers in this field to constantly be learning.”
Curiosity is a key quality among data scientists, Sugano says: “Not only do they enjoy the learning process and soak up the new knowledge gained, but they immediately turn around and begin thinking about how this new tool, method, data domain, etc. can be applied to the scope of problems they have been asked to solve.”
How can you show curiosity in your application materials? Sugano often looks for voluntary participation in data competitions, or a lifelong pursuit of learning via platforms such as Datacamp. Highlighting your personal data projects or a data-science blog can also help underscore your passion for the field.
“Additionally, a data scientist needs to know how to think about a problem,” Sugano continues. “Often, I observe people on the 'business' side asking questions to data science teams that are necessary but not sufficient. The best data scientists don't just take orders but come alongside the questioner, working to understand their world so they can help frame both the problem and the question in a way that leads to better outcomes all around. This skill is near impossible to detect just by reading a resume but can be identified via discerning questions in an interview process.”
What stands out on a data scientist resume?
Knowledge of statistical methodology is foundational when it comes to prepping your application materials. “There are too many people calling themselves data scientists just because they finished a four-course sequence on Coursera or finished a 12-week Python bootcamp,” Sugano says. “Don't get me wrong, these are good places to start, but just because someone lists a Kaggle project on their resume in which they used their favorite machine learning algorithm here doesn't mean they actually know what that algorithm is doing behind the scenes.”
In other words, it’s about far more than just calling a predictive modeling function in R or Python; you have to know why you’re doing something, as well as how to interpret the results. Knowing the limitations of a tool or model is likewise key. According to Sugano, “People with training in statistics can not only call the functions that run the algorithms, but they also know how to correctly prepare the data for the model being used, how to tune the model for even better performance, and can answer direct questions about how the predictions were generated and/or what the predicted values mean.”
John Fordice, Analytics Lead at Bonsai, agrees: “The candidate should be able to articulate their passion for data science.
Druckman adds: “Candidates with multi-industry experience, interdisciplinary background (math, stats, CS), strong CS background” as of particular interest to many organizations.
What questions can you expect in a data scientist interview?
Druckman, along with Abhinav Unnam (Senior Data Scientist at Aviso AI) and Benn Stancil (co-founder and Chief Analytics Officer at Mode) suggest some questions you should expect to face in a data scientist job interview:
- Python coding test, which typically uses the concept of lists, dictionary, and so on:
- Find all combinations of strings in a specific URL consisting of strings meeting specific requirements.
- Schedule algorithms for total time spent via a series of overlapping time intervals. Take the union of the time.
- Machine Learning case interview:
- Solve a problem statement end-to-end.
- Define the problem statement, come up with the solution.
- Explain it in simple terms in terms of metric; why those and how to measure?
- How would you help our sales leadership team decide if the sales team is the right size?
- How should we measure the impact of a billboard?
- How would you help an Airbnb host decide the right number of pictures to post on their profile?
- What's P value in laymen's terms?
- Type 1 and type 2 error: explain in simple words.
- How to convert wide dataframe to long dataframe and vice versa in SQL and Python.
- What's XGB and why is it efficient?
- What's random forest? How is feature importance calculated?
- What's logistic regression? How is maximum likelihood used?
- Code a logistic regression model from scratch using OOP.
- Tell me a project you led from its inception to business impact, step-by-step.
Interviewing can be especially tough with some hiring managers and data scientists, especially if the job itself is ultra-specialized. “All of my questions are tailored to the individual through a combination of the specific nature and needs of the job and the specific skills and experiences a candidate lists on their resume,” Sugano says. “Additionally, I find it beneficial to give candidates take-home assignments with real data for them to manipulate and analyze.”
That kind of process, he adds, “is a better reflection of the real world where workers have Google search, Stack Overflow, etc. at their disposal, instead of expecting them to know the answer to a limited set of programming, statistics, or probability questions (if there are 100 light bulbs in a row and...).”
Communicating your results is also incredibly important; when you sit down with the recruiter and hiring manager, be prepared to walk them through your logic behind solving problems a certain way. A big chunk of data scientists’ work is presenting the data for analysis by multiple stakeholders, including executives.
Are there online tools data scientists can use to prep for an interview?
“Yes and no,” Stancil says. “There are lots of tools for sample technical questions, and lots of online tutorials for learning technical languages. These tools are useful, and for many interviews, I think they help.”
But for highly specialized data scientist roles, such platforms may prove less of a help. “The best prep is trying to solve a problem with data,” Stancil adds. “It doesn’t need to be an important problem, but being able to talk to these experiences, the problems you ran into, and how you tried to solve them is far more useful and impressive to me than someone who can rattle off a list of predictive models they’re familiar with.”
Druckman encourages Data Scientists to work through “HackerRank, Leetcode, Interview.io, AlgoExpert” and the seemingly endless YouTube channels available.
Fordice adds: “Interviewkickstart.com is a great resource with a six week course for data scientists.”
Sugano notes if you really want to nail the interview, researching your prospective employer can yield great results: “Data scientists should be doing their research from an angle of really trying to understand a company's business model and anticipate ways this company already is or should be leveraging data to enhance business decisions. Asking questions about a company's set of data assets and how they are being leveraged today, as well as offering potentially new applications for their use, is a way for a data scientist to stand out by showing a strong interest in the company and a strong business acumen.”