CUSTOMISED
Expert-led training for your team
Dismiss
Top 10 Pain Points for Data Scientists

20 May 2022

Top 10 Pain Points for Data Scientists working in the real world

- Access to relevant data

Relevant data may not be directly available to the analyst (may need org permission, support infrastructure in place, different process for "one off" access vs. need to regularly refresh data)

 

- Data availability

Relevant data may still need to be identified and collected (same as above re. need for infrastructure in place before starting with the analysis job)

 

- Data Integration

Data from different sources need to be integrated into a normalised form, specific issues like record merge, record deduplication, missing attributes need to be tackled. Lack of documentation on schema (e.g. is "customer ID" from database A the same as "customer code" from database B?)

 

- Data Siloes

Following org siloes, data may be grouped and accessible by one department (or team, or business unit) but isolated from the rest of the org

 

- Data scientist as a vanity title

When you're hired as "data scientist" but the job is good old BI reporting

 

- Unrealistic expectations

Companies want a data scientist (because they've heard data science is

cool) and they expect one person to cover multiple roles (data engineer, backend engineer, dba, analyst, scientist and everything in between)

 

- Leadership has no data science experience (see above)

 

- No infrustructure or support in place

Your first data science hire should be a data engineer, not a data scientist

 

- Working with uncertainty

(Also the fun part of the job) Research tasks can be more difficult to estimate and time-box, especially in high-risk high-reward R&D efforts.

Need to break down complexity to reduce risk

 

- Access to business domain experts

Data scientist are expected to be expert software engineers, expert statisticians and expert in business domain -- it's more common to have

stats+SW background, still need to be exposed to business domain

knowledge (e.g. in medical applications, need to talk to doctors, in financial applications, need to talk to traders, etc.)

 

- Friction between R&D and production

When data science / R&D is completely separate from engineering, there's friction to bring R&D work into production. Need for embedded teams and offer engineering support.

 

- Forcing the favourite "agile" methodology Especially with R&D efforts, data science projects don't necessarily fit in the exact same frameworks used for software projects

 

- Process in place "because Google does it" (or Microsoft, Amazon, Netflix, Spotify, ...)

 

- Scalability

Techniques that work on small datasets may not be suitable for large datasets.

Also solutions that work as small prototypes may not be adequate for handling large datasets.

 

- Gap in skillset

Related to "unrealistic expectations", data scientists are expected to have deep expertise in a broad variety of tools and techniques but it's quite easy to have blind spots.

 

Collated by JBI's instructors based on course delegate feedback from the following courses:

Power BI training course

Power BI Beyond the basics training course

Python Data Analysis training course

Tableau training course

About the author: gRAHAM Smith
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

CONTACT
+44 (0)20 8446 7555

[email protected]

SHARE

Corporate Policies     Terms & Conditions
JB International Training Ltd  -  Company number 08458005

Registered address Wohl Enterprise Hub 2B Redbourne Avenue London N3 2BS

POPULAR

Rust training course                                                                          React training course

Threat modelling training course   Python for data analysts training course

Power BI training course                                   Machine Learning training course

Spring Boot Microservices training course              Terraform training course

Kubernetes training course                                                            C++ training course

Power Automate training course                               Clean Code training course