Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Short version

  1. Connect to your organisation/company data cataloguing and integration initiatives. 

Promote continuous data integration and data FAIR-ification principles. 

  1. Make sure that the Data Scientist perspective is taken into account during metadata standardisation processes. On each step of data acquisition, evaluate data for "fit for purpose".  

  2. Use the best practices in exploratory data analysis; data preprocessing: data cleaning, normalising, scaling; feature engineering. 

Promote continuous data preprocessing/feature engineering practices and data versioning. 

Use the “feature store” concept that allows to re-use already pre-processed/cleansed data, promotes collaboration, and removes silos (TO-DO: ADD DETAIL BELOW)
Use the “data passport” concept as an extension to data provenance (TO-DO: ADD DETAIL BELOW)

Long version

Introduction

Data science is an umbrella term that encompasses data management, data analytics, data mining, machine learning, MLOps (machine learning operations) and several other related disciplines. 

...

There are three significant steps in the added value creation process: 

  1. Data acquisition and management - the process of data collection and unification;

  2. Data analysis, predictive analysis, insights generation – the process of exploratory data analysis and pre-processing followed by predictive analysis;

  3. Model and data operations – the processes of deploying prediction models in production, making them available for the end-users.

At present, a very rough estimate of the amount of time and effort that data scientists spend on each step is approximately 80% for data acquisition and management, 15% for actual data/predictive analysis and insights generation, and finally, on the basis of need, 5% for the data/model production activities. 

...