Partitioning in Dataiku DSS

Online Events
Wed, May 20, 2020, 11:00 AM (EDT)

About this event

As datasets become more voluminous over time, processing time grows to update the flow with fresh incoming data, run preparation steps, and retrain models. Partitioning helps solve the issue. By splitting a dataset into subsets along meaningful dimensions (time or discrete dimensions), it leads to build the flow for the incremental data only - while keeping the historical data as it is.

Malick Konate (Data Scientist, Dataiku) will explain in details what partitioning is and how DSS users can use it to increase computation performances while dealing with large volumes of data. Using the example of a retail company, he will walk us through how this can be used to build historical data, target data processes on new data, and train a partitioned machine learning model for each country. This will also be an opportunity to share best practices and common pitfalls of managing dependencies.

Note: Partitioning is not available in the Community edition of Dataiku DSS.

Speaker

When

Wednesday, May 20
11:00 AM - 12:00 PM (EDT)

Host

  • Corey Strausman

    Corey Strausman

    Dataiku

    Community Manager