Splitting Data with the Data Manager
When you want more control on how the dataset is divided, you can split the dataset with a Data Manager task.
In this way you can specify criteria with which the dataset is split.
Prerequisites
the required datasets have been imported into the process
the data used for the model has been well prepared
a single unified dataset has been created by merging all the datasets imported into the process.
Data Manager
For more information on what you can do with the Data Manager task, see Overview of Data Exploration in the Data Manager
Procedure
Add a new Data Manager task to the process.
Drag and drop the attributes you want to filter by to create the dataset division onto the Filter column in the Query Manager.
Configure the filters to create the required view.
Right-click on any cell in the data sheet and select Assign view to > Test/Training/Validation set, accordingly (by default, patterns are all in the training set).
Remove the filter by selecting the filter cell in the Query Manager and pressing DELETE.
Save and compute the task.
Example
The following examples are based on the Adult dataset.
Scenario data can be found in the Datasets folder in your Rulex installation.
The following steps were performed:
The we add a Data Manager task to visualize the initial data and to create the training and test sets.
Procedure | Screenshot |
---|---|
After importing the adult.set dataset via an Import from Text File task, add a Data Manager task to the stage to display its contents. As we can see the original dataset contains 32561 patterns. | |
We want to divide the data from the source as follows:
We do not need a validation set. So we need to drag and drop the hours-per-week attribute onto the Filter column in the Query Manager and configure the filter as shown | |
We now click on a cell in the filtered dataset and select Assign view to > Test set. Then select the cell in the Filter column and delete it. Save and compute the task. | |
The dataset is now divided into a test and training set, as can be checked from the corresponding drop down list in the top right-hand corner. |