Splitting Data into Training and Test Sets
Before building a predictive model it is recommended that you split the dataset into subsets.
Splitting the dataset allows us to use a dataset to create our predictive models and then immediately test the validity of these models on different data.
The following datasets can be created:
the training set, used to identify patterns in the data and build the model,
the test set, used to assess the accuracy of the model and
the optional validation set, which can be used for tuning the model parameters.
Splitting methods
There are two distinct tasks for splitting datasets in Rulex:
Task name | Description | Corresponding page |
---|---|---|
Split Data | Splits the dataset randomly or sequentially. | |
Data Manager | Splits datasets according to specified criteria. |