Reshaping, Transforming and Cleaning Datasets
It is frequently necessary to perform operations on the structure or contents of datasets prior to creating a predictive model.
For example, it may be necessary to reshape some of the datasets prior to merging them in a single table, or transform attributes to more manageable data types, or clean up the dataset by removing outliers or attributes that could cause confusion in the final model.
| Task | Description | Corresponding page | 
|---|---|---|
| Reshaping Tasks | ||
| Reshape To Long | Transforms key attributes in a dataset into new columns. This operation is necessary when a table contains more than one key. | |
| Reshape To Wide | Transforms key attributes in a dataset into new rows. This operation is necessary when a table contains more than one key. | |
| Transpose | Converts rows into columns and vice versa. | |
| Transforming Tasks | ||
| Discretize | Transforms continuous attributes into a finite set of intervals | |
| Moving Window | Defines temporal windows of data of a specific size and shape. | |
| Cleaning Tasks | ||
| Fill/Clean | Removes attributes which could create confusion in the resulting predictive model. | |
Outliers
It is also important to correctly identify and manage outliers, which are anomalous data samples. which can have a negative impact on predictive models if not handled correctly.
This is not limited to a single task. For details see Identifying and Managing Outliers