Search

Reshaping, Transforming and Cleaning Datasets

It is frequently necessary to perform operations on the structure or contents of datasets prior to creating a predictive model.

For example, it may be necessary to reshape some of the datasets prior to merging them in a single table, or transform attributes to more manageable data types, or clean up the dataset by removing outliers or attributes that could cause confusion in the final model.

Task	Description	Corresponding page
Reshaping Tasks
Reshape To Long	Transforms key attributes in a dataset into new columns. This operation is necessary when a table contains more than one key.	Reshaping Datasets to Long Format
Reshape To Wide	Transforms key attributes in a dataset into new rows. This operation is necessary when a table contains more than one key.	Reshaping Datasets to Wide Format
Transpose	Converts rows into columns and vice versa.	Transposing Data
Transforming Tasks
Discretize	Transforms continuous attributes into a finite set of intervals	Discretizing Data
Moving Window	Defines temporal windows of data of a specific size and shape.	Performing Moving Windows Statistics on Data
Cleaning Tasks
Fill/Clean	Removes attributes which could create confusion in the resulting predictive model.	Cleaning Datasets

Outliers

It is also important to correctly identify and manage outliers, which are anomalous data samples. which can have a negative impact on predictive models if not handled correctly.

This is not limited to a single task. For details see Identifying and Managing Outliers