Reshaping Datasets to Long Format
The reshape to long pivoting operation reshapes a dataset by transforming a set of similar attributes into a new pair of attributes, containing the name of the erased column and the value respectively.
Consequently the number of attributes in the dataset will be decreased and the number of rows will be increased accordingly.
Rulex reshaping operations are similar to creating pivot tables in Microsoft Excel, where you can create a new table with the structure you require by summarizing the data from the original table. However Rulex reshapes the original table without creating a second table, and the new table is structured automatically, increasing or decreasing the number of rows and columns, and repeating some rows to fit the new shape if necessary.
Prerequisites
you have created a process in Rulex
the required datasets have been imported into the process.
Additional tabs
The following additional tabs are provided:
Documentation tab where you can document your task,
Parametric options tab where you can configure process variables instead of fixed values. Parametric equivalents are expressed in italics in this page (PO).
Procedure
Drag and drop the Reshape To Long task onto the stage.
Connect a task that contains the attributes you want to transform to the Reshape To Long task.
Double click the Reshape To Long task. The left-hand pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required. Select the attributes that will be displayed in the final table (othnames).
Configure the task options as described in the table below.
Save and compute the task.
Reshape to Long options | ||
Name | PO | Description |
---|---|---|
Number of contiguous attributes in a group | nwideatt | Specify into how many columns the values should be arranged. By default, the values of all the Attributes to be transformed in long format are inserted in a single column (with an extra column specifying which attribute the value refers to). |
Contiguous attributes in a group belong to the same final long attribute | seqwide | If selected, contiguous attributes will be used in the same final column, otherwise attributes are selected alternatively. |
Keep at least one row for each key | keepkeys | If selected, at least one row is displayed for each key, even if it contains, for example, only missing values. |
Attributes to be transformed in long format | widenames | Drag and drop the attributes that will become column values in the resulting data table. Instead of manually dragging and dropping attributes, they can be defined via filtered list. |
Example
The following examples are based on the Adult dataset.
Scenario data can be found in the Datasets folder in your Rulex installation.
Our aim is to reshape the dataset in order to have capital-gain and capital-loss attributes in separate records.
The following steps were performed:
First we import the adult dataset.
Use the Take a Look functionality to visualize the training set prior to transformation.
A Reshape To Long task is added to reshape the dataset.
Use the Take a Look functionality to visualize the training set after transformation.
Procedure | Screenshot |
---|---|
After importing the adult.set dataset with an Import from Text File task, right-click the task and select Take a look to visualize the imported data. The original dataset is made up of 32561 records, and each record may be associated to either a capital-gain or capital-loss value. We want to have separate records for capital-loss and capital-gain attributes. | |
Add a Reshape to Long task to the stage. In the attribute list ion the left, select all attributes you want to include in the final dataset. Attributes added to the transformation edit box on the right are automatically included in the final data table. Then drag and drop the capital-gain and capital-loss attributes onto the Attributes to be transformed to long format edit box. Now launch the computation on the task. | |
Right-click the task and select Take a look to visualize the resulting data. The new dataset in long format contains 65122 records. The columns related to the capital-gain and capital-loss attributes have been substituted with the following two columns:
|