Managing Attributes in Data Manager

The attributes in the dataset are displayed in the list on the left-hand side of the Data Manager. These attributes can be selected, searched and ordered directly in this list (see possible operations below).

To change their properties or work with their values you need to move to two other tabs:

Available operations in the Attributes list

Name

Description

Displayed Data

Here you can select the subset of data to be shown and to be used for the statistics reported in the summary pane.

Possible values are:

  • Training set, which is used to build the model,

  • Validation set, which is used to tune model parameters,

  • Test set, which is used to assess the accuracy of the model,

  • All.

Data Info

This box displays this information:

  • the number of patterns in the currently shown subset (training set, validation set, test set or total) and the corresponding percentage with respect to all the data,

  • the number of patterns currently shown in the main spreadsheet after custom queries have been performed and the corresponding percentage with respect to the currently considered subset of data (training, validation, test, total).

  • the percentage of the currently displayed data, which are correctly classified (if a classification model has been applied), or

  • the mean square error and the normalized mean square error (if a regression model has been applied).


Here you can add or remove attributes:

  • When you click the plus button, a dialog box is displayed where you can insert the number of attributes you want to add, along with a name, type and role for each new attribute.

  • To remove an attribute or group of attributes you can select them (using the Ctrl and/or Shift for groups) and click the minus button.

Attributes

This box shows the list of the attributes present in the dataset, along with their types.

Using the drag and drop function you can change the position of attributes in the list and use them in the Query, Plot and Statistic Managers.

Moving attributes

Moving attributes using the drag and drop function depends on the absolute final position of the moved attributes. If the same instructions are executed on a dataset with a different number of attributes, the result may not be what you expected. Using the Move Attributes function by right-clicking on an attribute (or group of attributes) allows you to move the attribute(s) before or after a target attribute, independently of its absolute position.

You can also move an attribute to the first or last position in the attribute list.

The checkbox on the left of the attribute name is for visualization purposes. If you uncheck an attribute, it will not be shown in the current main data pane, but it will not be removed from the dataset, even if you save and compute the Data Manager. You can check/uncheck a group of attributes selecting them (using Ctrl and/or Shift in case) and right-clicking as described below. This is very useful when you have many columns and you need to focus only on a subset of them, without removing the others. For more information on attributes and their types see Datasets and Attributes

Right-clicking any attribute allows you to:

  • check or clear the selection of a single item, or group of items.

  • check or clear the selection of all items

  • invert the selection

  • decide whether ignored attributes have to be shown

  • show the column corresponding to the selected attribute in the main data pane. Double-clicking on the attribute name will produce the same result.

  • recompute the formulas related to a group of attributes.

  • change the data type of the attribute

  • set the status of the attribute to ignored, or reset the status. Attributes flagged as Ignore will not be considered in modeling operations

  • add and delete attributes

  • move attributes before/after a target attribute or to the top or bottom of the list.

  • split attributes according to a fixed string, a list of characters or a fixed length. Only for nominal attributes.

When you split a nominal attribute several columns are created and filled with substrings of the original string.

For example, suppose that the original string is "aa-bb;cc-dd".

Then if you split:

  • by fixed string (selecting "-"): you obtain three new columns containing "aa", "bb;cc" and "dd" respectively. A string composed of several characters can be defined, too.

  • by list of characters (selecting "-;"): you obtain four new columns containing "aa", "bb", "cc", "dd" respectively.

  • by fixed length (selecting 3): you obtain four new columns containing "aa-", "bb;", "cc-" and "dd" respectively.

The original variable is removed after the split operation.

Search attribute

Here you can search for an attribute by inserting a matching string. 

Order attributes by

This drop down menu allows you to sort attributes by:

  • Attribute (as in the original dataset),

  • Name

  • Type

  • Ignored

  • Role

  • Number of values.