Overview of Data Exploration in the Data Manager

The Data Manager is a central task in Rulex, where many important data exploration tasks are performed, such as:

  • understanding whether all the data you require for your model is already included in your data tables, or whether you need to enrich the data tables with additional attributes created through formulas.

  • aggregating multiple rows to condense information into fewer more significant rows, for example by aggregating all the rows corresponding to a customer in a single row using the using the Group and Apply operations.

  • checking data are clear and coherent, for example by checking the attribute types are correctly defined. Note that an incorrect data type may have been incorrectly assigned automatically due to one or more values having been inserted with the incorrect format. If you try to change the data type to the correct type, Rulex will tell you which row contains the format error.

  • standardizing the way missing values are expressed (for example, missing values can be represented with the letters "n/a", or a question mark).

  • exploring data in the Statistics Manager and Plot Manager, to check visibly if the data at hand are appropriate for solving your problem, and detecting and removing any abnormal data (i.e. outliers), which may alter the generated models. Outliers often contain valuable information about the process under investigation or the data gathering and recording process. Before considering the possible elimination of these points from the data, one should try to understand why they appeared and whether it is likely similar values will continue to appear. Of course, outliers are often bad data points.

The task can be used to experiment on data, to check real-time how data changes according to the operations performed. Once you are satisfied with the results these operations can be saved, computed and made available to other tasks, and at this point the Data Manager elaborates the data according to the operations defined and listed in the History tab.

You can also export data directly from the Data Manager task, either exporting the data to file or creating a new dataset.

Take a look

A similar read-only version of the Data Manager view is displayed by right-clicking any task and selecting Take a look. You can view data, create plots, sort, filter etc, but as the view is read-only you cannot save any of these changes or compute the task.