Finding and Replacing Values in Datasets
Sometimes it may be necessary to find and replace values that create a specific undesired outcome.
For example, imagine you have a dataset with information on parcels that need to be posted, and any parcels that weigh more than 4kg must be sent by courier. If this rule is not respected, an "overweight, cannot be sent" outcome is produced. When applied to a new dataset, the Rule Based Control task corrects these errors in the dataset, consequently modifying the way the parcel will be sent.
The task performs this operation according to the weights specified for each outcome class, and modifying the attributes that have been specified as modifiable according to selected correction methods, such as by calculating the overall mean or median or the mean or median for specified key attributes.
This task was previously called Find/Replace.
Prerequisites
the required datasets have been imported into the process
the ruleset and a dataset on which the rules can be applied must come before the Rule Based Control task in the Rulex process. The ruleset can also be imported from a different Rulex process.
Procedure
Drag and drop the Rule Based Control task onto the process.
Connect a task, which contains the ruleset and data you want to modify, to the new task.
Double click the Rule Based Control task. The left-hand pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required.
Configure the options described in the table below.
Save and compute the task.
Rule Based Control options | |
Parameter Name | Description |
---|---|
Key attributes | Drag and drop here the attributes for which calculations will always be performed. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Attributes that cannot be modified | Drag and drop here any attributes that must not be changed. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. The attributes specified here cannot be modified, even if they are added to the Attributes to be corrected list. Any attributes included in the ruleset can be modified, unless added here, even if they are added to the Attributes to be corrected list. These attributes include those NOT covered in the ruleset. |
Attributes to be corrected | Drag and drop here any attributes that can be modified. Any attributes included in the ruleset can be modified, unless included in the cannot be modified list. These attributes include those NOT covered in the ruleset. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Undesired output weight | Specify a weight for each outcome class. Assign a higher value to the outcome you want most to avoid. |
Correct values for ORDERED attributes with | Select how you want to correct ordered values that have caused an undesired outcome class. Calculations can be performed on all values, or values for specific attributes, if these attributes have been dragged and dropped into the Key attributes list. Possible values are: Mean, Median, Mode, Minimum, Maximum, Minimum change (ordered attributes only). Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Correct values for TEMPORAL attributes with | Select how you want to correct temporal values that have caused an undesired outcome class. Calculations can be performed on all values, or values for specific attributes, if these attributes have been dragged and dropped into the Key attributes list. Possible values are: Mean, Median, Mode, Minimum, Maximum, Minimum change (ordered attributes only). Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Append results | If selected, the results of this computation will be added to the dataset. Otherwise they replace the results of previous computations. |
Example
In this very basic example shipping data is corrected to ensure that parcels heavier than 4kg, that should be sent by courier, are not mistakenly shipped by mail, incurring an excess weight charge.
The shipping excel file contains the following data:
Scenario data can be found in the Datasets folder in your Rulex installation.
We have created the following Rulex process with:
An Import from Excel File (excel1) task with the excel data file above.
A Data Manager (dataman1) task to define the roles of the attributes, create new attributes and ignore irrelevant attributes.
A LLM (llm1) task to generate rules.
A Rule Manager (ruleman1) task to analyze the generated rules.
A Rule Based Control (fr1) task to substitute the attributes that lead to the undesired output class.
A final Data Manager (dataman2) task to check the results of the correction.
Description | Screenshot |
---|---|
Import the shipping.xlsx file via an Import from Text File task. | |
Add a Data Manager task to the process, and configure it as follows:
$"Overweight" = $"Weight in kg">4
$"Excess Weight" = ifelse($"Overweight"=="True" &$"Shipping mode"=="post", "incorrect", "correct") | |
Add a Logic Learning Machine task, and define the Excess Weight attribute as the output by dragging and dropping it onto Output attributes. | |
Add a Rule Manager task to the process to view the generated rules. Three rules have been generated. | |
Add a Rule Based Control task to the process, and configure it as follows:
| |
To view the changes made by the Rule Based Control task, right-click the task in the stage and select Take a look. The incorrect shipping mode for a 7kg parcel has correctly been changed from post to courier, using Rule 3. |