Using Similar Items Detector to Solve Association Problems
Rulex generates description-based and sales-based replacement rules with the Similar Items Detector task.
This task uses description-based matching, which can be used with newly introduced items and helps solve cold start problems.
Prerequisites
the required datasets have been imported into the process
the data used for the model has been well prepared
a Frequent Itemsets Mining task must be present in the process and provide input data for the Similar Items Detector.
Additional tabs
The following additional tabs are provided:
Documentation tab where you can document your task,
Parametric options tab where you can configure process variables instead of fixed values. Parametric equivalents are expressed in italics in this page.
Replacement rules & Results tabs, where you can see the output of the task computation. See Results table below.
Procedure
Drag and drop the Similar Items Detector task onto the stage.
Connect a task that contains frequent itemsets to the new task.
Double click the Similar Items Detector task. The left-hand pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required.
To generate description-based replacement rules, click on the Text based matching tab and configure the options as described in the table below.
To generate sales based-replacement rules, click on the Sales based matching tab and configure the options as described in the table below.
Save and compute the task.
Similar Items Detector options | ||
Name | Parametric options | Description |
---|---|---|
Text based matching options | ||
Category attribute | popcatname | Select the attribute that represents the category from the drop-down list. This can be used to match only descriptions that belong to the same category. |
Description attribute | popdescname | Select the attribute that represents the description from the drop-down list, which will be used for text matching. |
Word separator | popwordsep | Select how words are separated from one of the following possibilities:
|
Minimum word length | popminwordlen | Words that are shorter than the value entered here will not be used for text matching. This helps to eliminate words such as the, a, one, at etc. |
Minimum unadjusted similarity cosine | popsimcosth | The minimum similarity of pure text matching, without considering Preferential requirements attributes. Entering 1 means the text must be identical, 0 corresponds to no match required. |
Case sensitive matching | popcasesens | If selected, the upper or lower case will be taken into consideration when matching text. |
Item key attributes | popdescname | Drag and drop the nominal attributes that uniquely identify the item from the Attributes list. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Preferential requirements attributes | mbaitemchildnames | Drag and drop the attributes which will influence the similarity score when they match. When they match, a weight is added to the similarity score. This weight is defined in the Preferential requirements weights. These attributes could, for example, define brand, packaging or size. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Ignored char list | popignoredchars | Select the characters you want to eliminate from text matching. |
Preferential requirements weights | The weight awarded to matching Preferential requirements attributes. | |
Sales based matching options | ||
Takes also sales data into account | popusetransactions | Select this option to include sales data in the task execution. |
Minimum alternativeness coefficient | popminalternativeness | The degree of alternativeness between the purchase of two items:
If a pair of items ensures the Minimum alternativeness coefficient, the corresponding replacement rule is discarded. |
Minimum volume replacement score | poprepcoeffth | The minimum percentage of orders in which a replaced item is expected to be replaceable by the replacing item. If this minimum threshold is not satisfied by a replacement rule, it is discarded. |
Results
The results of the task are displayed in two separate tabs:
The Replacement rules tab displays the generated item sets, where:
Rule Replacement ID: the sequential ID number for replacement rules.
Category:
Replaced item ID: IDs of replaced items
Replacing item ID: IDs of replacing items
Similarity score:
The Results tab displays details on the execution of the analysis, where:
Task Identifier: the ID code for the task, internally used by the Rulex engine.
Task Name: simply the name of the task.
Elapsed time (sec): the time required for latest computation (in seconds).
Number of generated replacement rules: the number of replacements rules which were generated by the task.