Using Similar Items Detector to Solve Association Problems

Rulex generates description-based and sales-based replacement rules with the Similar Items Detector task.

This task uses description-based matching, which can be used with newly introduced items and helps solve cold start problems.


Prerequisites

Additional tabs

The following additional tabs are provided:

  • Documentation tab where you can document your task,

  • Parametric options tab where you can configure process variables instead of fixed values. Parametric equivalents are expressed in italics in this page. 

  • Replacement rules & Results tabs, where you can see the output of the task computation. See Results table below.


Procedure

  1. Drag and drop the Similar Items Detector task onto the stage.

  2. Connect a task that contains frequent itemsets to the new task.

  3. Double click the Similar Items Detector task. The left-hand pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required.

  4. To generate description-based replacement rules, click on the Text based matching tab and configure the options as described in the table below.

  5. To generate sales based-replacement rules, click on the Sales based matching tab and configure the options as described in the table below.

  6. Save and compute the task.

Similar Items Detector options

Name

Parametric options

Description

Text based matching options

Category attribute

popcatname

Select the attribute that represents the category from the drop-down list. This can be used to match only descriptions that belong to the same category.

Description attribute

popdescname

Select the attribute that represents the description from the drop-down list, which will be used for text matching.

Word separator

popwordsep

Select how words are separated from one of the following possibilities:

  • Space

  • Tab

  • Newline

Minimum word length 

popminwordlen

Words that are shorter than the value entered here will not be used for text matching. This helps to eliminate words such as the, a, one, at etc.

Minimum unadjusted similarity cosine

popsimcosth

The minimum similarity of pure text matching, without considering Preferential requirements attributes.

Entering 1 means the text must be identical, 0 corresponds to no match required.

Case sensitive matching

popcasesens

If selected, the upper or lower case will be taken into consideration when matching text.

Item key attributes

popdescname

Drag and drop the nominal attributes that uniquely identify the item from the Attributes list. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.

Preferential requirements attributes

mbaitemchildnames

Drag and drop the attributes which will influence the similarity score when they match. When they match, a weight is added to the similarity score. This weight is defined in the Preferential requirements weights.

These attributes could, for example, define brand, packaging or size. 

Instead of manually dragging and dropping attributes, they can be defined via a filtered list.

Ignored char list

popignoredchars

Select the characters you want to eliminate from text matching.

Preferential requirements weights


The weight awarded to matching Preferential requirements attributes.

Sales based matching options

Takes also sales data into account

popusetransactions

Select this option to include sales data in the task execution.

Minimum alternativeness coefficient

popminalternativeness

The degree of alternativeness between the purchase of two items:

  • 1 (max) if they are never sold together

  • 0 (min) if if one item is always sold with the other one.

If a pair of items ensures the Minimum alternativeness coefficient, the corresponding replacement rule is discarded.

Minimum volume replacement score

poprepcoeffth

The minimum percentage of orders in which a replaced item is expected to be replaceable by the replacing item. If this minimum threshold is not satisfied by a replacement rule, it is discarded.

Results

The results of the task are displayed in two separate tabs:

  • The Replacement rules tab displays the generated item sets, where:

    • Rule Replacement ID: the sequential ID number for replacement rules.

    • Category

    • Replaced item ID: IDs of replaced items

    • Replacing item ID: IDs of replacing items  

    • Similarity score

  • The Results tab displays details on the execution of the analysis, where:

    • Task Identifier: the ID code for the task, internally used by the Rulex engine.

    • Task Name: simply the name of the task.

    • Elapsed time (sec): the time required for latest computation (in seconds).

    • Number of generated replacement rules: the number of replacements rules which were generated by the task.