Using LLM to Solve Classification Problems

Rulex solves classification problems with the Classification Logic Learning Machine task (LLM).

This task defines which class or category input attributes in a dataset belong, for example predicting whether or not a customer will renew his subscription, through predictive intelligible logic based rules.


Prerequisites

Additional tabs

Along with the Options tab, where the task can be configured, the following additional tabs are provided:

  • Documentation tab where you can document your task,

  • Parametric options tab where you can configure process variables instead of fixed values. Parametric equivalents are expressed in italics in this page.

  • Monitor and results tabs, where you can see the output of the task computation. See Results table below.


Procedure

  1. Drag and drop the Logic Learning Machine task onto the stage.

  2. Connect a task, which contains the attributes from which you want to create the model, to the new task.

  3. Double click the Logic Learning Machine task. The left-hand pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required.

  4. Configure the options described in the table below.

  5. Save and compute the task.

Classification LLM options

Parameter Name

PO

Description

Aggregate data before processing

aggregate

If selected, identical patterns are aggregated and considered as a single pattern during the training phase.

Minimize number of conditions

minimal

If selected, rules with fewer conditions, but the same covering, are privileged.

Perform a coarse-grained training

lowest

If selected, the LLM training algorithm considers the conditions with the subset of values that maximizes covering for each input attribute. Otherwise, only one value at a time is added to each condition, thus performing a more extensive search. The coarse-grained training option has the advantage of being faster than performing an extensive search.

Prevent interval conditions for ordered attributes

nointerval

If selected, interval conditions, such as 1<x≤5, are avoided, and only conditions with > (greater than) ≤ (lower or equal than) are generated.

Ignore attributes not present in rules

reducenames

If selected, attributes that have not been included in rules will be flagged Ignore at the end of the training process, to reflect their redundancy in the classification problem at hand.

Hold all the generated rules

holdrules

If selected, even redundant generated rules, which are verified only by training samples that already covered by other more powerful rules, are kept.

Ignore outliers while building rules

coveroutlier

If selected, the set of remaining patterns, not covered by generated rules, are ignored if its size is less than the threshold defined in the Maximum error allowed for each rule (%) option.

Consider relative error instead of absolute

relerrmax

Specify whether the relative or absolute error must be considered.

The Maximum error allowed for each rule is set by considering proportions of samples belonging to different classes. Imagine a scenario where for given rule pertaining to the specific output value yo:

  • TP is the number of true positives (samples with the output value yo that verify the conditions of the rule).

  • TN is the number of true negatives (samples with output values different from yo that do not verify the conditions of the rule).

  • FP is the number of false positives (samples with output values different from yo that do verify the conditions of the rule).

  • FN is the number of false negatives (samples with the output values yo that do not verify the conditions of the rule).

In this scenario the absolute error of that rule is FP/(TN+FP), whereas the relative error is obtained as follows: 

FP/Min(TP+FN,TN+FP) (samples with the output value yo that do verify the conditions of the rule).

Allow rules with no conditions

allowzerocond

If selected, rules with no conditions can also be generated. This may be useful, for example, if there are no examples for a specific class, as at least one rule is consequently created.

Missing values verify any rule condition

missrelax

If selected, missing values will be assumed to satisfy any condition. If there is a high number of missing values, this choice can have an important impact on the outcome.

Maximum number of trials in bottom-up mode

nbuiter

The number of times a bottom-up procedure can be repeated, after which a top-down procedure will be adopted. 

The bottom-up procedure starts by analyzing all possible cases, defining conditions and reducing the extension of the rules. If, at the end of this procedure, the error is higher than the value entered for the Maximum error allowed for each rule (%) option, the procedure starts again, inserting an increased penalty on the error. If the maximum number of trials is reached without obtaining a satisfactory rule, the procedure is switched to a top-down approach.

Maximum error allowed for each rule (%)

errmax

Set the maximum error (in percentage) that a rule can score. The absolute or relative error is considered according to the whether the Consider relative error instead of absolute option is checked or not.

Number of rules for each class (0 means 'automatic')

numrules

The number of rules for each class. If set to 0 the minimum number of rules required to cover all patterns in the training set is generated

Maximum number of conditions for a rule

maxant

Set the maximum number of conditions in a rule.

Overlap between rules (%)

maxoverlap

Set the maximum percentage of patterns, which can be shared by two rules.

Maximum nominal values

maxnomval

Set the maximum number of nominal values that can be contained in a condition. This is useful for simplifying conditions and making them more manageable, for example when an attribute has a very high number of possible nominal values. It is worth noting that overly complicated conditions also run the risk of over-fitting, where rules are too specific for the test data, and not generic enough to be accurate on new data.

Differentiate multiple rules by attributes

onlyatt

If selected, when multiple rules are generated, rules which contain the same attributes in their conditions are penalized.

Allow to use complements in conditions on nominal

alsocompl

If selected, conditions on nominal attributes can be expressed as complements.

Build rules for only/all but the first/last output value

buildsel, allonly, firstlast

If selected, you create rules only for the classes you specify in this option, in combination with the options only/all and first/last.

Change roles for input and output attributes

keeproles

If selected, input and output roles can be defined in the LLM task, overwriting the roles defined in any previous Data Manager task in the process.

Prevent rules in input from being included in the LLM model

avoidrules

If selected, rules fed into the LLM task should not be included in the final ruleset.

Minimum rule distance for additional rules

minruledist

The minimum difference between additional rules, taken into consideration if the Prevent rules in input from being included in the LLM model option has been selected.

Initialize random generator with seed

initrandom, iseed

If selected, a seed, which defines the starting point in the sequence, is used during random generation operations. Consequently using the same seed each time will make each execution reproducible. Otherwise, each execution of the same task (with same options) may produce dissimilar results due to different random numbers being generated in some phases of the process.

Append results

append

If selected, the results of this computation are appended to the dataset, otherwise they replace the results of previous computations.

Input attributes

inpnames

Drag and drop here the input attributes you want to use to form the rules leading to the correct classification of data. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.

Output attributes

outnames

Drag and drop here the attributes you want to use to form the final classes into which the dataset will be divided. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.

Key attributes

keynames

Drag and drop here the key attributes. Key attributes are the attributes that must always be taken into consideration in rules, and every rule must always contain a condition for each of the key attributes.

Instead of manually dragging and dropping attributes, they can be defined via a filtered list.

Results

The results of the LLM task can be viewed in two separate tabs:

  • The Monitor tab, where it is possible to view the statistics related to the generated rules as a set of histograms, such as the number of conditions, covering value, or error value. Rules relative to different classes are displayed as bars of a specific color. These plots can be viewed during and after computation operations. 

  • The Results tab, where statistics on the LLM computation are displayed, such as the execution time, number of rules, average covering etc.

Example

The following examples are based on the Adult dataset.

Scenario data can be found in the Datasets folder in your Rulex installation.

The scenario aims to solve a simple classification problem based on ranges on income.

The following steps were performed:

  1. First we import the adult dataset with an Import from Text File task.

  2. Split the dataset into a test and training set with a Split Data task.

  3. Generate rules from the dataset with the Classification LLM

  4. Analyze the generated rules with a Rule Manager task.

  5. Apply the rules to the dataset with an Apply Model task.

  6. View the results of the forecast via the Take a look function.

Procedure

Screenshot

After importing the adult dataset with the Import from Text File task and splitting the dataset into test and training sets (30% test and 70% training) with the Split Data task, add a Classification LLM to the process and define Income as the output attribute.

Click Compute process to start the analysis:

The distribution and properties of the generated rules can be viewed in the Monitor tab of the LLM task.

The conditions sub-tab shows how many conditions rules contain, and how many rules have been generated overall.

In our example 57 rules have been generated. The plot chart shows how many rules there are for each possible number of conditions, from a minimum of 1 to a maximum of 12.


Understanding results

There are a total of 7 rules with only 1 condition, out of which 

  • there are 2 rules with 1 condition for class “<=50K”

  • there are 5 rules for with 1 condition for class “>50K”, and so on.

Analogous histograms can be viewed for covering and error, by clicking on the corresponding sub-tabs.


Clicking on the Results tab displays a spreadsheet with 

  • the execution time (only for the LLM task),

  • some input data properties, such as the number of samples and attributes

  • some results of the computation, such as number of rules and rule statistics.

The rule spreadsheet that can be viewed by adding a Rule Manager task.

For example, rule 2 states that if age is less than or equal to 29, capital-gain is less than or equal to 5119.000 and then capital-loss is less than or equal to 1448.000 then income is <=50K

The maximum covering value of rule 2 is greater that 36%, whereas the error is around 4.8%.

In contrast, rule 38 asserts that if workclass is Federal-gov or Self-emp-inc then income is >50K.

The forecast ability of the set of generated rules can be viewed by adding an Apply Model task to the LLM task, and computing with default options.

If required, here we could apply weights to the execution, for example if we were more interested in identifying one of the two classes.

Now right-click the Apply Model task, and select Take a look to view the results.

The application of the rules generated by LLM has added four columns containing:

  • the forecast for each pattern: pred(income)

  • the confidence relative to this forecast: conf(income)

  • the number of rules used to make the prediction rule(income)

  • the number of the most important rule that determined the prediction: nrule(income)

  • the forecast error, i.e. 1.000 in case of misclassification and 0.000 in case of correct forecast : err(income).

The content of the parentheses is the name of the variable the prediction refers to.

Misclassified and correctly classified

Correctly classified patterns are highlighted in green in the pred column and identified by the number 0.000 in the err column.

Incorrectly classified patterns are highlighted in red in the pred column and identified by the number 1.000 in the err column.

From the summary panel on the left we can see that 81.174% of patterns have been correctly classified in the training set. Note that LLM does not reach the 100% because a certain number of errors were allowed in the training phase.


Selecting Test Set from the Displayed data drop down list shows how the rules behave on new data.

In the test set, the percentage of accuracy is about 80.7%.

Post-processing model optimization can improve test set accuracy (potentially) at the expense of a slightly higher error level on the training set.