How is the Manual Structured?

The manual is structured in such a way as to logically guide you as you are working in Rulex, although it can also be used as a reference manual to find specific details.

When you build a process in Rulex there are some common steps you'll need to do regardless of the problem you are trying to solve.

  1. Import data: this phase is an essential starting point for all Rulex processes, which involves creating the process and importing all the data you want to analyze from various data sources.

  2. Explore and structure data: during this data exploration phase you can reshape, merge, filter, sort and plot data to correctly structure your input data.

  3. Solve your problem: find a solution to your specific situation, which may involve:

    • Optimization problems (such as balancing workload distribution, or minimizing costs)

    • Supervised problems (such as classification and regression)

    • Unsupervised problems (such as clustering and association)

  4. Work with the results: apply the results to data, analyze behavior and outcome, and make any necessary tweaks.

  5. Manage and execute your process: you can create work schedules, version your process, or make changes to how it is executed.

Each phase is described in a separate section, along with its corresponding Rulex tasks and an overall introduction to its main concepts.


Sample datasets used in the manual

The examples contained in this guide are based on sample datasets, which you can download to try out the scenarios:

The following datasets are provided in the Datasets folder of your Rulex installation directory (i.e. ~\Rulex).

Dataset


Adult

The Adult dataset contains the data of a well-known machine learning benchmark regarding the problem of predicting whether or not a US citizen makes over 50K dollars a year.

A more detailed description of the data can be found on the UCI Machine Learning Repository homepage.

When importing the Adult dataset, to improve task performance, set the Get types from line option to 2.

Northwind

The Northwind database includes several tables regarding sales data for a fictitious company called Northwind Traders, which imports and exports specialty foods from around the world.

In our sample scenarios we use the datasets regarding customers, products and orders.

Further information on the data and licenses can be found here: http://northwinddatabase.codeplex.com/license

Groceries

The simplified groceries excel file is used by the Frequent Itemsets Mining and Hierarchical Basket Analysis tasks to create a typical shopping behavior scenario.

hba

The hba files contain replacement rules extracted from imported data sets. One contains order transaction data (hba_test), and the second prices and production costs (hba_test_prices_cost).

These files are used by the Assortment Optimizer task.

Shipping

This simplified excel file is used by the Find and Replace task to create a very basic example, where shipping data is corrected to ensure that parcels heavier than 4kg, that should be sent by courier, are not mistakenly shipped by mail, incurring an excess weight charge.

Yogurt

The yogurt excel file is used in the Mixed Integer Linear Programming task to create a scenario where a factory wants to maximize its profits by understanding the optimal amount of final product to make from its currently available material.

InventoryAssignment

The inventory_assignment excel file illustrates how the Network Optimizer task can be used to improve the movement of materials between source and destination locations.