Rulex Language Functions

A number of functions are provided with Rulex to be used to compute formulas in the Data Manager.

These functions are categorized according to their main use:

  • Arrays

  • Logical

  • Statistical

  • Math and trigonometry

  • Text

  • Date/Time

  • Graphs

  • System

  • Data

Mandatory and optional parameters and examples of use are displayed in Rulex when you click on the name of the function in the function bar.

When entering the parameters for a function, you can either:

  • simply respect the order of the parameters, for example in cast($"Att1", $""Att2"), the first attribute (Att1) is the column, while the second attribute (Att2) is the value for newtype.

  • specify which parameter the value applies to by using keywords, for example root($"att1", $"att2", whichpath = "all", separator = "-"). This is particularly useful when there are many parameters, and you don't need to provide a value for all of them.

Available functions

Finding functions

Click on the header columns in the following table to sort by category or function.

Function

Formula

Description

Arrays

enum

enum(group)

Enumerates the patterns inside each group for a given group of attributes

fillDown

fillDown(column, group)

Returns a copy of the column, filling all the missing values with the last valid value, according to the groups defined in the group parameter.

fillLinear

fillLinear(column, group)

Returns a copy of the column, filling all the missing values of the column with the linear interpolation, according to the groups defined in the group parameter.

fillUp

fillLinear(column, group)

Returns a copy of the column, filling all the missing values with the subsequent valid value, according to the groups defined in the group parameter.

len

len(column)

Returns the column with all values equal to the size (i.e. the total number of elements) of the column.

perm

perm(column)

Returns a random permutation of the column.

shift

shift(column, shift, group, cyclic)

Returns the attribute column shifted by the shift value. The shift can be performed according to the groups defined in the group parameter.

Logical



ifelse

ifelse(condition, iftrue, iffalse)

Returns the column with the value of iftrue if the condition is verified, or iffalse if not. If the value of the condition is missing, missing is returned.

isDate

isDate(string, binary)

Checks whether a string corresponds to a date value.

isDatetime

isDatetime(string, binary)

Checks whether a string corresponds to a datetime value.

isFloat

isFloat(string, binary)

Checks whether a string corresponds to a float value.

isInteger

isInteger(string, binary)

Checks whether a string corresponds to an integer value.

isMonth

isMonth(string, binary)

Checks whether a string corresponds to a month value.

isQuarter

isQuarter(string, binary)

Checks whether a string corresponds to a quarter value.

isTime

isTime(string, binary)

Checks whether a string corresponds to a time value.

isType

isType(string, type, binary)

Checks whether a string corresponds to a given type.

isWeek

isWeek(string, binary)

Checks whether a string corresponds to a week value.

Statistical



anovap

anovap(column, attclass, group, usemissing)

Returns the column with all values equal to the p value associated with the ANOVA t statistics relative to the column, according to the groups defined in the group parameter.

anovat

anovat(column, attclass, group, usemissing)

Returns the column with all values equal to the value of the ANOVA t statistics associated with the column, according to the groups defined in the group parameter.

argMax

argMax(column, group)

Returns the position of the maximum column as an index, evaluated within groups defined by the group parameter if required.

argMin

argMin(column, group)

Returns the position of the minimum column as an index, evaluated within groups defined by the group parameter if required.

chisquare

chisquare(column1, column2, group, usemissing)

Returns the column with the chisquare statistics computed from the contingency table of column1 and column2, according to the groups defined in the group parameter.

chisquarep

chisquarep(column1, column2, group, usemissing)

Returns the column with the p value associated with the chisquare statistics computed from the contingency table of column1 and column2, evaluated within groups defined by the group parameter if required.

cohenk

cohenk(column1, column2, group, usemissing)

Returns the Cohen K coefficient between column1 and column2, evaluated within groups defined by the group parameter if required.

count

count(group)

Returns the number of different combinations of values in the list group.

countIf

countIf(condition, group)

Returns the count of the records satisfying a given condition, according to the groups defined in the group parameter, if required.

covariance

covariance(column1, column2, group)

Evaluates the covariance between column1 and column2, according to the groups defined in the group parameter, if required.

cumMax

cumMax(column, group)

Returns the cumulative maximum of the column, evaluated within groups defined by the group parameter if required.

cumMin

cumMin(column, group)

Returns the cumulative minimum of the column, evaluated within groups defined by the group parameter if required.

distinct

distinct(column, group)

Returns the number of distinct values of the column, evaluated within groups defined by the group parameter if required.

entropy

entropy(column, group, usemissing)

Returns the entropy of the column, evaluated within groups defined by the group parameter if required.

gini

gini(column, group, usemissing)

Returns the Gini index of the column, evaluated within groups defined by the group parameter if required.

inIqr

inIqr(column, coeff)

Returns the column with a True/False value according to the interquantile range. If $"att" is in [Q1-coeff*(Q3-Q1), Q3+coeff*(Q3-Q1)] (where Q1 and Q3 are the first and the third quartiles, respectively, and coeff is a parameter fixed by the user), iniqr returns True, otherwise it returns False. The parameter a is set to 1.5 by default.

max

max(column, group)

Returns the maximum of the column, evaluated within groups defined by the group parameter if required.

max2

max2(column1, column2)

Returns the column with values equal to the maximum value between $"att1" and $"att2".

maxyoudencut

maxyoudencut(column, attclass, defclass, group)

Returns the value which maximizes the youden index of the ROC curve defined by column1 and by the class attclass. The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. The computation can be performed according to the groups defined in the group parameter, if required.

mean

mean(column, group)

Returns the mean of the column, evaluated within groups defined by the group parameter if required.

median

median(column, group)

Returns the median of the column, evaluated within groups defined by the group parameter if required.

min

min(column, group)

Returns the minimum of the column, evaluated within groups defined by the group parameter if required.

min2

min2(column1, column2)

Returns the column with values equal to the minimum values between $"att1" and $"att2".

mode

mode(column, group, usemissing)

Returns the mode of the column, evaluated within groups defined by the group parameter if required.

movMean

movMean(column, lag, group, front)

Returns the moving average of the column, evaluated on the lag continuous rows, computed according to groups defined by the values of group entry if required.

pearson

pearson(column1, column2, group)

Returns the Pearson coefficient between column1 and column2, evaluated within groups defined by the group parameter if required.

quantile

quantile(column, quant, group, weights)

Returns the quant quantile of the column, evaluated within groups defined by the group parameter if required.  A column of weights can also be defined.

roc

roc(column, attclass, defclass, group)

Returns the area under the ROC curve defined by column1 and by the class attclass. The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. All computation can be performed according to the groups defined in the group parameter.

std

std(column, group)

Returns  the standard deviation of the column, evaluated within groups defined by the group parameter if required.

variance

variance(column, group)

Returns the variance of the column, evaluated within groups defined by the group parameter if required.

Math & trigonometry

abs

abs(column)

Returns the absolute value of each row of the column.

acos

acos(column)

Returns the arccosine values of each row of the column.

acosh

acosh(column)

Returns the hyperbolic arccosine of each row of the column.

asin

asin(column)

Returns the arcsine of each row of the column.

asinh

asinh(column)

Returns the hyperbolic arcsine of each row of the column.

atan

atan(column)

Returns the arctangent of each row of the column.

atanh

atanh(column)

Returns the hyperbolic arctangent of each row of the column.

baseConv

baseConv(column, basein, baseout, compflagin, compflagout)

Converts a base 10 integer, or a string that corresponds to an integer, to a different base. Optional parameters allow the user to have a 2-complement code (if set to True) in the input and/or in the output value.

ceil

ceil(column)

Returns the smallest following values of each row of the column.

cos

cos(column)

Returns the cosine of each row of the column.

cosh

cosh(column)

Returns the hyperbolic cosine of each row of the column.

cumProd

cumProd(column, group)

Returns the cumulative product of the column, evaluated within groups defined by the group parameter if required.

cumSum

cumSum(column, group)

Returns the cumulative sum of the column, evaluated within groups defined by the group parameter if required.

exp

exp(column)

Returns the exponential of each row of the column.

floor

floor(column)

Returns the largest previous values of each row of the column.

isInteger

isInteger(string, binary)

Checks whether a string corresponds to an integer value.

log

log(column)

Returns the natural logarithm of each row of the column.

log10

log10(column)

Returns the logarithm (with respect to 10) of each row of the column.

prod

prod(column, group)

Returns the product of the column, evaluated within groups defined by the group parameter if required.

rand

rand(n, seed)

Returns a random column with the specified number of elements. If the number of elements is specified, a random column is created with n (n=number of examples) elements.

round

round(column)

Returns the nearest integer value of each row of the column.

sign

sign(column)

Returns the sign of each row of the column.

sin

sin(column)

Returns the sine of each row of the column.

sinh

sinh(column)

Returns the hyperbolic sine of each row of the column.

sqrt

sqrt(column)

Returns the square root of each row of the column.

sum

sum(column, group)

Returns the sum of the column, evaluated within groups defined by the group parameter if required.

tan

tan(column)

Returns the tangent of each row of the column.

tanh

tanh(column)

Returns the hyperbolic tangent of each row of the column.

Text

distance

distance(column1, column2, method)

Computes the distance between the values of two columns, column1, column2, according to one of the following methods: "levenshtein" ("I"), "damerau-levenshtein" ("dl"), "lcs", "hamming"

find

find(column, value, binary)

Returns in each row of the result True if the value is contained in the corresponding row of the column; otherwise else.

head

head(column, nchar)

Returns in each row of the result with the first n letters of the corresponding value contained in the column.

isPrefix

isPrefix(column, value, binary)

Returns True in the rows of the column starting with the string value; otherwise False.

isSuffix

isSuffix(column, value, binary)

Returns True in the rows of the column ending with the string value; otherwise False.

isWord

isWord(substring, delimiter, binary)

Returns a column with value 1 if str ($"att2" respectively) is word separated by a specified delimiter in the values of $"att1", otherwise 0. The default delimiter is (space).

numExt

numExt(column, onlyint, separator)

Returns a string containing only the numerical characters of the input string. If more than one number is present, numbers are delimited by a separator decided by the user (by default "-").

pad

pad(column, len, value, where)

Returns in each row of the result, the values of the column, filled (padded) with the padstring value to reach the specified length len. The string can be added at the beginning (where = "begin" or by default) or at the end (where = "end") of the string, according to the value of the flag where.

phonetic

phonetic(column, component)

Returns the phonetic encoding of the strings contained in the column using the Metaphone algorithm. Phonetic may return the primary Metaphone component (component = "Primary" or component = "P") or the secondary component (component = "Secondary" or component = "S"). By default the primary component is returned.

prefix

prefix(column, value)

Returns the part of the string before its passed string value (prefix) for each string contained in the column.

replace

replace(column, oldvalue, newvalue, ntimes)

Returns, in each row of the result, the values of the column, with the first (last) ntimes occurrences of oldvalue replaced by newvalue in each row.

strip

strip(column, value, where, ischarlist)

Returns, in each row of the result, the values of the column, in which all the characters included in a given string are removed from the beginning (where = "begin"), the end (where = "end") or from both of them (where = "both" or by default). The list of characters can be defined as a list or a substring according to the parameter ischarlist parameter.

suffix

suffix(column, value, last)

Returns the part of the string after its passed string value (suffix) for each string contained in the column.

tail

tail(column, nchar)

Returns in each row of the result with the last n letters of the corresponding value contained in the column.

textConcat

textConcat(column, separator, group)

Returns the concatenation of the strings in the column (if there is a customized separator), evaluated within groups defined by the group parameter if required. The column must be nominal.

textExtract

textExtract(column, startpos, endpos)

Returns in each row of the result the part of the corresponding string in the column between the starting startpos and the ending endpos position.

textLen

textLen(column)

Returns the length of the string contained in each row of the column.

textSort

textSort(column, ascending)

Returns a copy of the column with the strings contained in each row, sorted according to the ascending order.

Date/Time

addMonth

addMonth(date, nmonth)

Adds a given number of months to a date attribute.

addQuarter

addQuarter(date, nquarter)

Adds a given number of quarters to a date attribute.

addWorkingDays

addWorkingDays(date, nday)

Adds a given number of working days (excluding weekends) to a date attribute.

addYear

addYear(date, nyear)

Returns the column with the value of $”att” adding nyear years, if att is a date.

currDate

currDate(utc)

Returns the current date according to local or UTC settings.

currDatetime

currDatetime(utc)

Returns the current datetime according to local or UTC settings.

date

date(year, month, day)

Returns a column with all values equal to the date consisting of given year, month and day.

datetime

datetime(date, time)

Returns in each row of the result the datetime value obtained by the composition of the date value contained in the date entry and the time value contained in the time entry.

day

day(date)

Returns the day value of date.

hour

hour(time)

Returns the hour value of time.

isDate

isDate(string, binary)

Checks whether a string corresponds to a date value.

isDatetime

isDatetime(string, binary)

Checks whether a string corresponds to a datetime value.

isMonth

isMonth(string, binary)

Checks whether a string corresponds to a month value.

isQuarter

isQuarter(string, binary)

Checks whether a string corresponds to a quarter value.

isTime

isTime(string, binary)

Checks whether a string corresponds to a time value

isWeek

isWeek(string, binary)

Checks whether a string corresponds to a week value

minute

minute(time)

Returns the minute value of time.

month

month(date)

Returns the month value of date.

second

second(time)

Returns the second value of date.

time

time(hour, minute, second)

Composes a time starting from hours, minutes and seconds.

timeZone

timeZone()

Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time.

week

week(date)

Returns the week value of date.

weekDay

weekDay(date, mondaystart)

Returns the day of the week as an integer for each value of date. If mondaystart is True Monday is 1 and Sunday is 7; otherwise Sunday is 1 and Saturday is 7.

year

year(date)

Returns the year value of date.

Graphs

connComp

connComp(parent, son, group)

This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the index of the connected component associated with the node contained in the son attribute, according to the groups defined in the group parameter, if required.

leaf

leaf(parent, son, group, whichpath, separator, weights, operator)

This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the leaf of the node contained in the son attribute, according to the groups defined in the group parameter, if required.

The whichpath parameter allows the user to choose which path is to be considered:

  • the shortest one (whichpath = "minimum"),

  • the longest one (whichpath = "maximum") or

  • all the paths (whichpath = "all", in this case the leafs are concatenated in a single string with a separator settable by the user. The default separator is "-".)

It is also possible to introduce weights into the computation and control the way the result must be shown.

leafDistance

leafDistance(parent, son, group, whichpath, separator, weights, operator)

This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the inverse level (distance from the leaf) of the node contained in the son attribute, according to the groups defined in the group parameter, if required.

The whichpath parameter allows the user to choose which path is to be considered:

  • the shortest one (whichpath = "minimum"),

  • the longest one (whichpath = "maximum") or

  • all the paths (whichpath = "all", in this case the distances are concatenated in a single string with a separator settable by the user. The default separator is "-").

It is also possible to introduce weights into the computation and control the way the result must be shown.

root

root(parent, son, group, whichpath, separator, weights, operator)

This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the root of the node contained in the son attribute, according to the groups defined in the group parameter, if required.

The whichpath parameter allows the user to choose which path is to be considered:

  • the shortest one (whichpath = "minimum"),

  • the longest one (whichpath = "maximum") or

  • all the paths (whichpath = "all", in this case the distances are concatenated in a single string with a separator settable by the user. The default separator is "-").

It is also possible to introduce weights into the computation and control the way the result must be shown.

rootDistance

rootDistance(parent, son, group, whichpath, separator, weights, operator)

This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the level (distance from the root) of the node contained in the son attribute, according to the groups defined in the group parameter, if required.

The whichpath parameter allows the user to choose which path is to be considered:

  • the shortest one (whichpath = "minimum"),

  • the longest one (whichpath = "maximum") or

  • all the paths (whichpath = "all", in this case the distances are concatenated in a single string with a separator settable by the user. The default separator is "-").

It is also possible to introduce weights into the computation and control the way the result must be shown.

System

currDate

currDate(utc)

Returns the current date according to local or UTC settings.

currDatetime

currDatetime(utc)

Returns the current datetime according to local or UTC settings.

hostName

hostName()

Returns the hostname of the machine where Rulex is running.

ipAddress

ipAddress()

Returns the IP address of the machine where Rulex is running.

timeZone

timeZone()

Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time.

Data

cast

cast(column, newtype, forced)

Casts a column to the specified newtype. If the flag forces is set to true (false by default) the operation is performed even if this would cause a loss in information.

catNames

catNames(indatt, values, separator, negate)

Concatenates the name of the list of attributes passed in the indatt parameter, according to the condition specified in the values entry. The separator parameter can be used to introduce a separator in the concatenation.

decideType

decideType(column)

Automatically converts a column to the correct type, based on the data.

disc

disc(column, cutoffs, rank)

Discretizes an attribute according the provided cutoffs vector. If the flag rank is True (false by default), the rank (i.e. an integer 1,2,3,...) is returned.

discEqualFrequencies

discef(column, nvalue, rank, quantile)

Discretizes an attribute according to an equal frequency criterion.

discEqualWidth

discew(column, nvalue, rank, min, max)

Discretizes an attribute according to an equal width criterion.

discretize

discretize(column, nvalue, cutoffs, mode, rank, quantile, min, max)

Discretizes an attribute using the set of provided cut-offs, or using an equal width or equal frequency criterion.

isAttribute

isAttribute(name, binary)

Checks whether an attribute with a given name is present in the dataset.

isFloat

isFloat(string, binary)

Checks whether a string corresponds to a float value.

isType

isType(string, type, binary)

Checks whether a string corresponds to a given type.

type

type(column)

Returns the type of a column as a string.