Rulex Language Functions
A number of functions are provided with Rulex to be used to compute formulas in the Data Manager.
These functions are categorized according to their main use:
Arrays
Logical
Statistical
Math and trigonometry
Text
Date/Time
Graphs
System
Data
Mandatory and optional parameters and examples of use are displayed in Rulex when you click on the name of the function in the function bar.
When entering the parameters for a function, you can either:
simply respect the order of the parameters, for example in cast($"Att1", $""Att2"), the first attribute (Att1) is the column, while the second attribute (Att2) is the value for newtype.
specify which parameter the value applies to by using keywords, for example root($"att1", $"att2", whichpath = "all", separator = "-"). This is particularly useful when there are many parameters, and you don't need to provide a value for all of them.
Available functions
Finding functions
Click on the header columns in the following table to sort by category or function.
Function | Formula | Description |
---|---|---|
Arrays | ||
enum | enum(group) | Enumerates the patterns inside each group for a given group of attributes |
fillDown | fillDown(column, group) | Returns a copy of the column, filling all the missing values with the last valid value, according to the groups defined in the group parameter. |
fillLinear | fillLinear(column, group) | Returns a copy of the column, filling all the missing values of the column with the linear interpolation, according to the groups defined in the group parameter. |
fillUp | fillLinear(column, group) | Returns a copy of the column, filling all the missing values with the subsequent valid value, according to the groups defined in the group parameter. |
len | len(column) | Returns the column with all values equal to the size (i.e. the total number of elements) of the column. |
perm | perm(column) | Returns a random permutation of the column. |
shift | shift(column, shift, group, cyclic) | Returns the attribute column shifted by the shift value. The shift can be performed according to the groups defined in the group parameter. |
Logical | ||
ifelse | ifelse(condition, iftrue, iffalse) | Returns the column with the value of iftrue if the condition is verified, or iffalse if not. If the value of the condition is missing, missing is returned. |
isDate | isDate(string, binary) | Checks whether a string corresponds to a date value. |
isDatetime | isDatetime(string, binary) | Checks whether a string corresponds to a datetime value. |
isFloat | isFloat(string, binary) | Checks whether a string corresponds to a float value. |
isInteger | isInteger(string, binary) | Checks whether a string corresponds to an integer value. |
isMonth | isMonth(string, binary) | Checks whether a string corresponds to a month value. |
isQuarter | isQuarter(string, binary) | Checks whether a string corresponds to a quarter value. |
isTime | isTime(string, binary) | Checks whether a string corresponds to a time value. |
isType | isType(string, type, binary) | Checks whether a string corresponds to a given type. |
isWeek | isWeek(string, binary) | Checks whether a string corresponds to a week value. |
Statistical | ||
anovap | anovap(column, attclass, group, usemissing) | Returns the column with all values equal to the p value associated with the ANOVA t statistics relative to the column, according to the groups defined in the group parameter. |
anovat | anovat(column, attclass, group, usemissing) | Returns the column with all values equal to the value of the ANOVA t statistics associated with the column, according to the groups defined in the group parameter. |
argMax | argMax(column, group) | Returns the position of the maximum column as an index, evaluated within groups defined by the group parameter if required. |
argMin | argMin(column, group) | Returns the position of the minimum column as an index, evaluated within groups defined by the group parameter if required. |
chisquare | chisquare(column1, column2, group, usemissing) | Returns the column with the chisquare statistics computed from the contingency table of column1 and column2, according to the groups defined in the group parameter. |
chisquarep | chisquarep(column1, column2, group, usemissing) | Returns the column with the p value associated with the chisquare statistics computed from the contingency table of column1 and column2, evaluated within groups defined by the group parameter if required. |
cohenk | cohenk(column1, column2, group, usemissing) | Returns the Cohen K coefficient between column1 and column2, evaluated within groups defined by the group parameter if required. |
count | count(group) | Returns the number of different combinations of values in the list group. |
countIf | countIf(condition, group) | Returns the count of the records satisfying a given condition, according to the groups defined in the group parameter, if required. |
covariance | covariance(column1, column2, group) | Evaluates the covariance between column1 and column2, according to the groups defined in the group parameter, if required. |
cumMax | cumMax(column, group) | Returns the cumulative maximum of the column, evaluated within groups defined by the group parameter if required. |
cumMin | cumMin(column, group) | Returns the cumulative minimum of the column, evaluated within groups defined by the group parameter if required. |
distinct | distinct(column, group) | Returns the number of distinct values of the column, evaluated within groups defined by the group parameter if required. |
entropy | entropy(column, group, usemissing) | Returns the entropy of the column, evaluated within groups defined by the group parameter if required. |
gini | gini(column, group, usemissing) | Returns the Gini index of the column, evaluated within groups defined by the group parameter if required. |
inIqr | inIqr(column, coeff) | Returns the column with a True/False value according to the interquantile range. If $"att" is in [Q1-coeff*(Q3-Q1), Q3+coeff*(Q3-Q1)] (where Q1 and Q3 are the first and the third quartiles, respectively, and coeff is a parameter fixed by the user), iniqr returns True, otherwise it returns False. The parameter a is set to 1.5 by default. |
max | max(column, group) | Returns the maximum of the column, evaluated within groups defined by the group parameter if required. |
max2 | max2(column1, column2) | Returns the column with values equal to the maximum value between $"att1" and $"att2". |
maxyoudencut | maxyoudencut(column, attclass, defclass, group) | Returns the value which maximizes the youden index of the ROC curve defined by column1 and by the class attclass. The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. The computation can be performed according to the groups defined in the group parameter, if required. |
mean | mean(column, group) | Returns the mean of the column, evaluated within groups defined by the group parameter if required. |
median | median(column, group) | Returns the median of the column, evaluated within groups defined by the group parameter if required. |
min | min(column, group) | Returns the minimum of the column, evaluated within groups defined by the group parameter if required. |
min2 | min2(column1, column2) | Returns the column with values equal to the minimum values between $"att1" and $"att2". |
mode | mode(column, group, usemissing) | Returns the mode of the column, evaluated within groups defined by the group parameter if required. |
movMean | movMean(column, lag, group, front) | Returns the moving average of the column, evaluated on the lag continuous rows, computed according to groups defined by the values of group entry if required. |
pearson | pearson(column1, column2, group) | Returns the Pearson coefficient between column1 and column2, evaluated within groups defined by the group parameter if required. |
quantile | quantile(column, quant, group, weights) | Returns the quant quantile of the column, evaluated within groups defined by the group parameter if required. A column of weights can also be defined. |
roc | roc(column, attclass, defclass, group) | Returns the area under the ROC curve defined by column1 and by the class attclass. The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. All computation can be performed according to the groups defined in the group parameter. |
std | std(column, group) | Returns the standard deviation of the column, evaluated within groups defined by the group parameter if required. |
variance | variance(column, group) | Returns the variance of the column, evaluated within groups defined by the group parameter if required. |
Math & trigonometry | ||
abs | abs(column) | Returns the absolute value of each row of the column. |
acos | acos(column) | Returns the arccosine values of each row of the column. |
acosh | acosh(column) | Returns the hyperbolic arccosine of each row of the column. |
asin | asin(column) | Returns the arcsine of each row of the column. |
asinh | asinh(column) | Returns the hyperbolic arcsine of each row of the column. |
atan | atan(column) | Returns the arctangent of each row of the column. |
atanh | atanh(column) | Returns the hyperbolic arctangent of each row of the column. |
baseConv | baseConv(column, basein, baseout, compflagin, compflagout) | Converts a base 10 integer, or a string that corresponds to an integer, to a different base. Optional parameters allow the user to have a 2-complement code (if set to True) in the input and/or in the output value. |
ceil | ceil(column) | Returns the smallest following values of each row of the column. |
cos | cos(column) | Returns the cosine of each row of the column. |
cosh | cosh(column) | Returns the hyperbolic cosine of each row of the column. |
cumProd | cumProd(column, group) | Returns the cumulative product of the column, evaluated within groups defined by the group parameter if required. |
cumSum | cumSum(column, group) | Returns the cumulative sum of the column, evaluated within groups defined by the group parameter if required. |
exp | exp(column) | Returns the exponential of each row of the column. |
floor | floor(column) | Returns the largest previous values of each row of the column. |
isInteger | isInteger(string, binary) | Checks whether a string corresponds to an integer value. |
log | log(column) | Returns the natural logarithm of each row of the column. |
log10 | log10(column) | Returns the logarithm (with respect to 10) of each row of the column. |
prod | prod(column, group) | Returns the product of the column, evaluated within groups defined by the group parameter if required. |
rand | rand(n, seed) | Returns a random column with the specified number of elements. If the number of elements is specified, a random column is created with n (n=number of examples) elements. |
round | round(column) | Returns the nearest integer value of each row of the column. |
sign | sign(column) | Returns the sign of each row of the column. |
sin | sin(column) | Returns the sine of each row of the column. |
sinh | sinh(column) | Returns the hyperbolic sine of each row of the column. |
sqrt | sqrt(column) | Returns the square root of each row of the column. |
sum | sum(column, group) | Returns the sum of the column, evaluated within groups defined by the group parameter if required. |
tan | tan(column) | Returns the tangent of each row of the column. |
tanh | tanh(column) | Returns the hyperbolic tangent of each row of the column. |
Text | ||
distance | distance(column1, column2, method) | Computes the distance between the values of two columns, column1, column2, according to one of the following methods: "levenshtein" ("I"), "damerau-levenshtein" ("dl"), "lcs", "hamming" |
find | find(column, value, binary) | Returns in each row of the result True if the value is contained in the corresponding row of the column; otherwise else. |
head | head(column, nchar) | Returns in each row of the result with the first n letters of the corresponding value contained in the column. |
isPrefix | isPrefix(column, value, binary) | Returns True in the rows of the column starting with the string value; otherwise False. |
isSuffix | isSuffix(column, value, binary) | Returns True in the rows of the column ending with the string value; otherwise False. |
isWord | isWord(substring, delimiter, binary) | Returns a column with value 1 if str ($"att2" respectively) is word separated by a specified delimiter in the values of $"att1", otherwise 0. The default delimiter is (space). |
numExt | numExt(column, onlyint, separator) | Returns a string containing only the numerical characters of the input string. If more than one number is present, numbers are delimited by a separator decided by the user (by default "-"). |
pad | pad(column, len, value, where) | Returns in each row of the result, the values of the column, filled (padded) with the padstring value to reach the specified length len. The string can be added at the beginning (where = "begin" or by default) or at the end (where = "end") of the string, according to the value of the flag where. |
phonetic | phonetic(column, component) | Returns the phonetic encoding of the strings contained in the column using the Metaphone algorithm. Phonetic may return the primary Metaphone component (component = "Primary" or component = "P") or the secondary component (component = "Secondary" or component = "S"). By default the primary component is returned. |
prefix | prefix(column, value) | Returns the part of the string before its passed string value (prefix) for each string contained in the column. |
replace | replace(column, oldvalue, newvalue, ntimes) | Returns, in each row of the result, the values of the column, with the first (last) ntimes occurrences of oldvalue replaced by newvalue in each row. |
strip | strip(column, value, where, ischarlist) | Returns, in each row of the result, the values of the column, in which all the characters included in a given string are removed from the beginning (where = "begin"), the end (where = "end") or from both of them (where = "both" or by default). The list of characters can be defined as a list or a substring according to the parameter ischarlist parameter. |
suffix | suffix(column, value, last) | Returns the part of the string after its passed string value (suffix) for each string contained in the column. |
tail | tail(column, nchar) | Returns in each row of the result with the last n letters of the corresponding value contained in the column. |
textConcat | textConcat(column, separator, group) | Returns the concatenation of the strings in the column (if there is a customized separator), evaluated within groups defined by the group parameter if required. The column must be nominal. |
textExtract | textExtract(column, startpos, endpos) | Returns in each row of the result the part of the corresponding string in the column between the starting startpos and the ending endpos position. |
textLen | textLen(column) | Returns the length of the string contained in each row of the column. |
textSort | textSort(column, ascending) | Returns a copy of the column with the strings contained in each row, sorted according to the ascending order. |
Date/Time | ||
addMonth | addMonth(date, nmonth) | Adds a given number of months to a date attribute. |
addQuarter | addQuarter(date, nquarter) | Adds a given number of quarters to a date attribute. |
addWorkingDays | addWorkingDays(date, nday) | Adds a given number of working days (excluding weekends) to a date attribute. |
addYear | addYear(date, nyear) | Returns the column with the value of $”att” adding nyear years, if att is a date. |
currDate | currDate(utc) | Returns the current date according to local or UTC settings. |
currDatetime | currDatetime(utc) | Returns the current datetime according to local or UTC settings. |
date | date(year, month, day) | Returns a column with all values equal to the date consisting of given year, month and day. |
datetime | datetime(date, time) | Returns in each row of the result the datetime value obtained by the composition of the date value contained in the date entry and the time value contained in the time entry. |
day | day(date) | Returns the day value of date. |
hour | hour(time) | Returns the hour value of time. |
isDate | isDate(string, binary) | Checks whether a string corresponds to a date value. |
isDatetime | isDatetime(string, binary) | Checks whether a string corresponds to a datetime value. |
isMonth | isMonth(string, binary) | Checks whether a string corresponds to a month value. |
isQuarter | isQuarter(string, binary) | Checks whether a string corresponds to a quarter value. |
isTime | isTime(string, binary) | Checks whether a string corresponds to a time value |
isWeek | isWeek(string, binary) | Checks whether a string corresponds to a week value |
minute | minute(time) | Returns the minute value of time. |
month | month(date) | Returns the month value of date. |
second | second(time) | Returns the second value of date. |
time | time(hour, minute, second) | Composes a time starting from hours, minutes and seconds. |
timeZone | timeZone() | Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time. |
week | week(date) | Returns the week value of date. |
weekDay | weekDay(date, mondaystart) | Returns the day of the week as an integer for each value of date. If mondaystart is True Monday is 1 and Sunday is 7; otherwise Sunday is 1 and Saturday is 7. |
year | year(date) | Returns the year value of date. |
Graphs | ||
connComp | connComp(parent, son, group) | This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the index of the connected component associated with the node contained in the son attribute, according to the groups defined in the group parameter, if required. |
leaf | leaf(parent, son, group, whichpath, separator, weights, operator) | This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the leaf of the node contained in the son attribute, according to the groups defined in the group parameter, if required. The whichpath parameter allows the user to choose which path is to be considered:
It is also possible to introduce weights into the computation and control the way the result must be shown. |
leafDistance | leafDistance(parent, son, group, whichpath, separator, weights, operator) | This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the inverse level (distance from the leaf) of the node contained in the son attribute, according to the groups defined in the group parameter, if required. The whichpath parameter allows the user to choose which path is to be considered:
It is also possible to introduce weights into the computation and control the way the result must be shown. |
root | root(parent, son, group, whichpath, separator, weights, operator) | This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the root of the node contained in the son attribute, according to the groups defined in the group parameter, if required. The whichpath parameter allows the user to choose which path is to be considered:
It is also possible to introduce weights into the computation and control the way the result must be shown. |
rootDistance | rootDistance(parent, son, group, whichpath, separator, weights, operator) | This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the level (distance from the root) of the node contained in the son attribute, according to the groups defined in the group parameter, if required. The whichpath parameter allows the user to choose which path is to be considered:
It is also possible to introduce weights into the computation and control the way the result must be shown. |
System | ||
currDate | currDate(utc) | Returns the current date according to local or UTC settings. |
currDatetime | currDatetime(utc) | Returns the current datetime according to local or UTC settings. |
hostName | hostName() | Returns the hostname of the machine where Rulex is running. |
ipAddress | ipAddress() | Returns the IP address of the machine where Rulex is running. |
timeZone | timeZone() | Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time. |
Data | ||
cast | cast(column, newtype, forced) | Casts a column to the specified newtype. If the flag forces is set to true (false by default) the operation is performed even if this would cause a loss in information. |
catNames | catNames(indatt, values, separator, negate) | Concatenates the name of the list of attributes passed in the indatt parameter, according to the condition specified in the values entry. The separator parameter can be used to introduce a separator in the concatenation. |
decideType | decideType(column) | Automatically converts a column to the correct type, based on the data. |
disc | disc(column, cutoffs, rank) | Discretizes an attribute according the provided cutoffs vector. If the flag rank is True (false by default), the rank (i.e. an integer 1,2,3,...) is returned. |
discEqualFrequencies | discef(column, nvalue, rank, quantile) | Discretizes an attribute according to an equal frequency criterion. |
discEqualWidth | discew(column, nvalue, rank, min, max) | Discretizes an attribute according to an equal width criterion. |
discretize | discretize(column, nvalue, cutoffs, mode, rank, quantile, min, max) | Discretizes an attribute using the set of provided cut-offs, or using an equal width or equal frequency criterion. |
isAttribute | isAttribute(name, binary) | Checks whether an attribute with a given name is present in the dataset. |
isFloat | isFloat(string, binary) | Checks whether a string corresponds to a float value. |
isType | isType(string, type, binary) | Checks whether a string corresponds to a given type. |
type | type(column) | Returns the type of a column as a string. |