Importing Data from Text Files
You can import data into Rulex directly from text files, by defining the basic parsing options of the file.
You can either import text files in two different ways:
By dragging and dropping the text source file directly onto the stage.
By using the Import from Text file task. The advantage of using the task is that you can configure the import options and also specify whether you want to import single or multiple files:
Single Text file: only one file is imported, specifying the parsing and import options for the specific file.
Multiple Text files: in this case the files are concatenated to form a single table. Consequently, all files imported together must have the same structure.
Wildcards can be used in a filename/file list. For example, entering C:\Software*.csv during a file import, will result in the bulk import all the files in the Software folder, whose extension is .csv. Wildcards can be used in file names, but not in folder names, or in file extensions. Remember that only wildcard (*) can be used in the file path, and any consequent asterisks will be considered an effective part of the path, and not a wildcard. If the file extension is not actually specified when using a wildcard, files with extensions appropriate to the task in use will be searched for. For example, the extensions .xlsx, .xlsm for Import from Excel File, and .csv, .txt, .tab, etc. for Import from Text File.
You must have created a process in Rulex.
If files are imported in bulk, they must all have the same structure.
Drag and drop the Import from Text File task onto the central stage.
Double click the task.
If you want to import data via a remote connection, such as HTTP API, a SharePoint site or FTP, select the corresponding source from the Source file URI (uri) drop-down list and configure the connection in the Remote Connections tab.
To import a single text file:
Click Select file to browse to the excel file you want to import and click Open, or manually enter the name of the file in the corresponding edit box (filename): the Table preview pane displays the data that will be imported into Rulex, and is dynamically updated each time you change any of the available options.
Configure the options as explained in the Single file options below.
To import multiple text files click on the Advanced tab and configure the options as explained in the Multiple file options below.
Save and compute the task.
Single file options:
Select the data separator which delimits the values of the data to be imported.
Select the symbols used to mark decimals (decsep) and thousands (thousep) in numbers.
Enter the character (or string of characters) used to indicate missing values.
Select the symbol used to identify strings, which will not included in the imported values. For example, if you select the symbol ” as the text delimiter, the string “age” is imported simply as age.
Use two contiguous separators as a single one
Select this option if you want to force the parser to consider any possible group of adjacent separators as one. For example, if you select this option, the string ‘1,2,,,3’, with the comma as a separator, will be parsed as 1, 2, 3, while if not checked it will be parsed as 1, 2, ‘’, ‘’, 3.
Character encoding for input file
UTF-8/UNICODE is the standard encoding used to import files, but this may not always be the preferred as it adds three unprintable characters (BOM) at the beginning of the file. Here you can select to use ASCII or HTML instead, which are preferable to use when the files includes a procedure to execute.
Start importing from line
The line number from which the import operation will start.
Stop importing at line
The number of the line where the import operation will stop. If the value 0 is selected, all the lines are imported.
Get names from line
The number of the line from which the names of the columns will be taken.
Get types from line
The number of the line from which the names of the data types will be taken.
Select this option to remove spaces surrounding strings. For example, the string ” age ” is imported as “age”.
Compress white spaces
Select this option to remove extra consecutive spaces from within strings. For example the string "university program" would be imported as "university program".
Remove empty rows
If selected empty rows are automatically deleted.
Remove empty columns
If selected empty columns are automatically deleted.
Add an attribute containing filename
If selected an extra column is added to the dataset with the name of the file.
Use old computation data if source file is not available
If selected data from the previous computations will be used if the source table is not available.
Continue the execution if the file is missing
If selected, computation of the task continues, even if the selected source files are not available.
Turn off smart type recognition
If selected, the the data types of attributes is not automatically recognized, leaving the generic nominal type. This option is useful when manual identification is preferable, for example when there is the risk of a code being misinterpreted as a date.
Wait until the target file is present
If selected, Rulex polls the target file with the frequency specified (sleeptime) until it is available.
Number of records to preview
Specify how many records the table preview will display.
Multiple file options:
Drag and drop files to concatenate
Drag and drop the required files or folders from which you want to import data. All imported files must have the same structure.
If you are importing data from a remote connection, click this option to select the files and directories from which you want to import data.
Select the required concatenation type, which may be:
Match columns by
Select whether you want to match columns by: