Plotting Data in the Data Manager
Many different kinds of plots can be created in the Data Manager, to help you visually understand data.
Plots are particularly useful for detecting abnormal data, called outliers.
You can drag and drop attributes onto the Plot Manager before selecting which type of plot you want to create.
In this case a default plot is created, which is a histogram if you drag and drop an attribute onto the x column, or a curve if you drag and drop the attribute onto the y column.
Click the Plot Manager tab in the manager pane of the required Data Manager task.
Double click the Plot cell and select the type of plot you want to create.
Drag and drop the attributes for the required plot from the attributes list onto the x, y, weight and/or target cells.
Note that cells that cannot be used for your selected plot type will be grayed out.
Click the Plot cell and configure the parameters for your required plot:
Grouped Bar plot / Histogram
The Grouped Bar Plot is made up of rectangular bars, positioned next to each other. whose length is proportional to the values they represent.
Stacked Bar plot / Histogram
The Stacked Bar Plot is made up of rectangular bars, where each bar displays all the values of the corresponding attribute, stacked one above the other, within a single bin.
A Box Plot is a graphical representation based on the minimum, first quartile, median, third quartile and maximum value of a quantitative data set.
An Area Plot graphically displays quantitative data, emphasizing the area between the axis and the curve representing the data with colors and hatching.
A Curve displays the behavior of a quantitative variable as a function of another quantitative variable.
A Heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors.
A Pie Chart is a circular chart divided into sectors, illustrating numerical proportion.
A Scatter Plot displays the values of two variables of a dataset using a collection of points in Cartesian coordinates.
A P-P Plot (Probability-Probability Plot) is a probability plot used to evaluate if a data set follows some specified distribution, plotting the two cumulative function against each other.
A Q-Q Plot (Quantile-Quantile Plot) is a probability plot used to compare two probability distributions by plotting their quantiles against each other.
The Lorenz curve is a graphical representation of the distribution of income or wealth. The more the curve sags below the straight diagonal line, the higher the degree of inequality of distribution.
A Receiver Operating Characteristic (ROC) Curve is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied.