Plotting P-P Plots
A P-P Plot (Probability-Probability Plot) is a probability plot used to evaluate if a data set follows some specified distribution, plotting the two cumulative function against each other.
If a specified distribution is the correct model, the P-P plot should be approximately linear near the line y = x.
It contains the following attribute:
Nominal attributes not supported
The family of distribution with which comparisons are performed can be selected from the following values: normal, beta or exponential.
The corresponding options must then be specified below.
Normal distribution parameters
Normal distribution is a continuous probability distribution where values are symmetrically distributed around the average value, defined here as the Mean µ. The Standard Deviation σ defines how far the displayed values can deviate from the mean.
Normal distribution is defined as follows:
Beta distribution parameters
Beta distribution displays a probabilistic display of probabilities, by defining Alpha (α) and Beta (β) values, which will be used to define the distribution as follows:
Exponential distribution parameters
Exponential distribution calculates the time which occurs between two events, where the Lambda (λ) value specified here is the average number of events in 1 unit of time.
The exponential distribution is given by:
The following example is based on the Adult dataset.
Scenario data can be found in the Datasets folder in your Rulex installation.
Dragging and dropping the age attribute onto an x cell and selecting P-P Plot in the Plot cell will display the P-P plot of the age attribute compared with the Normal distribution (default comparison).