Plotting P-P Plots

P-P Plot (Probability-Probability Plot) is a probability plot used to evaluate if a data set follows some specified distribution, plotting the two cumulative function against each other.

If a specified distribution is the correct model, the P-P plot should be approximately linear near the line y = x.

It contains the following attribute:

Attribute

Mandatory

Constraints

x

Yes

Nominal attributes not supported

Properties

Category

Properties

Description

General parameters

Compared distribution

The family of distribution with which comparisons are performed can be selected from the following values: normal, beta or exponential.

The corresponding options must then be specified below.

Normal distribution parameters

Mean

Normal distribution is a continuous probability distribution where values are symmetrically distributed around the average value, defined here as the Mean µ. The Standard Deviation σ defines how far the displayed values can deviate from the mean.

Normal distribution is defined as follows:

where:

  • µ is the mean or expectation of the distribution, and

  • σ is the standard deviation.

Standard deviation

Beta distribution parameters

Alpha

Beta distribution displays a probabilistic display of probabilities, by defining Alpha (α) and Beta (β) values, which will be used to define the distribution as follows: 

where

 

Beta

Exponential distribution parameters

Lambda

Exponential distribution calculates the time which occurs between two events, where the Lambda (λ) value specified here is the average number of events in 1 unit of time.

The exponential distribution is given by:

Examples

The following example is based on the Adult dataset.

Scenario data can be found in the Datasets folder in your Rulex installation.

Description

Result

Dragging and dropping the age attribute onto an x cell and selecting P-P Plot in the Plot cell will display the P-P plot of the age attribute compared with the Normal distribution (default comparison).