Computing Statistics - Single Statistics
Single statistics are statistical functions used to perform preliminary descriptive analyses.
They are univariate statistics.
Statistics on integer variables are continuous
If selected, statistics will be displayed as continuous values (e.g. 12,34 instead of a rounded off 12).
Number of total valid samples
Number of rows with valid values, not including missing values.
Gini concentration index
Computes the Gini concentration index, which is obtained from the following equation:
Dispersion and heterogeneity measures
Number of mode elements
Shows the number of cases where the value corresponds to the mode. In a statistical distribution the mode represents the most frequently observed value, and the number of mode elements corresponds to the number of occurrences of tied values.
A heterogeneity measure, which is obtained, for a categorical variable, from the following equation:
Value obtained by dividing the entropy value by its maximum theoretical value (i.e. log(n)).
Coefficient commonly applied to qualitative ordinal variables to evaluate the dispersion of values across different categories. It is estimated by the following equation:
G takes the minimum value of zero when there is only one category (maximum concentration), while if there is only one value in each stratum (maximum dispersion) .
Normalized Gini coefficient
Coefficient with the same meaning as the non-normalized statistic, but it is forced to vary between 0 and 1, simply dividing the G value by its maximum theoretical value
Range of values
The difference between the highest and the lowest values observed for the attribute X.
A dispersion measure often applied to continuous variables when their distribution is non Gaussian. It represents the difference between the 75th and the 25th percentile, corresponding to the 75% and the 25% value of the cumulative distribution of X, respectively.
Standard error of mean
A dispersion measure usually used to evaluate the precision of a mean estimate. It is obtained by the ratio between the standard deviation and the square root of the number of subjects under study.
The square root of the variance (defined below).
It is often used instead of the Variance as it has the same unit of measure as X.
Standard error of standard deviation
Provides a measure of the uncertainty of the Standard deviation estimate.
It is seldom used.
It is among the most commonly used dispersion measures. It is estimated by the following equation (sample variance):
Standard error of variance
A measure of the precision of the variance estimate.
It is seldom used.
Coefficient of variation
The ratio between the standard error and the mean of an observed distribution. It represents a dimensionless measure of dispersion and it should be calculated only for continuous attributes, which take positive values.
Mean absolute deviation
The mean of the absolute values of the difference between each value and the average of the attribute X.
It is seldom used in favor of the Variance measure as it has better statistical properties.
Median absolute deviation
The median value of the absolute values of the difference between each value and the average of the attribute X.
It is a very rarely used as a measure of dispersion.
Descriptive, location and central tendency measures
Number of distinct values
The number of non coincident (tied) values.
Number of missing values
The number of missing values.
The simple arithmetic (unweighted) mean of the observed X values:
Absolute mean value
The mean value obtained from the absolute value of each observation.
Geometric mean value
A central tendency measure that should be calculated for positive values only:
It is often applied to evaluate the average mean time needed to perform a specific task or as a measure of average speed.
Geometric absolute mean value
The geometric mean value obtained from the absolute value of each observation.
It is seldom used.
Harmonic mean value
The harmonic mean value obtained from the value of each observation.
Harmonic absolute mean value
The harmonic mean value obtained from the absolute value of each observation.
It is seldom used.
The value that makes it possible to split the X distribution into two equally sized samples (or almost equally sized, in the presence of an odd number of distinct values), corresponding respectively to the values ≤ median and > median.
If there is an odd number of distinct X values, the median belongs to the X observed distribution. For example, if there are 5 valid items of data, the third ordered value corresponds to the median. Otherwise the median value is estimated by a simple interpolation, for example if there are 6 distinct values, the average between the 3rd and the 4th ordered value is used.
The most frequent observation.
Index of the mode element
The position of the most frequently observed value in the original (unsorted) X attribute.
The valid lowest value observed.
Index of the minimum element
The position of the lowest observed value in the original (unsorted) X attribute.
The valid highest value observed.
Index of the maximum element
The position of the highest observed value in the original (unsorted) X attribute.
The sum of all valid observations.
Absolute sum value
The sum of the absolute value of each observation.
The product of all valid observations.
Absolute product value
The absolute value of the product between each observed value.
The value that separates the quarter of ordered data with the lowest values from the 75% of the remaining observations. It is also called “the 25th percentile” and is used together with the upper quartile and the median value to obtain a box plot.
For further information on boxplots see Plotting Data - Box Plots.
The value that separates the quarter of ordered data with the highest values from the 75% of the remaining observations. It is also called “the 75th percentile” and is used together with the lower quartile and the median value to obtain a box plot.
Lower whisker for boxplot
The value corresponding to three times the standard deviation below the mean. It is used as a threshold to detect outliers detection and is part of the box plot.
Upper whisker for boxplot
The value corresponding to three times the standard deviation above the mean. It is used as a threshold to detect outliers and is part of the box plot.
Symmetry and shape measures
A measure of asymmetry.
Symmetric distributions have zero skewness, whereas positive values are associated with right asymmetry.
It is obtained by the following equation:
Standard error of skewness
An estimate of the precision of the skewness value estimate.
A measure of how data are peaked. For example, kurtosis of a standard Gaussian distribution is 3.0. Distribution with a higher kurtosis is said to be leptokurtic while distributions with a lower value are called platykurtic.
The Kurtosis value is obtained by the following equation:
Standard error of kurtosis
A measure of the precision of the Kurtosis estimate.