Descriptive statistics is a method in how to organizing, summing, and presenting data in comfortable and informative ways included graphical and counting techniques. In this session graphical techniques will not be explained, these presentation techniques will be explained in another writing session. Descriptive statistics can describes data that are analyzing, however it cannot makes decisions or inferences about population from those data. For decision making, we need another statistical technique that is inference statistics.
Descriptive statistics measures
Measures of descriptive statistics can be classified into two groups that are central location and dispertion measures. The measures of central location are mean, median, and mode. While dispertion measures are variance, standard deviation, coefficient of variation, and range. Those descriptive measures will be explained using in both ungrouped and grouped data.
Central location measures
for population mean
where N is the number of population
for sample mean
where n is the number of sample
From 11 pear trees produced fruits in weight as follow (in Kgs):
330 284 306 268 326 346 236 422 374 292 380
Calculate production mean of 11 pear trees?
The mean for grouped data
When data have been presented in groups such as in a frequency table where the numbers of observations falling into classes that are called frequencies, then the formula for the mean is
Another central location measure that may be chossen beside the mean is median. If data of pear fruit productions are ordered ascending, so the midpoint is 326 kgs. It means that five pear trees have production below and five pear trees have production above its. This midpoint is called median. Median can directly be determined if the numbers of observations are odd, if the numbers of observations are even then we will have two observations as midpoints. In that situation, median value is counted by averaging those two observation values. Procedure for getting median is firstly by ordering data from the smallest value to the highest value then we choose the middle observation of this arrangement. In other word, the median is the value that falls in the middle when the observations are arranged in order of magnitude.
The median for grouped data
For data have been presented in frequency tables, the median is estimated by using the formula as follow:
Median class is the class that has median value in it. To determine median class, observations are devided into two groups meaning that 50 percent observations have smaller values than median and 50 percent of them are greater. If we see Table 1 above, the median is the 25th observation (50 is devided by 2). Sum of the frequencies of the first three classes that are f1 + f2 + f3 is 16. Because the frequency of the fourth class is 14, so the median is in the fourth class. It means that the fourth class is as the median class.
Example: calculate median value of the group data in Table 1.
The question that may appear is if we have its raw data, is the median value 66,33? The answer is may be yes may be no, because this way is the interpolation way where the raw data are not known and they have already been grouped. Although the interpolation may not give precisely result but this way give the estimation that be close to the true median value.
The mode of a set of observation is the value that occurs most frequently. The concept of a mode is relevant in cases of multiple occurences of observation values. The mode is the least useful of the three measures of central location. However, the mode will be useful when the data sets are large and variatively. By using the data of 11 pear tree production, the mode is 326 kgs.
A distribution or a grouped data may not have a mode or may have more than one mode. The distribution that has one mode is called unimodal distribution, that has two modes is called bimodal distribution, and that has modes more than two is called multimodal distribution.
Example: find the modal value of the data below
a). 2, 3, 5, 7, 8.
b). 2, 5, 7, 9, 9, 9, 10, 10, 11, 12.
c). 2, 3, 4, 4, 4, 5, 5, 7, 7, 7, 9.
Data a) have no mode because all data have the same frequencies.
Data b) have a mode, the modal value is 9 because this observation has the most frequencies.
Data c) have two modes that are 4 and 7, because both of them have the most frequencies and the frequencies are the same.
The mode for grouped data
For data have been presented in frequency table, the mode is estimated by using the formula as follow:
The modal class is the class with the highest frequency.
Example: find the modal value of the 50 students data from table 1 above
solution: the modal class is the forth class, because this class has the highest frequency
To give characteristic summarization, we need another measurement besides central location measure. Usually we interested with the dispersion of observations value from their mean, that is the difference of the data from their mean. The mean of the square differences is a dispersion measure that be called variance. Symbol of variance for population measure is (read: sigma square) and for sample is s2.
Square root of variance is standard deviation. A standard deviation is a dispersion measure that oftenly used in analysis. A standard deviation value is basically picturing the distribution of a data set to their mean. It pictures the heterogeneity of a data set. The formula of standard deviation is as follow:
The variance of 11 pear tree productions is 2,575.2 kgs. so standard deviation is 50.75 kgs.
The variance for grouped data
For data have been presented in frequency table, the variance is estimated by using the formula as follow:
Coefficient of variation
Standard deviation can measure the heterogeneity or variation of a group of data. However, if we want to compare two groups of data that have different unit of measurement, standard deviation can not be used meaning that the bigger standard deviation is not always meant that this a group of data is more heterogen. To compare two groups of data without considering unit of measurement, then we can use a variation measure that be called coeficient of variation (CV). The formula of CV can be written as follow
If CV1 > CV2 means that the first group of data has more variation or more heterogen than the second group of data.
The simplest dispersion measure from numerical data is by taking the difference between the maximum value and the minimum value. This measurement way is known as range.
Range = maximum value – minimum value.
The range of 11 pear trees production is 96 kgs ( 402 – 306).
The range for grouped data
For grouped data, range is counted by taking the difference between the midpoint of the last class and the midpoint of the first class or the difference between the upper boundary of the last class and the lower boundary of the first class.
range = the midpoint of the last class – the midpoint of the first class
range = the upper boundary of the last class – the lower boundary of the first class.
From statistics scores of 50 students data in table 1, the range value is
range = 94.5 -34.5 = 60 (this way may not include extreem values)
range = 99.5 – 29.5 = 70.
See you to other writing sessions, have enjoying statistics.
Any questions, send to the e-mail address: firstname.lastname@example.org
*) Writer is a lecturer in Institute of Statistics, Jakarta, Indonesia.