DESCRIPTIVE STATISTICS

By. M.A.Yulianto.*)

Descriptive statistics is a method in how to organizing, summing, and presenting data in comfortable and informative ways included graphical and counting techniques.  In this session graphical techniques will not be explained, these presentation techniques will be explained in another writing session.  Descriptive statistics can describes data that are analyzing, however it cannot makes decisions or inferences about population from those data.  For decision making, we need another statistical technique that is inference statistics.

Descriptive statistics measures

Measures of descriptive statistics can be classified into two groups that are central location and dispertion measures.  The measures of central location are mean, median, and mode.  While dispertion measures are variance, standard deviation, coefficient of variation, and range.  Those descriptive measures will be explained using in both ungrouped and grouped data.

Central location measures

 

The mean

Mean is written by using symbol of  μ   (read: mu) for expressing of population mean, and (read: X bar) for expressing of sample mean.  The mean has formulation as follow:

for population mean

where N is the number of population

for sample mean

where n is the number of sample

example:

From 11 pear trees produced fruits in weight as follow (in Kgs):

330      284      306      268      326      346      236      422      374      292      380

Calculate production mean of 11 pear trees?

The mean for grouped data

When data have been presented in groups such as in a frequency table where the numbers of observations falling into classes that are called frequencies, then  the formula for the mean is

Solution:

The median

Another central location measure that may be chossen beside the mean is median.  If data of pear fruit productions are ordered ascending, so the midpoint is 326 kgs. It means that five pear trees have production below and five pear trees have production above its.  This midpoint is called median.  Median can directly be determined if the numbers of observations are odd, if the numbers of observations are even then we will have two observations as midpoints.  In that situation,  median value is counted by averaging those two observation values.  Procedure for getting median is firstly by ordering data from the smallest value to the highest value then we choose the middle observation of this arrangement.  In other word, the median is the value that falls in the middle when the observations are arranged in order of magnitude.

The median for grouped data

For data have been presented in frequency tables, the median is estimated by using the formula as follow:

Median class is the class that has median value in it.  To determine median class, observations are devided into two groups meaning that 50 percent observations have smaller values than median and 50 percent of them are greater.  If we see Table 1 above, the median is the 25th observation (50 is devided by 2).  Sum of the frequencies of the first three classes that are  f1 + f2 + f3  is 16.  Because the frequency of the fourth class is 14, so the median is in the fourth class.  It means that the fourth class is as the median class.

Example:  calculate median value of  the group data in Table 1.

solusion:

The question that may appear is if we have its raw data, is the median value 66,33?  The answer is may be yes may be no, because this way is the interpolation way where the raw data are not known and they have already been grouped.  Although  the interpolation may not give precisely result but this way give the estimation that be close to the true median value.

The mode

The mode of a set of observation is the value that occurs most frequently.  The concept of a mode is relevant in cases of multiple occurences of observation values. The mode is the least useful of the three measures of central location.  However, the mode will be useful when the data sets are large and variatively.  By using the data of 11 pear tree production, the mode is 326 kgs.

 A distribution or a grouped data may not have a mode or may have more than one mode.  The distribution that has one mode is called unimodal distribution, that has two modes is called bimodal distribution, and that has modes more than two is called multimodal distribution.

Example:  find the modal value of the data below

a).        2,         3,         5,         7,         8.

b).        2,         5,         7,         9,         9,         9,         10,       10,       11,       12.

c).        2,         3,         4,         4,         4,         5,         5,         7,         7,         7,         9.

Solution:

Data a) have no mode because all data have the same frequencies.

Data b) have a mode, the modal value is 9 because this observation has the most frequencies.

Data c) have two modes that are 4 and 7,  because both of them have the most frequencies and the frequencies are the same.

The mode for grouped data

For data have been presented in frequency table, the mode is estimated by using the formula as follow:

The modal class is the class with the highest frequency.

Example:  find the modal value of the 50 students data from table 1 above

solution:          the modal class is the forth class, because this class has the highest frequency

Dispersion measures

 

Variance

To give characteristic summarization,  we need another measurement besides central location measure.  Usually we interested with the dispersion of observations value from their mean, that is the difference of the data from their mean.  The mean of the square differences is a dispersion measure that be called variance. Symbol of variance for population measure is  (read: sigma square) and for sample is s2.

Standard deviation

Square root of variance is standard deviation.  A standard deviation is a dispersion measure that oftenly used in analysis.  A standard deviation value is basically picturing the distribution of  a data set to their mean.  It pictures the heterogeneity of a data set.  The formula of standard deviation is as follow:

The variance of 11 pear tree productions is 2,575.2 kgs.  so standard deviation is 50.75 kgs.

The variance for grouped data

For data have been presented in frequency table, the variance is estimated by using the formula as follow:

Coefficient of variation

Standard deviation can measure the heterogeneity or variation of a group of data. However, if we want to compare two groups of data that have different unit of measurement, standard deviation can not be used meaning that the bigger standard deviation is not always meant that this a group of data is more heterogen. To compare two groups of data without considering unit of measurement, then we can use a variation measure that be called coeficient of variation (CV).  The formula of CV can be written as follow

 

If CV1 > CV2 means that the first group of data has more variation or more heterogen than the second group of data.

The range

The simplest dispersion measure from numerical data is by taking the difference between the maximum value and the minimum value.  This measurement way is known as range.

Range = maximum value – minimum value.

The range of  11 pear trees production is 96 kgs  ( 402 – 306).

The range for grouped data

For grouped data, range is counted by taking the difference between the midpoint of the last class and the midpoint of the first class or the difference between the upper boundary of the last class and the lower boundary of the first class.

range = the midpoint of the last class – the midpoint of the first class

or

range = the upper boundary of the last class – the lower boundary of the first class.

From statistics scores of 50 students data in table 1, the range value is

range = 94.5 -34.5 = 60  (this way may not include extreem values)

or

range = 99.5 – 29.5 = 70.

digensia000

 

See you to other writing sessions, have enjoying statistics.

Any questions, send to the e-mail address:  yuliantoyorki@yahoo.com

*)  Writer is a lecturer in Institute of Statistics, Jakarta, Indonesia.

 

18 thoughts on “DESCRIPTIVE STATISTICS

  1. Great post. I used to be checking continuously this blog and I’m impressed! Very helpful information specially the closing part🙂 I deal with such information a lot. I was looking for this particular information for a very lengthy time. Thank you and best of luck.

  2. We love our Kindle Fire, but as many people have said it has some drawbacks when compared to the iPad, especially on size and power. But I’m impressed with the new specs of the Kindle Fire 2 with the 10inch screen. Some of the new specs are at kindlemad.com and I think they will be opening up pre-orders soon.

  3. On your calculation for the Pear the group is 11 not 12, there is a 12 you divided with there.. Kindly fix that.. otherwise… all you wrote is just perfect.

  4. I’m pretty pleased to uncover this great site.
    I want to to thank you for ones time just for this wonderful read!!
    I definitely liked every part of it and I have you book marked to
    see new things in your website.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s