Sample Formulas vs Population FormulasWhen we have the whole population, each data point is known so you are 100% sure of the measures we are calculating.
The Mean, Median and ModeYou must be asking yourself why there are unique formulas for the mean, median and mode. Well, actually, the sample mean is the average of the sample data points, while the population mean is the average of the population data points. As you can see in the picture below, there are two different formulas, but technically, they are computed in the same way.
Variance Formula: Sample Variance and Population VarianceVariance measures the dispersion of a set of data points around their mean value.
A Closer Look at the Formula for Population VarianceWhen you are getting acquainted with statistics, it is hard to grasp everything right away. Therefore, let’s stop for a second to examine the formula for the population and try to clarify its meaning. The main part of the formula is its numerator, so that’s what we want to comprehend. The sum of differences between the observations and the mean, squared. So, this means that the closer a number is to the mean, the lower the result we obtain will be. And the further away from the mean it lies, the larger this difference.
Why do we Elevate to the Second DegreeSquaring the differences has two main purposes.
- First, by squaring the numbers, we always get non-negative computations. Without going too deep into the mathematics of it, it is intuitive that dispersion cannot be negative. Dispersion is about distance and distance cannot be negative.
- Second, squaring amplifies the effect of large differences. For example, if the mean is 0 and you have an observation of 100, the squared spread is 10,000!
Putting the Population Formula to UseAlright, enough dry theory. It is time for a practical example. We have a population of five observations – 1, 2, 3, 4 and 5. Let’s find its variance. We start by calculating the mean: (1 + 2 + 3 + 4 + 5) / 5 = 3.
Calculating the Sample VarianceBut what about the sample variance? This would only be suitable if we were told that these five observations were a sample drawn from a population. So, let’s imagine that’s the case. The sample mean is once again 3. The numerator is the same, but the denominator is going to be 4, instead of 5.
Why the Results are not the SameTo conclude the variance topic, we should interpret the result. Why is the sample variance bigger than the population variance? In the first case, we knew the population. That is, we had all the data and we calculated the variance. In the second case, we were told that 1, 2, 3, 4 and 5 was a sample, drawn from a bigger population.
The Population of the SampleImagine that the population of the sample were the following 9 numbers: 1, 1, 1, 2, 3, 4, 5, 5 and 5.
Standard Deviation Formula: Sample Standard Deviation and Population Standard DeviationWhile variance is a common measure of data dispersion, in most cases the figure you will obtain is pretty large. Moreover, it is hard to compare because the unit of measurement is squared. The easy fix is to calculate its square root and obtain a statistic known as standard deviation. In most analyses, standard deviation is much more meaningful than variance.
The FormulasSimilar to the variance there is also population and sample standard deviation. The formulas are: the square root of the population variance and square root of the sample variance respectively. I believe there is no need for an example of the calculation. Anyone with a calculator in their hands will be able to do the job.
The Coefficient of Variation (CV)The last measure which we will introduce is the coefficient of variation. It is equal to the standard deviation, divided by the mean.
Why We Need the Coefficient of VariationSo, standard deviation is the most common measure of variability for a single data set. But why do we need yet another measure such as the coefficient of variation? Well, comparing the standard deviations of two different data sets is meaningless, but comparing coefficients of variation is not.
Examples of Comparing Standard DeviationsTo make sure you remember, here’s an example of a comparison between standard deviations. Let’s take the prices of pizza at 10 different places in New York. As you can see in the picture below, they range from 1 to 11 dollars.
Sample or Population Data
- First, we have to see if this is a sample or a population. Are there only 11 restaurants in New York? Of course not. This is obviously a sample drawn from all the restaurants in the city. Then we have to use the formulas for sample measures of variability.
Finding the Mean
- Second, we have to find the mean. The mean in dollars is equal to 5.5 and the mean in pesos to 103.46.
Calculating the Sample Variance and the Standard Deviation
- The third step of the process is finding the sample variance. Following the formula that we went over earlier, we can obtain 10.72 dollars squared and 3793.69 pesos squared.
- The respective sample standard deviations are 3.27 dollars and 61.59 pesos, as shown in the picture below.