Sample and Population Parameters

Statistics

7 Jan, 2015

A population of interest consists of every data point in a...well... "population". For example, say you want to figure out if a certain product is popular (by a "rating" variable) among 20 to 30 year old males living in the United Kingdom. Your population in this case is all males aged 20 to 30 years living in the United Kingdom. Assume (for simplicity) that every body in the population has rated the product. The population mean will be the mean of all of those ratings. The population standard deviation is the standard deviation of all those ratings. All "population" statistics (like mean, standard deviation, variance, etc.) are for all ratings in the population. Population parameters are commonly depicted with Greek letters (mu, sigma, etc. ...excuse me for not figuring out how to input the symbols here!).

The most accurate way to get the population statistics is to do a census over the whole population. I'm sure you'll agree that's quite ridiculous. Doing such a survey would be expensive, time consuming, and pretty much impossible. What we can do is get a representative sample of 20 to 30 year old males living in the UK. Possibly 50, 100, maybe a few hundred. And we can get their ratings. This smaller subset of the population is called a sample. Mean, standard deviation, variance, etc. for the ratings provided by the sample guys are called sample statistics. Sample statistics are only on the sample. Any ratings provided by guys outside the sample does not affect the sample statistics. Sample statistics are usually depicted by English letters (xbar for mean, s for standard deviation, etc.).
The act of inference is to estimate the population parameters from sample parameters. In our example above, we can infer mu from xbar. In other words, we can use the sample statistics we know (coz we have the data of the sample) to "predict" the population statistics. This is the goal of inference.

Note, inference doesn't necessarily mean extrapolation. In fact, extrapolation is often considered a bad practice. Extrapolation can lead to some pretty poor results, whereas proper inference can be quite accurate given good samples, and proper techniques.

Ashic's Blog

Sample and Population Parameters