Statistics Activity

Product Weight - Normal Population


In this lesson, you should...

  • Develop a conceptual understanding of the distribution of a sample statistic
  • Learn R code to help you simulate and explore these ideas

Distribution of Product Weight

A company markets a product that should weigh exactly 65g. There is a slight amount of variation in weight from product to product--the company reports that the standard deviation in weight is 1g and the distribution is normally distributed.

An inspector comes in for a distribution warehouse inspection. He takes a random sample of 40 products and check the weights. If the specifications provided are correct, let's consider what this sample of 40 could look like.

Run the code presented below by hitting the "run code" button on the right side of the code chunk!

sample = rnorm(n = 40, mean = 65, sd = 1)

     main = "Sample of 40",
     col = "pink",
     breaks = 15)


What sample means could we see?

This is the result from a sample of 40, but what sample means could the inspector have gotten? Would the inspector ever see a mean as low as 64g or 66g?

Let's consider what sample means we could get when taking samples of 40 under these conditions.

means = replicate(n = 100000, 
                  expr = mean(rnorm(n = 40, mean = 65, sd = 1))

     main = "Distribution of Sample Means for n = 40",
     col = "red",
     breaks = 30)

Does the sample size matter here? Would a larger sample size increase the likelihood of more extreme sample means? Adjust the code below to explore the distribution of sample means for different sample sizes.

means = replicate(n = 100000, 
                  expr = mean(rnorm(n = 40, mean = 65, sd = 1))

     main = "Distribution of Sample Means for n = ??",
     col = "red",
     breaks = 30)

What do you notice about the distribution of sample means when you change the sample size?

The Distribution of the Standard deviation

The inspector may not just be interested in the mean weight--he might also want to know if product weights truly vary with a standard deviation of about 1.

Use the code chunk below and the code template from above to create a distribution of sample standard deviations.

What do you notice about the distribution of sample SDs when you change the sample size?

The Distribution of the Sample Minimum

Something else this inspector checks is that no product is abnormally low in weight. For this reason, he checks the lowest weight in the sample.

If taking a sample of 40, what is the distribution of possible sample minimums the inspector might see?

Do you think sample size affects this distribution? If so, think about how it might first, then simulate to check your answer!

Diamond Prices - Skewed Population

Diamond Prices

In the previous prompt, our distribution was normally distributed, but what if we were working with a distribution with a different shape?

Let's consider an R dataset from the ggplot2 package called diamonds. This dataset contains information on nearly 60,000 diamonds--one of these variables being price.

Run the code below

     col = "seagreen",
     main = "Distribution of Diamond Prices",
     breaks = 30,
     xlab = "Diamond Prices")

How would you describe the shape of the distribution for price?

Taking one sample

Let's first take one sample from this distribution to get a sense of what might happen.

sample = sample(x = diamonds$price, size = 40, replace = FALSE)

     col = "pink", 
     breaks = 15,
     main = "Sample of 40")

Simulating sample means from skewed population

If we take samples of 40 from this skewed population, what will this distribution of sample means look like?

means = replicate(n = 100000, 
                  expr = mean(sample(diamonds$price, 
                                     size = 40, 
                                     replace = TRUE)

What if we take smaller sample sizes? How does that affect the distribution of sample means?

Simulating sample standard deviations

Now simulate the distribution of the sample SD for this variable.

Is this distribution different than it was when simulating from the normal distribution earlier? How so?