In this lesson, you should...

- Develop a conceptual understanding of the distribution of a sample statistic
- Learn R code to help you simulate and explore these ideas

A company markets a product that should weigh exactly 65g. There is a slight amount of variation in weight from product to product--the company reports that the standard deviation in weight is 1g and the distribution is normally distributed.

An inspector comes in for a distribution warehouse inspection. He takes a random sample of 40 products and check the weights. If the specifications provided are correct, let's consider what this sample of 40 could look like.

*Run the code presented below by hitting the "run code" button on the right side of the code chunk!*

```
sample = rnorm(n = 40, mean = 65, sd = 1)
hist(sample,
main = "Sample of 40",
col = "pink",
breaks = 15)
mean(sample)
```

This is the result from a sample of 40, but what sample means could the inspector have gotten? Would the inspector ever see a mean as low as 64g or 66g?

Let's consider what sample means we could get when taking samples of 40 under these conditions.

```
means = replicate(n = 100000,
expr = mean(rnorm(n = 40, mean = 65, sd = 1))
)
hist(means,
main = "Distribution of Sample Means for n = 40",
col = "red",
breaks = 30)
```

Does the sample size matter here? Would a larger sample size increase the likelihood of more extreme sample means? Adjust the code below to explore the distribution of sample means for different sample sizes.

```
means = replicate(n = 100000,
expr = mean(rnorm(n = 40, mean = 65, sd = 1))
)
hist(means,
main = "Distribution of Sample Means for n = ??",
col = "red",
breaks = 30)
```

**What do you notice about the distribution of sample means when you change the sample size?**

The inspector may not just be interested in the mean weight--he might also want to know if product weights truly vary with a standard deviation of about 1.

Use the code chunk below and the code template from above to create a distribution of sample standard deviations.

**What do you notice about the distribution of sample SDs when you change the sample size?**

Something else this inspector checks is that no product is abnormally low in weight. For this reason, he checks the lowest weight in the sample.

If taking a sample of 40, what is the distribution of possible sample minimums the inspector might see?

**Do you think sample size affects this distribution? If so, think about how it might first, then simulate to check your answer!**

In the previous prompt, our distribution was normally distributed, but what if we were working with a distribution with a different shape?

Let's consider an R dataset from the `ggplot2`

package called `diamonds`

. This dataset contains information on nearly 60,000 diamonds--one of these variables being `price`

.

Run the code below

```
library(ggplot2)
hist(diamonds$price,
col = "seagreen",
main = "Distribution of Diamond Prices",
breaks = 30,
xlab = "Diamond Prices")
```

```
mean(diamonds$price)
sd(diamonds$price)
```

**How would you describe the shape of the distribution for price?**

Let's first take one sample from this distribution to get a sense of what might happen.

```
sample = sample(x = diamonds$price, size = 40, replace = FALSE)
hist(sample,
col = "pink",
breaks = 15,
main = "Sample of 40")
```

If we take samples of 40 from this skewed population, what will this distribution of sample means look like?

```
means = replicate(n = 100000,
expr = mean(sample(diamonds$price,
size = 40,
replace = TRUE)
)
)
```

**What if we take smaller sample sizes? How does that affect the distribution of sample means?**

Now simulate the distribution of the sample SD for this variable.

**Is this distribution different than it was when simulating from the normal distribution earlier? How so?**