For once I was actually happy with the progress I made this week. I finally managed to get through the Random Variable unit test (THANK GOD) after 4 previous failed attempts. I then got through about 2/3’s of the following unit Sampling Distributions, although I am still struggling to understand some of what I’ve learned so far. Nonetheless, I’m incredibly relieved to finally be through Random Variables, especially considering I usually only made one or two mistakes on each of my failed attempts at the unit test. SO FRUSTRATING! Getting through all three statistics courses by the end of October could be tough but the good thing is that I’m 100% complete the next stats course, High School Statistics, and 50% through the third course, AP College Statistics. I think it’s still possible for me to get through all three courses by the start of November, however, and am planning to push hard to make that happen.
Going through the Random Variable unit test this week was easily the most stressed I’ve been throughout my entire time working on KA. As I mentioned above, it took me 5 attempts to get through it with a perfect score, which is easily the most attempts I’ve made at any KA test. On the fifth attempt, there were a number of questions that I spent 5-10 minutes double checking my work before submitting my answer. One good thing that came out of doing the test 5 times was that I now feel fairly confident when it comes to probability weighted questions. I also got a lot of practice with standard deviation and probability weighted standard deviation which I’m happy about in a begrudging kind of way.
A question came up in the unit test that helped me to finally fully wrap my head around the 10% Rule. The question was:
- Q. Max chooses 7 cards out of a deck without replacement at random. He’s wants to get as many face cards as possible. Is this considered a binomial variable?
- This question satisfies ¾ requirements to be considered binomial:
- 1) Same probability – Each card has an equally likely chance of being selected (apart from the fact that they’re not being replaced which is addressed below).
- 2) Specific number – There are exactly 7 cards being selected.
- 3) ‘Success’ or ‘Failure’ – In this question, success is defined as selecting a face card and failure as selecting all other cards.
- The requirement that is not met for this variable to be considered binomial is whether each trial is independent:
- 4) Independent – If the number of cards chosen (i.e. the sample) was less-than 10% of the population, we could infer that the sample selected was independent.
- 7/52 = ~13.46% > 10%
- Therefore, since 7/52 > 10%, we infer that the sample is not made up of independent trials.
- At most max could select 5/52 (i.e. ~9.6%) cards and still consider it a binomial variable. Selecting any more cards than 5 would make the sample greater that 10% of the population and break the 10% Rule.
- 4) Independent – If the number of cards chosen (i.e. the sample) was less-than 10% of the population, we could infer that the sample selected was independent.
- This question satisfies ¾ requirements to be considered binomial:
I began the next unit, Sampling Distributions, on Thursday. As I said earlier, I managed to get ~2/3’s of the way through it. I understand most of everything that I’ve learned so far but am still struggling with a handful of concepts. As far as I understand it, if you’re taking a sample from a population to test for a certain parameter, you end up creating a sample distribution where the domain is between 0 and 1. If the mean of the distribution sits at 0.6, for example, it means that the population parameter being tested occurred on average for 60% of the population. Here’s a page from my notes that defines a few key points about sample distributions:
- Key variables and their denotations:
- n = sample size
- Ex. If a coin was flipped 10 times and each individual flip was counted as either a success or failure, the sample size would be 10.
- Ex. If a coin was flipped in groups of 5, with 8 groups being tested/looked at, the sample size would be 8. (I think.)
- P = the percent of the population that has the parameter being tested for. (P stands for parameter, I believe.)
- Ex. Approximately 45% of Canadians voted conservative. P = 0.45.
- p̂ = “P-hat” = sample mean
- This measures the mean of a parameter from a sample. It is equal to P.
- n = sample size
- Normal vs Right Skewed vs Left Skewed
- I don’t fully understand this part yet but:
- 1) If (n * P) and (n * (1 – P)) are ≥ 10, the distribution resembles a normal distribution.
- 2) If (n * P) ≤ 10, the distribution is skewed to the right (i.e. tapers to the right).
- 3) If (n * (1 – P)) ≤ 10, the distribution is skewed (i.e. tapers) to the left.
- I don’t fully understand this part yet but:
An example of a common type of question I was given would be:
- Q. Approximately 87% of Americans >25 years old have a high school diploma. We take a sample of 200 people in this age category. What is the probability that <85% of people in this sample have a diploma?
- Step 1 – Is this a normal distribution?
- n * P = 200 * 0.87
- = 174
- n * (1 – P) = 200 * 0.13
- = 26
- Since both are greater than 10, we can assume the distribution resembles a normal distribution.
- n * P = 200 * 0.87
- Step 2 – Does this question satisfy the 10% Rule?
- Because this is a Bernoulli type of question, it’s necessary that the sample being tested is less-than 10% of the population. 200 people is certainly less-than 10% of the U.S. population so this requirement is met.
- Step 3 – Find the mean and standard deviation of the sample distribution:
- μ p̂ = P = 0.87
- σ p̂ = √P(1 – P)/n
- √0.87 * 0.13/200
- ~0.024
- (There were videos on how to derive the formula for the standard deviation of a sample but I don’t fully understand it yet.)
- Step 4 – Using the mean and S.D., find the Z-Score of 85% to determine what P(p̂ < 0.85) equals:
- Standard deviations of 0.85 = 0.85 – 0.87/0.024
- = ~-0.83
- Z-Score of -0.83 =
- = 0.20237
- P(p̂ < 0.85) =
- 0.20237
- Standard deviations of 0.85 = 0.85 – 0.87/0.024
- Step 1 – Is this a normal distribution?
When being shown a video on how to derive the Standard Deviation formula for a sample distribution from a population, I learned (or perhaps relearned) a random BEDMAS rule. Sal was going through part of an equation where [√n * P* (1 – P)]/n and he wanted to get the n in the denominator inside the radical. To do so, he squared the denominator n to make the equation √n * P* (1 – P)/n^2 (you’ll notice the squared brackets are now gone meaning the n is inside the radical). Long story short, the thing I learned/relearned was that you can put any number inside a radical as long as you to take it to the radicals corresponding power.
I’ve mentioned this before, but it seems like on average each unit is taking longer as I get further into KA. This makes sense as the concepts start to become more complex and seemingly abstract. It’s frustrating and fairly disappointing that this seems to be the case but the rational side of me understands that it’s to be expected. Although it’s a bit of a bummer, I’m trying to make peace with the fact that it may take longer than I’d like to get through KA than I’d like. It helps to remind myself how far I’ve come in the relatively short amount of time I’ve been doing this and that, given another year, I should understand most/all of calculus which is an exciting thought.
This week I looked up what the AP Statistics exam and AP Calculus exam (both AB and BC) are and realized they’re exams given in high school which cover approximately 1.5 to 2 years of college level classes. It seems like it would be a good goal for me to take these exams as a cap to KA when I finally get through it all. (which likely may not happen until 2022 but that’s ok). It would be incredibly satisfying to do those exams and get a good score on each of them as proof of my accomplishment. It feels like that may not happen for a very long time but the key I think is taking one small, incremental step at a time. This weeks’ step will be to finish Sampling Distributions (320/700 M.P.) and hopefully starting the following unit Confidence Intervals (0/800 M.P.).