Week 62 – Nov. 2nd to Nov. 8th

For the most part, I’m fairly pleased with what I was able to accomplish this week. I finished the unit Analysis of Variance (ANOVA) on Tuesday, got through the Statistics and Probability Course Challenge on Friday, and finished the High School Statistics Course Challenge on Saturday. I didn’t get a perfect score on either Course Challenge (I scored 28/30 on both), but I was happy enough with my knowledge of the material covered in both tests to feel fine about moving on. Even though I didn’t get a perfect score on either challenge, I was impressed with what I was able to remember considering that some of the questions were about things I learned 3.5 months ago. I was almost certain that I’d have to do each challenge more than once so it was a huge relief that I felt ok to move on after only doing each challenge once. Wheeeww!

The entire unit Analysis of Variance (ANOVA) consisted of just three videos that talked about three concepts; the Sum of Squares, the Sum of Squares Within, and the Sum of Squares Between. Here’s two pages from my notes that give a bit of detail and example for each of the three concepts:

  • Total Sum of Squares (S.S.T.)
    • The numerator in the variance equation.
      • Σ (x̅_i – μ_i)^2
    • In the example in the bottom left corner of the first photo, you see a 3×3 example of 9 data points split up into three groups (the groups are denoted with m, i.e m = 3, and the data points with n, i.e. n = 9 or n = 3/group). I was introduced to a new term known as the Grand Mean which is the mean of the sample (read: group) means:
      • Grand Mean = x̅ with a second bar above it, a.k.a. “x-double bar”.
  • Sum of Squares Within (S.S.W.)
    • The same thing as the S.S.T. but for each individual group. I.e. in the example, Group 1, 2, and 3 would all have their own S.S.W. calculated using the three data points for each individual group.
  • Sum of Squares Between (S.S.B.)
    • The same concept as both the S.S.T. and S.S.W. but looking at the sum of the difference between each groups’ mean and the grand mean, squared, and multiplied by the number of data points (n) of each group.
      • S.S.B. = Σn_Gi(x̅_Gi – x”double bar”)^2

I don’t fully understand why these concepts ares useful to know, but what’s interesting is if you find the S.S.W. and the S.S.B. and add them together you get the same value as the S.S.T. The same thing works when finding the Degrees of Freedom for all three concepts, as well. 

I felt a bit nervous starting the Statistics and Probability Course Challenge, especially considering I began the course almost 4 months ago. As I mentioned above, I ended up doing fairly well getting 28/30 questions correct. There were three questions that I got correct, however, that I had to look up before answering them so, in a way, I only got 25/30 correct from memory. Here are a list of the 5 questions I either got wrong or had to look up before answering:

  • Question 10 – Percentile
    • This question showed me a distribution of data and asked me to find what column contained the 35th percentile. The X-axis went from 0 to 35 and I calculated what 35% of 35 was, i.e 12.25. I forgot that percentile is about the data, not the X-axis. The way the data was distributed, the 35th percentile fell a bit higher than the column that held 12.25 on the X-axis so I got the question wrong.
  • Question 18 – Mean Absolute Deviation
    • This was a question I got right but had to look up. I learned about the Mean Absolute Deviation (M.A.D.) in Week 48 and had forgotten what it was. The M.A.D. is that mean of the difference between each data point and the mean of the data.
      • M.A.D. = Σ (x_i – x̅)/n
  • Question 23 – S.D. with Probability
    • This question asked me to find the S.D. of a set of data with weighted probability for each data point. I forgot that the formula for this includes the probability associated with each data point and ended up getting the question wrong. 
      • S.D. with Prob = √(Σ((x_i – μ)^2*(P_i))/n)
  • Question 27 – Bernoulli Mean and S.D.
    • Here I was asked to find the mean and S.D. of a Bernoulli distribution. I got the question right and correctly figured out the mean without issue but had to double check the formula for the S.D. 
      • Bernoulli Mean and S.D. formulas:
        • x̅ = n * p
        • σ = √(n * p * (1 – P))
  • Question 30 – Slope of Regression Line
    • Lastly, on question 30 I was shown a scatter plot of data that had a regression line drawn through it. I was given the correlation coefficient, r, and the S.D.’s for the x and y values, S_x and S_y respectively. I was then asked to find the slope, m, and Y-intercept, b, in order to come up with the equation for the line (a.k.a. y = mx + b). I forgot how to calculate the slope using the correlation coefficient and S.D. of x and y and had to look it up. 
      • m = r(S_y/S_x)

As I said, although I struggled with those five question, in particular, I was quite pleased with how I managed throughout the rest of the challenge and felt good enough to move on. (I still have one final statistics course to go so I’m sure I’ll get to practice everything I struggled with anyways).

The following Course Challenge from High School Statistics wasn’t quite as difficult but I did make two mistakes on it, as well:

  • Question 9 – Study Design
    • This question was a word problem that gave me a description of an experiment and asked me to determine if the experiment was an observational study or an experiment as well as choose one of four answers that described different possible conclusions from the results of the study/experiment. I got the first part of the question right (it was an observational study) but didn’t choose the right description (I choose an answer that was too general while there was an answer that was more specific and more suitable).
  • Question 22 – Probability of Two Dies
    • On this question I was asked, “if you roll two fair 6-sided die, what is the probability that at least one of the dice will land showing a ‘3’?”. I assumed I was supposed to add the probability of each dice landing on ‘3’ (i.e. 1/6 + 1/6 = 1/3) but forgot that there’s a 1/36 chance of both dice landing on ‘3’. The calculation I should have done was 6/36 + 6/36 – 1/36 = 11/36. After getting the question wrong, I clicked the ‘hint’ button and saw a 6×6 table that showed the 36 possible outcomes that made it clear why the answer was 11/36.

After finishing the first Course Challenge, I went back and redid both the units to get back to 100% for each individual unit and the entire course as a whole. I breezed through the first unit but only scored 20/21 on the second unit, Random Variables. I had to redo the unit test from Random Variables 5 times before getting 100% when I went through it initially, so I decided to move on knowing that I’ll have to redo it in the final statistics course anyways so I might as well wait to do it then. I also realized that, because I also got two questions wrong in the second Course Challenge, both courses have units that are <100%. Since the next course AP College Statistics contains all the units that are <100%, it seems to make the most sense to wait to redo any of them until coming across them in the following course.

I’m elated to now be starting the final statistics course, AP College Statistics (10, 620/14,100 M.P). The course has 16 units in total, 9 of which have <100% M.P. The 9 units that aren’t finished are for the most part ~60-80% complete so I’m feeling good about my chances of getting through them all by the end of December. I’m somewhat disappointed that it’s unlikely I’ll get through stats by the end of this month but, in the big picture, it’s not the end of the world. Soooo close to calculus!!!