I’ve reached a point where if I only do 5 hours of work in a week on KA I feel bad about it. That is exactly how I feel right now. I completed 3 units this week, Study Design, Probability and Random Variables, and made some headway in the 4th unit Sampling Distributions but, even still, I feel like I could/should have got more done. I only did ~1 hour of work a day which doesn’t seem like enough at this point. One bright spot, however, was that I had little difficulty getting through each of the unit tests and remembered ~95% of the material needed for each test without having to look anything up. 🙂
I began the week going through the unit Study Design which turned out to be the hardest unit of the week. It was the most difficult because most the questions that came up were all word-based which I always find tough to comprehend. The first thing I made a note of from this unit was the difference between a Simple random sample and a Systematic random sample:
- Simple Random Sample
- “Put all the names in a hat and draw at random.”
- Systematic Random Sample
- Begins by choosing a number at random within a given set (ex. choosing a number between 1-20 at random) and selects every number ‘x’ away from it.
- Ex. If n_1 = 3 and x = 10 you would select 3, 13, 23, etc.
- The value of x should be predetermined before selecting n_1.
- Often uses a computer to choose n_1 in order to make sure it’s random.
- Begins by choosing a number at random within a given set (ex. choosing a number between 1-20 at random) and selects every number ‘x’ away from it.
The next thing I made a note of in this unit was the different names that are used to label the X- and Y-axes on a scatterplot during an experiment versus any other situation you’d use a scatterplot. Here’s a photo of my notes that explains:
At the top of the page you see that a standard scatterplot labels the X-axis as the “independent” axis and the Y-axis as the “dependent” axis whereas in an experiment the X-axis is called the “explanatory” axis and the Y-axis the “response” axis. In the bottom half of the photo I also mentioned why observational studies cannot conclude causality which is worth knowing. To reiterate what I wrote, the reason why observational studies cannot determine causality is because these types of studies cannot state with certainty which variable should be plotted on which axis, i.e. which variable causes the other to occur.
The last thing I made note of from the unit Study Design was the difference between Cluster sampling and Stratified sampling. Here’s a page from my notes that explains:
As you can see from the photo, cluster sampling is when you group datapoints into clusters and choose x clusters at random whereas stratified sampling is when you group datapoints into strata and choose x number of datapoints from each stratum. ‘Clusters’ and ‘strata’ are essentially the same thing but are labelled differently according to which sampling method is being used.
When I got into the unit Random Variables, the only exercise I had to go through was one about finding the mean and S.D. of a geometric series which I hadn’t learned before. Once again, here’s a page from my notes that goes through an example of the type of question I was asked:
- Mean
- The intuitive way to think about this is that, considering there are 6 equally likely outcomes with ‘1’ being one of them, it would on average take 6 attempts to roll a ‘1’, which is the correct answer. The formula to use to solve for the mean in this type of question is:
- μ_X = 1/P
- The intuitive way to think about this is that, considering there are 6 equally likely outcomes with ‘1’ being one of them, it would on average take 6 attempts to roll a ‘1’, which is the correct answer. The formula to use to solve for the mean in this type of question is:
- Standard Deviation
- The formula to find the S.D. of a geometric series is something I don’t fully understand. In layman’s terms, you find the square-root of the probability of the non-successful outcome(s) divided by the probability of the successful outcome(s). The formula is:
- σ_X = √(1 – P)/P
- The formula to find the S.D. of a geometric series is something I don’t fully understand. In layman’s terms, you find the square-root of the probability of the non-successful outcome(s) divided by the probability of the successful outcome(s). The formula is:
One other thing I was able to figure out when working through the Random Variable unit test was why the formula for a probability weighted S.D., √(Σ(X_i – μ_X)^2 * (P_i)), doesn’t get divided by n or (n – 1). Here’s another page from my notes that shows an example of this and explains how it works:
From the example question, like any S.D. you must start by finding the mean ($28). To find the S.D., you would subtract the mean from each data point and square that value, and find the sum of all those values. Since there were 900 instances in this question of a datapoint equaling $20, you could simply find ($20 – $28)^2 and then multiply it by 900 since that’s how many times that datapoint occurred. You’d do the same thing with ($100 – $28)^2 * 100. You would divide both terms by 1000 since that’s how many samples there were and, by doing that, you are left with the probability for each term, 0.9 and 0.1 respectively. This is why the formula doesn’t include /n or /(n – 1).
One annoying thing that happened at the end of the week was that, after scoring 100% on the Random Variable unit test, when I looked at the total M.P. complete for the unit it still said 1580/1600 M.P. (99%). I think what happened is the unit test didn’t provide a question on geometric series which is what I needed in order to get the remaining 20 M.P. for that section. I decided to redo the test and move on to the next unit since I don’t have control over what questions are given to me in the unit tests. Hopefully I’ll come across a geometric series question in the course challenge, get that question right, and have the score switch to 100%.
Beginning this Monday, November 23, 2020 Toronto will be going back into lockdown due to COVID numbers going up in city. It’s been awhile since I’ve given a COVID update here, but the daily numbers seem to be going up ~50-75% each week. It means that I’ll be off work for the foreseeable future which, in a glass half-full kind of way, means I’ll have more time to work on KA. Even still, I think it will be tough for me to get through all of stats in the next 8 days. In one way, I wouldn’t mind spending a bit more time on stats since I don’t feel confident with my understanding of quite a few formulas but, in another sense, I feel like stats is starting to ware on me and that switching to another subject would be a welcomed relief. Either way, I’m going to do my best to take advantage of the extra time I have to work on KA and hopefully get a lot done. My goal next week is to be able to say I averaged 1.5 hours per day. Considering I’ll be at home the entire time, I’ll have a hard time coming up with an excuse if I don’t.