Sunday, October 11, 2015

The Case of the Suspicious Quiz

Mr. White, a teacher at Colorful Academy, gave his class a pop quiz comprised of 20 true or false questions.  In an attempt to thwart cheating, he gave two versions of the quiz in which the questions were the same but in a different order.  As he graded one particularly bad quiz, he noticed that 10 answers in a row matched the answer key of the wrong version.  He ran to the teacher lounge and asked his fellow colleagues what the probability of randomly matching 10 questions in a row on a 20 question true or false quiz was, and if they thought the student had cheated.


Like a typical faculty, no teacher could agree.  “I wouldn’t get too worked up,” blurted out Mr. Red.  “Since every true or false question has a 50% success rate, it is highly probable for a guessing student to get half of the questions to match the wrong version.” 

Mrs. Yellow disagreed.  “Since the questions are all in a row, the 50% chance should be compounded, so there is a (½)10 or 0.1% chance of this happening, which is highly unlikely.” 

“I think Mr. Red’s guess is too high,” Mr. Orange said, “since he did not consider that the 10 questions that matched were all in a row, and I think Mrs. Yellow’s guess is too low since she did not consider the other 10 questions that are not in a row.  So I’ll split the difference and say that there is a 50% chance that the student was cheating, which is of some concern but not outside the realm of likelihood.” 

Mrs. Green shook her head.  “Splitting the difference is not very mathematical,” she said.  “I agree with Mrs. Yellow that 10 questions in a row would be a (½)10 or 0.1% chance, but since this pattern could start at any number between #1 and #11, the probability should be multiplied by 11, which means that there is approximately an 11(½)10 or 1% chance that the student was not cheating.  In other words, there is a 99% chance that the student was cheating.”

“Isn’t there some formula with factorials that should be used for this?”  Mrs. Blue questioned as she pulled out her phone and started scrolling.  “Here it is: ‘For a binomial experiment consisting of n trials, the probability of exactly k successes is nCkpk(1 – p)n – k.’  So in this case, we want 10 successes out of 20 trials with a probability of 50%, which is,” she paused as she used her calculator app, “20C10 (0.5)10(1 – 0.5)20 – 10 which equals 1.76 x 10-19% or nearly 0%.  Either way you look at it, the student was cheating.”

“Let’s not jump to conclusions!” cried Mr. Purple.  “While you were all talking I wrote down possible combinations on this napkin, using an ‘O’ to note a match and an ‘X’ to note a miss.  Now, it became quickly apparent that all the combinations for a 20 question quiz would not fit on the napkin, so I wrote down all the combinations for a 4 question quiz instead, and then underlined all the instances of getting half or 2 matches in a row.  Here’s what I found.”  Mr. Purple held up a napkin with the following scribbled on it:  OOOO, OOOX, OOXO, OOXX, OXOO, OXOX, OXXO, OXXX, XOOO, XOOX, XOXO, XOXX, XXOO, XXOX, XXXO, XXXX.  “As you can see, there are 8 instances out of 16 which meet the criteria of 2 matches in a row, which is a 50% chance.  So Mr. Orange was right after all.

And so the dialogue continued, with every teacher choosing sides and every teacher expressing his or her own opinion.  Finally, the computer teacher, Mr. Black, volunteered to write a computer program using Mr. Purple’s method of counting combinations, but for all 20 questions instead of just 4 questions.  It was finally agreed that the outcome of such a program would yield the correct answer.

Being a computer teacher, Mr. Black changed Mr. Purple’s O’s to 0’s and X’s to 1’s so that the combinations could be expressed as binary numbers instead.  This gave the added bonus that he could just have the computer program count from 0 to 2^20 in binary and be assured of getting all the possible combinations.  He could then use a built-in substring function and count all the instances of 10 0’s in a row.  Here’s what his computer program looked like:

01
#The Suspicious Quiz Solution
02
#Python 2.7.3
03

04
#Number of questions on the quiz.
05
intQuestions = 20
06

07
#Number of matching questions in a row.
08
intInARow = 10
09

10
#Number of the instances which meets the criteria of having the
11
# correct number (or more) of matching questions in a row,
12
# starting at 0 and to be added to later.
13
intCount = 0
14

15
#Create a string of 0's the same length as the number of
16
# matching questions that are in a row.
17
strInARow = ""
18
for i in range(0, intInARow):
19
    strInARow = strInARow + "0"
20

21
#Create an array of all possible quiz combinations.
22
strEvents = []
23
for i in range(0, 2 ** intQuestions):
24
    #Change the number to a binary string with leading zeroes.
25
    strEvents.append(bin(i)[2:].zfill(intQuestions))
26

27
    #If the string of 0's can be found in the current array
28
    # item, add it to the total count.
29
    if (strInARow in strEvents[i]):
30
        intCount += 1
31

32
#Display the total count and percentage.
33
print "Getting at least " + str(intInARow) + " matches in " + (
34
    "a row on a ") + str(intQuestions) + " question quiz: "
35
print str(intCount) + "/" + str(2 ** intQuestions) + " (" + (
36
    str(100.0 * intCount / 2 ** intQuestions)) + "%)"

and here is the outcome after running the computer program:

>> 
Getting at least 10 matches in a row on a 20 question quiz:
>> 
6144/1048576 (0.5859375%)

With his computer program, Mr. Black discovered that there were 6,144 instances out of a total of 1,048,576 true or false combinations in which there are at least 10 matching questions in a row on a 20 question quiz, or about a 0.6% probability.  In other words, there was about a 99.4% chance that the student had cheated.  Mr. White decided to investigate the matter further.

The above probability of matching at least 10 questions in a row to the wrong answer key for 20 true or false questions is the same for a few other situations as well, such as the probability of correctly answering at least 10 questions in a row on 20 true or false questions by randomly guessing, or getting at least 10 heads in a row on 20 coin tosses. 


Unfortunately, the formula is limited and complex.  If r is the number of successes in a row, and k is the total number of trials, and c is the number of outcomes each trial has, then the probability p can be represented as:


Therefore, in our above scenario, r = 10, k = 20, and c  = 2, so p = (1 + ∑n=110(2∙2n–1∙1 + (n – 1)2n – 2∙12)) / 220, or 6144/1048576 ≈ 0.6%.  For Mr. Purple’s napkin calculation, r = 2, k = 4, and c = 2, so p = (1 + ∑n=12(2∙2n–1∙1 + (n – 1)2n – 2∙12)) / 24 = (1 + 2 + 5)/16 = 8/16 = 50%.

To derive the above formula, it is helpful to first look at the number of successful occurrences of exactly all trials, then the number of successful occurrences of exactly all but one trial, then exactly all but two, then exactly all but three, and so on, and generalize a pattern.  The number of successful occurrences of exactly all trials, or exactly 20 in a row out of 20, is, of course, 1:

OOOOOOOOOOOOOOOOOOOO

Exactly all but 1 trial, or exactly 19 in a row out of 20, looks like this:

OOOOOOOOOOOOOOOOOOOX
XOOOOOOOOOOOOOOOOOOO

where “X” represents all choices but “O” (otherwise it would be more than 19 in a row).  The two X’s can be represented as 2(c – 1).  When there are 2 choices, like on a true or false question, c = 2, and exactly 19 in a row out of 20 would be 2(2 – 1) = 2 successful occurrences.

Exactly all but 2 trials, or exactly 18 in a row out of 20, looks like this:

OOOOOOOOOOOOOOOOOOXX
OOOOOOOOOOOOOOOOOOXO
XOOOOOOOOOOOOOOOOOOX
OXOOOOOOOOOOOOOOOOOO
XXOOOOOOOOOOOOOOOOOO

To save space, we will introduce the “#” symbol which can represent any choice, so exactly 18 in a row out of 20 can be re-written as:

OOOOOOOOOOOOOOOOOOX#
XOOOOOOOOOOOOOOOOOOX
#XOOOOOOOOOOOOOOOOOO

The first and last lines have 1 X (c – 1) and 1 # (c) each, for a total of 2c(c – 1) choices, and the middle line has 2 X’s or (c – 1)2 choices, which is 2c(c – 1) + (c – 1)2.  When c = 2, there are 2(2)(2 – 1) + (2 – 1)2 = 5 successful occurrences.

Exactly all but 3 trials, or exactly 17 in a row out of 20, looks like this:

OOOOOOOOOOOOOOOOOX##
XOOOOOOOOOOOOOOOOOX#
#XOOOOOOOOOOOOOOOOOX
##XOOOOOOOOOOOOOOOOO

The first and last lines have 1 X (c – 1) and 2 #’s (c2) each, for a total of 2c2(c – 1) choices, and the middle 2 lines have 2 X’s ((c – 1)2) and 1 # (c) each, for a total of 2c(c – 1)2 + choices, which is 2c2(c – 1) + 2c(c – 1)2.  When c = 2, there are 2(22)(2 – 1) + 2(2)(2 – 1)2 = 12 successful occurrences.

Exactly all but 4 trials, or exactly 16 in a row out of 20, looks like this:

OOOOOOOOOOOOOOOOX###
XOOOOOOOOOOOOOOOOX##
#XOOOOOOOOOOOOOOOOX#
##XOOOOOOOOOOOOOOOOX
###XOOOOOOOOOOOOOOOO

The first and last lines have 1 X (c – 1) and 3 #’s (c3) each, for a total of 2c3(c – 1) choices, and the middle 3 lines have 2 X’s ((c – 1)2) and 2 #’s (c2) each, for a total of 3c2(c – 1)2 + choices, which is 2c3(c – 1) + 3c2(c – 1)2.  When c = 2, there are 2(23)(2 – 1) + 3(22)(2 – 1)2 = 28 successful occurrences.

Exactly all but 5 trials, or exactly 15 in a row out of 20, looks like this:

OOOOOOOOOOOOOOOX####
XOOOOOOOOOOOOOOOX###
#XOOOOOOOOOOOOOOOX##
##XOOOOOOOOOOOOOOOX#
###XOOOOOOOOOOOOOOOX

The first and last lines have 1 X (c – 1) and 4 #’s (c4) each, for a total of 2c4(c – 1) choices, and the middle 4 lines have 2 X’s ((c – 1)2) and 3 #’s (c3) each, for a total of 4c3(c – 1)2 + choices, which is 2c4(c – 1) + 4c3(c – 1)2.  When c = 2, there are 2(24)(2 – 1) + 4(23)(2 – 1)2 = 64 successful occurrences.

To summarize:
All But…
Formula
0
1
1
2(c – 1)
2
2c(c – 1) + (c – 1)2
3
2c2(c – 1) + 2c(c – 1)2
4
2c3(c – 1) + 3c2(c – 1)2
5
2c4(c – 1) + 4c3(c – 1)2

Other than the first one, the formula follows a pattern where exactly all but n results in 2cn – 1(c – 1) + (n – 1)cn – 2(c – 1)2 successful occurrences.  Considering “at least” r occurrences in a row means adding all successful occurrences for exactly r, r + 1, r + 2, … k occurrences in a row, which is why there is a sigma in the formula.

Unfortunately, the formula breaks down if r < k/2, because the technique we used counted two successful sequences in the same row twice.  For example, for exactly 9 in a row out of 20, the following combination

OOOOOOOOOXXOOOOOOOOO

is counted twice.  Once for

OOOOOOOOOX##########

and once for

##########XOOOOOOOOO

A more complex formula is therefore needed for r < k/2 (and an even more complex formula for r < k/3, r < k/4, and so on).

Just because Mr. Black mathematically showed that there was a 99.4% probability that Mr. White’s student was cheating, it did not actually prove that the student was cheating.  According to National Geographic, there is a 1 in 3000 chance of being struck by lightning in your lifetime, or a 99.97% chance that you won’t get struck by lightning in your lifetime.  Despite these odds, there are still people who do get struck by lightning, and we don’t discredit these reports.  The same reasoning should be used for the suspicious quiz.  Just like lightning strikes, it is very unlikely that a student would randomly match at least 10 true or false questions in a row to the wrong answer key, but we must allow for the possibility that it could happen.