Redesigning the Timed Exam directive
Recently the timed exam directive has had a lot more use in high stakes exams. A number of problems have been discovered or reported that are long standing design issues or just bugs that have not been reported until now. Including but not limited to:
Current Problems
- If a student does not Run an activecode and the Activecode is not visible when the "Finish Exam" button is pressed then the code is not saved and not graded! update This is patched up as of 10/18
- We are much more limited now than when we originally implemented the timed directive in what we can do with the
beforeunloadevent. Basically now our hands are tied to just whatever message the browser chooses to display when someone tries to close their tab. - More advanced students are now using Runestone in more advanced classes. Relying on student ignorance as a security measure is no longer viable.
- See #1083 -- Note server side testing is not a full solution to this as it lets students see all of the questions before the timer starts! This is a much more important problem in the remote learning world we now inhabit.
- Relatedly if you do not use a
selectquestiondirective then the answers can be checked for many question types. - There is a race condition between the code that "submits" all of the questions in an exam and when the autograder is run for a student. -- One solution to this is to submit each current question as the student navigates from question to question.
- There is a flaw in the
selectquestiondirective that can allow for questions to be duplicated within an exam because all of the requests for questions are done asynchronously. -- This could be solved by not pre-rendering all of the questions. That is we only get/render the question when the student clicks on the next/prev button or selects a question by number. It would create a small delay in navigation while the question is loaded. - Support for feedback/no feedback on various question types varies widely.
Proposal to Redesign
These and many others lead me to believe that we need a serious redesign of this directive. As part of that process:
- We should gather other new requirements.
- We should do a good and thorough review of the design before we start hacking.
New Requirements
-
Parts of an exam should be able to have a block of text along the lines of "The next X questions all refer to this diagram" said diagram should be visible.
-
An exam can be created either entirely in restructuredText OR the exam can be assembled through the assignment builder.
-
The exam should be secure enough to use for high stakes exams -- meaning that it should be very difficult or impossible for a student to find the answers to questions in the html. Similarly if the exam is timed then a student should not be able to preview all of the questions before pressing the start button.
...
I think one key question here is whether the current mechanism we use for displaying the questions is the best way forward.
I think another very reasonable option would be to show all of the questions and allow the student to scroll through them. I think this has some advantages for longer exams as it lets the student scan the exam and plan their time a bit better. I don't particularly like our current approach as the screen jumps around too much as you go from question to question. That could probably be fixed with some CSS. And maybe there are other pedagogical advantages to our current method.
We could still use some visual styling cues to make it easy for students to see questions they have not yet answered.
In this case the start button would not just reveal the first question but would likely go to a new page where the questions are displayed.
I use the second option (show all questions) for the high-stakes assessments in a class I teach on Microprocessors. I think it helps students to read through all the questions, then start by answering the questions they're most confident in. To see this, use asee as the username and Brad's last name as the password.
To do this, I run a parallel web app called exam_runestone that requires an authentication code in addition to a userid/password, then provide this auth code only to TAs or online testing software (Honorlock -- yuck -- what we use at MSU). This helps to restrict access to students who should be actually taking the exam.
I understand the motivation behind requirement 2 (exam can be assembled through the assignment builder), but I wonder if this is realistic. A good high-stakes exam isn't just a jumble of questions pulled from a bank, but a document with (as mentioned earlier) figures referenced by multiple questions, subsections grouping relation content, etc. I wonder if instead we could offer a bank of pre-build exams, and/or use question randomization to increase the range of questions a single test can provide.
Another option would be to simplify authoring tests -- something I'd certainly like to work on with the rest of the team.
Currently this is how instructors are creating exams on Runestone using the assignment builder to choose (or write their own) questions for the exam.
Then they check the box that turns the assignment into a timed assessment.
It works, but it has the tradeoffs of presenting one question at a time. Which (in email) I have heard that others do prefer. More discussion on this is needed to arrive at a consensus I think.
I'd be happy to hear options/ideas around how we could simplify the authoring of exams.
My biggest concern is feature creep: a feature intended for low-stakes assessment (timed exams) is now being adopted for high-stakes assessment. In fact, I'd argue that most of Runestone isn't really set up for high-stakes assessment. While I'm sure people want to do high-stakes assessment, this looks to be a significant redesign to me. Is this where we should be putting our (limited) development resources?
Well, like it or not, this is what people are using the current timed assessment for.
- Many instructors are using it for midterms and finals
- This is the mechanism that the competency exams for Mich were built upon.
A high stakes assessment tool can be used for low stakes quizzes but not vice versa.
I feel like I have committed to making this work to some degree and owe it to Michigan to have a system that works for them. Also even for low stakes it is clear that Runestone is getting used in more upper level courses where many of our original assumptions do not hold. So I think that we need to adapt.
There are always competing priorities. If you think I'm ignoring something that should be more important we should talk about it.
Yikes! Scary -- a disaster waiting to happen. I didn't know it was adopted so widely. I think this goes way beyond your SOW you signed up for. One big cheating scandal would really damage adoption.
But, as you say, here we are.
I'm not sure its as dire as you make it out to be. The timed exam thing still has a big bold Beta label on it. The vast majority of our classes are quite small. The larger classes that are trying this know the limitations but are still happy to test the limits.
Also, that is why I am trying to be very very clear about what this redesign does and doesn't do. We all read in a lot or make a lot of assumptions about what or how things are supposed to work and what their limits are. So I want to make sure we are clear about it here.
The longer the pandemic lasts and students are in an online or hybrid mode the more important this becomes.
That's good to hear. Fewer disasters are better :). Possible discussion points:
- What does "secure" mean? Can it be achieved client-side, or only server side? For server-side, should we require an HTTPS connection?
- How should test access be restricted? Can any student access the test, or only from some type of secure browser/environment?
- What methods of delivery should we support? The timed exam, for example, provides only one question at a time. Should we allow backtracking? Have the ability to present all questions at once? Deliver questions in randomized order? Deliver questions when the instructor allow it (an in-class quiz, where the instructor says "now let's move to the next problem").
- What specific features must the test builder portion of the instructor interface support? (Shared diagrams? Shared text? More?)
- What type of feedback should students receive when taking an exam? Should this be a blanket switch (no feedback) or something more fine-grained? Could this be enabled/disabled to allow students to take/study practice exams?
The largest question: how can we take what exists and make it more secure, while adding a minimal set of features?
Question of the day
Your question for the day: Can you guess what this line is trying to detect?
if (
$("div#timed_Test form input[name='group1']").is(":checked")
) {
…
}
This is how the timed assessment figures out that a multiple choice question has been answered! 🤣
Wow -- looks like a great entry for obfuscated programming...
Issues and features for timed assignments that Shuyang and I are currently working on under Barbara:
- [x] On loading a timed assignment webpage, an HTML element containing the questions and answers becomes visible for a brief moment before being hidden. This has been identified as a
<ul data-component="timedAssessment">element inRunestoneServer doAssignment.html, and the current plan is to addstyle="display: none;"to this tag so that it is never visible. It does not seem to me that this element ever needs to be visible to the end-user, so I do not think this change should interfere with other functionality. - [ ] When 'No Feedback' is checked for a timed assignment, the assignment questions initially become hidden upon clicking 'Finish Exam', but in RunestoneComponents timed.js:
handlePrevAssessment() {if (this.showResults) {$(this.timedDiv).show();causes the timed assignment questions to become visible again upon refreshing the page or otherwise revisiting the assignment. Barbara was thinking that if the assignment was set up to provide 'No Feedback', then perhaps it should also not provide the questions either, so then we would not wanthandlePrevAssessment()to reshow the questions, but do others have thoughts on this? Related, in timed.js, there are two variables:showFeedbackandshowResults.showFeedbackappears to be associated with the 'No Feedback' checkbox when setting up assignments, but it is unclear if anything is currently associated withshowResults. - [ ] Timed assignments currently can be paused by default. We are thinking to add the option for instructors to be able to toggle whether students have the option of pausing while taking a timed assignment.
- [ ] Add line numbers to active code questions in timed assignments.
- [ ] Timed assignment question numbers currently change display style to indicate either no response or responded, and they only apply to multiple choice questions. We are thinking to add more question number display styles including providing students ability to mark questions, for example 'highlight this question to remind me to return to it later', 'mark this question as having been completed', and extend the functionality to all question types.
These are smaller-scale items than those being discussed above, but nevertheless thoughts are welcome.
I have already committed your suggestion for the first item. In addition I updated most of the components so the the "origElement" that gets replaced when Javascript renders the component is also initially displayed as hidden. This was a great idea, thanks.
The timed directive currently supports the :nopause: option, so this is just a matter of adding the UI.
I'm working on adding support to all of the components so that you can know whether they have been interacted with. The current way of determining that a multiple choice question has been answered works (as illustrated above), but is not a great solution. I'm going to add code to each component so that they add a class maybe attempted when a student answers. This will allow us to be consistent across all the components.
I like the extension of allowing students to highlight a question for later, and if we do this with a class then it could work regardless of where we end up on how we show students an exam.
From yesterday's group meeting, my understanding was that by using a one-way hash, correct answers that are sent to the client would be hard for an attacker to discover. However, this doesn't work well for multiple-choice questions, since there are just few answers making it easy for an attacker to guess a correct answer. Spot-checking some of the answers on the server helps discourage attackers from creating answers labeled as correct that aren't actually.
This has several problems when I think about implementing it:
- For fill-in-the-blank questions, the client-side code uses a list of regexes to determine if a student's answer is correct. In other words, we can't one-way hash these "correct answers," since determining if an answer is correct isn't just comparing two hashes, but evaluating a regex on the student answer. I don't see any obvious ways of making this secure.
- The only way to spot-check answers is to rewrite the answer-checking code in Python. But if we have Python code to check answers, we could more securely do server-side checking.
- The idea that multiple-choice questions aren't secure bothers me. I don't see any obvious ways to make it secure.
Thanks @bjones1 here is a summary of our discussion yesterday and some of my thoughts. Sorry for the length.
Secure Exams
Who are we trying to be secure for? Over 50% of the current registered students are high school students taking their first CS course. A large fraction of the remaining are college students taking their first CS course.
There is the possibility of the one super smart student who writes a browser extension and sells it to his classmates. But I am not sure that is a very big market.
Exams are graded by the autograder after a student submits. The number of correct / incorrect and skipped questions reported at the end of the exam is advisory only, and there is some thought of removing that.
What are likely ways students will cheat in a remote exam ?
- View HTML source for the exam page.
- Use another tab, another browser, or another device to search for the answer
- Use social media to collaborate with friends
- Use the Network Tab of the dev tools to snoop on messages and then try to spoof messages with curl or other
- Use the developer tools to examine the Javascript source
- Some other more clever method that I haven't thought of.
- The most likely more sophisticated cheat for Runestone would be by spoofing the hsblog API call in which the actual answer as well as the correctness of the answer is reported. But note that successfully logging for a particular user requires a valid session cookie.
What does it mean for an exam to be secure?
- Difficult or impossible for student to find the answers.
- Difficult or impossible for student to falsely convince the server that your scores are correct
- Difficult or impossible to see the questions before starting the exam
- The system should implement some kind of deterrence mechanism that detects the obvious ways of cheating
What are some strategies to foil cheating?
In an unsupervised, remote world where students may have multiple devices at their disposal, there is only so much we can do.
- Use one way hashes of the answer in the HTML so the student does not get the answer from the HTML - this has problems with most question types.
- High stakes exams should have a time limit and be available during a specified time window. This prevents one student from figuring everything out and then sharing.
- For high stakes exams the exam should be written using the
select questiondirective. This keeps the answer out of the HTML and prevents casual cheating by looking at the source. - When using
selectquestionan author of a secure exam can provide multiple equivalent questions and the system will randomly assign one of them to a student. This makes casual sharing of answers through social media less useful. - Always make sure that we are using a minified runestone.js file. This makes code inspection through the debugger a lot harder.
- Use server side grading -- that is the page returns the answer without any indication of correctness browser side. This would eliminate any answers in the code on the client side at all. From a scalability perspective this is undesirable. We could make server side grading an option for everything but that makes all of the components more complicated, and server side grading has only been implemented for mchoice (and fitb??) This fall we are logging about 250,000 mchoice answers daily. But much of that is in the peak hours of 9AM - 4PM CT. I prefer to scale by taking advantage of the compute power at the edge rather than spending more money on servers. Finally, once the answers are known and can be distributed then server side doesn't buy you anything.
- We can do some deterrence of
hsblogentries used by autograder:- We can randomly validate answers that are marked as correct when doing auto grading
- We could force users to re-login prior to taking a high stakes exam and record the session cookie id. We could also record the session cookie id for each submitted answer and make sure they are the same.
- We can compare timestamps on answers submitted for an exam and make sure that they don't all come at nearly the same time.
Good points and a good summary! A couple of thoughts:
- Another strategy to foil cheating: randomization within a question (mostly ready on fitb, someone else is working on mchoice and it should be basically the same code). This requires more effort from authors, since it means rewriting questions, while the other strategies are simpler to implement.
- We have server-side grading for fitb only at the present. In particular, there's no server-side grading for mchoice.
- I'm not convinced that server-side grading is that much more loading. It means an extra database access or so per question and returning a different response, but we're already paying the high expense of logging (meaning there's time spent creating a web2py request, response, dispatching to a controller, accessing the db to log data, etc.).
- I would personally strongly prefer to have all the code that decides if a student's answer is correct or not on the client (DRYer, IMHO) -- but only if this is "reasonably" secure. Since the client needs to know the correct answer (no one-way hash, sigh) to do this, I can't see how to make it work.
I agree, more randomization is better. We should revisit that PR and think about client side options for doing the randomization and checks for correctness.
Maybe we should spend a little more time together looking at the selectquestion implementation. I would argue that that is actually quite secure. That is to say that for a student to get to an answer they would need to find a reference to an object that is dynamically created after the page is loaded and is not referenced by any globals.
Another reason I am loathe to go server side grading only is that there are still many people using a Runestone book by building with dynamic_pages = False and putting the build on a static web server.
I'd be happy to think more about selectquestion. Some (perhap uninformed) thoughts about it without having used the directive:
Pros:
- It's already done!
- It's a quick edit to include in any test.
- It dynamically loads questions, which prevents students from simply looking at the HTML or DOM to see answers.
- It runs client-side.
Cons:
- It relies on security through obscurity, which indicates it's only a part of the overall approach to securing high-stakes exams.
- I'm assuming that watching the network tab in the DOM inspector when changing questions would reveal the HTML for a question, which contains answers.
- Watching the network tab when submitting an answer shows the format for submitting a falsified answer. We should at least protect this using CSRF (I'm hoping using's web2py's CSRF in Ajax is possible?). Even with this, it's a fair-sized security hole.
If we choose to improve security by allowing server-side grading, I would only suggest a similar system to what we do for fitb questions: support both client-side and server-side options. I'd be interested in looking at client-side randomization, but I'd suggest we finish it server-side first just to have a complete unit of work.
I think the fact that the browser gives one access to network traffic, and the code and data through the debugger makes this a very difficult problem to completely solve on the client. Even if we encrypted the answers in transit between server and client a determined attacker could find the code that does the decryption and decrypt.
Of course they could get the format of the requests or any of the code from Github.
I don't think CSRF is really designed to solve this problem. CSRF inserts an encrypted hidden field into a form on the server side. Using a private key that only the server has. Another site attempting to forge arbitrary requests would not be able to recreate that hidden field.
I think our best strategy to keep things mostly client side is.
- Use the
selectquestiondirective - Only download a question the first time a student views it, not all at the beginning of the exam.
- Submit the answer to the current question whenever a student changes questions so that progress is more trackable.
- Introduce randomization into as many questions as we can.
- Randomize the order that the questions are presented as much as possible.
- Introduce some deterrence on the server side such as making sure that some reasonable amount of time passes between submitting answers.
I think incremental progress is still the way forward rather than trying to create the worlds best high stakes exam system. I can imagine a world where Runestone becomes so popular that it was worthwhile for someone to invest the time and effort to make and sell and exam cheating system, but I don't think we need to bite that off in the next version.
Like Bryan, I would encourage you to relax the client-only constraint. It doesn't seem like it buys you very much, and it makes the problem a lot harder.
On Mon, Oct 26, 2020 at 1:24 PM Bradley Miller [email protected] wrote:
I think the fact that the browser gives one access to network traffic, and the code and data through the debugger makes this a very difficult problem to completely solve on the client. Even if we encrypted the answers in transit between server and client a determined attacker could find the code that does the decryption and decrypt.
Of course they could get the format of the requests or any of the code from Github.
I don't think CSRF is really designed to solve this problem. CSRF inserts an encrypted hidden field into a form on the server side. Using a private key that only the server has. Another site attempting to forge arbitrary requests would not be able to recreate that hidden field.
I think our best strategy to keep things mostly client side is.
- Use the selectquestion directive
- Only download a question the first time a student views it, not all at the beginning of the exam.
- Submit the answer to the current question whenever a student changes questions so that progress is more trackable.
- Introduce randomization into as many questions as we can.
- Randomize the order that the questions are presented as much as possible.
- Introduce some deterrence on the server side such as making sure that some reasonable amount of time passes between submitting answers.
I think incremental progress is still the way forward rather than trying to create the worlds best high stakes exam system. I can imagine a world where Runestone becomes so popular that it was worthwhile for someone to invest the time and effort to make and sell and exam cheating system, but I don't think we need to bite that off in the next version.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/RunestoneInteractive/RunestoneComponents/issues/1084#issuecomment-716703403, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIG7TXC7V3FMVQMRBIGSMDSMWWFFANCNFSM4ST3LQJQ .
In that case I think server side grading needs to be an option, not the only way to score components. If we can coordinate this with #1087 that will ease some of the pain.
But backing up a bit... I started this issue to address a number of problems (see above). I don't view having the worlds most secure high stakes exam system in the world to be the highest priority. My primary worry about spending a huge amount of time on exam security is that I think it is most likely that students will cheat are in ways that we have zero control over.
Where do you all see exam security on the priority list?
I agree that server-side grading should be an option, and that it fits nicely with #1087. As Brad mentioned, do we need a securer system? Student can already cheat with phones, another browser window open, handwritten notes, etc. Is the added security of keeping answers/grades off the network worth the benefit for instructors? I can't definitively answer that question for all faculty, but for me personally, yes. I write my tests to be only fitb and programming questions currently for exactly that reason.
In terms of time required, I think we could integrate this with other efforts (#1087, etc.) to help reduce the expense. I'd suggest writing better testing for fitb questions which evaluates both server-side and client-side grading more thoroughly and see what we learn from the way this was implemented to move forward with this approach for other problem types.
Has this issue been resolved? I see the different options suggested in the discussion were added to To do in Timed Exam Redesign.