The very short answer (VSAQ) and prescribing (PVSA) items share a novel approach to marking which involves a hybrid of automatic and human evaluation of the given responses.
During authorship, a series of recognised responses (both correct and incorrect) are listed by the question writer. Once candidate responses are submitted, the system will group together matching values and evaluate them against the list of recognised ones. Where the given response matches one in the list it will be evaluated as such (either correct or incorrect).
If a response is not on the list of recognised ones it will be flagged for a human evaluation.
Marking interface
The marking interface is shown in the image below. The example shown is for a VSAQ item, but the same applies for the PVSA item type.
- This is the list of recognised answers created during authorship. You can see in the example that blue is worth 0.5 marks, green 1.0 and purple 0.0.
- These are the candidate responses:
- blue has been recognised, is coloured in green and is awarded 0.5 marks
- Magenta has not been recognised, is coloured in red and is awaiting a human evaluation
- purplw has not been recognised but is within the Levenshtein threshold, has been coloured in amber and is awaiting a human evaluation.
The response that has been matched to the recognised list (blue) has been automatically saved and confirmed.
Human marking
For the responses that require human evaluation, several options are available.
- Score: The marker can select the appropriate score from the list of available values (in this case 0.5 increments).
- Save: This saves the response given in the Score dropdown (but does not yet commit it).
- Confirm: Commits the response given in the Score dropdown.
- Add: This adds the candidate's response and the scoring decision to the list of recognised responses so that, when the item is next deployed in an exam, the responses will be recognised if given again. The idea is that, over time, as more responses are recognised, the human marking burden becomes less.
When responses are added to the recognised list, this automatically creates a new version of the item with the new responses included in the list. The item will need to be re-submitted and re-approved before it can be used in the next exam.
All marks must be fully Confirmed before marking is considered to be complete.
Case sensitivity and the Levenshtein threshold
Note in the previous step how some of the responses were marked in amber rather than red. This is due to a combination of two variables that can be set during item authorship: case sensitivity and the Levenshtein threshold.
- Case sensitivity controls whether the system will require the response to be matched precisely, including capitalisation, to the list response in order to be recognised.
- The Levenshtein threshold controls the leniency you wish to exercise over the exact matching. The value set in the question (in the example above, 1.0) is the number of steps of character transformation it would take for the given string to become the recognised string. In the example shown purplw can be transformed to the known answer purple by changing one letter. It is therefore within the 1.0 Levenshtein threshold and so has been marked as such.
Responses that fall within the Levenshtein threshold will always require a human verification. The system highlights them in amber to suggest that the response is "close" to a known one.
Comments
0 comments
Please sign in to leave a comment.