Written item types available in risr/ assess

Michael Pollitt
Michael Pollitt
  • Updated

risr/ assess offers a range of written item types which can be used to create robust and challenging assessment activities. This article outlines each of the different item types, how they work and how they might be useful.

Summary

The table below shows which marking and language variant functionality each item type supports.

Item type Marking Supports language variants
Assertion-reasoning question (ARQ) Auto No
Clinical prioritisation question (CPQ) Auto Yes
Extended matching question (EMQ) Auto Yes
Hotspot question Auto No
Multipart written question (MWQ) Hybrid Yes
Multiple acceptable answers (MAA) Auto Yes
Multiple true/false (MTF) Auto No
Prescribing question (PVSA) Hybrid No
Short answer question (SAQ) Human Yes
Single best answer (SBA) Auto Yes
Very short answer question (VSAQ) Hybrid Yes

Item features

Each entry in this list links to a dedicated article which explains the creation process for that that item type as well as its individual nuances.

  • The assertion-reasoning (ARQ) item is a type of critical reasoning question where test-takers are presented with both an assertion statement and a justification for that statement. Their task is to select the response option that correctly reflects the validity of both statements (and if both are true, the relationship between the two). Answer options are presented as simple single-best answer options.

    An example format is shown below:

    Assertion statement

    Reason statement

    1. Assertion is true; reason is true; reason explains the assertion
    2. Assertion is true; reason is true; reason does not explain the assertion
    3. Assertion true; reason is false
    4. Assertion is false; reason is true
    5. Assertion is false; reason is false

    In the example shown, test-takers must first determine whether both statements are correct. If they decide both statements are true, they must then carry out a third stage of the task to determine whether the reason correctly explains the assertion.

    Assertion-reasoning questions can be useful to start to assess more higher-order reasoning skills than might otherwise be possible in simple multiple-choice questions.

  • The Clinical Prioritisation (CPQ) item type presents the test-taker with a scenario or prompt and several answer options which they must rearrange into what they consider to be the most appropriate order.

    There are several possible uses for this question type, including ranking responses in terms of suitability, situational judgement style responses or ordering steps taken in response to a problem. 

    An example situational judgement format is shown below:

    You are just finishing a busy shift on the Acute Admissions Unit (AAU). Your FY1 colleague who is due to replace you for the evening shift leaves a message with the nurse in charge that she will be 15 to 30 minutes late. There is only a 30 minute overlap between your timetables to handover to your colleague. You need to leave on time as you have a social engagement to attend with your partner.

    Rank in order the following actions in response to this situation

    1. Quickly go around each of the patients on the AAU, leaving an entry in the notes highlighting the major outstanding issues relating to each patient and then leave at the end of your shift
    2. Make a list of the patients under your care on the AAU, detailing their outstanding issues, leaving this in the doctors' office when your shift ends and then leave at the end of your shift
    3. Ask your specialty trainee if you can leave a list of your patients and their outstanding issues with him to give to your colleague when she arrives and then leave at the end of your shift
    4. Leave a message for your partner explaining that you will be 30 minutes late
    5. Make a list of patients and outstanding investigations to give to your colleague as soon as she arrives

    Candidate responses are scored according to the distance each option is placed from its correct position. This means the minimum score available for the question (unless normalised within an item set) will always be a product of the number of response options (e.g. 4 response options / min score = 12, 5 response options / min score = 20)
    An example scoring table is shown below:

      Candidate ranks the item...
    Correct order 1st 2nd 3rd 4th 5th
    D 4 3 2 1 0
    E 3 4 3 2 1
    C 2 3 4 3 2
    A 1 2 3 4 2
    B 0 1 2 3 4

    This methodology makes the item type work well for nuanced situations where there is no one single answer is obviously correct.

  • The extended matching question (EMQ) presents test-takers with a common set of answers and series of clinical stems (usually from a single lead in set of instructions).

    An example is shown below:

    For each case presentation, select the most likely diagnosis from the list of options. Each option may be selected once, more than once or not at all.

    Presentation 1: Skin has round ring-like patches

    Presentation 2: Skin has scales and red patches that are dry sometimes painful

    Presentation 3: Skin is itchy, red and inflamed with cracks in areas

    Presentation 4: Skin has areas of burning and tingling sensation with a red blotchy rash

    Answer options:

    1. Eczema
    2. Heat rash
    3. Intertrigo
    4. Psoriasis
    5. Rosacea
    6. Shingles
    7. Stress rash

    This item type is closely related to the basic single best answer item type, and is scored as such (i.e. each stem is scored independently according to the answer selected). However EMQs provide opportunity to begin to test greater application of knowledge and in more complex settings.

  • The hotspot style question asks test-takers to identify a position on an image as their answer. For example, you might provide a radiograph and ask the candidate to click at the position of a clinical abnormality.

    The item author is able to draw the acceptable region(s) for the test-taker to click to register their answer. If the coordinate of the test-taker's click falls within that region, the response is evaluated as correct.

    It is also possible to add several recognition regions to any one image and in doing so require that the test-taker provides more than one answer.

  • The multipart written question (MWQ) is a compound item type that combines the functionality of two of our most commonly used item types; the single-best answer (SBA) and the short answer question (SAQ) which uses a free text box to collect the test-taker's response.

    In doing so, you can start to create rich, in-depth and multi-part items that challenge test-takers beyond the simple respond and move on approach.

    An example of this could be:

    Part a: Question stem

    1. Answer option 1
    2. Answer option 2
    3. Answer option 3
    4. Answer option 4
    5. Answer option 5

    Part b: Please explain why you selected the option above.

    In line with the functionality of the individual component question parts, this item type can support a mixture of auto and manual scoring.

  • The multiple acceptable answers (MAA) item type is a variant of the single-best answer (SBA) item type, only more than one option can be selected.

    An example is shown below:

    Here is the question stem

    Select all that apply

    1. Answer option 1
    2. Answer option 2
    3. Answer option 3
    4. Answer option 4
    5. Answer option 5

    There is also the possibility to use a combination of limiting the number of options that can be selected and using negative scoring to prevent or discourage test-takers from selecting all of the options.

  • The multiple true/false (MTF) question is another variant of the single-best answer (SBA) question, only in this format stems are presented as a series of grouped questions with preset true or false response options. The item requires a single lead in and then the number o stems is unlimited.

    An example is shown below:

    Decide whether the following statements are true or false.

    Part a: Statement 1

    1. True
    2. False

    Part b: Statement 2

    1. True
    2. False

    Part c: Statement 3

    1. True
    2. False

    Each individual statement, albeit grouped into one item, is treated as a separately scored question and received its own Angoff or Ebel values. Questions are also equally weighted within the parent item.  

  • The prescribing question is a complex item type that is designed to imitate a prescription form for medications. The candidate is presented with a question prompt and asked to fill out several medication fields as they might in a real clinical environment.

    An example is shown below:

    A 59 year old man has his blood pressure reviewed. He was diagnosed with hypertension 3 years previously, and currently takes amlodipine 10 mg once a day, and ramipril 10 mg once a day. He is concordant with his medication. He feels well with no new concerns. His average BP on home recordings over the last two months is 156/94 mmHg. His BMI is 24.7 kg/m2. Fundoscopy is normal.

    Investigations

    Sodium 136 mmol/L (135–146)
    Potassium 4.6 mmol/L (3.5–5.3)
    Urea 6.3 mmol/L (2.5–7.8)
    Creatinine 96 µmol/L (60–120)
    Total cholesterol 4.2 mmol/L (<5.0)
    Triglycerides 1.2 mmol/L (<2.3)
    TC:HDL ration 5.3 (<4.5)
    Fasting glucose 5.5 mmol/L (3.0–6.0)

    Urinalysis: No abnormality

    Please prescribe the most appropriate additional medication.

    Test-takers are presented with fields for medication, dosage, unit, route and frequency and are offered to select valid values from a list when they start typing. The drug name is not validated.

    The prescribing question requires human marking and this is conducted in the same way as the VSAQ item described below (including the Levenshtein threshold).

  • The short answer question (SAQ) item type requires the test-taker to provide an open-ended text response to a prompt.

    SAQs always require human marking.

  • The single best answer (SBA) item type is the basic multiple-choice question format available in risr/ assess. As the name suggests, it requires the test-taker to select the single most appropriate response to the question from a list of possible options.

    An example is shown below:

    The genetic predisposition to schizophrenia is usually transmitted as which of the following?

    • A multifactorial trait
    • An autosomal dominant trait
    • An autosomal recessive trait
    • An X-linked dominant trait
    • An X-linked recessive trait

    SBA items are automatically marked in the system, and predetermined candidate feedback can be added as required.

  • The very short answer question (VSAQ) requires test-takers to respond to a prompt with very short answers. The response box is open-ended (i.e. there are no answer options from which to select), and respondents should be asked to provide responses of usually no more than 2-3 words and on a single line of text.

    An example is shown below:

    What is the most common cause of pancreatitis?

    Acceptable responses (unseen to the test-taker):

    • Alcoholism (0.5)
    • alcohol (1.0)
    • Alcohol abuse (1.0)
    • Gallstones (0.5)

    The VSAQ item supports a mixture of automatic and manual scoring. Matching responses are grouped together and can be evaluated in bulk.

    During authorship, the question writer will specify a list of answer options they consider to be acceptable. If the respondent provides that exact response, it is automatically scored as such and the respondent receives the question credit. Case sensitivity is controllable per item and also as a site level configuration.

    Authors are also able to set a Levenshtein threshold during authorship. If the test-taker provides a response that is not one of the given ones, but is within the Levenshtein threshold (i.e. the minimum number of character changes to achieve an identical string), no credit will be given, but the response will be marked in amber so a human marker can evaluate.

    Responses that are outside of the threshold are marked in red to indicate that they are incorrect. The human marker can adjust this if they consider the response to be acceptable, or they can confirm the automated incorrect outcome.

    When markers score a new given response as acceptable, tis is saved against the available responses in the next version of the item. This means that as the question is used more over time, the pool of acceptable responses increases and less time needs to be spent on the manual marking process.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.