Capstone | Qianhui's Website

CritAssess

An online proctored exam tool built with Angular assessing critical thinking abilities of technical candidates

Try Working Prototype

OVERVIEW

Team:

Ryan Emberling (Developer)
Tanvi Domadia (Researcher)
Juliet Pusateri (Product Manager)
Qianhui Sun (Content Designer)
Doris Zhang (Designer)

Client: Microsoft

Tools:

Trello, Slack (AGILE workflows)
Mural board (Online Co-design Collaboration)
Figma, Sketch (Prototype)
InDesign (Layout Design)

Duration: Jan. 2019 - Aug. 2019

My Contributions:

Conducted 50+ SME interviews to locate performance gaps and decide training goals
Facilitated co-design workshops with end users and stakeholders
Used AGILE workflows to communicate research findings and project progress with client on a weekly basis
Composed 8 Situational Judgement Test scenarios.
Used a mixture of qualitative and quantitative research methods for validity check and reliability check
Facilitated developer to create components in Angular

Problem:

How might we assess non-technical skills of technical candidates in an authentic and engaging way?

Our Solution:

CritAssess is a proctored exam tool of an hour-long Situational Judgement Test (SJT) of critical thinking skills.

It provides learners with learner report written in easy-to-understand language to understand their proficiency in critical thinking and make improvement. It offers qualified candidates certificates for recruiters to use as a reference.

Scenario-based Exam

Learner Report

Certificate

Our Approach:

Achieved a high reliability (cronbach's alpha=0.891) which shows high consistency between all test items.
Be potentially integrated to our client's eLearning platforms that have millions of users worldwide.

Our Impacts:

Final Deliverable

FINAL DELIVERABLE

Instructional feature 1

SITUATIONAL JUDGEMENT TEST

Reason 1: Authenticity

“These are exactly the probing questions that the hiring manager wants to hear people talk about.”

—— CS recruiter

Reason 2:Transfer

Doing well in our assessment means that you are very likely to do equally well in similar job settings.

Instructional feature 2

TWO-PART SELECTED RESPONSE ASSESSMENT

Reason 1: Scalability

Our client faces a global audience.

Reason 2: Separate conclusion from reasons

They involve different thinking processes.

02 Design

Parallel Design

Storyboarding

Card Sorting

A/B Testing

04 Quality Assurance

Validity Check

Reliability Check

Usability Testing

01 Research

Literature Review

Competitive Analysis

Stakeholder Interview

Affinity Diagramming

03 Task Development

Stakeholder Interview

Cognitive Task Analysis

Cognitive Interview

Instructional feature 3

FORMATIVE AND SUMMATIVE ASSESSMENT

Reason 1: Balance two types of user needs

Learners want to know where their strengths and weaknesses are and improve with a growth mindset while recruiters want candidates with good critical thinking skills.

Instructional feature 4

SEGMENTED TASKS

Reason 1: Avoid extraneous cognitive load

Long hours of exam requires huge essential cognitive processing.

Reason 2: Offer learner control

Learners could take the exam at their own pace.

Instructional feature 5

KNOWLEDGE COMPONENT MODELING

Reason 1: More accurate measurement of learning gains

Each question is mapped with several knowledge components (KC). Credits given to each single question are also given to KCs.

Reason 2: Targeted feedback

Targeted feedback is given to knowledge components with high, middle, or low performance.

Design feature 1

TWO-PANEL DESIGN OF SCENARIO & QUESTIONS

Reason: Spatial contiguity reduces the need of scrolling up and down, thus minimizing extraneous cognitive processing

Design feature 2

HOVER-OVER GLOSSARY

Reason 1: Domain-specific terms → authenticity

Reason 2: Don't require prior knowledge

RESEARCH

Research goals

With a broad problem space in mind, our research is guided by the following 4 questions:

Among all the non-technical skills, which one shall we focus on?
What is the definition of critical thinking?
How is critical thinking skill being used in industry and academia now and what are the gaps?
What are the best practices to assess non-technical skills?

Stakeholders

Our target users are high school seniors and college freshmen who are going to take a career in STEM. This illustration of the ecosystem includes all our stakeholders.

Research methods & process

Research

Task Development

Secondary Research

60+

20+

National / State-level standards of critical thinking / non-technical skills

Research on the best practice of assessing critical thinking

Competitive analysis of existing vendor assessment

Field Research

Contextual inquiry of secondary school students

Interviews with school teachers and career counselors

Interviews with CS employees and tech recruiters

Insights

Q3 How is critical thinking skill being used in industry and academia and what are the gaps?

After 32 stakeholder interviews, we compared and contrasted the ways critical thinking skills are currently used and assessed in the context of computer science careers and secondary school.

We summarized performance gaps using the gap model and decided to set our assessment focuses on open-mindedness, analysis and reflection.

Q1 Among all the non-technical skills, which one shall we focus on?

We did extensive literature review on employment white papers to find critical thinking to be the most essential & hard to hire skill.

Q2 What is the definition of critical thinking?

We reviewed standards on non-technical skills and CS education to give domain-specific and domain-general definitions of critical thinking skills.

Insight 1:

Critical thinking is vital to employability in computer science careers.

Insight 2:

Students don’t receive explicit instruction defining critical thinking skills and consequently don’t know what it means.

Insight 3:

Students are impulsive and have difficulty generating & evaluating ideas with an open mind.

Insight 4:

Students rarely use reflection to iterate on their work or process.

Q4 What are the best practices to assess non-technical skills?

Trend 1: Stealth Assessment / Gamification

Use data continuously gathered through game playing to assess learners' mastery levels

Trend 2: Computer-supported Collaborative Learning

Computer agent / chatbot boosts meaningful online discussions by providing scaffolded prompts

Trend 3: Automated Essay Grading

Use NLP to parse the text and summarize micro features that could be mapped to macro features

Trend 4: Situational Judgement Test

Embed assessment in an authentic, work-related context, most commonly used in medical training.

Final Design Problems

1. HMW embed the assessment of open-mindedness in an engaging and interactive experience?

2. HMW help students to discover the critical thinking skills most integral to their employability roadmap?

3. HMW assess secondary school students' ability to reason, evaluate, and communicate with an open mind in collaborative decision making?

4. HMW assess secondary school students' ability to use feedback and self-examination to reflect on their work and iteratively improve themselves?

IDEATION

Brainstorm

Our team had a series of ideation sprints using different brainstorming techniques. This helped us generate a wide enough range of ideas when the insights from the field study was still fresh in our memory.

Ideation & Validation

Define Success Metrics

We proposed a list of evaluation criteria and did a card sorting activity with our client to better understand their priorities.

Validate

With each team member creating 3-5 storyboards, 23 storyboards were created in total. The team voted and combined similar ideas, consolidating into 6 storyboards. We did speed dating with 6 potential users (CMU STEM students) and 3 teachers. We further iterated them into 4 final storyboards. I produced the storyboards digitally. We evaluated each of these ideas against the evaluation metrics.

Idea 1:

Idea 2:

Problem it solves:

The use of open-mindedness and fair-mindedness is limited in secondary school; yet integral to computer science careers

Competitive edge:

Students love games
Adaptive model

Implementation requirement:

Educational Game Design
Selection of target critical thinking sub-skills
Playtesting to inform scenario design

Problem it solves:

Students don’t know what critical thinking skills are valued in workforce.
Current assessment of critical thinking skills is rarely embedded in an authentic context.

Competitive edge:

Scalability in a global online learning environment
Authenticity in career preparation

Implementation requirement:

A database of critical thinking sub-skills valued by recruiters for different jobs
Situational judgement tasks specially designed for each type of jobs
NLP that extracts micro features from students’ open-ended response
Rubrics that map micro features to macro features (critical thinking sub-skills)

Idea 3:

Idea 4:

Problem it solves:

Weighing options is missing in secondary schools
Students don’t reason in an open-minded manner

Competitive edge:

Teachers love it the most

Implementation requirement:

Natural Language Processing for grading skills in the chat
Access to Carolyn Rose’s work with the chatbot

Problem it solves:

Students seldom have the chances for authentic
Students don’t reason in an open-minded manner

Competitive edge:

Authentic video communication

Implementation requirement:

Database & Web server
Algorithm for producing weighted average grade based on peer grades

VALIDATION

Our goals:

Which idea our client / academic mentors like the most - to settle a large direction
Which parts of a particular idea do our client like, which they don't - to combine detailed features

Our approach:

Research presentation
Metric evaluation
Card Sorting

Result:

After co-design session and a series of debrief meetings that follow, our client decided to proceed with a modified version of Idea 2. We decided to use multiple choice question as our delivery format.

What I contributed:

I presented related EdTech research for our client to make the best-informed decision
I proposed the idea that our client finally decided to move forward with (with an adapted version)

Validation

Process:

To facilitate an online co-design activity, we used Mural Board to collaborate. We classified the features of all 4 proposed ideas into instructional features, technical features and assessment goals.

The thumbs up on the left of each storyboards indicate the number of participants who liked this idea.

We used different color codings to indicate opinions from different parties: pink stands for our team member while green stands for our client / professors.

After a series of debrief meetings that our client had internally, we decided that the final product would have the following 3 features:

Situational Judgement Test

Our client wants the assessment to be specifically targeting at CS employees. However, all the existing vendor assessment are domain general. Situational judgement test uses authentic work context to improve learning transfer.

Scenario-based

Learner Report & Certification

Our client wants the assessment to be both formative and summative. Therefore, we provide a learner report for learners to improve with a growth mindset as well as a certificate to those who pass certain scores.

Formative & Summative

Multiple Choice Question

Our client wants the assessment to be scalable. They prefer closed-form to open-ended questions. They prioritize feasibility over novelty. Therefore, we finally decided to go with multiple choice question.

Closed-form

PROTOTYPING

Wireframe

Prototyping

Onboarding

Exam

Formative Feedback - Learner-facing

Summative Feedback - Recruiter-facing

User Feedback

We first iterated on our prototype based on first-round user testing results. Main iterations from low-fi prototype to mid-fi prototype were:

1. Provided three 20-minutes exam sections
2. Allowed users to check exam progress and time
3. Left-right layout of test questions

When validating our design ideas, we used a similar testing structure as in low-fi prototype user testing. We recruited three users with experience in user experience design. We gave each of them three main tasks:

Start an exam
Go through the exam
Check and interpret exam results

We observed and recorded their performance and asked for specific feedback after finishing the tasks, findings include:

Users want to see general instructions at the beginning of the test
Users hope the scenarios and questions can be shorter and easier to read on-screen
Users do not understand technical terms and want to have an easy way to check a glossary
Users want to have a confirm button during the exam to prevent user error
Users want to see how many sections they have completed in a dashboard

Hi-fi Iterations

01. Add the word 'correct' on learner report to dismiss ambiguity

Before

After

02. Increase the size of trophy and test history selection to give clearer visual indication of completion

Before

After

03. Change the position of explanatory feedback to improve readability and visual hierarchy

After

Before

04. Document Design to improve readability and reduce extraneous cognitive loads

Before

After

Final Design

Onboarding

Exam-taking

Exam Review

Dashboard & Report

TASK DEVELOPMENT

Task Development

Process Overview:

Goal Analysis:

From spring research, we identified our learning goals to be reasoning, prioritization, and open-mindedness, which are the most essential yet lacking critical thinking sub-skills.

The next step we took was to further divided those learning goals into knowledge components. We further categorized them into big ideas, important to know / do, worth being familiar with.

Assessment Item Design:

In order to make our assessment authentic, we interviewed technical recruiters and CS employees to see how critical thinking is currently being used in their daily work. There are 3 things that we specially probed for in our interviews:

Open-mindedness - Understanding colleagues’ perspectives or different points of views
Reasoning - Planning and prioritizing work for the day, week and month
Analysis - Finding the root cause of a problem

Regarding specific questions, we explicitly asked tech recruiters to list scenarios in which tech employees need to

Understand their colleagues’ perspectives
Weigh pros and cons to choose the best option
Plan and prioritize their work for the day, the week, and the month?
Find the root cause of a problem

We adapted the scenarios that we gathered through interviews into 21 scenarios.

Goal-question Mapping:

We mapped knowledge components to each single question. 3 people were involved in this task for inter-rater reliability.

Scoring Principles:

1. Each question contributes equally to the overall score.

2. If the learner gets a question correct, all the knowledge components assessed in that question would be awarded corresponding credits.

3. Each item from a select-all-that-apply question is treated as a separate, independent true-or-false question.

4. In a select-all-that-apply question, there is no penalty for incorrect answers, the lowest score one could get for a question is zero.

Interpretation:

After finishing all three parts of the assessment, learners will receive a learner report of their top three performing knowledge components and bottom three performing knowledge components determined by the score they received. There are 3 steps to create a learner-centered performance report.

STEP 1. Map each question to different knowledge components.

STEP 2. Create a scoring rubric by referencing existing critical thinking rubrics.

STEP 3. Translate rubric into learner-interpretable learner report.

QUALITY ASSURANCE

Quality Assurance

Validity Check

Reliability Check

Validity Check: Target Learners

We conducted empirical cognitive task analysis on students aged 13 - 19 at MOS from June 17-19, 2019 in Orlando, Florida during Microsoft Office Specialist (MOS) U.S. National Championship and removed all questions related to coding / pseudo-code, added additional explanation in the scenario prompt explicitly asking students not to judge with prior knowledge of technical features or implementation.

We conducted cognitive interviews on CMU undergrads in STEM majors to make sure that students understand the tasks in the way we want them to.

Validity Check: Academic Experts

We asked academic experts whether the way we divide knowledge components makes sense, whether we are assessing things that we intend to assess, whether our scoring rubric makes sense, etc.

Validity Check: Recruiter

We asked recruiters to go through the tasks and provide feedback on whether they find the tasks authentic / whether they are going to ask those questions during a job interview.

Reliability Check: Target Learners

Goal:

Check the reliability of test items and delete or rephrase those that impede overall reliability accordingly
Awarding partial credit to items if this could improve overall reliability

Process:

We conducted two rounds of quantitative analysis. The first round on computer science graduate students. The second round was comprised of two thirds STEM undergrad students (computer science, software engineering, electrical and computer engineering, robotics etc.), and one third Amazon Mechanical turk masters ages 18-25 who were US high school graduates.

Iterations:

In response to our reliability check results, we made three types of changes to our questions: deletion (scenario, question, or item), rephrasing, or rescoring. The improvement in reliability is visualized in the chart below, color red indicates trouble-makers while color green indicates healthy results.

Result:

Reached a high reliability showing high consistency between all test items (Cronbach's alpha = 0.891)

REFLECTION

Reflection

What I learnt:

Workflow

Multi-tasking: While instructional designers are still working to generate more scenarios, designers could start to create multimedia assets. While others are doing proof reading, I could generate more assets.
Agile: Check in with client / teammates quickly before going towards a wrong direction for too long.
Version Control: Keep good version control of assessment design. Every time before making changes to the current work (e.g. after rounds of cognitive interviews), download / copy it. It might also be a good practice to keep a standalone document for dates & major changes.
Documentation: Keep a quote bank of important / fantastic quotes from interviews.
Field Guide: Write a field guide before going to interviews so that different people could have the same focus in mind.

Work with Client & SME

Provide consultancy backed up by research: help clients scope the problem / solution space and push back their requests sometimes.
SME checklist: Organization structure, needs and constraints may change constantly over time. AGILE workflow is important to ensure timely checkin.

What I could have done differently:

Plan for recruitment ahead of time. Although we were able to find the best proxies to test design with, it would have been better if we could test with actual users.