top of page

CritAssess

An online proctored exam tool built with Angular assessing critical thinking abilities of technical candidates

homepage.png

 

01

OVERVIEW

Team:

  • Ryan Emberling (Developer)

  • Tanvi Domadia (Researcher)

  • Juliet Pusateri (Product Manager)

  • Qianhui Sun (Content Designer)

  • Doris Zhang (Designer)

Client: Microsoft​

Tools:

  • Trello, Slack (AGILE workflows) 

  • Mural board (Online Co-design Collaboration)

  • Figma, Sketch (Prototype)

  • InDesign (Layout Design)

 

Duration: Jan. 2019 - Aug. 2019

My Contributions:

  • Conducted 50+ SME interviews to locate performance gaps and decide training goals

  • Facilitated co-design workshops with end users and stakeholders

  • Used AGILE workflows to communicate research findings and project progress with client on a weekly basis

  • Composed 8 Situational Judgement Test scenarios. 

  • Used a mixture of qualitative and quantitative research methods for validity check and reliability check

  • Facilitated developer to create components in Angular

 

Problem:

How might we assess non-technical skills of technical candidates in an authentic and engaging way?

Our Solution:

CritAssess is a proctored exam tool of an hour-long Situational Judgement Test (SJT) of critical thinking skills. 

It provides learners with learner report written in easy-to-understand language to understand their proficiency in critical thinking and make improvement. It offers qualified candidates certificates for recruiters to use as a reference.

Scenario Assessment.png

Scenario-based Exam

report.png

Learner Report

Certificate.jpg

Certificate

Our Approach:

  • Achieved a high reliability (cronbach's alpha=0.891) which shows high consistency between all test items.

  • Be potentially integrated to our client's eLearning platforms that have millions of users worldwide.

Our Impacts:

Final Deliverable

02

FINAL DELIVERABLE

SJT.png

Instructional feature 1

SITUATIONAL JUDGEMENT TEST

 

Reason 1: Authenticity

“These are exactly the probing questions that the hiring manager wants to hear people talk about.”

—— CS recruiter

 

Reason 2:Transfer

​Doing well in our assessment means that you are very likely to do equally well in similar job settings.

Instructional feature 2

TWO-PART SELECTED RESPONSE ASSESSMENT

 

Reason 1: Scalability 

Our client faces a global audience.

Reason 2: Separate conclusion from reasons 

They involve different thinking processes.

02 Design

Parallel Design 

Storyboarding

Card Sorting

A/B Testing

04 Quality Assurance

Validity Check

Reliability Check

Usability Testing

01 Research

Literature Review

Competitive Analysis 

Stakeholder Interview

Affinity Diagramming

03 Task Development

Stakeholder Interview

Cognitive Task Analysis

Cognitive Interview

Instructional feature 3

FORMATIVE AND SUMMATIVE ASSESSMENT

 

Reason 1: Balance two types of user needs

Learners want to know where their strengths and weaknesses are and improve with a growth mindset while recruiters want candidates with good critical thinking skills. 

Group 2.png
dashboard.png

Instructional feature 4

SEGMENTED TASKS

 

Reason 1: Avoid extraneous cognitive load

Long hours of exam requires huge essential cognitive processing.

Reason 2: Offer learner control 

Learners could take the exam at their own pace. 

Instructional feature 5

KNOWLEDGE COMPONENT MODELING

Reason 1: More accurate measurement of learning gains

Each question is mapped with several knowledge components (KC). Credits given to each single question are also given to KCs.

Reason 2: Targeted feedback

Targeted feedback is given to knowledge components with high, middle, or low performance.

KC.png
twoPanel.png

Design feature 1

TWO-PANEL DESIGN OF SCENARIO & QUESTIONS

 

Reason: Spatial contiguity reduces the need of scrolling up and down, thus minimizing extraneous cognitive processing

Design feature 2

HOVER-OVER GLOSSARY

 

Reason 1: Domain-specific terms → authenticity

Reason 2: Don't require prior knowledge 

glossary.png

03

RESEARCH

Research goals

With a broad​ problem space in mind, our research is guided by the following 4 questions:

  • Among all the non-technical skills, which one shall we focus on? 

  • What is the definition of critical thinking?

  • How is critical thinking skill being used in industry and academia now and what are the gaps?

  • What are the best practices to assess non-technical skills?

Stakeholders

ecosystem.png

Our target users are high school seniors and college freshmen who are going to take a career in STEM. This illustration of the ecosystem includes all our stakeholders. 

Research methods & process 

Research
Task Development
Group.png

Secondary Research

60+ 

20+ 

20+ 

National / State-level standards of critical thinking / non-technical skills

Research on the best practice of assessing critical thinking

Competitive analysis of existing vendor assessment

Field Research

18

Contextual inquiry of secondary school students

5

Interviews with school teachers and career counselors

8

Interviews with CS employees and tech recruiters

Insights

Word Cloud.png

Q3 How is critical thinking skill being used in industry and academia and what are the gaps?

After 32 stakeholder interviews, we compared and contrasted the ways critical thinking skills are currently used and assessed in the context of computer science careers and secondary school.

 

We summarized performance gaps using the gap model and decided to set our assessment focuses on open-mindedness, analysis and reflection.

assets2.png

Q1 Among all the non-technical skills, which one shall we focus on? 

We did extensive literature review on employment white papers to find critical thinking to be the most essential & hard to hire skill.

Q2 What is the definition of critical thinking?

We reviewed standards on non-technical skills and CS education to give domain-specific and domain-general definitions of critical thinking skills.

Insight 1:

Critical thinking is vital to employability in computer science careers.

Insight 2:

Students don’t receive explicit instruction defining critical thinking skills and consequently don’t know what it means.

Insight 3:

Students are impulsive and have difficulty generating & evaluating ideas with an open mind.

Insight 4:

Students rarely use reflection to iterate on their work or process.

Q4 What are the best practices to assess non-technical skills?

Word Cloud 2.png

Trend 1: Stealth Assessment / Gamification

Use data continuously gathered through game playing to assess learners' mastery levels 

Trend 2: Computer-supported Collaborative Learning

Computer agent / chatbot boosts meaningful online discussions by providing scaffolded prompts

Trend 3: Automated Essay Grading

Use NLP to parse the text and summarize micro features that could be mapped  to macro features

Trend 4: Situational Judgement Test

Embed assessment in an authentic, work-related context, most commonly used in medical training.

Final Design Problems

1. HMW embed the assessment of open-mindedness in an engaging and interactive experience?

2. HMW help students to discover the critical thinking skills most integral to their employability roadmap?

3. HMW assess secondary school students' ability to reason, evaluate, and communicate with an open mind in collaborative decision making?

4. HMW assess secondary school students' ability to use feedback and self-examination to reflect on their work and iteratively improve themselves?

04

IDEATION

Brainstorm 

Our team had a series of ideation sprints using different brainstorming techniques. This helped us generate a wide enough range of ideas when the insights from the field study was still fresh in our memory.

Gallery Crawl.jpg
Ideation & Validation

Define Success Metrics

We proposed a list of evaluation criteria and did a card sorting activity with our client to better understand their priorities. 

Evaluation.png

Validate

With each team member creating 3-5 storyboards, 23 storyboards were created in total. The team voted and combined similar ideas, consolidating into 6 storyboards. We did speed dating with 6 potential users (CMU STEM students) and 3 teachers. We further iterated them into 4 final storyboards. I produced the storyboards digitally. We evaluated each of these ideas against the evaluation metrics.

Evaluation Metrics.png

Idea 1:

Screen Shot 2019-10-21 at 16.26.44.png

Idea 2:

Screen Shot 2019-10-21 at 16.27.14.png

Problem it solves:

  • The use of open-mindedness and fair-mindedness is limited in secondary school; yet integral to computer science careers

Competitive edge:

  • Students love games

  • Adaptive model

Implementation requirement:

  • Educational Game Design

  • Selection of target critical thinking sub-skills

  • Playtesting to inform scenario design

Problem it solves:

  • Students don’t know what critical thinking skills are valued in workforce. 

  • Current assessment of critical thinking skills is rarely embedded in an authentic context.

Competitive edge:

  • Scalability in a global online learning environment  

  • Authenticity in career preparation

Implementation requirement:

  • A database of critical thinking sub-skills valued by recruiters for different jobs 

  • Situational judgement tasks specially designed for each type of jobs

  • NLP that extracts micro features from students’ open-ended response 

  • Rubrics that map micro features to macro features (critical thinking sub-skills)

Idea 3:

Screen Shot 2019-12-15 at 08.44.13.png

Idea 4:

Screen Shot 2019-12-15 at 08.44.21.png

Problem it solves:

  • Weighing options is missing in secondary schools 

  • Students don’t reason in an open-minded manner

Competitive edge:

  • Teachers love it the most

Implementation requirement:

  • Natural Language Processing for grading skills in the chat

  • Access to Carolyn Rose’s work with the chatbot 

Problem it solves:

  • Students seldom have the chances for authentic 

  • Students don’t reason in an open-minded manner

Competitive edge:

  • Authentic video communication

Implementation requirement:

  • Database & Web server

  • Algorithm for producing weighted average grade based on peer grades

05

VALIDATION

Our goals:

  • Which idea our client / academic mentors like the most - to settle a large direction

  • Which parts of a particular idea do our client like, which they don't - to combine detailed features

Our approach:

  • Research presentation 

  • Metric evaluation

  • Card Sorting

Result:

After co-design session and a series of debrief meetings that follow, our client decided to proceed with a modified version of Idea 2. We decided to use multiple choice question as our delivery format.

What I contributed:

icons8-light-80.png
  • I presented related EdTech research for our client to make the best-informed decision

  • I proposed the idea that our client finally decided to move forward with (with an adapted version)

Validation

Process:

To facilitate an online co-design activity, we used Mural Board to collaborate. We classified the features of all 4 proposed ideas into instructional features, technical features and assessment goals.

The thumbs up on the left of each storyboards indicate the number of participants who liked this idea.

We used different color codings to indicate opinions from different parties: pink stands for our team member while green stands for our client / professors. 

Co-design.png

After a series of debrief meetings that our client had internally, we decided that the final product would have the following 3 features:

Scenario Assessment.png

Situational Judgement Test

Our client wants the assessment to be specifically targeting at CS employees. However, all the existing vendor assessment are domain general. Situational judgement test uses authentic work context to improve learning transfer.

Scenario-based 

report.png

Learner Report & Certification

Our client wants the assessment to be both formative and summative. Therefore, we provide a learner report for learners to improve with a growth mindset as well as a certificate to those who pass certain scores.

Formative & Summative

MCQ.png

Multiple Choice Question

Our client wants the assessment to be scalable. They prefer closed-form to open-ended questions. They prioritize feasibility over novelty. Therefore, we finally decided to go with multiple choice question.

Closed-form

06

PROTOTYPING

Wireframe

Prototyping

Onboarding

Homepage.png
learner-test.png

Exam

Test page V2 copy.png
Test page V2.png

Formative Feedback - Learner-facing

learner-statistics.png
Formative.png

Summative Feedback - Recruiter-facing

recruiter copy.png
recruiter.png

User Feedback

We first iterated on our prototype based on first-round user testing results. Main iterations from low-fi prototype to mid-fi prototype were:

1. Provided three 20-minutes exam sections
2. Allowed users to check exam progress and time
3. Left-right layout of test questions


When validating our design ideas, we used a similar testing structure as in low-fi prototype user testing. We recruited three users with experience in user experience design. We gave each of them three main tasks:

  1. Start an exam

  2. Go through the exam

  3. Check and interpret exam results

We observed and recorded their performance and asked for specific feedback after finishing the tasks, findings include:

  1. Users want to see general instructions at the beginning of the test

  2. Users hope the scenarios and questions can be shorter and easier to read on-screen

  3. Users do not understand technical terms and want to have an easy way to check a glossary

  4. Users want to have a confirm button during the exam to prevent user error

  5. Users want to see how many sections they have completed in a dashboard

Hi-fi Iterations

01. Add the word 'correct' on learner report to dismiss ambiguity

DashboardV1.png

Before

DashboardV2.png

After

02. Increase the size of trophy and test history selection to give clearer visual indication of completion

TestHistoryV1.png

Before

TestHistoryV2.png

After

03. Change the position of explanatory feedback to improve readability and visual hierarchy

Review V1.jpg

After

ReviewV2.png

Before

04. Document Design to improve readability and reduce extraneous cognitive loads

S1Q1V2.png

Before

S1Q1.jpg

After

Final Design

Onboarding

Homepage - simple version.png
Homepage - simple version (1).png
Instruction.png

Exam-taking

S1Q1.png
S1Q1 (1).png
S1Q2.png

Exam Review

S1Q1-review direction.png
S1Q2-review.png

Dashboard & Report

Dashboard-overview-finish all.png
Dashboard-certificates.png
Dashboard-certificates (1).png

07

TASK DEVELOPMENT

Task Development

Process Overview:

Process.png

Goal Analysis:

From spring research, we identified our learning goals to be reasoning, prioritization, and open-mindedness, which are the most essential yet lacking critical thinking sub-skills.

The next step we took was to further divided those learning goals into knowledge components. We further categorized them into big ideas, important to know / do, worth being familiar with.

Goals.png

Assessment Item Design:

In order to make our assessment authentic, we interviewed technical recruiters and CS employees to see how critical thinking is currently being used in their daily work. There are 3 things that we specially probed for in our interviews: 

  • Open-mindedness - Understanding colleagues’ perspectives or different points of views

  • Reasoning - Planning and prioritizing work for the day, week and month

  • Analysis - Finding the root cause of a problem

Regarding specific questions, we explicitly asked tech recruiters to list scenarios in which tech employees need to

  • Understand their colleagues’ perspectives

  • Weigh pros and cons to choose the best option

  • Plan and prioritize their work for the day, the week, and the month?

  • Find the root cause of a problem

We adapted the scenarios that we gathered through interviews into 21 scenarios.

Goal-question Mapping:

KC Mapping.png

We mapped knowledge components to each single question. 3 people were involved in this task for inter-rater reliability.

Scoring Principles:

1. Each question contributes equally to the overall score.

2. If the learner gets a question correct, all the knowledge components assessed in that question would be awarded corresponding credits.

3. Each item from a select-all-that-apply question is treated as a separate, independent true-or-false question.

4. In a select-all-that-apply question, there is no penalty for incorrect answers, the lowest score one could get for a question is zero.

Interpretation:

After finishing all three parts of the assessment, learners will receive a learner report of their top three performing knowledge components and bottom three performing knowledge components determined by the score they received. There are 3 steps to create a learner-centered performance report. 

 

STEP 1. Map each question to different knowledge components.

STEP 2. Create a scoring rubric by referencing existing critical thinking rubrics.

STEP 3. Translate rubric into learner-interpretable learner report.

08

QUALITY ASSURANCE

Quality Assurance
Validity Check.jpg

Validity Check

Reliability Check.jpg

Reliability Check

Validity Check: Target Learners

We conducted empirical cognitive task analysis on students aged 13 - 19 at MOS from June 17-19, 2019 in Orlando, Florida during Microsoft Office Specialist (MOS) U.S. National Championship and removed all questions related to coding / pseudo-code, added additional explanation in the scenario prompt explicitly asking students not to judge with prior knowledge of technical features or implementation.

We conducted cognitive interviews on CMU undergrads in STEM majors to make sure that students understand the tasks in the way we want them to.

IMG_5783
IMG_6855
IMG_7480
IMG_0250
IMG_9223

Validity Check: Academic Experts

We asked academic experts whether the way we divide knowledge components makes sense, whether we are assessing things that we intend to assess, whether our scoring rubric makes sense, etc.

Validity Check: Recruiter

We asked recruiters to go through the tasks and provide feedback on whether they find the tasks authentic / whether they are going to ask those questions during a job interview. 

Reliability Check: Target Learners

Goal:

  • Check the reliability of test items and delete or rephrase those that impede overall reliability accordingly

  • Awarding partial credit to items if this could improve overall reliability

Process:

We conducted two rounds of quantitative analysis. The first round on computer science graduate students. The second round was comprised of two thirds STEM undergrad students (computer science, software engineering, electrical and computer engineering, robotics etc.), and one third Amazon Mechanical turk masters ages 18-25 who were US high school graduates.

Iterations:

In response to our reliability check results, we made three types of changes to our questions: deletion (scenario, question, or item), rephrasing, or rescoring. The improvement in reliability is visualized in the chart below, color red indicates trouble-makers while color green indicates healthy results.

Result:

Reached a high reliability showing high consistency between all test items (Cronbach's alpha = 0.891)

Screen Shot 2019-08-06 at 17.14.02.png

09

REFLECTION

Reflection

What I learnt: 

Workflow

  1. Multi-tasking: While instructional designers are still working to generate more scenarios, designers could start to create multimedia assets. While others are doing proof reading, I could generate more assets. 

  2. Agile: Check in with client / teammates quickly before going towards a wrong direction for too long. 

  3. Version Control: Keep good version control of assessment design. Every time before making changes to the current work (e.g. after rounds of cognitive interviews), download / copy it. It might also be a good practice to keep a standalone document for dates & major changes.

  4. Documentation: Keep a quote bank of important / fantastic quotes from interviews. 

  5. Field Guide: Write a field guide before going to interviews so that different people could have the same focus in mind. 

Work with Client & SME

  1. Provide consultancy backed up by research: help clients scope the problem / solution space and push back their requests sometimes.

  2. SME checklist: Organization structure, needs and constraints may change constantly over time. AGILE workflow is important to ensure timely checkin.

What I could have done differently:

 

Plan for recruitment ahead of time. Although we were able to find the best proxies to test design with, it would have been better if we could test with actual users.

bottom of page