CritAssess
An online proctored exam tool built with Angular assessing critical thinking abilities of technical candidates

01
OVERVIEW
Team:
-
Ryan Emberling (Developer)
-
Tanvi Domadia (Researcher)
-
Juliet Pusateri (Product Manager)
-
Qianhui Sun (Content Designer)
-
Doris Zhang (Designer)
Client: Microsoft
Tools:
-
Trello, Slack (AGILE workflows)
-
Mural board (Online Co-design Collaboration)
-
Figma, Sketch (Prototype)
-
InDesign (Layout Design)
Duration: Jan. 2019 - Aug. 2019
My Contributions:
-
Conducted 50+ SME interviews to locate performance gaps and decide training goals
-
Facilitated co-design workshops with end users and stakeholders
-
Used AGILE workflows to communicate research findings and project progress with client on a weekly basis
-
Composed 8 Situational Judgement Test scenarios.
-
Used a mixture of qualitative and quantitative research methods for validity check and reliability check
-
Facilitated developer to create components in Angular
Problem:
How might we assess non-technical skills of technical candidates in an authentic and engaging way?
Our Solution:
CritAssess is a proctored exam tool of an hour-long Situational Judgement Test (SJT) of critical thinking skills.
It provides learners with learner report written in easy-to-understand language to understand their proficiency in critical thinking and make improvement. It offers qualified candidates certificates for recruiters to use as a reference.

Scenario-based Exam

Learner Report

Certificate
Our Approach:
-
Achieved a high reliability (cronbach's alpha=0.891) which shows high consistency between all test items.
-
Be potentially integrated to our client's eLearning platforms that have millions of users worldwide.
Our Impacts:
02
FINAL DELIVERABLE

Instructional feature 1
SITUATIONAL JUDGEMENT TEST
Reason 1: Authenticity
“These are exactly the probing questions that the hiring manager wants to hear people talk about.”
—— CS recruiter
Reason 2:Transfer
Doing well in our assessment means that you are very likely to do equally well in similar job settings.
Instructional feature 2
TWO-PART SELECTED RESPONSE ASSESSMENT
Reason 1: Scalability
Our client faces a global audience.
Reason 2: Separate conclusion from reasons
They involve different thinking processes.
02 Design
Parallel Design
Storyboarding
Card Sorting
A/B Testing
04 Quality Assurance
Validity Check
Reliability Check
Usability Testing
01 Research
Literature Review
Competitive Analysis
Stakeholder Interview
Affinity Diagramming
03 Task Development
Stakeholder Interview
Cognitive Task Analysis
Cognitive Interview
Instructional feature 3
FORMATIVE AND SUMMATIVE ASSESSMENT
Reason 1: Balance two types of user needs
Learners want to know where their strengths and weaknesses are and improve with a growth mindset while recruiters want candidates with good critical thinking skills.


Instructional feature 4
SEGMENTED TASKS
Reason 1: Avoid extraneous cognitive load
Long hours of exam requires huge essential cognitive processing.
Reason 2: Offer learner control
Learners could take the exam at their own pace.
Instructional feature 5
KNOWLEDGE COMPONENT MODELING
Reason 1: More accurate measurement of learning gains
Each question is mapped with several knowledge components (KC). Credits given to each single question are also given to KCs.
Reason 2: Targeted feedback
Targeted feedback is given to knowledge components with high, middle, or low performance.


Design feature 1
TWO-PANEL DESIGN OF SCENARIO & QUESTIONS
Reason: Spatial contiguity reduces the need of scrolling up and down, thus minimizing extraneous cognitive processing
Design feature 2
HOVER-OVER GLOSSARY
Reason 1: Domain-specific terms → authenticity
Reason 2: Don't require prior knowledge

03
RESEARCH
Research goals
With a broad problem space in mind, our research is guided by the following 4 questions:
-
Among all the non-technical skills, which one shall we focus on?
-
What is the definition of critical thinking?
-
How is critical thinking skill being used in industry and academia now and what are the gaps?
-
What are the best practices to assess non-technical skills?
Stakeholders

Our target users are high school seniors and college freshmen who are going to take a career in STEM. This illustration of the ecosystem includes all our stakeholders.
Research methods & process

Secondary Research
60+
20+
20+
National / State-level standards of critical thinking / non-technical skills
Research on the best practice of assessing critical thinking
Competitive analysis of existing vendor assessment
Field Research
18
Contextual inquiry of secondary school students
5
Interviews with school teachers and career counselors
8
Interviews with CS employees and tech recruiters
Insights

Q3 How is critical thinking skill being used in industry and academia and what are the gaps?
After 32 stakeholder interviews, we compared and contrasted the ways critical thinking skills are currently used and assessed in the context of computer science careers and secondary school.
We summarized performance gaps using the gap model and decided to set our assessment focuses on open-mindedness, analysis and reflection.

Q1 Among all the non-technical skills, which one shall we focus on?
We did extensive literature review on employment white papers to find critical thinking to be the most essential & hard to hire skill.
Q2 What is the definition of critical thinking?
We reviewed standards on non-technical skills and CS education to give domain-specific and domain-general definitions of critical thinking skills.
Insight 1:
Critical thinking is vital to employability in computer science careers.
Insight 2:
Students don’t receive explicit instruction defining critical thinking skills and consequently don’t know what it means.
Insight 3:
Students are impulsive and have difficulty generating & evaluating ideas with an open mind.
Insight 4:
Students rarely use reflection to iterate on their work or process.
Q4 What are the best practices to assess non-technical skills?

Trend 1: Stealth Assessment / Gamification
Use data continuously gathered through game playing to assess learners' mastery levels
Trend 2: Computer-supported Collaborative Learning
Computer agent / chatbot boosts meaningful online discussions by providing scaffolded prompts
Trend 3: Automated Essay Grading
Use NLP to parse the text and summarize micro features that could be mapped to macro features
Trend 4: Situational Judgement Test
Embed assessment in an authentic, work-related context, most commonly used in medical training.
Final Design Problems
1. HMW embed the assessment of open-mindedness in an engaging and interactive experience?
2. HMW help students to discover the critical thinking skills most integral to their employability roadmap?
3. HMW assess secondary school students' ability to reason, evaluate, and communicate with an open mind in collaborative decision making?
4. HMW assess secondary school students' ability to use feedback and self-examination to reflect on their work and iteratively improve themselves?
04
IDEATION
Brainstorm
Our team had a series of ideation sprints using different brainstorming techniques. This helped us generate a wide enough range of ideas when the insights from the field study was still fresh in our memory.

Define Success Metrics
We proposed a list of evaluation criteria and did a card sorting activity with our client to better understand their priorities.

Validate
With each team member creating 3-5 storyboards, 23 storyboards were created in total. The team voted and combined similar ideas, consolidating into 6 storyboards. We did speed dating with 6 potential users (CMU STEM students) and 3 teachers. We further iterated them into 4 final storyboards. I produced the storyboards digitally. We evaluated each of these ideas against the evaluation metrics.

Idea 1:

Idea 2:

Problem it solves:
-
The use of open-mindedness and fair-mindedness is limited in secondary school; yet integral to computer science careers
Competitive edge:
-
Students love games
-
Adaptive model
Implementation requirement:
-
Educational Game Design
-
Selection of target critical thinking sub-skills
-
Playtesting to inform scenario design
Problem it solves:
-
Students don’t know what critical thinking skills are valued in workforce.
-
Current assessment of critical thinking skills is rarely embedded in an authentic context.
Competitive edge:
-
Scalability in a global online learning environment
-
Authenticity in career preparation
Implementation requirement:
-
A database of critical thinking sub-skills valued by recruiters for different jobs
-
Situational judgement tasks specially designed for each type of jobs
-
NLP that extracts micro features from students’ open-ended response
-
Rubrics that map micro features to macro features (critical thinking sub-skills)
Idea 3:

Idea 4:

Problem it solves:
-
Weighing options is missing in secondary schools
-
Students don’t reason in an open-minded manner
Competitive edge:
-
Teachers love it the most
Implementation requirement:
-
Natural Language Processing for grading skills in the chat
-
Access to Carolyn Rose’s work with the chatbot
Problem it solves:
-
Students seldom have the chances for authentic
-
Students don’t reason in an open-minded manner
Competitive edge:
-
Authentic video communication
Implementation requirement:
-
Database & Web server
-
Algorithm for producing weighted average grade based on peer grades
05
VALIDATION
Our goals:
-
Which idea our client / academic mentors like the most - to settle a large direction
-
Which parts of a particular idea do our client like, which they don't - to combine detailed features
Our approach:
-
Research presentation
-
Metric evaluation
-
Card Sorting
Result:
After co-design session and a series of debrief meetings that follow, our client decided to proceed with a modified version of Idea 2. We decided to use multiple choice question as our delivery format.
What I contributed:

-
I presented related EdTech research for our client to make the best-informed decision
-
I proposed the idea that our client finally decided to move forward with (with an adapted version)
Process:
To facilitate an online co-design activity, we used Mural Board to collaborate. We classified the features of all 4 proposed ideas into instructional features, technical features and assessment goals.
The thumbs up on the left of each storyboards indicate the number of participants who liked this idea.
We used different color codings to indicate opinions from different parties: pink stands for our team member while green stands for our client / professors.

After a series of debrief meetings that our client had internally, we decided that the final product would have the following 3 features:

Situational Judgement Test
Our client wants the assessment to be specifically targeting at CS employees. However, all the existing vendor assessment are domain general. Situational judgement test uses authentic work context to improve learning transfer.
Scenario-based

Learner Report & Certification
Our client wants the assessment to be both formative and summative. Therefore, we provide a learner report for learners to improve with a growth mindset as well as a certificate to those who pass certain scores.
Formative & Summative

Multiple Choice Question
Our client wants the assessment to be scalable. They prefer closed-form to open-ended questions. They prioritize feasibility over novelty. Therefore, we finally decided to go with multiple choice question.
Closed-form
06
PROTOTYPING
Wireframe
Onboarding


Exam


Formative Feedback - Learner-facing


Summative Feedback - Recruiter-facing


User Feedback
We first iterated on our prototype based on first-round user testing results. Main iterations from low-fi prototype to mid-fi prototype were:
1. Provided three 20-minutes exam sections
2. Allowed users to check exam progress and time
3. Left-right layout of test questions
When validating our design ideas, we used a similar testing structure as in low-fi prototype user testing. We recruited three users with experience in user experience design. We gave each of them three main tasks:
-
Start an exam
-
Go through the exam
-
Check and interpret exam results
We observed and recorded their performance and asked for specific feedback after finishing the tasks, findings include:
-
Users want to see general instructions at the beginning of the test
-
Users hope the scenarios and questions can be shorter and easier to read on-screen
-
Users do not understand technical terms and want to have an easy way to check a glossary
-
Users want to have a confirm button during the exam to prevent user error
-
Users want to see how many sections they have completed in a dashboard
Hi-fi Iterations
01. Add the word 'correct' on learner report to dismiss ambiguity

Before

After
02. Increase the size of trophy and test history selection to give clearer visual indication of completion

Before

After
03. Change the position of explanatory feedback to improve readability and visual hierarchy

After

Before
04. Document Design to improve readability and reduce extraneous cognitive loads

Before

After
Final Design
Onboarding

.png)

Exam-taking

.png)

Exam Review


Dashboard & Report


.png)
07
TASK DEVELOPMENT
Process Overview:

Goal Analysis:
From spring research, we identified our learning goals to be reasoning, prioritization, and open-mindedness, which are the most essential yet lacking critical thinking sub-skills.
The next step we took was to further divided those learning goals into knowledge components. We further categorized them into big ideas, important to know / do, worth being familiar with.

Assessment Item Design:
In order to make our assessment authentic, we interviewed technical recruiters and CS employees to see how critical thinking is currently being used in their daily work. There are 3 things that we specially probed for in our interviews:
-
Open-mindedness - Understanding colleagues’ perspectives or different points of views
-
Reasoning - Planning and prioritizing work for the day, week and month
-
Analysis - Finding the root cause of a problem
Regarding specific questions, we explicitly asked tech recruiters to list scenarios in which tech employees need to
-
Understand their colleagues’ perspectives
-
Weigh pros and cons to choose the best option
-
Plan and prioritize their work for the day, the week, and the month?
-
Find the root cause of a problem
We adapted the scenarios that we gathered through interviews into 21 scenarios.
Goal-question Mapping:

We mapped knowledge components to each single question. 3 people were involved in this task for inter-rater reliability.
Scoring Principles:
1. Each question contributes equally to the overall score.
2. If the learner gets a question correct, all the knowledge components assessed in that question would be awarded corresponding credits.
3. Each item from a select-all-that-apply question is treated as a separate, independent true-or-false question.
4. In a select-all-that-apply question, there is no penalty for incorrect answers, the lowest score one could get for a question is zero.
Interpretation:
After finishing all three parts of the assessment, learners will receive a learner report of their top three performing knowledge components and bottom three performing knowledge components determined by the score they received. There are 3 steps to create a learner-centered performance report.
STEP 1. Map each question to different knowledge components.
STEP 2. Create a scoring rubric by referencing existing critical thinking rubrics.
STEP 3. Translate rubric into learner-interpretable learner report.
08
QUALITY ASSURANCE

Validity Check

Reliability Check
Validity Check: Target Learners
We conducted empirical cognitive task analysis on students aged 13 - 19 at MOS from June 17-19, 2019 in Orlando, Florida during Microsoft Office Specialist (MOS) U.S. National Championship and removed all questions related to coding / pseudo-code, added additional explanation in the scenario prompt explicitly asking students not to judge with prior knowledge of technical features or implementation.
We conducted cognitive interviews on CMU undergrads in STEM majors to make sure that students understand the tasks in the way we want them to.
![]() | ![]() | ![]() | ![]() |
---|---|---|---|
![]() |
Validity Check: Academic Experts
We asked academic experts whether the way we divide knowledge components makes sense, whether we are assessing things that we intend to assess, whether our scoring rubric makes sense, etc.
Validity Check: Recruiter
We asked recruiters to go through the tasks and provide feedback on whether they find the tasks authentic / whether they are going to ask those questions during a job interview.
Reliability Check: Target Learners
Goal:
-
Check the reliability of test items and delete or rephrase those that impede overall reliability accordingly
-
Awarding partial credit to items if this could improve overall reliability
Process:
We conducted two rounds of quantitative analysis. The first round on computer science graduate students. The second round was comprised of two thirds STEM undergrad students (computer science, software engineering, electrical and computer engineering, robotics etc.), and one third Amazon Mechanical turk masters ages 18-25 who were US high school graduates.
Iterations:
In response to our reliability check results, we made three types of changes to our questions: deletion (scenario, question, or item), rephrasing, or rescoring. The improvement in reliability is visualized in the chart below, color red indicates trouble-makers while color green indicates healthy results.
Result:
Reached a high reliability showing high consistency between all test items (Cronbach's alpha = 0.891)

09
REFLECTION
What I learnt:
Workflow
-
Multi-tasking: While instructional designers are still working to generate more scenarios, designers could start to create multimedia assets. While others are doing proof reading, I could generate more assets.
-
Agile: Check in with client / teammates quickly before going towards a wrong direction for too long.
-
Version Control: Keep good version control of assessment design. Every time before making changes to the current work (e.g. after rounds of cognitive interviews), download / copy it. It might also be a good practice to keep a standalone document for dates & major changes.
-
Documentation: Keep a quote bank of important / fantastic quotes from interviews.
-
Field Guide: Write a field guide before going to interviews so that different people could have the same focus in mind.
Work with Client & SME
-
Provide consultancy backed up by research: help clients scope the problem / solution space and push back their requests sometimes.
-
SME checklist: Organization structure, needs and constraints may change constantly over time. AGILE workflow is important to ensure timely checkin.
What I could have done differently:
Plan for recruitment ahead of time. Although we were able to find the best proxies to test design with, it would have been better if we could test with actual users.