ITCS 6010 VUI Evaluation

82 Slides862.50 KB

ITCS 6010 VUI Evaluation

Summative Evaluation Evaluation of the interface after it has been developed. Typically performed only once at the end of development. Rarely used in practice. Not very formal. Data is used in the next major release.

Formative Evaluation Evaluation of the interface as it is being developed. Begins as soon as possible in the development cycle. Typically, formative evaluation appears as part of prototyping. Extremely formal and well organized.

Formative Evaluation Performed several times. An average of 3 major cycles followed by iterative redesign per version released First major cycle produces the most data. Following cycles should produce less data, if you did it right.

Formative Evaluation Data Objective Data Directly observed data. The facts! Subjective Data Opinions, generally of the user. Some times this is a hypothesis that leads to additional experiments.

Formative Evaluation Data Subjective data is critical for VUIs.

Formative Evaluation Data Quantitative Data Numeric Performance metrics, opinion ratings (Likert Scale) Statistical analysis Tells you that something is wrong. Qualitative Data Non numeric User opinions, views or list of problems/observations Tells you what is wrong.

Formative Evaluation Data Not all subjective data are qualitative. Not all objective data are quantitative. Quantitative Subjective Data Likert Scale of how a user feels about something. Qualitative Objective Data Benchmark task performance measurements where the outcome is the expert’s opinion on how users performed.

Steps in Formative Evaluation State hypothesis and design the experiment. Conduct the experiment. Collect the data. Analyze the data. Draw your conclusions & establish hypotheses. Redesign and do it again.

Experiment Design Subject selection Who are your participants? What are the characteristics of your participants? What skills must the participants possess? How many participants do I need (5, 8, 10, ) Do you need to pay them?

Experiment Design Task Development What tasks do you want the subjects to perform using your interface? What do you want to observe for each task? What do you think will happen? Benchmarks? What determines success or failure?

Experiment Design Protocol & Procedures What can you say to the user without contaminating the experiment? What are all the necessary steps needed to eliminate bias? You want every subject to undergo the same experiment. Do you need consent forms (IRB)?

Experiment Trials Calculate Method Effectiveness Follow protocol and procedures. Sears, A., (1997) “Heuristic Walkthroughs: Finding the Problems Without the Noise,” International Journal of HumanComputer Interaction, 9(3), 213-23. Don’t say “say” in your experiment, this will bias or contaminate your experiment. Pilot Study Expect the unexpected.

Experiment Trials Pilot Study An initial run of a study (e.g. an experiment, survey, or interview) for the purpose of verifying that the test itself is well-formulated. For instance, a colleague or friend can be asked to participate in a user test to check whether the test script is clear, the tasks are not too simple or too hard, and that the data collected can be meaningfully analyzed. (see http://www.usabilityfirst.com/ )

Experiment Trials – Pilot Study Wizard of OZ You play the “Wizard” or system. Users call the Wizard and have the Wizard pretend to be the system.

Data Collection Collect more than enough data. More is better! Backup your data. Secure your data.

Data Analysis Use more than one method. All data lead to the same point. Your different types of data should support each other. Remember: Quantitative data tells you something is wrong. Qualitative data tells you what is wrong. Experts tell you how to fix it.

Measuring Method Effectiveness

Redesign Redesign should be supported by data findings. Setup next experiment. Sometimes it is best to keep the same experiment. Sometimes you have to change the experiment. Is there a flaw in the experiment or the interface?

Formative Evaluation Methods Usability Inspection Methods Usability Testing Methods Usability experts are used to inspect your system during formative evaluation. Usability tests are conducted with real users under observation by experts. Usability Inquiry Methods Usability evaluators collect information about the user’s likes, dislikes and understanding of the interface.

Usability Inspection Methods Usability experts “inspect” your interfaces during formative evaluation. Widely used in practice. Often abused by developers that consider themselves to be usability experts.

Usability Inspection Methods Heuristic Evaluation Cognitive Walkthroughs Pluralistic Walkthroughs Feature, Consistency & Standards Inspection

Heuristic Evaluation: What is it? Several evaluators independently evaluate the interface & come up with potential usability problems. It is important that there be several of these evaluators and that the evaluations be done independently. Nielsen's experience indicates that around 5 evaluators usually results in about 75% of the overall usability problems being discovered.

Heuristic Evaluation: How can I do it? Obtain the service of 4, 5 or 6 usability experts. Each expert will perform an independent evaluation. Give experts a heuristics inspection guide. Collect the individual evaluations. Bring the experts together and do a group heuristic evaluation. (Optional)

Cognitive Walkthroughs: What is it? Cognitive walkthroughs involve one or a group of evaluators inspecting a user interface by going through a set of tasks and evaluate its understandability and ease of learning. The input to the walkthrough also include the user profile, especially the users' knowledge of the task domain and of the interface, and the task cases. Based upon exploratory learning methods. Exploration of the user interface.

Cognitive Walkthroughs: What is it? The evaluators may include Human factors engineers Software developers People from marketing Documentation, etc. Best used in the design stage of development.

Cognitive Walkthroughs: How can I do it? During the walkthrough: Illustrate the task and then ask a user to perform a task. Accept input from all participants: do not interrupt demo. After the walkthrough: Make interface changes. Plan the next evaluation.

Pluralistic Walkthroughs: What is it? During the design stage, a group of people: Users Developers Usability Experts Meet to perform a walkthrough.

Pluralistic Walkthroughs: How can I do it? The group meets and 1 person acts as coordinator. A task is presented to the group. Paper prototypes, screen shots, etc. are presented. Each participants write down comments on each interface. After the demo, a discussion will follow.

Feature, Consistency & Standards Inspection: What is it? Feature, Consistency & Standards are inspected by an expert.

Feature, Consistency & Standards Inspection: How can I do it? Feature Inspection Consistency Inspection The expert is given use cases/scenarios and asked to inspect the system. The expert is asked to inspect consistency within your application. Standards Inspection The expert is asked to inspect standards. Standards can be in house, government, etc.

Usability Testing Methods Carrying out experiments to find out specific information about a design and/or product. Basis comes from experimental psychology. Uses statistical data methods Quantitative and Qualitative

Usability Testing Methods During usability testing, users work on specific tasks using the interface/product and evaluators use the results to evaluate and modify the interface/product. Widely used in practice, but not appropriately used. Often abused by developers that consider themselves to be usability experts. Can be very expensive and time consuming.

Usability Testing Methods Performance Measurement Thinking-aloud Protocol Question-asking Protocol Coaching Method

Usability Testing Methods Co-discovery Learning Teaching Method Retrospective Testing Remote Testing

Performance Measurement: What is it? Used to collect quantitative data. Typically, you will be looking for benchmark data. Objectives MUST be quantifiable 75% of users shall be able to complete the basic task in less than 30 minutes.

Performance Measurement: How can I do it? Define the goals that you expect users to perform Quantify the goals The time users take to complete a specific task. The Ratio between successful interactions and errors. The time spent recovering from errors. The number of user errors. The number of commands or other features that were never used by the user. The number of system features the user can remember during a debriefing after the test. The proportion of users who say that they would prefer using the system over some specified competitor.

Performance Measurement: How can I do it? Get participants for the experiments Conduct very controlled experiments All variables must remain consistent across users Problem with performance measurement No qualitative data

Thinking-aloud Protocol: What is it? Technique where the participant is asked to vocalize his or her thoughts, feelings, and opinions while interacting with the product.

Thinking-aloud Protocol: How can I do it? Select the participants, who will be involved? Select the tasks and design scenarios. Ask the participant to perform a task using the software. During the task, ask the user to vocalize Thoughts, opinions, feelings, etc.

Thinking-aloud Protocol Problem With Thinking-Aloud Protocol Cognitive Overload Can you walk & chew gum at the same time? Asking the participants to do too much.

Question-asking Protocol: What is it? Similar to Thinking-aloud protocol. Instead of participant saying what they are thinking, the evaluator prompts the participant with questions while using the system.

Question-asking Protocol: How can I do it? Select the participants, who will be involved? Select the tasks and design scenarios. Ask the participant to perform a task using the software.

Question-asking Protocol: How can I do it? During the task, ask the user to questions about the product Thoughts, opinions, feelings, etc. Problem With Thinking-Aloud Protocol Cognitive Overload Can you walk, chew gum & talk at the same time? Asking the participants to do too much. Added pressure when the evaluator asks questions. Can be frustrating on novice users.

Coaching Method: What is it? A system expert sits with the participant and acts as a coach. Expert answers the participant’s questions. The evaluator observes their interaction.

Coaching Method: How can I do it? Select the participants, who will be involved? Select the tasks and design scenarios. Ask the participant to perform a task using the software in the presence of a coach/expert.

Coaching Method: How can I do it? During the task, the user will ask the expert questions about the product. Problem With Coaching Method In reality, there will not be a coach present. This is good for creating a coaching system, but not for evaluating an interface.

Co-Discovery Learning: What is it? Two test users attempt to perform tasks together while being observed. They are to help each other in the same manner as they would if they were working together to accomplish a common goal using the product. They are encouraged to explain what they are thinking about while working on the tasks. Thinking Aloud, but more natural because of partner.

Co-Discovery Learning: How can I do it? Select the participants, who will be involved? Select the tasks and design scenarios. Ask the participants to perform a task using the software.

Co-Discovery Learning: How can I do it? During the task, the users will help each other and voice their thoughts by talking to each other. Problem With Co-Discovery Learning Neither is an expert The blind leading the blind.

Teaching Method: What is it? You have 1 participant use the system. Ask the participant to teach a novice participant how to use the system.

Teaching Method: How can I do it? Select the participants. Select the tasks and design scenarios. Ask the 1st participant to perform a task using the software. Ask the 1st participant to teach a new participant.

Teaching Method: How can I do it? Observe their interactions. Problem With Teaching Method Neither is an expert The blind leading the blind. Advantage of Teaching Method Possible to discover some interesting things about the learn-ability of your interfaces.

Retrospective Testing: What is it? A videotape of the session is observed by the usability expert and the participants.

Retrospective Testing: How can I do it? Select the participants, who will be involved? Select the tasks and design scenarios. Use one of the usability testing methods that we have discussed. Videotape the session.

Retrospective Testing: How can I do it? Review the videotape with the users. Problem With Retrospective Testing Extremely time consuming!

Remote Testing: What is it? The participants are separated from the evaluators. No formal observation. No usability lab.

Remote Testing: How can I do it? Give the product/software to participants. Collect information about how they use your software/product. Methods Same-Time Different Place Different-Time Different Place

Remote Testing: How can I do it? Lotus Video Cam, Look@Me, SnagIt Usability Logger http://www.usabletools.com/ Journaled Sessions

Remote Testing: How can I do it? Problem With Remote Testing The evaluator is not there. Can’t observe facial expressions. Great for Web based systems.

Usability Testing Methods Select the method that works best for you. Select the method that fits your implementation. Be thorough during your experiments. The more data, the better.

Usability Testing Methods Hawthorne Effect The tendency for people to change their behavior and thus performance when they know their performance is being studied.

Usability Inquiry Methods Usability experts learn about the users’ likes, dislikes, needs, etc. of the system through: Observation Verbal questioning Written questioning Widely used in practice. Different methods have different costs, but in general, this is relatively cheap.

Usability Inquiry Methods Contextual Inquiry Field Observation Questionnaires Interviews Focus Groups Logging Actual Use

Contextual Inquiry: What is it? Before designing the system, the expert(s) visit the users’ workplace and question them. This should occur before any design has been done.

Contextual Inquiry: How can I do it? Determine who your users are. Go visit them where they work. Talk to them about the system How do they currently do their job? How would you like to do your job? What do you like about the current system/method? What don’t you like about the current system/method? http://jthom.best.vwh.net/usability/context.htm

Field Observation: What is it? Usability experts observe users in the field using the system/product.

Field Observation: How can I do it? Go to the users’ workplace and simply observe. Things to look for: What is the user’s mental model? Are the users using it the way you expect? You don’t want them to know you are evaluating them.

Questionnaires: What is it? Written lists of questions that you distribute to your users.

Questionnaires: How can I do it? Develop a list of questions on paper, web, email, etc. and give the questionnaire(s) to the users. The users will answer the questions and return the questionnaires to you. http://jthom.best.vwh.net/usability/question.htm http://www.acm.org/ perlman/question.html

Interviews: What is it? You interview users and ask them questions.

Interviews: How can I do it? Develop a list of questions for the users. Meet with the users, individually. Ask them the questions and log the responses Written and/or taped

Interviews: How can I do it? Interview Tips: Clearly define this is an interview. Ask open ended questions to get the user talking. Yes-No questions are bad. Begin with less demanding topics and progress to more difficult topics. Don’t ask questions to support your belief or hypothesis. Do not answer your own questions. Do not agree or disagree remain neutral.

Interviews: How can I do it? Probes: used to encourage the subjects to continue speaking, or to guide their response in a particular direction Addition Probe Encourages more information or clarifies certain responses from the test users. Either verbally or nonverbally the message is, "Go on, tell me more," or "Don't stop."

Interviews: How can I do it? Reflecting Probe Uses a nondirective technique, encourages the test user to give more detailed information. The interviewer can reformulate the question or synthesize the previous response as a proposition. Directive Probe Specifies the direction in which a continuation of the reply should follow without suggesting any particular content. A directive probe may take the form of "Why is the (the case)?"

Interviews: How can I do it? Defining Probe Requires the subject to explain the meaning of a particular term or concept. http://jthom.best.vwh.net/usability/surveys.htm

Focus Groups: What is it? A group of users are gathered to talk about the system. The expert acts as the moderator. Should conduct more than 1 focus group.

Focus Groups: How can I do it? Bring a group of users together and begin. Collect data

Logging Actual Use: What is it? The computer automatically collect usage data. You could ask the user to log their usage, but that’s not practical.

Logging Actual Use: How can I do it? Usability Logger http://www.usabletools.com/ Automatic capture of keyboard, mouse, etc. VideoCam and other products.

Logging Actual Use Facts On Logging Actual Use You know exactly what the user is doing. You don’t know why, but you do know what, when, where. You don’t know how the user feels.

Conclusions The data should support your conclusions. Method Effectiveness Measure Make design changes based upon the data. Establish new hypotheses based upon the data.

Back to top button