Introduction to Assessment
As discussed in Chapter 7 of Dick and Carey, learner-centered assessment is linked very closely to the traditional notion of criterion-referenced tests. The name criterion-referenced is derived from the purpose of the test: to find out whether the criteria stated in an objective have been achieved. Criterion-referenced assessments are composed of items or performance tasks that directly measure skills described in one or more behavioral objectives. The importance of criterion-referenced assessment from an instructional design standpoint is that it is closely linked to instructional goals and a matched set of performance objectives, therefore giving the designers an opportunity to evaluate performance and revise instructional strategies if needed. In other words, criterion-referenced assessment allows instructors to decide how well the learners have met the objectives that were set forth. It also facilitates a reflective process in which learners are able to evaluate their own performance against the stated objectives and assessment items. Smith and Ragan (1999) note that criterion referenced tests have also been referred to as objective-referenced or domain-referenced instruments. They believe that this testing strategy is effective for determining “competency”, especially as it relates to meeting instructional objectives.
In contrast to criterion-referenced tests, norm-referenced tests are designed to yield scores that compare each students performance with that of a group or with a norm established by group scores. They provide a spread of scores that generally allows decision makers to compare or rank learners. They are not based on each student achieving a certain level of mastery. In fact, in many cases items are selected to produce the largest possible variation in scores among students. As a result, items that all students are able to master are often removed in order to maintain a certain spread of scores. An example of a norm-referenced test would be the SAT test. Scores from this test are used to perform comparisons of students for various purposes (such as college admission). Although this form of assessment can be learner-centered, it differs in the manner in which it defines the content that is to be assessed. In this course we will mainly concern ourselves with criterion-referenced assessment.
Types of Criterion-Referenced Tests
Dick, Carey and Carey discuss four different types of criterion-referenced tests that fit into the design process:
- Entry Behaviors Test
- Practice Tests
1. Entry Behaviors Test
An entry behaviors test is given to learners before instruction begins. They are designed to assess learners mastery of prerequisite skills. These are the skills that appear below the dotted line you drew on your instructional analysis flowchart. If you have no entry behaviors then there would be no need to develop a pretest. However, if you have entry behaviors that you are unsure about you should test your learners to help determine if they are indeed entry behaviors after all.
A pretest is used to determine whether learners have already mastered some of the skills in your instructional analysis. If they have, then they do not need as much instruction for those skills. If it becomes obvious that they lack certain skills then your instruction can be developed with enough depth to help them attain those skills. When using a pretest in this manner you are not trying to get a score that you can compare with a later posttest in order to document gains.
A pretest is often combined with an entry behaviors test. However, it is important to keep in mind the purpose of each test. The entry behaviors test determines whether or not students are ready to begin your instruction, while the pretest helps determine which skills in your main instructional analysis they may already be familiar with. However, if you already know that your learners have no clue about the topic you are teaching them, then they may not need a pretest.
3. Practice Tests
Practice tests solicit learner participation during the instruction by providing them with a chance to rehearse the new skills they are being taught. They also allow instructors to provide corrective feedback to keep learners on track.
Posttests are given following instruction, and help you determine if learners have achieved the objectives you set out for them in the beginning. Each item on a posttest should match one of your objectives, and the test should assess all of the objectives, especially focusing on the terminal objective. If time is a factor, it may be necessary to create a shorter test that assesses only the terminal objective and any important related subskills.
Posttests are used by instructors to assess learner performance and hand out grades, but for the designer the primary purpose of a post-test is to help identify areas where the instruction is not working. If learners are not performing adequately on the terminal objective, then there is something wrong with your instruction, and you will have to identify the areas that are not working. Since each test item should correspond to one of your objectives, it should be relatively easy to figure this out.
Designing Tests & Writing Items
There are quite a few issues to consider when designing assessment instruments. Lets spend a little time discussing some of the more important ones.
Types of Assessment Items
The first thing we want to look at is the various types of items you can use when creating assessment items. Earlier we discussed different types of tests (Entry Behaviors Test, Pretest, Practice Tests, and Posttests); now we are discussing individual test items. Possible test items include:
- Product checklist
- Live performance checklist
In the table on page 154, Dick and Carey give some guidelines for selecting item types according to the type of behavior specified in your objective. This table provides a good starting point for deciding on what item type to use for a particular objective. However, when it comes right down to it, the wording of your objective should guide the selection of item type. You should select the type of item that gives learners the best opportunity to demonstrate the performance specified in the objective. For example, if our objective was for students to state the capital of Virginia, it would be best to have them state it from memory (fill-in-the-blank) and not pick it from a list of choices (multiple-choice).
In addition to selecting the appropriate test item type, it is also important to consider the testing environment. If your test items require special equipment and facilities as specified in the “conditions” component of your objective you will need to make sure that those things will be available to them. If not, you will need to create a realistic alternative to the ideal test item. Keep in mind that the farther removed the behavior in the assessment is from the behavior specified in the objective, the less likely you will be able to predict if learners can or cannot perform the objective.
Matching Learning Domain and Item Type
The next issue we want to look at is that of matching the learning domain with an appropriate item type. Organizing your objectives according to learning domain can also aid you in selecting the most appropriate type of assessment item. If you remember, Gagn defined four main learning domains (categories):
- Verbal Information – Verbal skill objectives generally call for simple objective-style test items. This includes short-answer, matching, and multiple-choice.
- Intellectual Skills Intellectual skills objectives require either objective-style test items, the creation of some product, or a performance of some sort. The product or performance would need to be judged by a checklist of criteria.
- Attitudes Attitude objectives are more problematic since there is not usually a way to directly measure a persons attitude. Assessment items generally involve observing learners in action and inferring their attitudes, or having learners state their preferences on a questionnaire.
- Psychomotor Skills Psychomotor objectives are usually assessed by having the learner perform a set of tasks that lead to the achievement of the goal. It also requires a checklist or rating scale so that the instructor can determine if each step is performed properly.
Writing Test Items
You should write an assessment item for each objective whose accomplishment you want to measure. Mager provides these steps to follow when writing a criterion assessment item:
- Read the objective and determine what it wants someone to be able to do (i.e., identify the performance).
- Draft a test item that asks students to exhibit that performance.
- Read the objective again and note the conditions under which the performing should occur (i.e., tools and equipment provided, people present, key environmental conditions).
- Write those conditions into your item.
- For conditions you cannot provide, describe approximations that are as close to the objective as you can imagine.
- If you feel you must have more than one item to test an objective, it should be because (a) the range of possible conditions is so great that one performance wont tell you that the student can perform under the entire range of conditions, or (b) the performance could be correct by chance. Be sure that each item calls for the performance stated in the objective, under the conditions called for.
If you follow these steps and still find yourself having trouble drafting an assessment item, it is almost always be because the objective isnt clear enough to provide the necessary guidance.
Criteria for Writing Test Items
Dick and Carey list several criteria that you should consider when writing test items:
- Goal-Centered Criteria
- Learner-Centered Criteria
- Context-Centered Criteria
- Assessment-Centered Criteria
Lets take a brief look at each one.
As we have inferred already, test items should be congruent with the terminal and performance objectives by matching the behavior involved. What this means is that each test item should measure the exact behavior and response stated in the objective. The language of the objective should guide the process of writing the assessment items. A well-written objective will prescribe the form of test item that is most appropriate for assessing achievement of the objective. Appropriate assessment items should answer “yes” to the following questions:
- Does the assessment item require the same performance of the student as specified in the instructional objective?
- Does the assessment item provide the same conditions (or “givens”) as those specified in the instructional objective?
For example, if the performance of an objective states that learners will be able to state or define a term, the assessment item should ask them to state or define the term, not to choose the definition from a list of answers.
Another common bad practice is teaching one thing and then testing for another. You should not use a test item that asks for a different performance than the one called for by your objectives. For example, if you have an objective that says students need to be able to make change, it would be deceitful to then have test items such as the following:
- Define money
- Name the president on the fifty-dollar bill.
- Describe the risks of not being able to count.
None of these items asks the student to do what the objective asks, which is to make change. As a result you will not know if your students can perform as required. Kemp, Morrison, and Ross (1998) cite several more examples of “mismatches” between objectives and assessment. In one example a college professor whose objective asked students to analyze and synthesize developments of the Vietnam War simply asked students to list those developments in the final exam. Other examples include a corporate training course on group leadership skills that included objectives that were performance or skill-based, yet the sole assessment items were multiple-choice. As these examples illustrate, it is important to determine which learning domain your objective falls into in order to write the most appropriate type of assessment.
So why do we keep saying that the performance indicated in the assessment item has to match the performance in the objective? Well, the point of testing is to be able to predict whether your learners will be able to do what you want them to do when they leave you, and the best way to do that is to observe the actual performance that you are trying to develop. Mager (1988) provides a good story to illustrate this point. Suppose your surgeon were standing over you with gloved hands and the following conversation took place:
Surgeon: Just relax. Ill have that appendix out in no time.
You: Have you done this operation before?
Surgeon: No, but I passed all the tests.
You: Oh? What kind of tests?
Surgeon: Mostly multiple-choice. But there were some essay items, too.
So, would you prefer your surgeon to have had some meaningful, practical types of assessments or strictly paper-and-pencil-tests?
Test items should take into consideration the characteristics and needs of the learners. This includes issues such as learners vocabulary and language levels, motivational and interest levels, experiences and backgrounds, and special needs. To start with, test items should be written using language and grammar that is familiar to the learners. Another important aspect of learner-centered assessment is that the level of familiarity of experiences and contexts needs to be taken into consideration. Learners should not be asked to demonstrate a desired performance in an unfamiliar context or setting. The examples, question types, and response formats should also be familiar to learners, and your items should be free of any gender, racial, or cultural bias.
Remember the context analysis you wrote? Well, when writing test items you should consider both the performance context and the learning context your wrote about. It is important to make your test items as realistic and close to the performance setting as possible. This will help ensure the transfer of skills from the learning environment to the eventual performance environment. According to Dick and Carey, “the more realistic the testing environment, the more valid the learners responses will be” (pg. 153). It is also important to make sure the learning environment contains all the necessary tools to adequately simulate the performance environment.
Test items should be well written and free of spelling, grammar, and punctuation errors. Directions should be clearly written to avoid any confusion on the part of the learner. Its also important to avoid writing “tricky” questions that feature double negatives, deliberately confusing directions, or compound questions. Your learners should miss questions because they do not have the necessary skill, not because your directions were unclear, or because you wanted to throw them off with unclear wording.
Dick and Carey provide a checklist of these four criteria on page 165. Use this checklist as you create your own test items.
How Many Items?
The question inevitable arises as to how many items are necessary to achieve mastery of an objective? For some skills only one item is necessary. However, for others it may require more than one item. For example, a second grade student may be asked to demonstrate his mastery of an arithmetic rule by means of the item: 3M + 2M = 25; M=? Obviously, the purpose of assessment would be to determine if the student could perform aclass of arithmetic operations similar to this, not whether he or she is able to perform this single one. Generally items of the same type and class would be employed to ensure the reliability of the results.
Also, on any single item a student may make a correct response because he or she has seen the correct response before, or perhaps has just made a good guess. In this case several items may be warranted. With some assessment items, though, guessing is not something that could be rewarded, so you may only require a single performance. Another possibility is that a single item may be missed because a student has been misled by some confusing characteristic of the item, or has simply made a “stupid” mistake.
It is essential to keep in mind that, no matter how many items are created for an objective, the conclusion aimed for should not be, “how many did they get correct?” but rather, “does the number correct indicate mastery of the objective?” Also keep in mind that while two items may be better than one, it may also yield a 50-50 result, with a student getting one right and one wrong. Would this indicate mastery? Gagn (1988) suggests having three items in this case instead of two, as two out of three provides a better means of making a reliable decision about mastery.
Assessment of Performances, Products, and Attitudes
Some intellectual skills, as well as psychomotor and attitudinal skills, cannot be assessed using common objective-type test items. They require either the creation of some type of product or a performance of some sort. These types of performances need to be assessed using an evaluation or observation instrument. Dick and Carey suggest that you provide guidance during the learning activities and construct a rubric to assist in the evaluation of the performance or product.
Attitudes are unique in that they require in that they are not directly measurable. Instead, the best way to assess attitudes is to observe the learner exhibiting or not exhibiting the desired attitude. During observation, it is important that the learners be given the choice to behave according to their attitudes. If you are observing a performance and the learners know they are being observed their behavior may not reflect their true attitudes. If direct observation isn’t possible you can have students respond to a questionnaire or open-ended essay question. Much care should be taken when constructing such tests, though. If you simply give them a test with leading questions and/or directions describing the nature of the test they are likely to give you the answer they think you want to read. The results would not tell you how they would act when faced with a real-world situation involving that attitude.
Dick and Carey make the following suggestions regarding the development of these types of assessment instruments:
Directions for performance and products should clearly describe what is expected of the learners. You should include any special conditions and decide on the amount of guidance you will provide during the assessment. In some situations you may want to provide no guidance.
Developing the Instrument
When assessing performances, products, or attitudes you will need to create an assessment instrument to help you evaluate the performance, product, or attitude. Dick and Carey offer five steps to creating this instrument:
- Identify the elements to be evaluated these elements should be taken directly from the behaviors and criteria included in your objectives. Make sure that the elements you select can be observed during the performance.
- Paraphrase each element elements should be paraphrased to cut down on the length of the instrument. Also, make sure that a “Yes” response on the instrument always corresponds with a positive performance, and a “No” response with a negative performance.
- Sequence the elements on the instrument the order in which the elements are listed should match the natural order of the performance. For example, if you are creating an instrument to help assess the changing of a tire you would not put “Tightens lug nuts on new tire” at the top of the list.
- Select the type of judgment to be made by the evaluator When evaluating a performance, product, or attitude, judgments can be made using checklists, rating scales, or frequency counts. Checklists provide a simple “yes” or “no” as to whether or not a learner meets a criterion or element. Rating scales take this a step further by allowing for in-between ratings instead of strictly “yes” or “no”. Frequency counts are used for indicating the number of times a learner meets or displays a criterion or element. This is good if the element can be observed more than once.
- Determine how the instrument will be scored With checklists you can simply add up the “yes” answers to obtain a score for each objective and for the entire process or product. With rating scales you can add up the numbers assigned for each element. Frequency counts are a little more complicated as you have to determine how to create a score. You can add up the frequencies for an element, but you would still have to determine what constitutes a good score and whether a lack of an occurrence would be detrimental.
Dick and Carey provide good examples of assessment instruments for evaluating psychomotor skills and attitudes on page 169 and 170 of their book.
Many of you are probably familiar with this type of assessment. Portfolios are collections of work that together represent learners achievements over an extended period of time. This could include tests, products, performances, essays, or anything else related to the goals of the portfolio. They allow you to assess learners work as well as their growth during the process. As with all other forms of assessment, whatever is included in the portfolio must be related to specific goals and objectives. The choice of what to include can be decide on entirely by the teacher, or in cooperation with students. Assessment of each portfolio component is done as it is completed, and the overall assessment of the portfolio is carried out at the end of the process using rubrics. In addition, learners are given the opportunity to assess their own work by reflecting on the strengths and weaknesses of various components. Portfolios can also be used as part of the evaluation process to determine what students did and did not learn, and then that information can be used to strengthen the instruction.
Evaluating Congruence in the Design Process
One of the most crucial aspects of the assessment phase of the design process is to be able to evaluate the congruence of the assessment against the objectives and analyses that have been performed. Remember that this is a systematic approach to instructional design, which means that every step in the process influences subsequent steps. As such, all of your skills, objectives, and assessment items should be parallel. One way to clearly represent this relationship is to create a three-column table that lists each of the skills from your instructional analysis, the accompanying objective, and the resulting assessment item. At the bottom of the table you would finish up with your main instructional goal, the terminal objective, and the test item for the terminal objective.
Design Evaluation Chart
It is important at this point to make sure that your design is adequate so that you will be able to move on to the next step in the instructional design process. The next step involves developing an instructional strategy based on all of the design work you have done up to this point. But before we move on, lets close with one more note from Mager regarding objectives and assessment:
If you write your test items according to the above procedures, and if you find yourself saying, “But the test items look pretty much like the objectives,” you need to have a little chat with yourself. Remember that the objective of instruction is to bestow competence just as elegantly as you can manage to do it. The object is not to use trick questions just to make it harder, or to spread people on a curve, or to find out whether students “really” understand. The object is to find out whether they have achieved the objectives you derived for them to achieve. If your test items look similar to your objectives, rejoice. Theyre supposed to look similar.
Here are some examples of good and bad assessment items:
Objective: The student will state the time shown on an analog clock to the nearest 5 minutes.
Bad assessment: Students are given a time and are asked to draw the corresponding minute and hour hands on a blank clock diagram.
Good assessment: Students are given pictures of analog clock faces and are asked to state the time indicated on each clock.
Objective: The student will set up an attractive merchandise display in the student store, with appropriate signs.
Bad assessment: Students are asked to write a paragraph describing the six elements of an attractive merchandise display.
Good assessment: Students are scheduled a week to set up an attractive merchandise display in the student bookstore. Displays are evaluated using a criteria checklist.
Objective: Students will write a descriptive essay of at least 300 words.
Bad assessment: Have students read several examples of good essays.
Bad assessment: Write a descriptive essay in class by having each student contribute a sentence.
Bad assessment: Have each student orally describe an unknown object until the other students can guess what the object is.
Good assessment: Have students choose a topic and write an essay describing it.