转载:测试学,区分度,难度,选项分析
2018-11-11 10:11阅读:
原文链接:
http://www.specialconnections.ku.edu/?q=assessment/quality_test_construction/teacher_tools/item_analysis
from 堪萨斯大学。
批注:大猫咪。本文术语可能非专业术语,但会尽力帮助理解
*****************************
Item Analysis (本文的item指一道题:1个题目4个选项)
What is item analysis?
Item analysis is a process of examining class-wide performance on
individual test items. There are three common types of item
analysis which provide teachers with three different types of
information:
- Difficulty
Index(难度:做对本道题的人数/参加考试的人数) -
Teachers produce a difficulty index for a test item by calculating
the proportion of students in class who got an item correct. (The
name of this index is counter-intuiti
ve, as one actually gets a measure of how easy the item is, not the
difficulty of the item.) The larger the proportion, the more
students who have learned the content measured by the item.
- Discrimination
Index(区分度:下面有详细公式) - The discrimination index is a basic measure of the
validity of an item. It is a measure of an item's ability to
discriminate between those who scored high on the total test and
those who scored low. Though there are several steps in its
calculation, once computed, this index can be interpreted as an
indication of the extent to which overall knowledge of the content
area or mastery of the skills is related to the response on an
item. Perhaps the most crucial validity standard for a test item is
that whether a student got an item correct or not is due to their
level of knowledge or ability and not due to something else such as
chance or test bias.
- Analysis of Response
Options(选项分析:ABCD四个选项,分别有多少比例的学生选这个选项) - In addition to examining the performance of an entire
test item, teachers are often interested in examining the
performance of individual distractors (incorrect answer options) on
multiple-choice items. By calculating the proportion of students
who chose each answer option, teachers can identify which
distractors are 'working' and appear attractive to students who do
not know the correct answer, and which distractors are simply
taking up space and not being chosen by many students. To eliminate
blind guessing which results in a correct answer purely by chance
(which hurts the validity of a test item), teachers want as many
plausible distractors as is feasible. Analyses of response options
allow teachers to fine tune and improve items they may wish to use
again with future classes.
Performing item analysis
Here are the procedures for the calculations involved in item
analysis with data for an example item. For our example, imagine a
classroom of 25 students who took a test which included the item
below. The asterisk indicates that B is the correct answer.
(本例中有25个学生参加考试,其中有一道题目是:《了不起的盖茨比》的作者是谁?)
Number of Students Choosing Each Answer Option
Who wrote
The Great Gatsby?
A. Faulkner
*B. Fitzgerald
C. Hemingway
D. Steinbeck
4
16
5
0
(分别有这些学生选了ABCD)
Total Number of Students
25
Item Analysis Method
Procedures
Example
*********************
Difficulty Index(难度)-
Proportion of students who got an item
correct
Count the number of students who got the correct answer.
Divide by the total number of students who took the test.
Difficulty Indices range from .00 to 1.0.
16
16/25 = .64
(难度:25个学生有16个选中了正确答案,所以难度是0.64。显然,难度的取值范围是0%-100%。)
*********************
Discrimination Index (区分度)- A comparison of
how overall high scorers on the whole test did on one particular
item compared to overall low scorers.
(首先,全班总成绩分两组,卷面不止这一道题,还有其他题目,根据整个卷面总成绩,把全班成绩从高到低对半分两组,成绩好的一组,成绩差的一组。
本例中,13个总成绩排名前一半的一组,12个后一半的一组。分别简称“高分组”和“低分组”。
高分组里,这道题目做对的有10人,占比10/13=
0.77。
低分组里,这道题目做对的有6人,占比6/12=
0.50。
那么,区分度就是0.77-0.50=0.27。
区分度的取值范围是
-1.0~+1.0。+1.0的情况是最理想的,即总成绩排名前一半的同学,这道题目都做对了,排名后一半的同学都做错了。那么这道题目就完美的“区分”了这两组同学。-1.0的情况是最奇怪的,即成绩排名前一半的同学都做错了,后一半的同学都做对了,那么,这道题目完全不能“区分”总成绩的高低,这道题可能出的有问题。
总之,区分度,是把一道题目,放到本次整个班级的整套试卷中去看待,看是否能够较好的区分开总成绩高和总成绩低的学生。
)
Sort your tests by total score and create two groupings of tests-
the high scores, made up of the top half of tests, and the low
scores, made up of the bottom half of tests.
For each group, calculate a difficulty index for the item.
Subtract the difficulty index for the low scores group from the
difficulty index for the high scores group.
Discrimination Indices range from -1.0 to 1.0.
Imagine this information for our example: 10 out of 13 students (or
tests) in the high group and 6 out of 12 students in the low group
got the item correct.
High Group 10/13= .77
Low Group 6/12= .50
.77-.50=.27
Analysis of Response
Options(选项分析:多少人选了A,多少选B,C和D)- A comparison of the
proportion of students choosing each response option.
For each answer option divide the number of students who choose
that answer option by the number of students taking the test.
Who wrote
The Great Gatsby?
A.
Faulkner 4/25 = .16
*B.
Fitzgerald 16/25 = .64
C.
Hemingway 5/25 = .20
D.
Steinbeck 0/25 = .00
Interpreting the results of item analysis
(如何解释难度、区分度、选项分析这些指标)
In our example, the item had a difficulty index of .64. This
means that sixty-four percent of students knew the answer. If a
teacher believes that .64 is too low, he or she can change the way
they teach to better meet the objective represented by the item.
Another interpretation might be that the item was too difficult or
confusing or invalid, in which case the teacher can replace or
modify the item, perhaps using information from the item's
discrimination index or analysis of response
options.(难度0.64,即64%的同学这道题做对了。这个数值是高是低,要看老师自己去理解。例如,要是觉得过低,那就上课再教教好。老师也可能觉得64%不能分析出啥来,可能这道题目没有出好。那就有可能借助区分度和选项分析来研究下。)
The discrimination index for the item was .27. The formula
for the discrimination index is such that if more students in the
high scoring group chose the correct answer than did students in
the low scoring group, the number will be positive. At a minimum,
then, one would hope for a positive value, as that would indicate
that knowledge resulted in the correct answer. The greater the
positive value (the closer it is to 1.0), the stronger the
relationship is between overall test performance and performance on
that item. If the discrimination index is negative, that means that
for some reason students who scored low on the test were more
likely to get the answer correct. This is a strange situation which
suggests poor validity for an item.(区分度0.27。是个正数。正值越大代表越能区分。区分度是负数的话,就代表总成绩低的学生反而更能做对这道题目,这就说明这个题目的效度(Validity)有问题了。效度也是题目的重要指标,而区分度是能够指示效度的一个指标。效度Validity,就需要另外一篇文章解释了。)
The analysis of response options shows that those who missed
the item were about equally likely to choose answer A and answer C.
No students chose answer D. Answer option D does not act as a
distractor. Students are not choosing between four answer options
on this item, they are really choosing between only three options,
as they are not even considering answer D. This makes guessing
correctly more likely, which hurts the validity of an
item.(选项分析:本例题目中,没人选D,说明D就不是一个有效的干扰选项distractor。没人选的干扰项,就会降低这道题目的效度。)
How can the use of item analysis benefit your students, including
those with special needs?
The fairest tests for all students are tests which are valid
and reliable(效度和信度). To improve the quality of tests, item analysis can
identify items which are too difficult (or too easy if a teacher
has that concern), are not able to differentiate between those who
have learned the content and those who have not, or have
distractors which are not plausible.(太难的或太简单的题目,都不会帮助区分学生有没有学到知识。干扰项太明显没人去选的话,也会降低区分度)
If items are too hard, teachers can adjust the way they
teach. Teachers can even decide that the material was not taught
and for the sake of fairness, remove the item from the current
test, and recompute scores.(题目太难的话,老师甚至可以把这道题从卷面分中移除并重新计算分数)
If items have low or negative discrimination values, teachers
can remove them from the current test and recomputed scores and
remove them from the pool of items for future tests. A teacher can
also examine the item, try to identify what was tricky about it,
and either change the item or modify instruction to correct a
confusing misunderstanding about the content.(区分度太低或者是负值的话,这题目就有问题,要改,或干脆不用了)
When distractors are identified as being non-functional,
teachers may tinker with the item and create a new distractor. One
goal for a valid and reliable classroom test is to decrease the
chance that random guessing could result in credit for a correct
answer. The greater the number of plausible distractors, the more
accurate, valid, and reliable the test typically
becomes.(某个干扰项没人选的话,就需要重新出一个干扰项。效度和信度都很高的考试,能够减少学生靠猜来得分的情形。干扰项越好,测试就越有效,越可信。)
References
Research Articles
- Haladyna, T.M. & Downing, S.M. & Rodriguez, M.C.
(2002). A review of multiple-
- choice item-writing guidelines for classroom assessment.
Applied Measurement
in Education, 15(3), 309-334.