AIOU Solved Assignments 1 & 2 Code 8602 Autumn & Spring 2020. Solved Assignments code 8602 Educational Assessment and Evaluation 2021. Allama iqbal open university old papers.
Course: Educational Assessment and Evaluation (8602)
Level: B.Ed (1 1?2 & 21?2 Years)
Semester: Autumn & Spring 2020
ASSIGNMENT No. 2
Q.1. State different methods to enhance the reliability of the measurement tool also
explain each type by providing examples.
What does the term reliability mean? Reliability means Trustworthy. A test score is
called reliable when we have reasons for believing the test score to be stable and objective.
For example if the same test is given to two classes and is marked by different teachers
even then it produced the similar results, it may be considered as reliable. Stability and
trustworthiness depends upon the degree to which score is free of chance error. We must
first build a conceptual bridge between the question asked by the individual (i.e. are my
scores reliable?) and how reliability is measured scientifically. This bridge is not as simple
as it may first appear. When a person thinks of reliability, many things may come into his
mind – my friend is very reliable, my car is very reliable, my internet bill-paying process is
very reliable, my client’s performance is very reliable, and so on. The characteristics being
addressed are the concepts such as consistency, dependability, predictability, variability
etc. Note that implicit, reliability statements, is the behaviour, machine performance, data
processes, and work performance may sometimes not reliable. The question is “how much
the scores of tests vary over different observations?”
Some Definitions of Reliability: According to Merriam
“Reliability is the extent to which an experiment, test, or measuring procedure yields the
same results on repeated trials.”
According to Hopkins & Antes (2000):
“Reliability is the consistency of observations yielded over repeated recordings either for
one subject or a set of subjects.”
Joppe (2000) defines reliability as:
“…The extent to which results are consistent over time and an accurate representation of
the total population under study is referred to as reliability and if the results of a study can
be reproduced under a similar methodology, then the research instrument is considered to
be reliable.” (p. 1)
The more general definition of the reliability is: The degree to which a score is stable and
consistent when measured at different times (test-retest reliability), in different ways
(parallel-forms and alternate-forms), or with different items within the same scale (internal
Types of Reliability
Reliability is one of the most important elements of test quality. It has to do with the
consistency, or reproducibility, of an examinee’s performance in the test. It’s not possible
to calculate reliability exactly. Instead, we have to estimate reliability, and this is always
an imperfect attempt. Here, we introduce the major reliability estimators and talk about
their strengths and weaknesses.
There are six general classes of reliability estimates, each of which estimates reliability in a
different way. They are:
i. Inter-Rater or Inter-Observer Reliability
To assess the degree to which different raters/observers give consistent estimates of the
same phenomenon. That is if two teachers mark same test and the results are similar, so it
indicates the inter-rater or inter-observer reliability.
ii. Test-Retest Reliability:
To assess the consistency of a measure from one time to another, when a same test is
administered twice and the results of both administrations are similar, this constitutes the
test-retest reliability. Students may remember and may be mature after the first
administration creates a problem for test-retest reliability.
iii. Parallel-Form Reliability:
To assess the consistency of the results of two tests constructed in the same way from the
same content domain. Here the test designer tries to develop two tests of the similar kinds
and after administration the results are similar then it will indicate the parallel form
i. Internal Consistency Reliability:
To assess the consistency of results across items within a test, it is correlation of the individual items score with the entire test.
ii. Split half Reliability:
To assess the consistency of results comparing two halves of single test, these halves may be
even odd items on the single test.
iii. Kuder-Richardson Reliability:
To assess the consistency of the results using all the possible split halves of a test. Let’s
discuss each of these in turn.
AIOU Solved Assignment 1 & 2 Code 8602 Autumn 2018
Q.2. Explain the effects of curricular validity on performance of the examines. How can
you measure the curricular validity of tests Elaborate in detail.
The extent to which the content of the test matches the objectives of a specific curriculum as it is
formally described. Curricular validity takes on particular importance in situations where tests are
used for high-stakes decisions, such as Punjab Examination Commission exams for fifth and
eight grade students and Boards of Intermediate and Secondary Education Examinations. In
these situations, curricular validity means that the content of a test that is used to make a
decision about whether a student should be promoted to the next levels should measure the
curriculum that the student is taught in schools.
Curricular validity is evaluated by groups of curriculum/content experts. The experts are
asked to judge whether the content of the test is parallel to the curriculum objectives and
whether the test and curricular emphases are in proper balance. Table of specification
may help to improve the validity of the test.
Functions of Test Scores and Progress Reports
The task of grading and reporting students’ progress cannot be separated from the procedures
adopted in assessing students’ learning. If instructional objectives are well defined in terms of
behavioural or performance terms and relevant tests and other assessment procedures are
properly used, grading and reporting become a matter of summarizing the results and presenting
them in understandable form. Reporting students’ progress is difficult especially when data is
represented in single letter-grade system or numerical value (Linn & Gronlund, 2000).
Assigning grades and making referrals are decisions that require information about individual
students. In contrast, curricular and instructional decisions require information about groups of
students, quite often about entire classrooms or schools (Linn & Gronlund, 2000).
There are three primary purposes of grading students. First, grades are the primary currency for
exchange of many of the opportunities and rewards our society has to offer. Grades can be
exchanged for such diverse entities as adult approval, public recognition, college and university
admission etc. To deprive students of grades means to deprive them of rewards and
opportunities. Second, teachers become habitual of assessing their students’ learning in grades,
and if teachers don’t award grades, the students might not well know about their learning
progress. Third, grading students motivate them. Grades can serve as incentives, and for many
students incentives serve a motivating function.
The different functions of grading and reporting systems are given as under:
1. Instructional uses
The focus of grading and reporting should be the student improvement in learning. This is most
likely occur when the report: a) clarifies the instructional objectives; b) indicates the student’s
strengths and weaknesses in learning; c) provides information concerning the student’s personal
and social development; and d) contributes to student’s motivation.
The improvement of student learning is probably best achieved by the day-to-day assessments
of learning and the feedback from tests and other assessment procedures. A portfolio of work
developed during the academic year can be displayed to indicate student’s strengths and
Periodic progress reports can contribute to student motivation by providing short-term
goals and knowledge of results. Both are essential features of essential learning. Well-
designed progress reports can also help in evaluating instructional procedures by
identifying areas need revision. When the reports of majority of students indicate poor
progress, it may infer that there is a need to modify the instructional objectives
2. Feedback to students
Grading and reporting test results to the students have been an on-going practice in all the
educational institutions of the world. The mechanism or strategy may differ from country to
country or institution to institution but each institution observes this practice in any way. Reporting
test scores to students has a number of advantages for them. As the students move up through
the grades, the usefulness of the test scores for personal academic planning and self-
assessment increases. For most students, the scores provide feedback about how much they
know and how effective their efforts to learn have been. They can know their strengths and areas
need for special attention. Such feedback is essential if students are expected to be partners in
managing their own instructional time and effort. These results help them to make good decisions
for their future professional development.
Teachers use a variety of strategies to help students become independent learners who are able
to take an increasing responsibility for their own school progress. Self-assessment is a significant
aspect of self-guided learning, and the reporting of test results can be an integral part of the
procedures teachers use to promote self-assessment. Test results help students to identify areas
need for improvement, areas in which progress has been strong, and areas in which continued
strong effort will help maintain high levels of achievement. Test results can be used with
information from teacher’s assessments to help students set their own instructional goals, decide
how they will allocate their time, and determine priorities for improving skills such as reading,
writing, speaking, and problem solving. When students are given their own test results, they can
learn about self-assessment while doing actual self-assessment. (Iowa Testing Programs, 2011).
Grading and reporting results also provide students an opportunity for developing an awareness
of how they are growing in various skill areas. Self-assessment begins with self-monitoring, a skill
most children have begun developing well before coming to kindergarten.
3. Administrative and guidance uses
Grades and progress reports serve a number of administrative functions. For example, they are
used for determining promotion and graduation, awarding honours, determining sports eligibility
of students, and reporting to other institutions and employers. For most administrative purposes,
a single letter-grade is typically required, but of course, technically single letter-grade does not
truly interpret student’s assessment.
Guidance and Counseling officers use grades and reports on student’s achievement,
along with other information, to help students make realistic educational and vocational
plans. Reports that include ratings on personal and social characteristics are also useful
in helping students with adjustment problems.
Without any doubt, it is more effective to talk parents to face about their children’s scores than to
send a score report home for them to interpret on their own. For a variety of reasons, a parent-
teacher or parent-student-teacher conference offers an excellent occasion for teachers to provide
and interpret those results to the parents.
1. Teachers tend to be more knowledgeable than parents about tests and the types of scores
2. Teachers can make numerous observations of their student’s work and consequently
substantiate the results. In-consistencies between test scores and classroom performance can
be noted and discussed.
3. Teachers possess work samples that can be used to illustrate the type of classroom work the
student has done. Portfolios can be used to illustrate strengths and to explain where improvements are needed.
4. Teachers may be aware of special circumstances that may have influenced the scores, either
positively or negatively, to misrepresent the students’ achievement level.
Reliability is a measure of the consistency of a metric or a method. Every metric or method we
including things like methods for uncovering usability problems in an interface and expert
be assessed for reliability. In fact, before you can establish validity, you need to establish
are the four most common ways of measuring reliability for any empirical method or metric:
• inter-rater reliability
• test-retest reliability
• parallel forms reliability
• internal consistency reliability
AIOU Solved Assignment 1 & 2 Code 8602 Autumn & Spring 2020
Q.3. Write down learning outcomes for any unit of Social Studies for 8th grade and develop
an essay type test item with rubric,5 multiple choice questions and 5 short question for
the written learning outcomes.
Achievement tests are widely used throughout education as a method of assessing and
comparing student performance. Achievement tests may assess any or all of reading, math, and
written language as well as subject areas such as science and social studies. These tests are
available to assess all grade levels and through adulthood. The test procedures are highly
structured so that the testing process is the same for all students who take them.
It is developed to measure skills and knowledge learned in a given grade level, usually through
planned instruction, such as training or classroom instruction. Achievement tests are often
contrasted with tests that measure aptitude, a more general and stable cognitive trait.
Achievement test scores are often used in an educational system to determine what level of
instruction for which a student is prepared. High achievement scores usually indicate a mastery
of grade-level material, and the readiness for advanced instruction. Low achievement scores can
indicate the need for remediation or repeating a course grade.
Teachers evaluate students by: observing them in the classroom, evaluating their day-to-day
class work, grading their homework assignments, and administrating unit tests. These classroom
assessments show the teacher how well a student is mastering grade level learning goals and
provide information to the teacher that can be used to improve instruction. Overall achievement
testing serves following purposes:
· Assess level of competence
· Diagnose strength and weaknesses
· Assign Grades
· Achieve Certification or Promotion
· Advanced Placement/College Credit Exams
· Curriculum Evaluation
· Informational Purposes
(i) Types of Achievement Tests
(a) Summative Evaluation:
Testing is done at the end of the instructional unit. The test score is seen as the summation of all
knowledge learned during a particular subject unit.
(a) Formative Evaluation:
Testing occurs constantly with learning so that teachers can evaluate the effectiveness of
teaching methods along with the assessment of students’ abilities.
(ii) Advantages of Achievement Test:
· One of the main advantages of testing is that it is able to provide assessments that are
psychometrically valid and reliable, as well as results which are generalized and replicable.
· Another advantage is aggregation. A well designed test provides an assessment of an
individual’s mastery of a domain of knowledge or skill which at some level of aggregation will
provide useful information. That is, while individual assessments may not be accurate enough for
practical purposes, the mean scores of classes, schools, branches of a company, or other groups
may well provide useful information because of the reduction of error accomplished by increasing
the sample size.
(iii) Designing the Test
Step 1: The first step in constructing an effective achievement test is to identify what you want
students to learn from a unit of instruction. Consider the relative importance of the objectives and
include more questions about the most important learning objectives.
Writing the questions:
Step2: Once you have defined the important learning objectives and have, in the light of these
objectives, determined which types of questions and what form of test to use, you are ready to
begin the second step in constructing an effective achievement test. This step is writing the
Step3: Finally, review the test. Are the instructions straightforward? Are the selected learning
objectives represented in appropriate proportions? Are the questions carefully and clearly
worded? Special care must be taken not to provide clues to the test-wise student. Poorly
constructed questions may actually measure not knowledge, but test-taking ability.
(iv) General Principles:
While the different types of questions–multiple choice, fill-in-the-blank or short answer, true-false,
matching, and essay–are constructed differently, the following principles apply to construct
questions and tests in general.
· Make the instructions for each type of question simple and brief.
· Use simple and clear language in the questions. If the language is difficult, students who
understand the material but who do not have strong language skills may find it difficult to
demonstrate their knowledge. If the language is ambiguous, even a student with strong language
skills may answer incorrectly if his or her interpretation of the question differs from the instructor’s
· Write items that require specific understanding or ability developed in that course, not just
general intelligence or test-wiseness.
AIOU Solved Assignment Code 8602 Autumn & Spring 2020
Q.3.Describe the measurement of central tendency. Also elaborate how these measures
can be utilize in the interpretation of the test results? Provide examples where necessary .
Measures of Central Tendency
Suppose that a teacher gave the same test to two different classes and following results are
Class 1: 80%, 80%, 80%, 80%, 80%
Class 2: 60%, 70%, 80%, 90%, 100%
If you calculate the mean for both sets of scores, you get the same answer: 80%. But the data of
two classes from which this mean was obtained was very different in the two cases. It is also
possible that two different data sets may have same mean, median, and mode. For example:
Class A: 72 73 76 76 78
Class B: 67 76 76 78 80
Therefore class A and class B has same mean, mode, and median.
The way that statisticians distinguish such cases as this is known as measuring the variability of
the sample. As with measures of central tendency, there are a number of ways of measuring the
variability of a sample.
Probably the simplest method is to find the range of the sample, that is, the difference between
the largest and smallest observation. The range of measurements in Class 1 is 0, and the range
in class 2 is 40%. Simply knowing that fact gives a much better understanding of the data
obtained from the two classes. In class 1, the mean was 80%, and the range was 0, but in class
2, the mean was 80%, and the range was 40%.
Statisticians use summary measures to describe patterns of data. Measures of central
tendency refer to the summary measures used to describe the most “typical” value in a set of
Here, we are interested in the typical, most representative score. There are three most common
measures of central tendency are mean, mode, and median. A teacher should be familiar with
these common measures of central tendencies.
The mean is simply the arithmetic average. It is sum of the scores divided by the number of
scores. it is computed by adding all of the scores and dividing by the number of scores. When
statisticians talk about the mean of a population, they use the Greek letter ? to refer to the mean
score. When they talk about the mean of a sample, statisticians use the symbol to refer to the mean score.
It is symbolized as: = X N X ·
(read as “X-Bar”) when computed on a sample Computation – Example: find the mean of 2,3,5,
= 4 Example 1
The marks of seven students in a mathematics test with a maximum possible m are
15 13 18 16 14 17 12
Find the mean of this set of data values.
Solution: 2 ·3·5·10=420= 5
Since means are typically reported with one more digit of accuracy that is present in the data, I
reported the mean as 5.0 rather than just 5.
So, the mean mark is 15.
Symbolically, we can set out the solution as follows:
So, the mean mark is 15.
When working with grouped frequency distributions, we can use an approximation:
Where Mdpt. is midpoint of the group
Interval Midpoint f Mid*f
95-99 97 1 97
90-94 92 3 276
85-89 87 5 435
80-84 82 6 492
75-79 77 4 308
70-74 72 3 216
65-69 67 1 67
60-64 62 2 124
AIOU Solved Assignment 2 Code 8602 Autumn & Spring 2020
Q.5.Briefly describe the present trends and class room techniques use by teachers for the
formative assessment of the students learning. Also enlist the components of a good
According to Carole Tomlinson “Assessment is today’s means of modifying tomorrow’s
instruction.” It is an integral part of teaching learning process. It is widely accepted that
effectiveness of teaching learning process is directly influenced by assessment. Hamidi (2010)
developed a framework to answer the Why; What, How and When to assess. This is helpful in
understanding the true nature of this concept.
Why to Assess: Teachers have clear goals for instruction and they assess to ensure that these
goals have been or are being met. If objectives are the destination, instruction is the path to it
then assessment is a tool to keep the efforts on track and to ensure that the path is right. After
the completion of journey assessment is the indication that destination is ahead.
a) Assessment for Learning (Formative Assessment)
Assessment for learning is a continuous and an ongoing assessment that allows teachers to
monitor students on a day-to-day basis and modify their teaching based on what the students
need to be successful. This assessment provides students with the timely, specific feedback that
they need to enhance their learning. The essence of formative assessment is that the information
yielded by this type of assessment is used on one hand to make immediate decisions and on the
other hand based upon this information; timely feedback is provided to the students to enable
them to learn better. If the primary purpose of assessment is to support high-quality learning then
formative assessment ought to be understood as the most important assessment practice.
Garrison, & Ehringhaus, (2007) identified some of the instructional strategies that can be used for
· Observations. Observing students’ behaviour and tasks can help teacher to identify if
students are on task or need clarification. Observations assist teachers in gathering evidence of
student learning to inform instructional planning.
· Questioning strategies. Asking better questions allows an opportunity for deeper thinking and
provides teachers with significant insight into the degree and depth of understanding. Questions
of this nature engage students in classroom dialogue that both uncovers and expands learning.
· Self and peer assessment. When students have been involved in criteria and goal setting,
self-evaluation is a logical step in the learning process. With peer evaluation, students see each
other as resources for understanding and checking for quality work against previously
xix · Student record keeping It also helps the teachers to assess beyond a “grade,” to see where
the learner started and the progress they are making towards the learning goals.
b) Assessment of Learning (Summative Assessment)
Summative assessment or assessment of learning is used to evaluate students’ achievement at
some point in time, generally at the end of a course. The purpose of this assessment is to help
the teacher, students and parents know how well student has completed the learning task. In
other words summative evaluation is used to assign a grade to a student which indicates his/her
level of achievement in the course or program.
Assessment of learning is basically designed to provide useful information about the performance
of the learners rather than providing immediate and direct feedback to teachers and learners,
therefore it usually has little effect on learning. Though high quality summative information can
help and guide the teacher to organize their courses, decide their teaching strategies and on the
basis of information generated by summative assessment educational programs can be modified.
Many experts believe that all forms of assessment have some formative element. The difference
only lies in the nature and the purpose for which assessment is being conducted.
Comparing Assessment for Learning and Assessment of Learning
Checks how students are learning and
is there any problem in learning
it d t i h t t d t
Checks what has been learned to
Is designed to assist educators and
students in improving learning?
Is designed to provide information
to those not directly involved in
classroom learning and teaching
(school administration parents Is used continually? Is periodic?
Usually uses detailed, specific and
descriptive feedback—in a formal or
Usually uses numbers, scores or
marks as part of a formal report.
Usually focuses on improvement,
compared with the student’s own
Usually compares the student’s
learning either with other students’
Source: adapted from Ruth Sutton, unpublished document, 2001, in Alberta Assessment
c) Assessment as Learning
Assessment as learning means to use assessment to develop and support students’
metacognitive skills. This form of assessment is crucial in helping students become lifelong
learners. As students engage in peer and self-assessment, they learn to make sense of
information, relate it to prior knowledge and use it for new learning. Students develop a
sense of efficacy and critical thinking when they use teacher, peer and self-assessment
feedback to make adjustments, improvements and changes to what they understand.