|
|
|
ASSIGNMENT
3 SPECIFICATIONS
|
All sample problems are online for Assignment 2 and can be found under COURSE MATERIALS in Blackboard..
The Assignment 1 generic feedback site
is in place and can be accessed under COURSE MATERIALS in Blackboard.
|
EDF 5481 READINGS AND ASSIGNMENTS |
ASSIGNMENT 1 FEEDBACK |
OVERVIEW |
EDF
5481 METHODS OF EDUCATIONAL RESEARCH
FALL
2001
|
|
|
|
|
|
At last! We now focus on different ways of conducting studies and gathering data. Each technique has its own set of strengths and weaknesses. That is why it is advisable over the long run to conduct a series of studies, all with the same independent and dependent variable(s) but using a mix of experiments, ethnographies, surveys, content analysis, focus groups, and so forth.
As we saw in Guide 3, a major strength of true experiments is causal control and strong internal validity. Various threats to internal validity are described in more detail below. In an experiment you can literally build your own independent variables by:
(1) Creating "factors" or levels of
some kind of treatment then
(2) Randomly assigning subjects or
groups to different levels of the treatment.
However, a true experiment is simply not always possible, yet investigators still want to make causal statements. Finally, even if you have conducted a true experiment, all experiments do not have equally strong causal control. Issues with reactivity, with poor measurement, and the nature of control groups can all influence the degree of internal validity in an experimental design.
|
|
WHAT MAKES FOR A QUASI-EXPERIMENT
What makes a true experiment is random assignment of people or groups to treatments. Human judgment plays no role in who gets which experimental condition. The strength of randomization is that it creates two or more groups that are equivalent in the beginning on the average on just about any characteristic you can imagine.
Of course we are speaking long term and reasonable size samples. If you have two groups of five people each, I wouldn't count on them necessarily being very similar. However, even as few as 10 people per group and you will begin to see the beauty of randomization.
But randomization just isn't always possible. Some treatment groups are initially formed on the basis of performance (high, medium, low, for example), some variables (e.g., bipolar depressive disorder) just aren't experimentally induced.
If your study has different levels of treatments, and people or groups are assigned to those treatments WITHOUT random assignment, you have a quasi-experiment.
It's not just having intact groups that creates a quasi-experiment. Individuals who are not in intact groups could enter treatment levels through self-selection, because they are in a particular performance category (that bottom quartile in Intramural sports, for example), or because a researcher has "paired" individuals that she or he believes are somehow similar.
However, self-selection, or regression
toward the mean effects are alternative explanations for why you
found the results you did instead of the treatment.
|
How might you find out about just how similar or different groups were at the beginning of a study?
Background
information. You might have access to grades, test scores, "personality"
or other standardized test results collected before the study ever began.
Some
kind of pretest measures. These vary from requesting background
or "demographic" information such as own or parental education, occupation,
or income, to various standardized tests. Be careful, however! Remember
that pretests can sensitize people that their behavior is under study and
lead to pretest-treatment interaction biases.
Supplemental
information from other people. Interviews with teachers, parents,
physicians, therapists, or others who know the subjects of study well may
provide supplemental information.
Here's the basic problem: even if we assign groups to treatments based on their differences, such as a high ability and low ability group, the groups may differ in other respects, on variables that we never measured at all. For example, the high ability group may be more motivated or confident, on the average, than the low ability group. And it is those differences in motivation or confidence, instead of the difference in ability (that YOU thought was the true independent variable) that were the true causes of the treatment outcome differences that you observed.
THREAT! Even if you are able to obtain background, pretest, or supplemental information, you may have never measured the true differences between your groups on other variables. And those true differences that you never measured were the real causes responsible for the outcome effects that you found in your study.
Now you can see why quasi-experimental designs pose threats to internal validity.
Be patient for a little bit! We will return to the issue of intact groups shortly. Meanwhile, remember if people were initially assigned to intact groups in a random fashion, you may have a true experiment after all. if you want a review from Guide 3, click HERE.
TYPES OF QUASI-EXPERIMENTAL DESIGNS
Many of the types of quasi-experimental designs are very similar to true experimental designs except that randomization never takes place.
Just as we have in experiments, one group may be assigned a treatment. Then, following the treatment, we measure some type of observation or dependent variable for both the group that received a treatment and the group that did not. Here is one comparison of a quasi experimental design with the corresponding "true" experimental design:
| Type of Study | Randomization? | Treatment | Outcomes | |
| Experiment | Group 1 |
|
X1 | O1 |
| Group 2 |
|
X2 | O2 | |
| Quasi-Experiment | Group 1 | X1 | O1 | |
| Group 2 | X2 | O2 |
In the time series design, you have several observations over time. While you may have some type of experimental intervention, often "nature" does the experimenting for you:
Case studies occur with some frequency in both medical and therapeutic fields. Practitioners who work one on one (such as counselors) or with very small groups (special education classes) are the most likely to use case studies. Subjects are not random, the case base is small, and there may be no control group. As you can guess, causal inference is much more difficult.
What's the best you can do under such circumstances? Impose a time series of observations if possible. If the intervention is under your control (dispensing a new medication, for example), impose the intervention, remove it, impose it again, remove it, and so forth. Try to use a double blind (see below) administration if possible and the most objective outcome measures that you can find.
ISSUES WITH USING INTACT GROUPS
The Wiersma book virtually defines quasi-exeriments as those using intact groups, i.e., groups that existed prior to any treatment or intervention. Normally (say, 75 to 80 percent of the time) this is true. What are important are: (1) HOW subjects entered the groups in the first place; (2) What happens in the group; and (3) The length of time groups pre-existed prior to interventions.
If subjects are randomly assigned to groups in the first place (which often happens in school and universities for classes where there are many equivalent sections), the tasks to be performed are virtually identical in each group, and the pre-intervention time is short (probably a few weeks at most), THEN if you randomly assign groups to conditions, you probably have a true experiment.
Consider some of the alternatives. Subjects may be assigned to groups using pre-existing knowledge about the subjects and the groups consequently differ on variables related to the study. Sports teams grouped by ability, "tracking" systems in schools, and enlisted versus officers in the military are three examples. Even if random assignment originally places subjects in groups, their curricula and itineraries may be different, thus providing subjects in different groups with different experiments. Finally, bosses and teachers differ in their approaches, again providing subjects in different groups with different experiences which diverge further as time goes on.
So, if you use,
for example, randomly selected sections of basic college math, random assignments
to treatments AND do at least much of your data collection at the very
beginning of the academic year, you probably have a true experimental
design. Do your data resemble all these criteria in the example? If not,
your design is probably quasi-experimental.
|
|
Threats to internal validity are threats to causal control. They mean that we do not know for sure what caused the effects that we observed. Naturally, we like to hope that our interventions (experimental treatments) or other known and measured independent variables caused the effects. Unfortunately this is often not the case. For exampe, because of their multidimensionality, confounded variables (which measure more than one entity) are a threat to internal validity.
BIAS VERSUS RANDOM ERROR
If you have tight control over your experimental treatments (and, of course, you used randomization), hopefully the only source of variance left in your dependent variables will be random error.
Random error is just that: It is the random variation that occurs on measurements across administrations, situations, or time periods. If random error is VERY large, it can pose a threat to the reliability (predictability, stability) of our measurements. Many political attitudes, for example, are highly unstable or volatile.
On the other hand, because it is random, random error does not usually pose a threat to internal validity.
Bias is systematic error, such as the scale that always weighs you in at five pounds too light. Bias introduces a constant source of error into measurements or results. Bias can occur when test items that favor a particular ethnic, age, or gender group are used. For example, a "culture exam" that asked respondents to identify songs from the 1950s and the 1960s would discriminate against younger people. Tests of "science knowledge" often favor younger people because they use the most recent definitions of science phenomena and thus favor those with a more recent education. Bias in testing instruments is a threat to internal validity because it poses an alternative explanation for the results that we found.
If we could either control bias experimentally (random assignment controls much of it by making experimental treatment groups roughly equivalent at the beginning of a study, thus controlling factors such as self-selection or regression toward the mean effects) or measure the variables we suspect cause bias and thus control them statistically, we would at least maximize internal validity.
Unfortunately bias is often hidden,
either in the variables you didn't measure--or the variables you didn't
consider at all. Thus you didn't measure it and only discover your mistake
after all your data are collected. Confounded variables are a major threat
to internal validity.
|
|
Self-selection
effects : When subjects can select their own treatments, we
do not know whether the intervention or a pre-existing factor of the subject
caused the outcomes we observed. Random assignment can cure this problem.
The same problem can occur with differential selection,
only in this case, the investigator (rather than the subject) uses human
judgement to assign groups or subjects to treatment. A common variation
on this one is selecting extreme groups (see below).
Experimental
mortality. When subjects discontinue the study and this occurs
more in certain conditions than others, we do not know how to causally
interpret the results because we don't know how subjects who discontinued
participation differed from those who completed it. A pretest questionnaire
given to all subjects make help clarify this, but watch out for pretesting
effects (a Solomon four group design can help here.)
History:
Some
kind of event occurred during the study period (such as the assaults on
New York City) and it is reactions to these events that caused the outcomes
we observed. Sometimes this is a medical event (such as a flu outbreak)
and sometimes an actual political or historical event. Random assignment
and a control group helps with this problem.
Maturation
effects are especially important with children and youth (such
as college freshmen) but could happen at any age. For example, young children's
speech will normally become more complex, no matter what reading method
you use. Some studies have found that most college students pull out of
a depression within six months, even if they receive no treatment whatsoever.
A certain number of people will stop smoking, whether they receive treatment
or not. Again, a randomized control group helps.
Regression
toward the mean effects ("statistical regression") are especially
likely when you study extreme groups. For example, students scoring at
the bottom of a test typically improve their scores a least a little when
they retake the test. Students with nearly perfect scores might miss an
item the second time around. That is, people with extreme scores, or in
extreme groups, will often fall back toward the average or "regress to
the mean" on a second administration of the dependent variable.
Regression toward the mean effects are especially likely to occur among well-meaning investigators, who want to give a treatment that they believe is very beneficial to the group that appears to need it the most (the top scoring group is usually left alone.) When the scores of the worst group improve after the intervention (and the top group scores a little lower on the readministration), misguided investigators are even more convinced that they have found a good treatment (instead of a methodological artifact.) How to avoid this threat to internal validity? Either avoid extreme groups, or if you do use them, randomly assign their members to treatment conditions, INCLUDING A CONTROL GROUP.
Testing.
Just taking a pretest can sensitize people and many people improve their
performance with practice. Almost every classroom teacher knows that part
of a student' s performance on assessment tests depends on their familiarity
with the format. Solution? A Solomon Four Group Design, wherein half the
subjects do not receive a pretest is a good way to control inferences in
this case.
|
|
While a true experiment can be higher on internal validity, by no means do all experiments have high internal validity. To enhance internal validity, the investigator must use control groups effectively, control reactivity, and scrutinize experimental reality. Further, you need to know if people noticed and comprehended your treatment or intervention in the first place.
EFFECTIVE USE OF CONTROL GROUPS
When a new pharmaceutical drug is tested, typically all experimental subjects receive a pill.
Some receive the new active ingredient, such as a brand new antihistamine
or antibiotic.
Some
receive an older medicine, such as Tavist (clemestine) or Penicillin.
Yet
others receive an inert "sugar pill" that has no active ingredients,
or a placebo.
These control or comparison groups are an absolute necessity in any design, but certainly for an experiment.
The group receiving the older medication lets us know if the new drug (or intervention) is less effective, as effective, or more effective than treatments currently available.
The group receiving the "sugar pill" alerts us to changes that occur with the two active ingredient medication groups above and beyond a placebo effect. In a placebo effect, changes that occur are due to other factors besides the active treatment. For example, a patient might feel "safe" and "treated" if their doctor gives them a pill, even a sugar pill. Because of these psychological changes, their immune system might actually function better. This is very interesting but not what you set out to assess. So, anything that smacks of a placebo effect is a threat to internal validity and must be controlled for.
Notice that the "control group" GETS A PILL. The "nothing at all" control group is generally a very poor design. For example, if you were studying the effects of watching a violent film on aggression imitation among school children, the very act of watching any movie can be physiologically arousing. If your control group watched no movie at all, then you could not controll for these effects. So, instead, your control group watches a generally unaggressive movie such as "The Adventures of Milo and Otis" (a young dog and young cat who are friends).
THE MORAL: Design your control group carefully. See that your control group has some features in common with your treatment groups if those features could affect your outcomes (it takes a pill, sees a film, or fills out a pretest questionnaire.)
REACTIVITY AND THREATS TO INTERNAL VALIDITYReactivity refers to changes in the subjects' behavior simply because they are being studied.
For example, some people get nervous when a doctor or nurse takes their blood pressure, and their blood pressure goes up.
Reactivity poses a distinct threat to internal validity because we don't know what caused the outcome: treatment effects or reactivity. The experimental laboratory is probably the most reactive because people have come for an experiment and they know their behavior is being watched. That is why so many experimenters use deception. They are trying to divert subject attention so that the "true behavior under study" is not altered.
Demand
effects, in which subjects or respondents "follow orders" or
cooperate in ways that they almost never would under their routine daily
lives.
ON REACTIVITY AND INTERNAL VALIDITY. If demand effects are specific to a particular situation, reactivity problems may also influence generalizing, or external validity (this is how your Wiersma book treats the term.)
However, I think reactivity introduces an alternative causal explanation for our results: they occurred, not because of the intervention or treatment, but because people were so self-conscious that they changed their behavior. This is internal validity. Reactivity may also statistically interact with the experimental manipulation. For example, if the treatment somehow impacts on self-esteem (say you are told that the stories you tell to the TAT pictures indicate your leadership ability), reactivity may be a greater internal validity problem.
MORE ON GENERALIZING: "EXPERIMENTAL" VERSUS "MUNDANE" REALITYMore of a threat to external validity is the issue of the reality of the study setting. In many cases, such as studies of classrooms or online environments, the setting of the study is identical to the "everyday reality" or mundane reality in which most subjects live their lives. High mundane reality makes it easier to generalize to people's typical settings and it facilitates external validity. Field studies of all kinds, and ethnographies, too, take place in typical, as opposed to unusual, settings.
However, laboratory experiments in particular
may use unusual settings or tasks. For example, some sports experiments
will have subjects on a treadmill for hours. In other studies, subjects
may be injected with substances (such as adrenaline) or take pills. Subjects
may see specially constructed movies that are nothing like they see on
TV. Or may be called upon to perform tasks (watching a light "move" in
a darkened room) that bear no resemblance to their normal environment.
While
these settings or tasks may be engrossing or compelling, thus high in experimental
reality, they do not resemble the settings to which researchers
may really want to generalize.
DID
ANYBODY NOTICE? I HOPE YOU USED A Manipulation check.
YOU are certain that your intervention will make life healthier or enhance learning. But what if no one pays attention to the treatment or comprehends its message? Then it will appear that you have no effects at all, whereas if you had simply used a stronger manipulation, your guesswork would have been confirmed.
Anyone doing experimental work needs to have a manipulation check, an inclusion to measure if subjects even paid attention to factors in the treatment and understood their messages For example, if you show different movies to different groups and your topic is filmed aggression, have a short questionnaire that has subjects rate the violence of the movie. The group receiving the more aggressive film should rate it as more violent than those receiving an unaggressive movie. If you are trying a new reading technique, make sure that students understand the stories they are exposed to and remember something about them. If you try a new template in your online learning course, did students even pay attention?
THE HUMAN FACTOR: USING DOUBLE BLIND
When the medical and pharmacy professions test a new medicine, they don't just use a "sugar pill" placebo.
Subjects
in the study do not know if they are taking a new medication, an old medication,
or a sugar pill.
The
individuals who pass out the medication and assess the subjects' health
and behavior also do not know whether the person is taking a new medication,
an old medication, or a sugar pill.
Thus both those involved as subjects and those involved with collecting data are "blind:" blind to the purposes of the study, the condition that subjects are in, and the results expected.
This means that
Susan Carol Losh September
21 2001
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.
|
|
EDF 5481 READINGS AND ASSIGNMENTS |
OVERVIEW |
|
A
last glimpse of the World Trade Center. September 11, 2001