BULLETIN: ASSIGNMENT TWO NOW DUE MONDAY SEPTEMBER 24 BY NOON. MY MAILBOX 307 STONE

EDF 5481 READINGS AND ASSIGNMENTS
ASSIGNMENT TWO

OVERVIEW
GUIDE 1: INTRODUCTION
GUIDE 2: VARIABLES AND HYPOTHESES
GUIDE 4: EXPERIMENTS & QUASI-EXPERIMENTS
GUIDE 5: A SURVEY RESEARCH PRIMER
GUIDE 6: FOCUS GROUP BASICS
GUIDE 7: LESS STRUCTURED METHODS INTRODUCTION
GUIDE 8: ARCHIVES AND DATABASES

EDF 5481 METHODS OF EDUCATIONAL RESEARCH
FALL 2001

GUIDE 3: RELIABILITY, VALIDITY, CAUSALITY, AND EXPERIMENTS I

 
RELIABILITY & VALIDITY
ISSUES IN CAUSALITY
RULES FOR CAUSE & EFFECT
EXPERIMENTS PRIMER

At this point, you are fairly itching to begin your design. But we still have important conceptual material to cover. After all, you want your measures to be reliable and valid, your statements about causality to be appropriate, and be able to generalize your findings.

RELIABILITY AND VALIDITY

RELIABILITY

In order to make causal assessments in your research situation, you must first have reliable measures, i.e., stable and/or repeatable measures. If the random error variation in your measurements is so large that there is almost no stability in your measures, you can't explain anything! Picture an intelligence test where an individual's scores ranged from moronic to genius level. No one would place any faith in the results of such a "test" because the person's scores were so unstable or unreliable.

Reliability is required to make statements about validity. However, reliable measures could be biased and hence "untrue" measures of a phenomenon) or confounded with other factors such as acquiescence response set. Picture a scale that always weighs five pounds too light. The results are reliable, but inaccurate or biased. Or, picture an intelligence test on which women or people of color always score lower (even if this doesn't occur on other tests). Again, the measure may be reliable but biased.

Note that some estimates of reliability are based on the number of items in the test or scale (Cronbach's Alpha is one example). Thus, we might have a long measure, with a lot of items, that will appear "reliable," yet when we examine the measure closely, we discover that the correlations among items are low. This means that items in that measure don't seem to "hang together" or relate well to each other and your measure may be multidimensional. While this is a "judgement call," be advised that it is desirable for "reliable measures" to also be unidimensional measures, i.e., to measure one and only one construct. It is much easier to interpret unidimensional measures.

INTERNAL VALIDITY

Internal validity addresses the "true" causes of the outcomes that you observed in your study. Strong internal validity means that you not only have reliable measures of your independent and dependent variables BUT a strong justification that causally links your independent variables to your dependent variables. At the same time, you are able to rule out extraneous variables, or alternative, often unanticipated, causes for your dependent variables. Thus strong internal validity refers to the unambiguous assignment of causes to effects. Internal validity is about causal control.

Laboratory "true experiments" have the potential to make very strong causal control statements. Random assignment of subjects to treatment groups (see below) rules out many threats to internal validity. Further, the lab is a controlled setting, very often the experimenter's "stage." If the researcher is careful, nothing will be in the laboratory setting that the researcher did not place there. When we leave the lab to do studies in natural settings, we can still do random assignment of subjects to treatments, but we lose control over potential causal variables in the study setting (dogs bark, telephones ring, the experimental confederate just got run over walking against the "don't walk" sign on Woodward.)

EXTERNAL VALIDITY

External validity addresses the ability to generalize your study to other people and other situations. To have strong external validity (ideally), you need a probability sample of subjects or respondents drawn using "chance methods" from a clearly defined population (all registered students at Florida State University, for example). Ideally, you will have a good sample of groups (e.g., classes at all ability levels). You will have a sample of measurements  and situations (you study who follows a confederate who violates the "don't walk" sign on Woodward Avenue at different times of day, different days, and different locations on campus.)  When you have strong external validity, you can generalize to other people and situations with confidence. Public opinion surveys typically place considerable emphasis on defining the population of interest and drawing good samples from that population. On the other hand, laboratory experiments often employ "convenience samples," such as intact college classes taught by a friend. As a result, we may not know whom the subjects represent.

CONSTRUCT VALIDITY

Construct validity is about the correspondence between your concepts (constructs) and the actual measurements that you use. A measure with high concept validity is an accurate reflection of the abstract concept that you are trying to study. Since we can only know about our concepts through the concrete measures that we use, you can see that construct validity is extremely important. It also becomes clear why it is so important to have clear conceptual definitions of our variables. Only then can we begin to assess whether our measures, in fact, correspond to these concepts.

If we only use one measure of a concept, about the best we can do is "face validity," i.e., the measure appears "on the face of it" to reflect the concept. Therefore, it is wise to use multiple measures of a concept whenever possible. Further, ideally these will be different kinds of measures and designs.

For example, you might measure mathematical skill through a paper and pencil test, through having the student work with more geometric problems, such as a wood puzzle, and having the student make change at a cash register. Our faith that we have accurately measured her math ability is stronger if she performs well on all three sets of tasks.

Construct validity is often established through the use of a multi-trait, multi-method matrix. At least two constructs are measured. Each construct is measured at least two different ways, and the type of measures is repeated across constructs. For example, each construct first might be measured using a questionnaire, then each construct would be measured using a similar set of behavioral observation categories.

Typically, under conditions of high construct validity, correlations are high for the same construct (or "trait") across a host of different measures. Correlations are low across constructs that are different but measured using the same general technique. Sometimes, this is called "triangulating" measures.

Under low construct validity, the reverse holds. Correlations are high across traits using the same "method" (or type of technique or measurement) but low for the same trait measured in different ways. For example, if our estimate of a student's math ability was wildly divergent depending on whether we examined scores on the questionnaire, making change, or the wood puzzle, we would have low construct validity.

One implication of all this material is that, of course, we NEVER, NEVER say "intelligence is what this intelligence test measures."
 
 
 

ON PROOF AND CAUSALITY

There are many ways of knowing, and different cultures and subcultures use different expectations and norms about proof and causality. Causality is critical: it tells us what is possible, what can be changed and what is difficult, if not impossible, to change. For example, if you are convinced that biological factors cannot be overcome, you probably will not work with visually impaired children because you would believe they could not compensate for their disabilities. Causality tells us what are the “prime movers” of the phenomena that we observe.

Consider some different perspectives on causality:

Of course, neither these perspectives nor the "means of proof" below are mutually inconsistent in the human cognitive process. Just as a physicist may secretly read his horoscope each morning, people may simultaneously invoke some, all, or none of these perspectives.

Here are some different ways and means of "proof":

Why bother with these different orientations? Because causality is critical to the research enterprise!

Much of the research process centers around what are the true causal or “independent variables.” What we initially may consider to be “true causal” variables may, instead, turn out to be artifacts of the research process (e.g., questionnaire format response set or experimental reactivity) or the particular group that we studied. Much of science consists of ruling out alternative causes or explanations. While science is one form of knowing and one generic way of gathering evidence that either disconfirms or is suggestive of causality, it is not the only way of doing so. The results of science may or may not be accurate, but without following "the rules" of science, most scientists do not believe one is "doing science." Considerable disagreement occurs between scientists and members of the general public because scientists don't make it clear how our methods of  "proof" differ from those commonly used among the general public (e.g., legal arguments).

According to science rules, definitive proof via empirical testing does not exist. Science uses the term "proof" (or, rather, "disproof") differently from the way attorneys or journalists do. Our measurements could be later shown to be contaminated by confounding factors. A correlation could have many causes, only some of which have been identified. Later work can show earlier causes to be spurious, that is, both cause and effect depend on some prior causal (often extraneous) variable. Statistics are NEVER EVER considered to "prove" anything although statistical results CAN disconfirm.

We use the rules of science in this course.
 

CAUSALITY AND METHODS: EXPERIMENTS AND CORRELATIONAL STUDIES
Cancerous Human Lung
This dissection of human lung tissue shows light-colored cancerous tissue in the center of the photograph. While normal lung tissue is light pink in color, the tissue surrounding the cancer is black and airless, the result of a tarlike residue left by cigarette smoke. Lung cancer accounts for the largest percentage of cancer deaths in the United States, and cigarette smoking is directly responsible for the majority of these cases. 

"Cancerous Human Lung," Microsoft(R) Encarta(R) 96 Encyclopedia. (c) 1993-1995 Microsoft Corporation. All rights reserved.

Most people--and most scientists--accept that smoking cigarettes causes lung cancer although the evidence (for humans) is strictly correlational rather than experimental. There are many topics where it is neither possible--nor desirable--to use the experimental method. To accept more correlational evidence it will help to examine the rules below.(SCL)

Many scientists believe that the ONLY way to establish causality is through randomized experiments. That is why so many methods text books designate experiments–and only experiments--as “quantitative research.” However a moment’s reflection will convince you that this cannot be so. Most people now accept that smoking cigarettes causes lung cancer (see the Encarta selection above)–yet no society has ever randomly assigned half its population to smoke cigarettes and the other half not. This causal conclusion about smoking and lung cancer is based on correlational evidence, i.e., observing the systematic covariation of two (or more) variables. Cigarette smoking and lung cancer are both "naturalistic"  variables, i.e., we must accept the data as nature gave them to us (some authors call these "organismic" variables for "organic.")

There is no doubt that the results from careful, well-controlled experiments are typically easier to interpret in causal terms than results from other methods. However, as you can see, causal inferences are often drawn from correlational studies as well. Non-experimental methods must use a variety of ways to establish causality and ultimately must use statistical control, rather than experimental control.



SOME RULES TO HELP ESTABLISH CAUSAL ORDER

If one variable causes a second variable, they should correlate thus causation implies correlation. However, two variables can be associated without having a causal relationship, for example, because a third variable is the true cause of the "original" independent and dependent variable. For example, there is a statistical correlation over months of the year between ice cream consumption and the number of assaults. Does this mean ice cream manufacturers are responsible for crime? No! The correlations occurs statistically because the hot temperatures of summer cause both ice cream consumption and assaults to increase. Thus, correlation does NOT imply causation. Other factors besides cause and effect can create an observed correlation.

If one variable causes a second, the cause is the independent variable  (explanatory variables or predictors).
The effect is the dependent variable (criterion variable).

If you can designate a distinct cause and effect, the relationship is called asymmetric.
Two variables may be associated but we may be unable to designate cause and effect. These are symmetric relationships.


WHICH VARIABLE IS THE INDEPENDENT VARIABLE IN CORRELATIONAL STUDIES?
RULES AND GUIDANCE

Since we know that we cannot use experimental treatments in naturalistic variables to determine cause and effect, yet we know that scientists do draw causal conclusions in nonexperimental studies, here is a set of helpful rules for tentatively establishing causality in correlational data.

By the way, there are always alternative causal explanations in experiments too. The study control group may be flawed. Subjects' awareness of being studied may create conditions (e.g., anxiety) that mean we do not measure "true" behavior or performance. So even though it may be easier to establish cause in experiments, keep in mind that nothing is fool-proof.

(1) TIME ORDER. The independent variable came first in time, prior to the second variable.

EXAMPLE: Gender or race are fixed at birth.

(2) EASE OF CHANGE. The independent variable is harder to change. The dependent variable is easier to change.

EXAMPLE: One's gender is harder to change than scores on an assessment test or years of school.

(3) "MAJORITY RULE." The independent variable is the cause for most people.

EXAMPLE: Although some people become so fed up with their jobs that they return to school to train for a better job, most people complete their education prior to obtaining a regular year-round, full-time job.

(4) NECESSARY OR SUFFICIENT. If one variable is a necessary or sufficient condition for the other variable to occur, or a prerequisite for the second variable, then the first variable is the cause or independent variable.

EXAMPLES: A certain type of college degree is often required for certain jobs. At most universities, publications are a prerequisite for being awarded tenure.

(5)  GENERAL TO SPECIFIC. If two variables are on the same overall topic and one variable is quite general and the other is more specific, the general variable is usually the cause.

EXAMPLE: Overall ethnic intolerance influences attitudes toward Hispanics.

(6) THE "GIGGLE" OR "SANITY" FACTOR. If reversing the causal order of the two variables seems illogical and makes you laugh, reverse the causal order back.

EXAMPLES: We don't  believe choosing a specific college major or engaging in a particular sport determines one's gender.

A PRIMER ON EXPERIMENTS

Dedicated to health and fitness, you devised a new exercise plan that you believe will really help people. So you obtain a sample of Educational Psychology undergraduate students. With the flip of a coin, half the students receive a physical and mental health screening and those who are fit begin this new exercise program. The other half also receive a health screening but no exercise regimen. Six weeks later, you re-examine everyone who was physically fit in the screening and compare the two groups. The group receiving the exercise plan now score happier and healthier than the group that did not.

Jubilant over the results, you assert that your new exercise plan contributes to physical and mental fitness!

Or does it? Are your results internally valid?

Could be.

This study was a "true experiment." In a true experiment--whether laboratory, field, or simulation--subjects are randomly assigned to treatment groups using a coin flip or some other type of probability, non human judgment method. It is randomization that makes true experiments so strong in internal validity and typically allows us to make relatively strong influences about causality. It is also random assignment to treatments that distinguishes a true experiment from other kinds of data collection.

Random assignment means that on the average at the beginning of a study, all your treatment groups are about the same.  In your physical fitness study, it meant about the same percent of each group "flunked" the screening test and about the same percent exercised on a regular basis, even before your intervention.

This study had another important research design aspect: a control group who did not receive the special exercise program. Control or comparison groups are critical in all kinds of research. If we did not have a control or comparison group, the study would be open to the criticism--and alternative causal explanation--that improvement in health would have occurred in any event among young adults, even had the exercise program never been instituted. Not only did you have a control group, but, in an experiment, participants are randomly assigned to it.

Studies that lack a control group are sometimes called "one shot" studies or sometimes case studies. While the results may be interesting, we are limited in the causal implications we can make from the results of "one shot" research.

We will later examine facets of the "good" control group.

You are pretty sure that you know what improved the health of your experimental subjects: the new exercise program you initiated. And there is a good chance that you are right, because by using random assignment you controlled for several pre-existing conditions or threats to internal validity: subjects' general physical health, previous exercise patterns, incidence of depression or their general personal histories which, on the average, would be the same for each group. By using random assignment, you also controlled for any incidental historical conditions (such as an influenza outbreak which could influence health in both groups).

Your study has two other important features: a pretest and a posttest. In the pretest, you measured existing conditions on your dependent variables, i.e., mental and physical health among all your subjects, whether in the experimental or control group, prior to any intervention at all. This enables you to double-check that your subjects are pretty much alike across groups at the beginning of the study. You can also assess the level of change because you have both pretest and posttest information. Then, after your intervention, you reassessed scores on your dependent variables in a posttest. A posttest only design cannot do either of these important sets of measures.

This is often called a "pretest-posttest" experimental design.

You should be advised, however, that the standard pretest-posttest design may pose some threats to internal validity, or the unambiguous assignment of cause and effect. Why? Because simply being measured or observed during the pretest may sensitize some subjects and they will behave differently as a result. (For example, being weighed might have sent all subjects to the exercise room for six weeks!) Further, a pretest may interact with an experimental treatment to heighten the effect of the experimental intervention more than it would have ordinarily.

How can you cope with this dilemma? One way is the Solomon Four Group Design, considered one of the strongest experimental designs with respect to internal validity. In the Solomon Four Group Design, there are four randomized groups of subjects. One group receives a pretest, the experimental treatment and a posttest. The second group is identical, except it does not receive a pretest. The third group receives a pretest and posttest but a different treatment (this could be a group that receives no treatment at all, for example). The final group receives only a posttest and the second treatment (such as no treatment). Below is a diagram of the Solomon Four Group Design:
 
 

GROUP ONE Pretest Treatment 1 Posttest
GROUP TWO   Treatment 1 Posttest only
GROUP THREE Pretest Treatment 2 Posttest
GROUP FOUR   Treatment 2 Posttest only

Solomon Four Group Designs are more expensive because they require more subjects and conditions than other types of experimental treatments. But, many researchers believe the advantages are worth the expense.

We will revisit experiments, and compare them with "quasi experiments", in Guide 4.

ON EXPERIMENTAL DESIGNS WITH INTACT GROUPS

The Wiersma book seems to imply that "intact groups" cannot be part of a "true experiment." This is not necessarily true so assess each situation carefully to see if a true experiment is possible. Suppose you are studying fourth grade classes. The major way the school divides its fourth grade students into classes is through a systematic alphabetical list. If there are five fourth grade classes, every fifth student goes to Class 1, Class 2, and so on. In other words, there is no reason at this particular school to believe any of the fourth grade classes is distinctive at the very beginning of the school year. If you randomly assign classes to different experimental treatments in this example, you will indeed have a "true experiment." The key is that the intact groups were pretty much assembled using random means in the first place.

On the other hand, suppose there was a systematic difference among groups before you applied any kind of intervention, such as Honors classes versus regular classes in school. In such a case, even random assignment of intact groups could not produce a true experimental design. The problem is particularly great if a difference between groups relates to a variable you want to study. For example, Honors math students may react differently to a new way of teaching algebra than students in regular classes.
 

Do extra-terrestrial aliens exist? Consider how you could construct a valid measure.

Susan Carol Losh September 11 2001
This page was built with Netscape Composer
and is best viewed with Netscape Navigator
600 X 800 display resolution.
 

EDF 5481 READINGS AND ASSIGNMENTS

OVERVIEW