Analysis of (ABCDE) Potential
Confounding Linguistic Variables in Achievement Testing Using a Venn diagram of
the Bio-Psycho-Social-Cultural-Spiritual Model
05/24/2012
PSY
470
Scott
Gerhardt
Abstract
This paper implements the biopsychosocioculturalspiritual
model to conduct an etilogical investigation examining the confusion of
linguistic bias in achievement testing as a result standardization. This paper aims to prove that it is not
nature or nurture that achievement tests do not account for; rather it is the
interactions of the spiritual, cultural, sociological, and psychological that
cements traits, and possible states through nurture from the biology,
psychology, and society of each individuals biological endowment. The paper will introduce all concepts and the
interactions prior to examining the WRAT series of achievement tests. The concepts and interactions will be
discussed and related through the use of two 5D Venn diagrams representing
possible perspectives on the same static biopsychosocioculturalspiritual
model. The basic premises will include:
Hebbian theory, biopsychosocioculturalspiritual model, neuroscience,
linguistics, achievement testing, sources of error, semantic relativism, and
learning as sources of Linguistic bias.
Contents
Introduction
The
etiological model of definite, lineal causes for learning and mental health
diseases was inspired in the seventeenth century by Sydenham’s model of
infectious diseases. These antiquated
models can no longer inspire our conceptions about the nature of the brain mind
context relationship. The ambition of psychology and psychiatry to become not only more credible and but also
to be considered as credible as a medical discipline. This ambition began that in the nineteenth century
should not lead us to believe that the brain mind context relationship can be
studied with the conventional medical approach to diseases. Current trends in
psychiatry and neuroscience reconsider
wider, new, refreshing models such as the Complex Systems Theory, or the
biopsychosocioculturalspiritual model. Friederick von Hayek (1899-1992), the
famous Austrian philosopher and economist, believed that the sciences of
complex phenomena, in general, could not be modeled after the sciences that
deal with essentially simple phenomena like physics. Hayek held that complex
phenomena (such as linguistic bias in achievement testing), can only allow
pattern predictions (through modeling), compared with the precise predictions
that can be made out of non-complex phenomena. From the ontological point of
view, matter is organized in different levels (physical, biological,
psychological, and socio-cultural). Those levels cannot be studied with the
same methodological approach. However, taking into account findings coming from
other fields would certainly enrich our comprehension of the complex brain mind
context interaction. Let’s use a metaphor. If we want to understand why a book
we like (e.g. a best-seller) is important for us, we cannot reduce our
comprehensive model to a statistical analysis of the number of chapters,
paragraphs, words, and letters the book contains. These data will tell us
something interesting about the book but little about why it is important not
only for me/us but also for many other readers. Of course, if we eliminate the
“quantitative” composition of the book, we won’t feel and think what we feel.
We need the physical support to read a book. However, we need to incorporate a
“qualitative” (psychological and socio-cultural) explanation to achieve a wider
comprehension of the “best-seller” phenomenon. We can clearly understand that
those different levels cannot be studied with the same methodology. Nor can it be
explained through the other. Finally, models should be dynamic as they change
over time. For instance, the synthesis of evolutionary and developmental
hypothesis for biological phenomena underline that matter is always in change,
though some basic structural features (or organization patterns) may keep some
configurations with minor changes over time.
It is time to incorporate a new multi-level, non-lineal, dynamic,
open-minded model in the academic psychiatric field. Psychiatric journals would
benefit from an “open minded” perspective, including approaches that certainly
tell us crucial things about the human being.
Testing
Achievement
tests measure how much some one knows or has learned. Achievement tests tend to be standardized as
a means of maintaining internal consistency with the teaching methods,
construct meanings, and instruments of assessment; By this methodology the
measure and metrics of achievement tests rely how much knowledge the assessed
has in a particular subject area, be it mathematics, reading, history, aircraft
mechanics, microbiology, culinary science, art, or any other taxonomical system
that can be taught and learned. Of all
the tests administered achievement tests are by far the most common test given.
For
the most part achievement tests are used by ontological endeavors (educational
institutions, scientific research facilities, or anywhere training is
implemented) not only to shape further developments in the structure of the
language spoken, but also that communication is expressed with consistent form,
structure, and meaning over time. The
purposes of achievement tests overlap but the over arching goal of topic
assessment is present in the sub-categories following:
- Achievement tests provide metrics that can be analyzed to determine whether an individual has accomplished (achieved) the (previously defined) level of proficiency and requisite knowledge to advance. Examples are finals in classes that are prerequisite for the next class in a series, i.e. Calculus 251,252, 253, 254, 255, & 256. Testing out of a class is another example some universities provide or CLEP provides privately as a service to students and universities.
- Achievement tests are often used for the categorical grouping of individuals into certain skill areas to empower the instructor to better meet the needs of the group. If the tutor, or professor has an accurate assessment of the students skill the teaching can be targeted more precisely at current levels of proficiency to facilitate further achievement and standardization. Overcoming linguistic bias at this level is necessary to reliably ensure that the instructor can address the primarily linguistic issue directly rather than have the students suffer the consequences of lower test scores later.
- Overlapping with the previous categorical grouping achievement tests can be employed as a diagnostic to determine strengths and weakness. Once an individual students areas of weakness can be indentified and their deficiencies can be catalogued; instruction can be focused on the areas requiring remediation. This and the previous purpose are the two that can be used to close the linguistic bias gap often measured as a cultural, ethnic, or socio economic correlated variance.
- The most political issue that looms largest is the use of achievement tests that are used to assess the success of a program or a political initiative on a school, district, regional, or federal basis. This form of top down testing has become the norm recently as the achievement tests are applied not only to the students but to the institutions. This is done primarily to ensure internal validity among the teachers, trainers, and administrators by adhering to the originally legislated policies and stay true to their intent.
- The metrics produced by achievement tests are used to inform everyone who is helping address levels of progress and what should be dealt with for each individual involved. Ideally these test should be diagnostic because it outlays how these weaknesses should be addressed. Historically variance has not been viewed from the perspective provided by the biopsychosocial model or as a purely linguistic function. Specifically what fires together wires together, and what fires together is a function of the psycho and social levels of the biopsychosocial model, such that the physical biology of the brain is to some degree a function of environment. Variance in environment can be overcome by the use of different teaching methods.
- Achievement tests help define the particular areas that teachers believe are important to assess. The help pinpoint those topics that are important and those that are not. This paper aims to illuminate the linguistic bias that is at the root cause of cultural variance across standardized tests. Achievement test table of specifications rarely account for linguistic variance. Rather they list categorical constructs and pinpoint those topics that are important.
Test specifications must
be developed in order for any test to reach any level of standardization. This is a complex process that allows the
test developer to understand the relationship between the level of items (along
the dimension to be measured), the construct presence to be identified, and the
content of the items. The tables
containing the test specifications are generally created based on curriculum
guides and other information that informs the test developers as to what
content is covered. By any measure the
table of specifications is a fascinating ideas and a terrific guideline when it
comes to the creation of any kind of achievement test. Most achievement tests are standardized.
A
standardized examination, test, or battery of tests is one that has undergone
extensive test development including the writing and rewriting of the items to
dial in the specific construct validity for the population that was used to
standardize the test. Variance is
inevitable when a great deal of norming has been done with what is sometimes
(and historically) a large group of test takers. By these means the test must be narrower than
the population it measures as defined by the development of consistent
directions, administrative procedures, and very clear scoring
instructions. Standardization
Sources of Error
Error
is the difference between one’s observed score and one’s theoretical true
score. Measurement errors affecting the
reliability of a test can be divided into components comprised of random error
and systemic error.
Random sampling error is the
differences between the population and the sample that are due only to chance
factors (random errors ), not to systemic sampling error. Random sampling error may or may not be the
result in an unrepresentative sample.
The magnitude of sampling error due to chance factors can be estimated
statistically. In some contexts a parallel
construct is referred to as chance sampling error. In the context of achievement tests the
neglect of the influence of cultural and linguistic factors is epic and
historic. The very nature of
standardization is to so narrow the linguistic constraints such that variation
is rejected. Variance of language and
culture has not been identified as a root cause for test score variations
across populations, instead it has been written off as random sampling error,
or worse it has been accepted as systemic.
Although the fault and burden do not lay with the administrators, nor
does it lie with the institutions.
Rather the burden lies with the individuals labeled as having not passed
the achievement tests despite the presence of the constructs tested for. This lack of face validity due to errors
resulting from unidentified confound linguistic variables is a result of
linguistic bias, the thought that language alone is a measure of cognition. OR worse that one language is correct and
another is incorrect despite their both recapitulating the exact same construct
via different grammar, word choice, and sentence structure.
Some specialists argue
standardized tests, particularly older standardized tests, are out of
calibration with the construct investigated.
Explaining reasons for systemic error as the result of an over
investment bias in a reliable but invalid test. Systemic errors are the result of biases in
measurement which lead to the situation where the average of many separate
measurements differs significantly from the actual value of the attribute to be
measured. All measurements are prone to
systemic errors. The difficulty of
elucidating systemic errors are the nature of their contributing interrelated,
and cascading affects. Systemic errors can
be functions of culture, whether from organizational culture or the culture the
organization is immersed in. The
assumptions of a culture or of a test administering organizations management
can affect not only the way in which test questions are written, but also in
the way the answers are interpreted.
According to research using the
linguistic category model, verb forms play a significant role in the
transmission and maintenance of intergroup stereotypes, in the form of the
linguistic intergroup bias (Maass et al., 1989). The linguistic intergroup bias
reflects a tendency for speakers to describe the negative behavior of out-group
members and positive behavior of ingroup members using abstract verbs
(suggesting expected stability), whereas they characterize the positive
behavior of out-group members and negative behavior of in-group members using
concrete verbs (implying little expected stability). Two distinctive mechanisms
are assumed to underlie these effects. Research provides evidence of an in-group
protection motive (Maass, Ceccarelli, & Rudin, 1996). Consistent with
social identity theory, research shows that under conditions of threat to an
in-group’s identity or perceived competition, linguistic intergroup bias serves
to maintain a positive group image even in the presence of disconfirming
evidence (Maass et al., 1989, Maas et al., 1996). However, research has also
highlighted an expectancy motive (Maass et al., 1995). Specifically, the
findings of such research suggests that
expectancy consistent behavior is described using abstract verbs given
that such behavior is considered to be lasting and typical, while
expectancy-inconsistent behavior is described in concrete terms given that such
behavior is considered to be transitory and atypical (Maass et al., 1996) No
criterion adjusted tests where the criterion adjustment is based on linguistic
bias potentials.
Bio-Psycho-Social-Cultural-Spiritual Model
The
biopsychosocial models is a general model and approach that posits that
biological, psychological, and social factors all play significant roles in
human functioning. The biopsychosocial
paradigm is also a technical terms for the popular concept of the min-body
connection which addresses more philosophical arguments rather than purely
empirical exploration and clinical application.
(Sarno, 1981)
The
Biopsychosocial model has been used in fields such as medicine, nursing, health
psychology, sociology, psychiatry, family therapy, and clinical psychology; but
not in psychometrics. And the BPS model
has not been used to define a metric of linguistic bias measured as use of
grammar, word choice, and sentence structure.
Neurons that fire together wire together. This is occurring within a psychological,
social, and physical environment that can engender physical constructs
represented as simply as regional accents or euphemisms. Such linguistic artifacts are hurdles for
individuals to overcome in scoring well on achievements tests used to measure
standardization of language that has been deemed necessary.
Linguistic Bias
The
purported source of linguistic bias is the forward thinking manner in which
achievement tests are designed. Working
from cause to effect by means of predictive reasoning creators of achievement
tests have fallen victim to their means, to the language they aim to
standardize; As opposed to the achievement of evaluating the strength, ability,
and persistence of constructs within the individual to be tested working
backward from effect to cause by means of diagnostic reasoning. The discrepancy between the to cognitive
theories implemented results in the belief that variance is not the product of
transient minority effects of multiple confounding variables, one category of
which is linguistic. Linguistic
variation is a function of culture, experience, and learning. The range of differences could be quantified
by using causal Bayes nets, we developed a normative
formulation of how predictive and diagnostic probability judgments should vary
with the strength of alternative causes, causal power, and prior probability.
This model was tested through two experiments that elicited predictive and
diagnostic judgments as well as judgments of the causal parameters for a
variety of scenarios that were designed to differ in strength of alternatives.
Model predictions fit the diagnostic judgments closely, but predictive
judgments displayed systematic neglect of alternative causes, yielding a
relatively poor fit. Three additional experiments provided more evidence of the
neglect of alternative causes in predictive reasoning and ruled out pragmatic
explanations. We conclude that people use causal structure to generate
probability judgments in a sophisticated but not entirely veridical way.
(Fernbach, Darlow, & Sloman, 2011)
This methodology of exploratory data analysis to elucidate the potentiality
for casual parameters and indentify evidence for neglected alternative is
consistent with the uncomfortable science proposed by the statistician John
Tukey in response to an over reliance on confirmatory data analysis.
Non-linguistic
methods of investigation are necessary to avoid the nature of linguistic
thought that is also the source of the bias in this instance. Even Bloom’s Taxonomy cannot account for
linguistic bias, in fact contamination of linguistic bias should be measureable
across all six levels of Bloom’s taxonomy as the error is unavoidable and is
the function of any standardized test.
The degree to which linguistic bias is present at each level of Bloom’s
taxonomy can vary by culture, language, and even the development of the
individual being assessed.
The
issue is the cost of investing to change the ever-plastic brain and the amount
of time spent on various activities that all re-wire the brain versus the time
spent learning the standardized language necessary for success on most
achievement tests. The investment
necessary in overcoming the naturally acquired language is proportional to the
dissonance between the naturally acquired language and the standardized
language necessary to score well on an achievement test. Tomasello (2003) points out that another
crucial prerequisite for language acquisition is the ability of children to
detect patterns in their environment. He reports research that has demonstrated
that infants can detect artificial nonsense words made up of three syllable
sequences. Later the infants respond to
those words, but they do not respond to the syllables presented in a different
order (Saffran, Aslin, & Newport, 1996).
Marcus, Vijayan, Bandi Rao, & Vishton (1999) briefly trained seven
month olds on three syllable
sequences of the form
ABB. Later the infants responded to this
pattern even when the syllables were different (e.g. XYY). Tomasello notes that this ability to detect
abstract patterns in auditory and visual input is not unique to humans. Other primates such as tamarin monkeys also
have this skill. Therefore, pattern
finding is a cognitive capacity that has a deep evolutionary history and
certainly cannot be seen as a specific adaptation for language. Using only a canonical taxonomy of language
to measure cognitive ability is like trying to measure the circumference of the
earth in inches by measure of one intermediate phalange at a time.
Language can be a
powerful conveyor of bias, in both blatant and subtle forms. It is not the language itself, rather it is
the implied frame resultant from the associated meaning of the words chosen to
be used by the speaker, and those heard by you, the listener. Linguistic bias can manipulate perception and
perspective, impacting race/ethnicity, gender, accents, age, (dis)ability and
sexual orientation.
- Native Americans described as "roaming," "wandering," or "roving" across the land. Such language implicitly justifies the seizure of Native lands by "more goal-directed" white Americans who "traveled" or "settled" their way westward.
- The implied frame of such words as forefathers, mankind, and businessman serve to deny the contributions (even the existence) of females.
- The bias against non-English speakers.
In the psychology and
communications literatures, research has shown that language is a subtle but
powerful way to examine cognition in intergroup contexts (Maass & Arcuri,
1992; Maass, Salvi, Arcuri, & Semin, 1989; Semin & Fiedler, 1988, Semin
& Smith, 1999). Because people have wide latitude in their use of
interpersonal verbs that can be used to describe self and others, and because
this word choice is related to their causal attributions and expectations
(Semin & Marsman, 1994), linguistic categories can be used to maintain and
transmit Perceptions. According to
linguistic category models, interpersonal verbs can be arrayed on a continuum
of concrete to abstract forms (Semin & Fiedler, 1988), the evocation of
which are based on differences in causal attributions of, and expectancies for,
behavior (Semin & Marsman, 1994).
Given the findings of research which demonstrates a link between
language and intergroup bias (Franco & Maass, 1999; von Hippel,
Sekaquaptea, & Vargas, 1997), a linguistic category model has implications
for understanding diversity within and between groups
Researchers
have shown that people have wide latitude in their use of verbs that can be
used to describe the self and others; the word choice, grammar, and language of
others is related to their attributions and expectations (Semin & Marsman,
1994). Language is a cultural artifact that emerges as a complex adaptive
system from the verbal interaction among humans. We see the ubiquity of
language acquisition among children generation after generation as the product
of an interactional instinct that, as Tomasello indicates, is based on an
innate drive communicate with and become like conspecifics, (Lee &
Schumann, 2005) to become linguistically standardized. Semin
and Fiedler (1988) introduce a linguistic category model useful for interpreting
people’s attributions of causality and responsibility within a given context.
Specifically, they put forth a general taxonomy that distinguishes between verb
forms at different levels of abstraction. They use the following examples to
articulate their taxonomy: “(a) A is talking to B; (b) A is helping B; (c) A
likes B; and (d) B is an extraverted person”
(p. 558). The first example (i.e., talking) demonstrates the most
concrete level of language. At this level are descriptive action verbs, which
provide an objective description of action, offer no interpretation, and
typically have one or more physically invariant features. Interpretive action
verbs, in contrast, include greater depth of interpretation, pronounced
evaluative components, and do not have physically invariant features – that is,
many different actions could lead to use of the same interpretive action verbs.
As illustrated in Semin
and Fiedler’s (1988)
second example, “helping” could be physical or psychological and therefore,
open for interpretation. Relative to ascriptions of causality and
responsibility, accounts using descriptive action verbs are typically
attributed to situational factors, while accounts using interpretive action
verbs are often attributed to the actor. Consequently, the use of
interpretative (rather than descriptive) action verbs suggests a greater
expectation of similar future behavior. State verbs, which presume knowledge of
the actor’s state of mind, do not maintain reference to any specific behavior
or incidents (e.g., “likes” as used in the third example). Although state verbs
typically evoke causal attributions to the sentence object rather than the
actor (Semin & Marsman, 1994), they convey expected similar future states
of mind from actors. Finally, adjectives, such as “extraverted person” in Semin
and Fiedler’s (1988) example, refer to enduring qualities of persons and
represent the highest level of abstraction.
Because adjectives are typically associated with dispositional
attributions, they are indicative of a continued expectation of similar future
behavior. Variations in taxonomy or
across taxonomies can explain the variance in scores due to linguistic bias. The taxonomy exists to standardize the
language and facilitate communication.
The achievement test is designed to measure and to provide a level of
conformity metric. The construct
required is present in the individual to be assessed although the grammar and
language used to recapitulate the
Hebbian Theory
Hebbian theory concerns how
neurons might connect themselves to becomeengrams. Hebb's theories on the form and function of cell assemblies can
be understood from the following:
·
"The general idea is
an old one, that any two cells or systems of cells that are repeatedly active
at the same time will tend to become 'associated', so that activity in one
facilitates activity in the other." (Hebb 1949)
·
"When one cell
repeatedly assists in firing another, the axon of the first cell develops
synaptic knobs (or enlarges them if they already exist) in contact with the
soma of the second cell.” (Hebb 1949)
Gordon Allport has posited additional ideas regarding cell
assembly theory and its role in forming engrams, along the lines of the concept
of auto-association, described as follows: "If the inputs to a system
cause the same pattern of activity to occur repeatedly, the set of active
elements constituting that pattern will become increasingly strongly
interassociated. That is, each element will tend to turn on every other element
and (with negative weights) to turn off the elements that do not form part of
the pattern. To put it another way, the pattern as a whole will become
'auto-associated'. We may call a learned (auto-associated) pattern an
engram."(Allport, 1985)
Hebbian theory has been the primary basis for the conventional
view that when analyzed from a holistic level, engrams are neuronal nets or
neural networks like those associated with wernicke’s area or broccas region.
Work in the laboratory of Eric Kendel has
provided evidence for the involvement of Hebbian learning mechanisms at
synapses. Much of the work on
long-lasting synaptic changes between vertebrate neurons (such as long-term
potentiations like language acquisition) involves the use of non-physiological
experimental stimulation of brain cells. However, some of the physiologically
relevant synapse modification mechanisms that have been studied in vertebrate
brains do seem to be examples of Hebbian processes. One such study reviews
results from experiments that indicate that long-lasting changes in synaptic
strengths can be induced by physiologically relevant synaptic activity working
through both Hebbian and non-Hebbian mechanisms.
Semantic Relativism
Perception is the interface between
cognition and reality. Descriptive
perceptual relativism is the
empirical claim that certain groups (e.g., those with different cultures,
languages, biological makeup) perceive the world differently. Various
philosophers of science have argued that theory influences perception to such
an extent that partisans of substantially different theories might literally
see the world differently.
Descriptive claims about
perception are sometimes thought to bear on various versions of normative
relativism. For example, some writers have argued that people with different
concepts and beliefs will nevertheless perceive the same things in the much
same way and that these common perceptions can be used as a fixed point from
which to adjudicate the claims of rival frameworks. Most philosophers and
vision scientists today now agree, though, that perception is theory-laden; our perceptual
experiences in a given situation are influenced by the concepts, beliefs,
expectations and, perhaps, even the hopes and desires, which we bring to the
situation.
Normative perceptual relativism is the claim that there
is not just one correct, framework-independent way to perceive things. But
different ways are correct relative to different constellations of concepts and
beliefs. Given modern medical training and practice, a competent radiologist
should see that this spot on the X-ray is a stomach tumor, and anyone with any
sensitivity should see that Sam felt humiliated.
As with most versions of
normative relativism, the strongest versions of normative perceptual
relativism, ones on which “anything goes,” are implausible; there clearly are
constraints on the perception. But weaker versions of the thesis may be
defensible. Perception is theory-laden to some degree; it involves what current
vision scientists call top
down processing. (See Section 1.1 of the supplementary document Relativism and the Constructive
Aspects of Perception for more on top-down processing and
the theory-ladenness of perception); see Section 4 of that same supplement for
more on perceptual relativism; and see Section 5.2 below for an argument against stronger versions of perceptual
relativism.)
In Quine's philosophy the idea of
stimulus meaning is not a special semantics, but rather is an attempt to
isolate the net empirical content of each of various single observation
sentences without regard to the theory that contains them yet without loss of
what the sentence owes to that containing theory. This attempt to isolate the
semantics of observation language is a move away from his earlier critique of
reductionism, where reductionism is understood as statements having a unique
range of possible sensory events, such that the statements can be criticized in
isolation. But at this stage Quine still retains his original thesis of
empirical under determination, in which empirical under determination is
integral to his holistic thesis of semantical indeterminacy or vagueness. The under determination thesis admitting
multiple and alternative observation sentences for the same stimulus situation
presents a question: how can the same stimuli yield alternative stimulus
meanings? One of Quine's answers is that the alternative theories or belief
systems in which the stimulus situation is understood, supply different
significant approximations. But there still remains the question of how
stimulus meanings are to be construed as approximations. Quine has a theory of
vagueness that he setsforth in the third and fourth chapters of Word and
Object, which resembles the latter Wittgenstein's thesis of paradigms, except
that Quine explicitly invokes the behavioristic stimulus-response analysis of
learning. On this analysis Quine rejects the view that stimulations eliciting a
verbal response "red" are a well defined or neatly bounded class. He
maintains that the stimulations are distributed about a central norm, which
when a language is initially being learned, may be a very wide distribution.
The penumbral objects of a vague term are the objects whose similarity to those
for which verbal response has been socially rewarded in the learning process,
is relatively slight. The learning process is an implicit induction on the part
of the subject regarding society's usage, and the penumbral cases are
those words for which that induction is most inconclusive for want of evidence,
because the evidence is not there to be gathered. And society's members have
had to accept similarly fuzzy edges when they were learning. There is an
inevitability of vagueness on the part of terms learned by ostension, and it carries
over to other terms defined by context on the basis of these ostensively
learned terms.
He notes Hanson maintains that
observations vary from observer to observer according to the amount
of knowledge that the observers bring with them. Thus one man's
observation is another man's closed book or flight of fancy, with the result
that observation as the impartial and objective source of evidence for science
is bankrupt. At this stage of Quine's thinking the semantical contribution
of theory to observation is still problematic for him, but he continued to
characterize observation language in terms of behavioristic theory
of learning. In the chapter titled "Observation" in his “The Web
of Belief” (1970) Quine says that an observation sentence is a sentence that
can be learned ostensively by the association of heard words with things
simultaneously observed, an association which is conditioned and reinforced by
social approval or successful communication, and which becomes habitual. And due
to the social character of its learning, the observation sentence must be understandable
by all competent speakers of the language who might be asked to assent to it.
Thus according to Quine the sentence "That is a condenser" is not an
observation sentence, even if experts agree to it. Quine maintains contrary to
the Positivists, that what qualifies a sentence as observational is not the
lack of theoretical terms that may occur in theory formulations, but just that
the sentence taken as an individual whole commands assent consistently or
dissent consistently when the same global sensory stimulation is repeated. This
behavioristic characterization initially enabled Quine to evade reference to
semantics in his identification of observation language, and thereby to
separate his view from that of the Positivists, who defined observation
language in semantical terms.
The scientific hypotheses that purport to
describe things beyond the reach of observation are related to observation
sentences by a kind of one-way implication, such that many alternative
hypotheses may imply the same set of observation sentences, but not vice versa.
Observation sentences do not uniquely imply just one theory purporting to explain the observable events. It now is in this sense that natural science is
"empirically underdetermined" by all possible events. Quine says that under-determination lurks
where there are two irreconcilable theory formulations each of which implies
exactly the desired set of observation conditionals plus extraneous
theoretical matter, and where no formulation affords a tighter fit. In
Quine's vocabulary
the
phrase" observation
conditional" is an empirical generalization expressed in conditional form
and implying an observation sentence describing an individual event. And his phrase
"theory formulation" is a conjunction of the axioms of a
deductive theory, which implies observation conditionals. This is a different sense of "empirical
under-determination" than what Quine meant in "Two Dogmas",
because it resurrects the idea of a semantically neutral observation language,
which philosophers such as Hanson, Kuhn and Feyerabend reject. These philosophers find a phrase such as "same observation
sentences" when speaking of sentences implied by alternative theories to
be very problematic; they deny that different theories can have the same set of
observations due to the contribution of the semantics of theory to the
semantics of observation language.
World"
in Erkenntnis (1975), which as it happens had in 1930 been made the official
journal of the Vienna Circle. This development of the Duhem-Quine thesis
represents a further restriction on Quine's earlier version on his holistic
semantical thesis of observation. Previously he had viewed empirical under-determination
as integral to semantical indeterminacy or vagueness in his semantical holism.
But in this paper he revises the concept of empirical under-determination of
language, and separates it from the holistic view of the Duhem-Quine thesis.
The scientific hypotheses that purport to describe things beyond the reach of
observation are related to observation sentences by a kind of one-way
implication, such that many alternative hypotheses may imply the same set of
observation sentences, but not vice versa. Observation sentences do not
uniquely imply just one theory purporting to explain the observable events. It
now is in this sense that natural science is "empirically underdetermined"
by all possible events. Quine says that
under-determination lurks where there are two irreconcilable theory
formulations each of which implies exactly the desired set of observation
conditionals plus extraneous theoretical matter, and where no formulation
affords a tighter fit. In Quine's vocabulary the phrase" observation
conditional" is an empirical generalization expressed in conditional form
and implying an observation sentence describing an individual event. And his
phrase "theory formulation" is a conjunction of the axioms of a
deductive theory, which implies observation conditionals. This is a different sense of "empirical
under determination" than what Quine meant in "Two Dogmas",
because it resurrects the idea of a semantically neutral observation language,
which philosophers such as Hanson, Kuhn and Feyerabend reject. These
philosophers find a phrase such as "same observation sentences" when
speaking of sentences implied by alternative theories to be very problematic;
they deny that different theories can have the same set of observations due to
the contribution of the semantics of theory to the semantics of
observation language. Having revised
"empirical under determination", Quine then distinguishes his revised
concept from the holistic doctrine of the Duhem-Quine thesis. He reiterates that
the holistic doctrine says that scientific statements are not separately
vulnerable to adverse observations, since it is only jointly as a theory that
they imply their observable consequences, with the result that any one of the
statements can be adhered to in the face of adverse observations by revising
others. Then he states that holism lends credence to the under determination
thesis, because in the face of adverse observations we are free always to choose
among various adequate modifications of our theory, and all possible
observations are insufficient to determine theory uniquely. A second reservation is the breadth of the
theory A third reservation is that the semantical and ontological holism may imply
a cultural relativistic view of truth. Quine
finds a paradox in the thesis of cultural relativism: if truth were culture
bound, then the advocate of cultural relativism ought to see his own culture
bound truth as absolute, which is exactly what is represented by the idea of
construct validity. The cultural
relativist cannot proclaim cultural relativism without rising above it, and he
cannot rise above it without giving it up.
Quine then turns to the issue of irrationality of theory choice, the
argument for cultural relativism that is internal. Quines argues that the choice between
empirically equivalent alternative systems need not be irrational when one
could settle for a frank dualism. Even if it is a dualism merely due to
empirical equivalence.
In his"Empirical Content"
(1981) in Theories and Things, which he notes contains "echoes" from
"Empirically Equivalent Systems of the World", Quine explicitly uses
Hanson's terminology saying that observation sentences are "theory-laden.”
But Quine reconstrues the intended meaning of Hanson's phrase to mean that the
terms embedded in observation sentences may recurin theory formulations. Thus
while Quine here says that observation sentences are theory-laden, he denies to
the semantics of theory any participating role in the semantics of observation.
In fact in Quine's construing of "theory-laden" it is not observation
language that is theory-laden, but rather theory that is observation-laden.
Still later in "Truth" in
his Quiddities (1988) he is explicitly reconciled about refusing to admit
theory any resolving function in the semantics of observation. There he
says that we work out the neatest world system, and we tighten the squeeze by
multiplying the observations. Tightening the squeeze in observation sentences
is the progressive reduction of vagueness but only by the addition of
information in additional observation sentences.
More recently a member of Quine’s
intellectual entourage, Donald Davidson, has attempted to evade semantical
relativism with a turn to instrumentalism. Davidson’s principal statement of
his thesis is set forth in his “The Very Idea of a Conceptual Scheme” (1974)
and “Belief and theBasis of Meaning” (1974) reprinted in his Inquiries into
Truth and Interpretation (1984), a book he dedicates to Quine with an inscription”without
whom not.” He rejects the representationalist view of the semantics of
language, which he considers a third dogma of empiricism after the first two
referenced by Quine in the latter’s 1952 “Two Dogmas” article. Like Dewey’s rejection of the dualism of
“experience” and “nature” Davidson
rejects the dualism of “scheme” and “world”, of “conceptual scheme” associated
with language and “empirical content”, of “organizing system and something waiting
to be organized”, that he finds in the views of Whorf, Kuhn, and
Feyerabend. In this manner he remains more faithful to Quine’s original
behaviorism than Quine did. Given the mutual and reciprocal determination of
between belief and semantics, the decision necessary for interpreting another’s
discourse is to maximize our shared beliefs, such that there can be no basis
for concluding that others have concepts or beliefs radically different from
one’s own. Davidson concludes that in giving up the dualism of scheme and
world, we do not give up the world, but rather re-establish “unmediated touch”
with the familiar objects that make our sentences and opinions true or false.
Thus Davidson argues that there is no conceptual relativism, because there are
no conceptual schemes to be relativistic.
But Davidson’s conclusion is a non sequitur. Firstly he confuses two distinct
questions: one is the question of what is meaning, and the other is the
question of what is the meaning of a term, sentence, or theory and how is this
determination made. The existence of conceptual schemes is an answer to the
former question, and his behavioristic procedure is his answer to the latter
one. The answers are made interdependent only because Davidson is a behaviorist,
which is to accuse him of being a Positivist. And his Positivism makes him
inconsistent with Quine’s and his acceptance of ontological relativity, because
Positivism requires a prior ontological commitment. Davidson does not practice ontological
relativity in his own philosophical discourse. Secondly the word “unmediated”
in his phrase “unmediated touch”, which purportedly justifies his denying
language its representational semantics, is a weasel word. In fact the
interpreter’s charitable decision required for interpretation does not imply
any rejection of the representational nature of the semantics of language. This
interpretative decision is operative when someone uses a dictionary with the
charitable assumption that its lexical entries are true, so that he can assimilate
the meanings of the terms he is researching. And also when a community
of scientists in a profession considers an experiment and agrees on the
validity of the test design statements, so that the scientists can describe the
phenomenon under examination and the experiment’s outcome. Neither the thesis
of the charitable decision required for communication nor the thesis
of the interdependence between truth nor meaning imply any rejection of
the representational nature of the semantics of language; representationalism
is perfectly consistent with both theses. “Representation” may be a weasel word,
because there survives an atavistic belief residual from modern philosophy
including Positivism, that the knower is a spectator to his ideas. Of course the knower can be a spectator of
his ideas, but this inspection is a reflection ex post facto to his firstly
already having the inspected knowledge of the real world. Apart from this
secondary reflective knowledge, the spectator thesis about knowledge of the real
world is readily rejected, when we realize that what we know firstly is not our
ideas, but the real world, and most notably that our knowledge is thus
constituted by our ideas rather than the ideas being an object of knowledge.
Contrary to Davidson, therefore, these and their schemes are quite admissible,
and they very much involve semantical relativism.
Both Quine and Davidson are
motivated to evade semantical relativism, because both mistakenly believe that
a relativistic, context-determined, semantics implies a relativistic thesis of
truth. Regardless of how culture-bound and context-determined may be the
semantics of a language, it is not possible capriciously either to affirm or to
deny truthfully just anything expressed by sentences made with those
concepts. The empirical under determination of language implies that many
alternative sentences can be said which are consistent with the same
observations. Still, the empirical constraint imposed exogenously on sentences
by the recalcitrant real world - even when not yet interpreted - forbids just
any arbitrary distribution of truth-values over a set of logically related,
semantically interpreted grammatical sentences. When any subset of these
sentences is given definitional force to specify its semantics, then only some
of the remainder sentences containing the same descriptive terms can also be
true. Truth is always relative to what
is said, but the real world in which all language users live forbids
ingenuously asserting just any old thing in the semantically interpreted
language. Therefore, semantical relativity does not imply relativism of truth,
but just the opposite: with a meta-theory of semantical description
exhibiting the composite nature of meanings, semantical relativity explains the
partial equivocation that makes it impossible for the same sentences occurring
in two different belief systems, to be completely true in one belief system and
completely false in another. It explains
how the same sentence is not simply and completely the same statement in each
system, but is partially the same in each, and to that extent true in both
systems. And for the same reason it also explains why the semantics of
observation language need not be quarantined from the semantics of theory, in
order to assert the objectivity of truth. Observation statements, which
pragmatically defined are merely singular test design statements, may be common
to pragmatically defined contrary theories, such that belief in the test design
statements makes the test outcome contingent and not willfully or necessarily
verifying, and makes a falsifying test outcome of one of the theories an
objective truth. Each person acquires
the semantics of what Quine calls observation sentences from his own personal
experiences, and he acquires it publicly and ostensively in the circumstances
of his learning situation in his personal history. There is a wide variation
among people between what is learned ostensively and contextually, but even for
those simple statements learned ostensively by most people, inter subjectivity
is increased with successive approximation, as the web of belief grows and
imposes increasingly more shared truth conditions on the ostensively acquired
semantics. The entire web of beliefs may be viewed on analogy with an underdetermined
system of conditional equations, in which the addition of a new equation
further restricts the range of numeric values that the set of variables may
accept as solution sets. One difference between the mathematical system and the
language system is that with just a sufficient number of restrictions the equation
system may admit to only one solution set, whereas language is never restricted
to a unique interpretation. Another noteworthy departure from the mathematical
analogy is that the mathematical variables can take only one numeric value at a
time without becoming ambiguous, while each of the descriptive terms, including
those used as mathematical variables in applied mathematics in empirical
science, simultaneously take on the semantic values distinguishable in the
explicitly related universal statements in the system of beliefs, subject only
to the preservation of univocity. Thus all the terms explicitly related by the
sentences in the web of beliefs may participate in one another's univocal semantics,
and thereby resolve one another's vagueness in relation to each other.
Furthermore as implicit statements are made explicit by deduction, the
vagueness in the meanings of the terms of the system is even further
resolved. But Quine viewed meanings as
abstract or mental "entities", and then developed his behavioristic
theory of stimulus meanings, which he called "behavioral
dispositions" to evade the representative function of language. He could not be expected to have developed a
meta-theory of semantical description enabling him to describe how meanings
participate in one another. The closest Quine came to the idea of semantical
participation was the idea of the resolution of vagueness. His rejection of the
dichotomous analytic-synthetic distinction is a worthy start toward such a meta-theory,
but his rejection of the distinction was actually a rejection of analyticity as
such, except in the cases that he called "analytical hypotheses" used
for translations. As it happens, rejection of the analytic-synthetic dichotomy does
not imply the rejection of analyticity as such. Universally quantified statements
believed to be true for empirical reasons may also be used analytically to
exhibit the complexity in the meanings of their constituent terms by displaying
their component semantic values that constitute the discriminating capability
in the descriptive function of the language. In other words all universal
empirical statements in the web of beliefs are analytical hypotheses. And
theories are those that are viewed as relatively more hypothetical than other
empirical statements.
There
are many means by which to interpret a semantical system. The semantical
rules for interpreting a mechanically generated semantical system might be
viewed as analogous to Carnap's meaning postulates, in that all of them are
stated in the object language instead of the meta-language, and are not like
Carnap's rules of designation, which occur in the meta-language. Two
relevant types of semantical rules may be distinguished. One type
consists of those semantical rules that are the mechanically generated
statements and equations. These consist only of the statements
constituting a mechanically generated and empirically acceptable theory, the
outputted theory statements that are believed to be true. But not all the
semantical rules occurring in the object language are mechanically
generated. A second type consists of test design statements, which are
accepted independently of any statements of theory generated by the system, so
that the generated theory is not tautological and can be tested independently. But the semantical rules for mechanically
generated semantical systems are unlike Carnap's meaning postulates, because
they are not just analytical sentences. With Quine's rejection of any
distinctively analytic truth it is possible to view sentences as both analytic
and synthetic, and the semantical rules that describe the semantical
interpretation of the object-language statements must be viewed as both
analytic and synthetic sentences. They are more like Quine's analytical
hypotheses or discursive postulates. These semantical rules might also be
viewed as similar to Carnap's reduction sentences, which he says determine only
"part" of the meaning of theoretical terms. But Carnap has
never explained how it is possible for the meanings of terms to have
parts. Viewing the sentences as both analytic and synthetic enables the
empirical statements constituting the generated theory to exhibit the parts of
the meanings of their constituent terms, just as analytic statements always
have. Test design statements and generated theory statements, both of
which are believed to be true for empirical reasons and not due to the meanings
of their constituent terms, are object-language statements functioning as
semantical rules, each of which contribute parts to the meaning of each of
their common descriptive terms.
“anything that exists in any amount can
be measured” Rene Descartes
Including
bias
What is the name of the shape of the volume produced by the
intersection of two spheres when the volume’s largest dimension is lesser than
or equal to the radius of the smaller sphere?
WRAT
Since its introduction, the Wide
Range Achievement Test (WRAT-#) has undergone five revisions, has multiple sub
test addenda and has been built upon with a number of expansions. The WRAT has been applied to numerous
settings, along with the development of specialty applications such as the new
WRAT-Expanded. The proprietary test was
further developed by Sidney W. Bijou & Joseph Jastak, after having been
originally written by Gary J. Robertson.
It was first published in 1946 by Joesph F. Jastak (Wilkinson &
Robertson, 2006). The WRAT was written
as an achievement test battery designed to assess the core curricular domains
of reading, mathematics, and oral written language (Jastak & Jastak, 1965).
The entire WRAT line of tests is now published and sold as intellectual
property by Physiological Assessment Resources Inc. of Lutz Florida. The three sections of the WRAT require
roughly an hour and fifty minutes to complete.
The reading comprehension portion requires 40 minutes to administer, the
mathematics section requires 40 minutes to administer, and the nonverbal
reasoning requires a half hour to complete
(Wilkinson & Robertson, 2006). There
are different forms for each defined age range or specific application with
prices for the tests ranging from $190.00 to in excess of $500.00 for expanded
packets or multiple tests, available online.
The Wide Range Achievement Test 3
is appropriate for individuals whose ages range from five to 94 years. The WRAT3 is a measure of the individual’s
ability to read words, comprehend sentences, spelling, and computational mathematics
problems appropriate for the age range of the individual taking the test. The WRAT3 has been used in a variety of
settings as a measure of the basic academic skills necessary for effective
learning, communication, and thinking
(Wilkinson & Robertson, 1993).
The WRAT4 provides two equivalent forms named Blue, and Green; these
forms are implemented for retesting on an accelerated time line without having
to correct for practice effects from having repeated the same version of the
test. The wide range achievement test
has two normative levels (Wilkinson & Robertson, 2006).
1. Level I
a. Level one is normalized for children
aged five years and zero months to eleven years and eleven months.
2. Level II
a. Level two is normalized for children
aged twelve years zero months all the way thru 64 years (The WRAT is used in
gerontology until age 94).
Only standard scores are used for
comparisons among scores. The norms
provided for the 1978 edition include standard scores (remarkably similar to
the Standford-Binet IQ test) with a mean of 100 and a standard deviation of
15. The scores are reported as
percentile scores for each grade level.
The standard scores are scaled from the norm group baseline. The grade levels are arbitrarily assigned and
can be interpreted only as rough references to achievement level.
The reliability reported in the test
manual lists split-half reliabilities of .98 for reading at both levels, .94
for arithmetic at both levels, .96 for spelling I, and .97 for spelling
II. During the norming study, both
levels of the WRAT4 were administered to children ages 9 through 14. Since
there is overlap in skills tested between the high end of level I and the low
end of level II, this provides another estimate of the reliability of both. The
split half reliabilities for reading and spelling, for all age groups, had a
narrow range from .88 to .94. The split
half reliability for arithmetic ranged from .79 to .89. These results indicate
that, overall, the reliability of the WRAT4 is excellent (Wilkinson & Robertson,
2006). For
comparative purposes, the validity can be compared to the test most similar to
the WRAT4, the Peabody Individual Achievement Test (PIAT). The PIAT is another short, individually
administered achievement test which covers comparable material. In general the
WRAT4 correlates very highly with the PIAT. The WRAT4 correlates moderately
with various IQ tests, and with other achievement tests in the range of .40 to
.70 for most groups and most achievement and ability tests (Cordon, &
Snyder, 1981).
The WRAT4 norms are based on 15,200
subjects from seven states. According to the manual, no attempt was made to
make the sample representative of national characteristics. The manual states
that minorities were represented, but gives no data on their representation.
The sample was stratified by age, sex, and approximate ability. Possible applications of the WRAT 4 test
described in the manual include comparing achievement of one person to another,
determining learning ability or learning disability, comparing codes with
comprehension in order to prescribe remedial programs, and informally assessing
error patterns to plan instructional programs.
An added feature in the WRAT4 is a reading composite score (Wilknison
& Robertson, 2006).
The wide range achievement test-Expanded
edition is an achievement test battery “designed to assess the core curricular
domains of reading, mathematics, and oral and written language. Published by the same Psychological
Assessment Resources Incorporated that publishes the full line of WRAT
assessment instruments, the WRAT-Expanded was designed to complement the afore
mentioned WRAT3 (WRAT3; T6:2713) and WRAT4 and was published in 2001-2002. The WRAT-Expanded individual assessment (Form
I) is used for ages 5-24 to score an individual’s areas of reading, mathematics,
oral expression, and written language (Bo Zhang, MMY16). The Form I assessment administration requires
30 minutes to administer and costs $195.00 per WRAT-Expanded Individual
assessment (Form I) reading/mathematics module package including reading and
math flipbook, reading and mathematics manual (2002, 185 pages), 25 response
forms, and technical supplements; $55.00 per manual; $120.00 per flip book;
$45.00 per 25 response forms; $45.00 per technical manual.
The WRAT-Expanded group assessment (Form
G) is used for grades 2-12 to score group areas of reading (basic reading &
reading comprehension), mathematics, nonverbal reasoning. The Form G assessment administration requires
125-135 minutes and costs $195.00 per WRAT-Expanded Group assessment (Form G)
comprehensive package including 10 booklets and scoring for each level (2001,
71-75 pages each), and technical supplement (2001 80 pages); $95.00 per 25
booklets, 25 scoring forms, and manual (for the specified level); $45.00 per
technical supplement.
Both forms of the Wide
Range Achievement test-Expanded edition are achievement test batteries designed
to assess the core curricular domains of reading, mathematics, oral and written
language, and nonverbal reasoning. The
WRAT-Expanded, Form G for Group assessments is designed to measure reading
(basic reading, reading comprehension), mathematics, and nonverbal reasoning
for students in Grades 2-12. And form I
is an individual assessment designed to measure reading, mathematics, listening
comprehension, oral expression, and written language for examinees from ages
5-24. The WRAT-Expanded is organized in
terms of five levels:
1.
Level 1 (Grade 2, 120 Items)
2.
Level 2 (Grades 3-4, 121 Items)
3.
Level 3 (Grades 5-6, 120 Items)
4.
Level 4 (Grades 7-9, 119 Items)
5.
Level 5 (Grades 10-12, 120 Items)
The individualized component of the WRAT-Expanded
(Form I) consists of 70 items in reading and 75 items in mathematics. The WRAT-Expanded was designed to provide
users with a group administered achievement test closely linked to an
individually administered test with similar content and interpretative
systems. This integrated system is
intended to provide an initial screen based on a group assessment (Form G) with
an in-depth individual assessment (Form I) available for targeted
students. According to the author, the
WRAT-Expanded (Form G, group assessment) has six intended uses: preliminary
screening to identify students for individual assessments, placement of new or
transfer students, identification of special students, developing of individual
instructional goals and plans, periodic assessment of special groups of
students, and research uses. Intended
uses of Form I are to compliment the WRAT3 assessment, follow-up assessments of
students across a wide range of achievement, placement of students, assessment
of ability-achiever discrepancies and research use (Wilkinson & Robertson,
2006).
The WRAT-Expanded consists of reading,
mathematics, and nonverbal reasoning tests.
The content selected for the reading test is based on the assumption
that the ability to read and understand printed material is necessary for
success in school. Specifications for
the reading test were developed by the author of the Stanford Diagnostic
Reading Test (Dr. Bjorn Karlsen).
Synthetic reading passages are used rather than authentic text. The mathematics test is designed to measure
the ability to understand concepts and apply them to the solution of various
types of mathematical problems.
According to the author, the emphasis of the mathematics test is
consistent with current National Council of Teachers of Mathematics
standards. The nonverbal reasoning test
consists of nonverbal classification items that are used in other ability
tests, such as the Otis-Lennon School Ability Test and the Cognitive Abilities
Test (Wilkinson &
Robertson, 2006).
The author of the WRAT-Expanded provides a
variety of manuals describing the psychometric characteristics of the
instrument. General information about
the instrument is provided in several administration manuals organized by
levels (1 to 5), whereas more detailed psychometric information is provided in
a separate technical supplement (Wilkinson & Robertson, 2006).
The group and individual
forms of the WRAT-Expanded were normalized concurrently in grades 2-12 with
common groups of students completing both Form G and Form I. The norms for Form G are based on 8,136
students from a variety of settings. It
should be noted that private schools were excluded from the normalizing
procedure; and that only parochial schools were sampled for non public
schools. The authors have been very
careful to construct a normative sample that is demographically representative
of the school-age population in the United States based on census data. Out-of-level norms are also available. The normative data for the individually
administered test (Form I) is based on fewer students (N=635). Detailed information provided in the
technical supplement allows potential users to see the close match of the
normative sample to the school-age population.
Because the WRAT-Expanded covers a wide age range, the potential user
needs to be reminded and cautioned about how few examinees actually were
assessed within each grade or age group.
A detailed description of the norms is provided in the technical
supplement to the WRAT-Expanded and the manual for the individual assessment,
but not in the administrative manuals for the group assessments (Form G). Because users may not have access to the technical
supplement, and because the description of the norms is essential for proper
interpretation of the standardized scores, future versions of the group
administration manuals should include a description of the norms. The author chose to report the scores based
on the normative sample as follows:
1.
Grade based standard scores
2.
Age based standard scores
3.
Percentile ranks
4.
Stanine groups
5.
Normal curve equivalents
6.
Grade equivalents
7.
Ability scale scores
In the technical supplement, evidence regarding
the precision of form G is based on internal consistency reliability (KR20) and
test-retest reliability. Across grades
the KR-20 coefficients for the standardization sample (range across grades in
parentheses) as follows:
1.
Reading (.86-.90)
2.
Mathematics (.80-.88)
3.
Non-verbal reasoning (.80-.89)
The test re-test reliability coefficients for the
standardization sample are as follows:
1.
Reading (.75-.90)
2.
Mathematics (.76-.90)
3.
Non-verbal reasoning (.75-.79)
The WRAT-Expanded exhibits reliability
coefficients that are comparable to those obtained with similar instruments. The author provides standard errors of
measurement with a clear description of potential users on how to interpret
SEM’s. Information regarding the
precision of content skill areas is not provided. No scorer reliability information is provided
for the individually administered Form I test (Wilkinson & Robertson, 2006). The statement, “these reliability
coefficients indicate that the WRAT-Expanded tests are measuring their
designated constructs with sufficient consistency. Or homogeneity, to yield
dependable results” (Technical Supplement pg. 29) cannot be supported by those
relatively high reliability coefficients due to semantic relativism. Whether any test is measuring those skills is
a construct validity issue within the realm of semantic relativism as opposed
to a reliability concern.
One
shortcoming for the test-retest reliability study is the relatively small and
non representative sample used. All
students came from one parochial school.
Sample sizes at five grade levels ranged from 19 to 124. For example, at level 5 only 19 students were
used to obtain the test-retest reliability estimate for the reading subtest for
the Form G. Worse yet, there was no test-retest
coefficient reported for the mathematics and nonverbal reasoning subtests for
that level. For the individual form only
97 students from ages 5-17 were included in the test-retest reliability
study. Confidence bands of test scores
are tabulated for convenient use. The
issue is that 85% and 90% confidence bands were provided for Form G while 90%
and 95% confidence bands were provided for Form I; providing a potentiality for
unnecessary discrepancies in test score interpretation.
Validity information for Form G is
presented in terms of internal evidence (content, inter-correlations between
tests within the WRAT-Expanded speed of responding, and differential item
functioning) and external evidence (correlations between Form G and a variety
of group and individually administered achievement and ability tests). Evidence regarding clinical validity is
reported in terms of mean differences between two groups of students (learning
disabled and gifted) as identified by their teachers, and matched comparison
groups of students. The author is very
clear that the validity evidence represents, “only a start of the validation
process for the WRAT-Expanded.” (Wilkinson & Robertson, 2006)
The WRAT-Expanded test seems to show a
high level of content validity. An
informative description of the test development is provided for both the group
and individual forms in the technical manuals.
Content experts were hired to assist in creating specific domains in
order to ensure overall validity as well as construct validity. The authors of the test controlled for
readability and the difficulty level of the vocabulary to improve the content
validity of the test. Facial bias
(facial validity) was screened by content experts for possible racial and
religious offences as well as gender stereotyping.
The WRAT-Expanded suffers from many of the
problems encountered with wide-range achievement tests designed to measure
educational performance over large age and grade ranges. There are very few items within each content
area, testing time is quite short, and relatively few students are included in
the norms for each grade and age group.
Of course, some of these weaknesses can also be viewed as strengths by
potential users who seek a quick and preliminary assessment of their students. Given the intended purposes of the instrument
the WRAT-Expanded has some nice features that potential users may find
useful. In particular, the simultaneous
norming of forms G and I make comparisons between achievement and ability
levels more meaningful (Wilkinson
& Robertson, 1993).
One of the intended uses of the WRAT-Expanded
is the identification of relative strengths and weaknesses of students in
reading, mathematics, and non-verbal reasoning.
The most straightforward interpretation is to view the comparisons of
student performance in terms of normative data and the distribution of score
difference obtained within the normative sample. Rasch measurement theory was used to
calibrate the WRAT-Expanded.
Unfortunately, the author does not provide the psychometric information
necessary to evaluate model-data fit. Traditional
psychometric criteria are provided but the additional information available
based on the use of Rasch measurement is not reported. The author should include information
regarding item fit, reliability of person separation and other Rasch-based psychometric
criteria. The WRAT-Expanded does not
take full advantage of Rasch measurement and its useful features, such as
variable maps and person fit displays, that can enhance the usefulness and
interpretability of the scores.
Potential users should carefully consider the alignment between the
content of the WRAT-Expanded, and the school curriculum when using the test as
an achievement test. No validity
evidence is currently available to support the accuracy of decisions based on
the use of the WRAT-Expanded as a screen to identify students for individual
assessments or the identification of students with special needs (Englehard,
MMY16).
As with all other instruments that include
individual assessments, the utility of the WRAT-Expanded depends fundamentally
on the proper interpretation and use of the test by clinicians. Target users are ages 7-18 for the group form
and ages 5-24 for the individual form.
No explanation is provided about the discrepancy in content coverage and
age difference between these two forms.
Although the author did not specify situations in which one would choose
to administer the group form rather than the individual form, or vice versa, it
was recommended that the group form serve as a screening tool and the
individual form as a more in-depth assessment of learning difficulty; primarily
because the individual form classifies students into finer categories.
The modifications made to the
WRAT-Expanded resulted in a tool that appears to be much better at identifying
possible learning deficiencies than the WRAT3; primarily because the
WRAT-Expanded focuses on a narrower and younger age range than the WRAT3. The modifications made to the WRAT3 by the
expanded version results in a tool that is much better at identifying possible learning
deficiencies as compared to its predecessors, if only for elementary and
secondary school children. While the
tests were standardized concurrently, they are not classified into similar
tranches (a division or portion of a pool or whole, in this case ordinal
levels). Therefore, the normative scores
have the same meaning while the classifications do not. Another advantage of the expanded version is
that the aptitude-achievement discrepancy can be investigated as the WRIT (Wide
Range Intelligence Test; 15:279). The
reading test may also lack validity due to a lack of authenticity, as the
reading passages were written specifically for this test, they do not
necessarily represent material that would be read in the “real” world.
References
Anastasi, A.
(1982). Psychological testing. (5th
ed). New York: Macmillan.
Cordon, B.,
& Snyder, M., (1981). A Comparison of learning disabled college students
achievement
from WRAT and PIAT grade, standard and subtest scores, Psychology in the Schools, 18, pg 28-34.
Hale, R. L.,
(1981). The utility of the Stanford-Binet in predicting WRAT performance,
Psychology in the Schools, 16, pg 488-490.
Jastak, J. F.,
Jastak, S,R,. (1965). The wide range
achievement test: Manual of Instructions
Revised Edition. Wilimington,
De.: Guidance Associates.
Jastak, J. F.,
Jastak, S,R,. (1978) Manual of instructions:
The Wide Range Achievement Test,
Wilimington De.: Jastak Associates.
Lubin, B.,
Wallis, R., Paine, C., (1971). Patterns
of psychological test usage in the United States
1935-1969,
Professional Psychology.
Prasse, D. P.,
Siewert, J, C,. (1983). An analysis of performance on reading subtests from the
1978 wide
range achievement test and woodcock reading mastery test with the WISC-R for
Learning Disabled and Regular Education Students, Journal of Learning Disabilities.
Roberts, R.,
Santa-Barbara, J., WoodWard, C.A., (1975). Test-retest reliability of the wide
range achievement test, Journal of
Clinical Psychology. 70-74.
Schale, K. W.,
Roberts, J., School achievement of children as measured by the reading and
arithmetic
subtests of the wide range achievement test.
U.S. Department of Health,
Education and Welfare, Public Health Service Publication No. 1000- Series
11, Rockville, Md.
Wide range achievement test-expanded
edition. In (2005). R. Spies & B. Plake (Eds.), The
sixteenth mental measurements yearbook (16th ed. pp. 1136-1141). Lincoln, Nebraska: The Univeristy
of Nebraska Press.
Wilkinson, G.
S. (1993). Wide range achievement test
(3rd ed.). Wilmington, DE: Wide
Range, Inc.
Wilkinson, G. S., & Robertson, G. J. (2006). Wide range achievement test 4
professional
manual. Lutz, FL: Psychological Assessment
Resources.
COMPREHENSIVE
ASSESSMENT OF CULTURALLY AND LINGUISTICALLY DIVERSE
STUDENTS:
A
SYSTEMATIC, PRACTICAL APPROACH FOR NONDISCRIMINATORY ASSESSMENT. (2004) Samuel
O. Ortiz
Applied cross cultural psychology
The achievement gap (2010) Karen Mille
Measuring up: what educational
testing really tells us (2004) Daniel Koretz
The language police: how pressure
restricts what students learn (2004)
The bias of communication (1951)
Harold A Innis
Test better, teach better, the
instructional role of assessment. (2003) W James Popham
Braquehais, M., D., (2010) Complex system
Theories are Necessary for a Better Understanding of out BioPsychoSocioCultural
Constitution. BIOLOGINĖ PSICHIATRIJA IR PSICHOFARMAKOLOGIJA, T12, pg 102.
Sarno, John E. MD "The Mindbody
Prescription: Healing the Body, Healing the Pain." 1998
Maass, A., Ceccarelli, R., &
Rudin, S. (1996). Linguistic intergroup bias: Evidence for in-groupprotective
motivation. Journal of Personality and
Social Psychology, 71, 512-526
Maass, A., Milesi, A., Zabbini, S.,
& Stahlberg, D. (1995). Linguistic intergroup bias:
Differential expectancies or
in-group protection. Journal of
Personality and Social
Psychology, 68, 116-126.
Maass, A., Salvi, D., Arcuri, L.,
& Semin, G. (1989). Language use in intergroup contexts: The
linguistic intergroup bias. Journal of Personality and Social Psychology,
57, 981-993.
Maass, A., & Arcuri, L. (1996).
Language and stereotyping. In C. N. Macrae, C. Stangor, & M.
Hewstone (Eds.), Stereotypes and
stereotyping (pp. 193-226). New York: Guilford Press.
Franco, F. M., & Maass, A. (1999).
Intentional control over prejudice: When the choice of the measure matters.
European Journal of Social Psychology, 29, 469-477.
Maass, A., & Arcuri, L. (1992). The
role of language in the persistence of stereotypes. In K. Fiedler & G.
Semin (Eds.), Towards a social psychology of powerful linguistic devices (pp.
129-143). Newbury Park, CA: Sage.
Semin, G. R, & Marsman, J. G.
(1994). Multiple inference-inviting properties of interpersonal verbs: Event
instigation, dispositional inference, and implicit causality. Journal of
Personality and Social Psychology, 67, 836-849
Semin, G. R., & Fiedler, K. (1988).
The cognitive functions of linguistic categories in describing persons: Social
cognition and language. Journal of Personality and Social Psychology, 54,
558-568.
Fernbach, Philip M.;
Darlow, Adam; Sloman, Steven A. (2011). Asymmetries in predictive and
diagnostic reasoning.
Journal of Experimental Psychology,140,
2011, 168-185. doi:10.1037/a0022100
Nell, V. (2000). Cross-cultural
neuropsychological assessment: Theory and practice. Mahwah:
Erlbaum.
Ripich, D. N., Carpenter, B., &
Ziol, E. (1997). Comparison of African American and white
persons with Alzheimer’s disease on
language measures. Neurology, 48, 781–783.
Johnson-Selfridge, M. T., Zalewski,
C., & Aboudarham, J. F. (1998). The relationship
between ethnicity and word fluency.
Archives of Clinical Neuropsychology, 13, 319–325.
Marcopulos, B. McLain, C. A., &
Giuliano, A. J. (1997). Cognitive impairment or inadequate
norms? A study of healthy, rural
older adults with limited education. The Clinical Neuropsychologist, 11,
111–131.
No comments:
Post a Comment