Saturday, July 20, 2013

Analysis of (ABCDE) Potential Confounding Linguistic Variables in Achievement Testing Using a Venn diagram of the Bio-Psycho-Social-Cultural-Spiritual Model



Analysis of (ABCDE) Potential Confounding Linguistic Variables in Achievement Testing Using a Venn diagram of the Bio-Psycho-Social-Cultural-Spiritual Model




05/24/2012
PSY 470
Scott Gerhardt
Abstract
This paper implements the biopsychosocioculturalspiritual model to conduct an etilogical investigation examining the confusion of linguistic bias in achievement testing as a result standardization.  This paper aims to prove that it is not nature or nurture that achievement tests do not account for; rather it is the interactions of the spiritual, cultural, sociological, and psychological that cements traits, and possible states through nurture from the biology, psychology, and society of each individuals biological endowment.  The paper will introduce all concepts and the interactions prior to examining the WRAT series of achievement tests.  The concepts and interactions will be discussed and related through the use of two 5D Venn diagrams representing possible perspectives on the same static biopsychosocioculturalspiritual model.  The basic premises will include: Hebbian theory, biopsychosocioculturalspiritual model, neuroscience, linguistics, achievement testing, sources of error, semantic relativism, and learning as sources of Linguistic bias. 


Contents

 




Introduction

The etiological model of definite, lineal causes for learning and mental health diseases was inspired in the seventeenth century by Sydenham’s model of infectious diseases.  These antiquated models can no longer inspire our conceptions about the nature of the brain mind context relationship. The ambition of psychology and psychiatry  to become not only more credible and but also to be considered as credible as a medical discipline.  This ambition began that in the nineteenth century should not lead us to believe that the brain mind context relationship can be studied with the conventional medical approach to diseases. Current trends in psychiatry and neuroscience  reconsider wider, new, refreshing models such as the Complex Systems Theory, or the biopsychosocioculturalspiritual model. Friederick von Hayek (1899-1992), the famous Austrian philosopher and economist, believed that the sciences of complex phenomena, in general, could not be modeled after the sciences that deal with essentially simple phenomena like physics. Hayek held that complex phenomena (such as linguistic bias in achievement testing), can only allow pattern predictions (through modeling), compared with the precise predictions that can be made out of non-complex phenomena. From the ontological point of view, matter is organized in different levels (physical, biological, psychological, and socio-cultural). Those levels cannot be studied with the same methodological approach. However, taking into account findings coming from other fields would certainly enrich our comprehension of the complex brain mind context interaction. Let’s use a metaphor. If we want to understand why a book we like (e.g. a best-seller) is important for us, we cannot reduce our comprehensive model to a statistical analysis of the number of chapters, paragraphs, words, and letters the book contains. These data will tell us something interesting about the book but little about why it is important not only for me/us but also for many other readers. Of course, if we eliminate the “quantitative” composition of the book, we won’t feel and think what we feel. We need the physical support to read a book. However, we need to incorporate a “qualitative” (psychological and socio-cultural) explanation to achieve a wider comprehension of the “best-seller” phenomenon. We can clearly understand that those different levels cannot be studied with the same methodology. Nor can it be explained through the other. Finally, models should be dynamic as they change over time. For instance, the synthesis of evolutionary and developmental hypothesis for biological phenomena underline that matter is always in change, though some basic structural features (or organization patterns) may keep some configurations with minor changes over time.  It is time to incorporate a new multi-level, non-lineal, dynamic, open-minded model in the academic psychiatric field. Psychiatric journals would benefit from an “open minded” perspective, including approaches that certainly tell us crucial things about the human being.

Testing

Achievement tests measure how much some one knows or has learned.  Achievement tests tend to be standardized as a means of maintaining internal consistency with the teaching methods, construct meanings, and instruments of assessment; By this methodology the measure and metrics of achievement tests rely how much knowledge the assessed has in a particular subject area, be it mathematics, reading, history, aircraft mechanics, microbiology, culinary science, art, or any other taxonomical system that can be taught and learned.  Of all the tests administered achievement tests are by far the most common test given. 
For the most part achievement tests are used by ontological endeavors (educational institutions, scientific research facilities, or anywhere training is implemented) not only to shape further developments in the structure of the language spoken, but also that communication is expressed with consistent form, structure, and meaning over time.  The purposes of achievement tests overlap but the over arching goal of topic assessment is present in the sub-categories following:
  • Achievement tests provide metrics that can be analyzed to determine whether an individual has accomplished (achieved) the (previously defined) level of proficiency and requisite knowledge to advance.  Examples are finals in classes that are prerequisite for the next class in a series, i.e. Calculus 251,252, 253, 254, 255, & 256.  Testing out of a class is another example some universities provide or CLEP provides privately as a service to students and universities.
  • Achievement tests are often used for the categorical grouping of individuals into certain skill areas to empower the instructor to better meet the needs of the group.   If the tutor, or professor has an accurate assessment of the students skill the teaching can be targeted more precisely at current levels of proficiency to facilitate further achievement and standardization.  Overcoming linguistic bias at this level is necessary to reliably ensure that the instructor can address the primarily linguistic issue directly rather than have the students suffer the consequences of lower test scores later.
  • Overlapping with the previous categorical grouping achievement tests can be employed as a diagnostic to determine strengths and weakness.  Once an individual students areas of weakness can be indentified and their deficiencies can be catalogued; instruction can be focused on the areas requiring remediation.  This and the previous purpose are the two that can be used to close the linguistic bias gap often measured as a cultural, ethnic, or socio economic correlated variance. 
  • The most political issue that looms largest is the use of achievement tests that are used to assess the success of a program or a political initiative on a school, district, regional, or federal basis.  This form of top down testing has become the norm recently as the achievement tests are applied not only to the students but to the institutions.  This is done primarily to ensure internal validity among the teachers, trainers, and administrators by adhering to the originally legislated policies and stay true to their intent.
  • The metrics produced by achievement tests are used to inform everyone who is helping address levels of progress and what should be dealt with for each individual involved.  Ideally these test should be diagnostic because it outlays how these weaknesses should be addressed.  Historically variance has not been viewed from the perspective provided by the biopsychosocial model or as a purely linguistic function.  Specifically what fires together wires together, and what fires together is a function of the psycho and social levels of the biopsychosocial model, such that the physical biology of the brain is to some degree a function of environment.  Variance in environment can be overcome by the use of different teaching methods.
  • Achievement tests help define the particular areas that teachers believe are important to assess.  The help pinpoint those topics that are important and those that are not.  This paper aims to illuminate the linguistic bias that is at the root cause of cultural variance across standardized tests.  Achievement test table of specifications rarely account for linguistic variance.  Rather they list categorical constructs and pinpoint those topics that are important.
Test specifications must be developed in order for any test to reach any level of standardization.  This is a complex process that allows the test developer to understand the relationship between the level of items (along the dimension to be measured), the construct presence to be identified, and the content of the items.  The tables containing the test specifications are generally created based on curriculum guides and other information that informs the test developers as to what content is covered.  By any measure the table of specifications is a fascinating ideas and a terrific guideline when it comes to the creation of any kind of achievement test.  Most achievement tests are standardized.
A standardized examination, test, or battery of tests is one that has undergone extensive test development including the writing and rewriting of the items to dial in the specific construct validity for the population that was used to standardize the test.  Variance is inevitable when a great deal of norming has been done with what is sometimes (and historically) a large group of test takers.  By these means the test must be narrower than the population it measures as defined by the development of consistent directions, administrative procedures, and very clear scoring instructions.  Standardization 

Sources of Error

Error is the difference between one’s observed score and one’s theoretical true score.  Measurement errors affecting the reliability of a test can be divided into components comprised of random error and systemic error.
Random sampling error is the differences between the population and the sample that are due only to chance factors (random errors ), not to systemic sampling error.  Random sampling error may or may not be the result in an unrepresentative sample.  The magnitude of sampling error due to chance factors can be estimated statistically.  In some contexts a parallel construct is referred to as chance sampling error.  In the context of achievement tests the neglect of the influence of cultural and linguistic factors is epic and historic.  The very nature of standardization is to so narrow the linguistic constraints such that variation is rejected.  Variance of language and culture has not been identified as a root cause for test score variations across populations, instead it has been written off as random sampling error, or worse it has been accepted as systemic.  Although the fault and burden do not lay with the administrators, nor does it lie with the institutions.  Rather the burden lies with the individuals labeled as having not passed the achievement tests despite the presence of the constructs tested for.  This lack of face validity due to errors resulting from unidentified confound linguistic variables is a result of linguistic bias, the thought that language alone is a measure of cognition.  OR worse that one language is correct and another is incorrect despite their both recapitulating the exact same construct via different grammar, word choice, and sentence structure. 
Some specialists argue standardized tests, particularly older standardized tests, are out of calibration with the construct investigated.  Explaining reasons for systemic error as the result of an over investment bias in a reliable but invalid test.   Systemic errors are the result of biases in measurement which lead to the situation where the average of many separate measurements differs significantly from the actual value of the attribute to be measured.  All measurements are prone to systemic errors.  The difficulty of elucidating systemic errors are the nature of their contributing interrelated, and cascading affects.  Systemic errors can be functions of culture, whether from organizational culture or the culture the organization is immersed in.  The assumptions of a culture or of a test administering organizations management can affect not only the way in which test questions are written, but also in the way the answers are interpreted. 
According to research using the linguistic category model, verb forms play a significant role in the transmission and maintenance of intergroup stereotypes, in the form of the linguistic intergroup bias (Maass et al., 1989). The linguistic intergroup bias reflects a tendency for speakers to describe the negative behavior of out-group members and positive behavior of ingroup members using abstract verbs (suggesting expected stability), whereas they characterize the positive behavior of out-group members and negative behavior of in-group members using concrete verbs (implying little expected stability). Two distinctive mechanisms are assumed to underlie these effects. Research provides evidence of an in-group protection motive (Maass, Ceccarelli, & Rudin, 1996). Consistent with social identity theory, research shows that under conditions of threat to an in-group’s identity or perceived competition, linguistic intergroup bias serves to maintain a positive group image even in the presence of disconfirming evidence (Maass et al., 1989, Maas et al., 1996). However, research has also highlighted an expectancy motive (Maass et al., 1995). Specifically, the findings of such research suggests that  expectancy consistent behavior is described using abstract verbs given that such behavior is considered to be lasting and typical, while expectancy-inconsistent behavior is described in concrete terms given that such behavior is considered to be transitory and atypical (Maass et al., 1996) No criterion adjusted tests where the criterion adjustment is based on linguistic bias potentials. 

Bio-Psycho-Social-Cultural-Spiritual Model




The biopsychosocial models is a general model and approach that posits that biological, psychological, and social factors all play significant roles in human functioning.  The biopsychosocial paradigm is also a technical terms for the popular concept of the min-body connection which addresses more philosophical arguments rather than purely empirical exploration and clinical application.  (Sarno, 1981)
The Biopsychosocial model has been used in fields such as medicine, nursing, health psychology, sociology, psychiatry, family therapy, and clinical psychology; but not in psychometrics.  And the BPS model has not been used to define a metric of linguistic bias measured as use of grammar, word choice, and sentence structure.  Neurons that fire together wire together.  This is occurring within a psychological, social, and physical environment that can engender physical constructs represented as simply as regional accents or euphemisms.  Such linguistic artifacts are hurdles for individuals to overcome in scoring well on achievements tests used to measure standardization of language that has been deemed necessary. 

Linguistic Bias

The purported source of linguistic bias is the forward thinking manner in which achievement tests are designed.  Working from cause to effect by means of predictive reasoning creators of achievement tests have fallen victim to their means, to the language they aim to standardize; As opposed to the achievement of evaluating the strength, ability, and persistence of constructs within the individual to be tested working backward from effect to cause by means of diagnostic reasoning.  The discrepancy between the to cognitive theories implemented results in the belief that variance is not the product of transient minority effects of multiple confounding variables, one category of which is linguistic.  Linguistic variation is a function of culture, experience, and learning.  The range of differences could be quantified by using causal Bayes nets, we developed a normative formulation of how predictive and diagnostic probability judgments should vary with the strength of alternative causes, causal power, and prior probability. This model was tested through two experiments that elicited predictive and diagnostic judgments as well as judgments of the causal parameters for a variety of scenarios that were designed to differ in strength of alternatives. Model predictions fit the diagnostic judgments closely, but predictive judgments displayed systematic neglect of alternative causes, yielding a relatively poor fit. Three additional experiments provided more evidence of the neglect of alternative causes in predictive reasoning and ruled out pragmatic explanations. We conclude that people use causal structure to generate probability judgments in a sophisticated but not entirely veridical way. (Fernbach, Darlow, & Sloman, 2011)  This methodology of exploratory data analysis to elucidate the potentiality for casual parameters and indentify evidence for neglected alternative is consistent with the uncomfortable science proposed by the statistician John Tukey in response to an over reliance on confirmatory data analysis. 
Non-linguistic methods of investigation are necessary to avoid the nature of linguistic thought that is also the source of the bias in this instance.  Even Bloom’s Taxonomy cannot account for linguistic bias, in fact contamination of linguistic bias should be measureable across all six levels of Bloom’s taxonomy as the error is unavoidable and is the function of any standardized test.  The degree to which linguistic bias is present at each level of Bloom’s taxonomy can vary by culture, language, and even the development of the individual being assessed. 

The issue is the cost of investing to change the ever-plastic brain and the amount of time spent on various activities that all re-wire the brain versus the time spent learning the standardized language necessary for success on most achievement tests.  The investment necessary in overcoming the naturally acquired language is proportional to the dissonance between the naturally acquired language and the standardized language necessary to score well on an achievement test.  Tomasello (2003) points out that another crucial prerequisite for language acquisition is the ability of children to detect patterns in their environment. He reports research that has demonstrated that infants can detect artificial nonsense words made up of three syllable sequences.  Later the infants respond to those words, but they do not respond to the syllables presented in a different order (Saffran, Aslin, & Newport, 1996).  Marcus, Vijayan, Bandi Rao, & Vishton (1999) briefly trained seven month olds on three syllable
sequences of the form ABB.  Later the infants responded to this pattern even when the syllables were different (e.g. XYY).  Tomasello notes that this ability to detect abstract patterns in auditory and visual input is not unique to humans.  Other primates such as tamarin monkeys also have this skill.  Therefore, pattern finding is a cognitive capacity that has a deep evolutionary history and certainly cannot be seen as a specific adaptation for language.  Using only a canonical taxonomy of language to measure cognitive ability is like trying to measure the circumference of the earth in inches by measure of one intermediate phalange at a time. 
Language can be a powerful conveyor of bias, in both blatant and subtle forms.  It is not the language itself, rather it is the implied frame resultant from the associated meaning of the words chosen to be used by the speaker, and those heard by you, the listener.  Linguistic bias can manipulate perception and perspective, impacting race/ethnicity, gender, accents, age, (dis)ability and sexual orientation.
  • Native Americans described as "roaming," "wandering," or "roving" across the land. Such language implicitly justifies the seizure of Native lands by "more goal-directed" white Americans who "traveled" or "settled" their way westward.
  • The implied frame of such words as forefathers, mankind, and businessman serve to deny the contributions (even the existence) of females.
  • The bias against non-English speakers.
In the psychology and communications literatures, research has shown that language is a subtle but powerful way to examine cognition in intergroup contexts (Maass & Arcuri, 1992; Maass, Salvi, Arcuri, & Semin, 1989; Semin & Fiedler, 1988, Semin & Smith, 1999). Because people have wide latitude in their use of interpersonal verbs that can be used to describe self and others, and because this word choice is related to their causal attributions and expectations (Semin & Marsman, 1994), linguistic categories can be used to maintain and transmit Perceptions.  According to linguistic category models, interpersonal verbs can be arrayed on a continuum of concrete to abstract forms (Semin & Fiedler, 1988), the evocation of which are based on differences in causal attributions of, and expectancies for, behavior (Semin & Marsman, 1994).  Given the findings of research which demonstrates a link between language and intergroup bias (Franco & Maass, 1999; von Hippel, Sekaquaptea, & Vargas, 1997), a linguistic category model has implications for understanding diversity within and between groups
Researchers have shown that people have wide latitude in their use of verbs that can be used to describe the self and others; the word choice, grammar, and language of others is related to their attributions and expectations (Semin & Marsman, 1994). Language is a cultural artifact that emerges as a complex adaptive system from the verbal interaction among humans. We see the ubiquity of language acquisition among children generation after generation as the product of an interactional instinct that, as Tomasello indicates, is based on an innate drive communicate with and become like conspecifics, (Lee & Schumann, 2005) to become linguistically standardized.   Semin and Fiedler (1988) introduce a linguistic category model useful for interpreting people’s attributions of causality and responsibility within a given context. Specifically, they put forth a general taxonomy that distinguishes between verb forms at different levels of abstraction. They use the following examples to articulate their taxonomy: “(a) A is talking to B; (b) A is helping B; (c) A likes B; and (d) B is an extraverted person”  (p. 558). The first example (i.e., talking) demonstrates the most concrete level of language. At this level are descriptive action verbs, which provide an objective description of action, offer no interpretation, and typically have one or more physically invariant features. Interpretive action verbs, in contrast, include greater depth of interpretation, pronounced evaluative components, and do not have physically invariant features – that is, many different actions could lead to use of the same interpretive action verbs. As illustrated in Semin
and Fiedler’s (1988) second example, “helping” could be physical or psychological and therefore, open for interpretation. Relative to ascriptions of causality and responsibility, accounts using descriptive action verbs are typically attributed to situational factors, while accounts using interpretive action verbs are often attributed to the actor. Consequently, the use of interpretative (rather than descriptive) action verbs suggests a greater expectation of similar future behavior. State verbs, which presume knowledge of the actor’s state of mind, do not maintain reference to any specific behavior or incidents (e.g., “likes” as used in the third example). Although state verbs typically evoke causal attributions to the sentence object rather than the actor (Semin & Marsman, 1994), they convey expected similar future states of mind from actors. Finally, adjectives, such as “extraverted person” in Semin and Fiedler’s (1988) example, refer to enduring qualities of persons and represent the highest level of abstraction.   Because adjectives are typically associated with dispositional attributions, they are indicative of a continued expectation of similar future behavior.  Variations in taxonomy or across taxonomies can explain the variance in scores due to linguistic bias.  The taxonomy exists to standardize the language and facilitate communication.  The achievement test is designed to measure and to provide a level of conformity metric.  The construct required is present in the individual to be assessed although the grammar and language used to recapitulate the

Hebbian Theory


            Hebbian theory concerns how neurons might connect themselves to becomeengrams. Hebb's theories on the form and function of cell assemblies can be understood from the following:
·         "The general idea is an old one, that any two cells or systems of cells that are repeatedly active at the same time will tend to become 'associated', so that activity in one facilitates activity in the other." (Hebb 1949)
·         "When one cell repeatedly assists in firing another, the axon of the first cell develops synaptic knobs (or enlarges them if they already exist) in contact with the soma of the second cell.” (Hebb 1949)
Gordon Allport has posited additional ideas regarding cell assembly theory and its role in forming engrams, along the lines of the concept of auto-association, described as follows: "If the inputs to a system cause the same pattern of activity to occur repeatedly, the set of active elements constituting that pattern will become increasingly strongly interassociated. That is, each element will tend to turn on every other element and (with negative weights) to turn off the elements that do not form part of the pattern. To put it another way, the pattern as a whole will become 'auto-associated'. We may call a learned (auto-associated) pattern an engram."(Allport, 1985)
Hebbian theory has been the primary basis for the conventional view that when analyzed from a holistic level, engrams are neuronal nets or neural networks like those associated with wernicke’s area or broccas region. Work in the laboratory of Eric Kendel has provided evidence for the involvement of Hebbian learning mechanisms at synapses.  Much of the work on long-lasting synaptic changes between vertebrate neurons (such as long-term potentiations like language acquisition) involves the use of non-physiological experimental stimulation of brain cells. However, some of the physiologically relevant synapse modification mechanisms that have been studied in vertebrate brains do seem to be examples of Hebbian processes. One such study reviews results from experiments that indicate that long-lasting changes in synaptic strengths can be induced by physiologically relevant synaptic activity working through both Hebbian and non-Hebbian mechanisms. 

Semantic Relativism

Perception is the interface between cognition and reality. Descriptive perceptual relativism is the empirical claim that certain groups (e.g., those with different cultures, languages, biological makeup) perceive the world differently. Various philosophers of science have argued that theory influences perception to such an extent that partisans of substantially different theories might literally see the world differently.
Descriptive claims about perception are sometimes thought to bear on various versions of normative relativism. For example, some writers have argued that people with different concepts and beliefs will nevertheless perceive the same things in the much same way and that these common perceptions can be used as a fixed point from which to adjudicate the claims of rival frameworks. Most philosophers and vision scientists today now agree, though, that perception is theory-laden; our perceptual experiences in a given situation are influenced by the concepts, beliefs, expectations and, perhaps, even the hopes and desires, which we bring to the situation.
Normative perceptual relativism is the claim that there is not just one correct, framework-independent way to perceive things. But different ways are correct relative to different constellations of concepts and beliefs. Given modern medical training and practice, a competent radiologist should see that this spot on the X-ray is a stomach tumor, and anyone with any sensitivity should see that Sam felt humiliated.
As with most versions of normative relativism, the strongest versions of normative perceptual relativism, ones on which “anything goes,” are implausible; there clearly are constraints on the perception. But weaker versions of the thesis may be defensible. Perception is theory-laden to some degree; it involves what current vision scientists call top down processing. (See Section 1.1 of the supplementary document Relativism and the Constructive Aspects of Perception for more on top-down processing and the theory-ladenness of perception); see Section 4 of that same supplement for more on perceptual relativism; and see Section 5.2 below for an argument against stronger versions of perceptual relativism.)
In Quine's philosophy the idea of stimulus meaning is not a special semantics, but rather is an attempt to isolate the net empirical content of each of various single observation sentences without regard to the theory that contains them yet without loss of what the sentence owes to that containing theory. This attempt to isolate the semantics of observation language is a move away from his earlier critique of reductionism, where reductionism is understood as statements having a unique range of possible sensory events, such that the statements can be criticized in isolation. But at this stage Quine still retains his original thesis of empirical under determination, in which empirical under determination is integral to his holistic thesis of semantical indeterminacy or vagueness.  The under determination thesis admitting multiple and alternative observation sentences for the same stimulus situation presents a question: how can the same stimuli yield alternative stimulus meanings? One of Quine's answers is that the alternative theories or belief systems in which the stimulus situation is understood, supply different significant approximations. But there still remains the question of how stimulus meanings are to be construed as approximations. Quine has a theory of vagueness that he setsforth in the third and fourth chapters of Word and Object, which resembles the latter Wittgenstein's thesis of paradigms, except that Quine explicitly invokes the behavioristic stimulus-response analysis of learning. On this analysis Quine rejects the view that stimulations eliciting a verbal response "red" are a well defined or neatly bounded class. He maintains that the stimulations are distributed about a central norm, which when a language is initially being learned, may be a very wide distribution. The penumbral objects of a vague term are the objects whose similarity to those for which verbal response has been socially rewarded in the learning process, is relatively slight. The learning process is an implicit induction on the part of the subject regarding society's usage, and the penumbral cases are those words for which that induction is most inconclusive for want of evidence, because the evidence is not there to be gathered. And society's members have had to accept similarly fuzzy edges when they were learning. There is an inevitability of vagueness on the part of terms learned by ostension, and it carries over to other terms defined by context on the basis of these ostensively learned terms.
He notes Hanson maintains that observations vary from observer to observer according to the amount of knowledge that the observers bring with them. Thus one man's observation is another man's closed book or flight of fancy, with the result that observation as the impartial and objective source of evidence for science is bankrupt. At this stage of Quine's thinking the semantical contribution of theory to observation is still problematic for him, but he continued to characterize observation language in terms of behavioristic theory of learning. In the chapter titled "Observation" in his “The Web of Belief” (1970) Quine says that an observation sentence is a sentence that can be learned ostensively by the association of heard words with things simultaneously observed, an association which is conditioned and reinforced by social approval or successful communication, and which becomes habitual. And due to the social character of its learning, the observation sentence must be understandable by all competent speakers of the language who might be asked to assent to it. Thus according to Quine the sentence "That is a condenser" is not an observation sentence, even if experts agree to it. Quine maintains contrary to the Positivists, that what qualifies a sentence as observational is not the lack of theoretical terms that may occur in theory formulations, but just that the sentence taken as an individual whole commands assent consistently or dissent consistently when the same global sensory stimulation is repeated. This behavioristic characterization initially enabled Quine to evade reference to semantics in his identification of observation language, and thereby to separate his view from that of the Positivists, who defined observation language in semantical terms.

The scientific hypotheses that purport to describe things beyond the reach of observation are related to observation sentences by a kind of one-way implication, such that many alternative hypotheses may imply the same set of observation sentences, but not vice versa. Observation sentences do not uniquely imply just one theory purporting to explain the observable events. It now is in this sense that natural science is "empirically underdetermined" by all possible events.  Quine says that under-determination lurks where there are two irreconcilable theory formulations each of which implies exactly the desired set of observation conditionals plus extraneous theoretical matter, and where no formulation affords a tighter fit. In Quine's vocabulary the phrase" observation conditional" is an empirical generalization expressed in conditional form and implying an observation sentence describing an individual event. And his phrase "theory formulation" is a conjunction of the axioms of a deductive theory, which implies observation conditionals.  This is a different sense of "empirical under-determination" than what Quine meant in "Two Dogmas", because it resurrects the idea of a semantically neutral observation language, which philosophers such as Hanson, Kuhn and Feyerabend reject. These philosophers find a phrase such as "same observation sentences" when speaking of sentences implied by alternative theories to be very problematic; they deny that different theories can have the same set of observations due to the contribution of the semantics of theory to the semantics of observation language.
            World" in Erkenntnis (1975), which as it happens had in 1930 been made the official journal of the Vienna Circle. This development of the Duhem-Quine thesis represents a further restriction on Quine's earlier version on his holistic semantical thesis of observation. Previously he had viewed empirical under-determination as integral to semantical indeterminacy or vagueness in his semantical holism. But in this paper he revises the concept of empirical under-determination of language, and separates it from the holistic view of the Duhem-Quine thesis. The scientific hypotheses that purport to describe things beyond the reach of observation are related to observation sentences by a kind of one-way implication, such that many alternative hypotheses may imply the same set of observation sentences, but not vice versa. Observation sentences do not uniquely imply just one theory purporting to explain the observable events. It now is in this sense that natural science is "empirically underdetermined" by all possible events.  Quine says that under-determination lurks where there are two irreconcilable theory formulations each of which implies exactly the desired set of observation conditionals plus extraneous theoretical matter, and where no formulation affords a tighter fit. In Quine's vocabulary the phrase" observation conditional" is an empirical generalization expressed in conditional form and implying an observation sentence describing an individual event. And his phrase "theory formulation" is a conjunction of the axioms of a deductive theory, which implies observation conditionals.  This is a different sense of "empirical under determination" than what Quine meant in "Two Dogmas", because it resurrects the idea of a semantically neutral observation language, which philosophers such as Hanson, Kuhn and Feyerabend reject. These philosophers find a phrase such as "same observation sentences" when speaking of sentences implied by alternative theories to be very problematic; they deny that different theories can have the same set of observations due to the contribution of the semantics of theory to the semantics of observation language.  Having revised "empirical under determination", Quine then distinguishes his revised concept from the holistic doctrine of the Duhem-Quine thesis. He reiterates that the holistic doctrine says that scientific statements are not separately vulnerable to adverse observations, since it is only jointly as a theory that they imply their observable consequences, with the result that any one of the statements can be adhered to in the face of adverse observations by revising others. Then he states that holism lends credence to the under determination thesis, because in the face of adverse observations we are free always to choose among various adequate modifications of our theory, and all possible observations are insufficient to determine theory uniquely.  A second reservation is the breadth of the theory A third reservation is that the semantical and ontological holism may imply a cultural relativistic view of truth.  Quine finds a paradox in the thesis of cultural relativism: if truth were culture bound, then the advocate of cultural relativism ought to see his own culture bound truth as absolute, which is exactly what is represented by the idea of construct validity.  The cultural relativist cannot proclaim cultural relativism without rising above it, and he cannot rise above it without giving it up.  Quine then turns to the issue of irrationality of theory choice, the argument for cultural relativism that is internal.  Quines argues that the choice between empirically equivalent alternative systems need not be irrational when one could settle for a frank dualism. Even if it is a dualism merely due to empirical equivalence.  

In his"Empirical Content" (1981) in Theories and Things, which he notes contains "echoes" from "Empirically Equivalent Systems of the World", Quine explicitly uses Hanson's terminology saying that observation sentences are "theory-laden.” But Quine reconstrues the intended meaning of Hanson's phrase to mean that the terms embedded in observation sentences may recurin theory formulations. Thus while Quine here says that observation sentences are theory-laden, he denies to the semantics of theory any participating role in the semantics of observation. In fact in Quine's construing of "theory-laden" it is not observation language that is theory-laden, but rather theory that is observation-laden.

Still later in "Truth" in his Quiddities (1988) he is explicitly reconciled about refusing to admit theory any resolving function in the semantics of observation. There he says that we work out the neatest world system, and we tighten the squeeze by multiplying the observations. Tightening the squeeze in observation sentences is the progressive reduction of vagueness but only by the addition of information in additional observation sentences.

More recently a member of Quine’s intellectual entourage, Donald Davidson, has attempted to evade semantical relativism with a turn to instrumentalism. Davidson’s principal statement of his thesis is set forth in his “The Very Idea of a Conceptual Scheme” (1974) and “Belief and theBasis of Meaning” (1974) reprinted in his Inquiries into Truth and Interpretation (1984), a book he dedicates to Quine with an inscription”without whom not.” He rejects the representationalist view of the semantics of language, which he considers a third dogma of empiricism after the first two referenced by Quine in the latter’s 1952 “Two Dogmas” article.  Like Dewey’s rejection of the dualism of “experience” and “nature”  Davidson rejects the dualism of “scheme” and “world”, of “conceptual scheme” associated with language and “empirical content”, of “organizing system and something waiting to be organized”, that he finds in the views of Whorf, Kuhn, and Feyerabend. In this manner he remains more faithful to Quine’s original behaviorism than Quine did. Given the mutual and reciprocal determination of between belief and semantics, the decision necessary for interpreting another’s discourse is to maximize our shared beliefs, such that there can be no basis for concluding that others have concepts or beliefs radically different from one’s own. Davidson concludes that in giving up the dualism of scheme and world, we do not give up the world, but rather re-establish “unmediated touch” with the familiar objects that make our sentences and opinions true or false. Thus Davidson argues that there is no conceptual relativism, because there are no conceptual schemes to be relativistic.  But Davidson’s conclusion is a non sequitur. Firstly he confuses two distinct questions: one is the question of what is meaning, and the other is the question of what is the meaning of a term, sentence, or theory and how is this determination made. The existence of conceptual schemes is an answer to the former question, and his behavioristic procedure is his answer to the latter one. The answers are made interdependent only because Davidson is a behaviorist, which is to accuse him of being a Positivist. And his Positivism makes him inconsistent with Quine’s and his acceptance of ontological relativity, because Positivism requires a prior ontological commitment.  Davidson does not practice ontological relativity in his own philosophical discourse. Secondly the word “unmediated” in his phrase “unmediated touch”, which purportedly justifies his denying language its representational semantics, is a weasel word. In fact the interpreter’s charitable decision required for interpretation does not imply any rejection of the representational nature of the semantics of language. This interpretative decision is operative when someone uses a dictionary with the charitable assumption that its lexical entries are true, so that he can assimilate the meanings of the terms he is researching. And also when a community of scientists in a profession considers an experiment and agrees on the validity of the test design statements, so that the scientists can describe the phenomenon under examination and the experiment’s outcome. Neither the thesis of the charitable decision required for communication nor the thesis of the interdependence between truth nor meaning imply any rejection of the representational nature of the semantics of language; representationalism is perfectly consistent with both theses. “Representation” may be a weasel word, because there survives an atavistic belief residual from modern philosophy including Positivism, that the knower is a spectator to his ideas.  Of course the knower can be a spectator of his ideas, but this inspection is a reflection ex post facto to his firstly already having the inspected knowledge of the real world. Apart from this secondary reflective knowledge, the spectator thesis about knowledge of the real world is readily rejected, when we realize that what we know firstly is not our ideas, but the real world, and most notably that our knowledge is thus constituted by our ideas rather than the ideas being an object of knowledge. Contrary to Davidson, therefore, these and their schemes are quite admissible, and they very much involve semantical relativism.
Both Quine and Davidson are motivated to evade semantical relativism, because both mistakenly believe that a relativistic, context-determined, semantics implies a relativistic thesis of truth. Regardless of how culture-bound and context-determined may be the semantics of a language, it is not possible capriciously either to affirm or to deny truthfully just anything expressed by sentences made with those concepts. The empirical under determination of language implies that many alternative sentences can be said which are consistent with the same observations. Still, the empirical constraint imposed exogenously on sentences by the recalcitrant real world - even when not yet interpreted - forbids just any arbitrary distribution of truth-values over a set of logically related, semantically interpreted grammatical sentences. When any subset of these sentences is given definitional force to specify its semantics, then only some of the remainder sentences containing the same descriptive terms can also be true.  Truth is always relative to what is said, but the real world in which all language users live forbids ingenuously asserting just any old thing in the semantically interpreted language. Therefore, semantical relativity does not imply relativism of truth, but just the opposite: with a meta-theory of semantical description exhibiting the composite nature of meanings, semantical relativity explains the partial equivocation that makes it impossible for the same sentences occurring in two different belief systems, to be completely true in one belief system and completely false in another.  It explains how the same sentence is not simply and completely the same statement in each system, but is partially the same in each, and to that extent true in both systems. And for the same reason it also explains why the semantics of observation language need not be quarantined from the semantics of theory, in order to assert the objectivity of truth. Observation statements, which pragmatically defined are merely singular test design statements, may be common to pragmatically defined contrary theories, such that belief in the test design statements makes the test outcome contingent and not willfully or necessarily verifying, and makes a falsifying test outcome of one of the theories an objective truth.  Each person acquires the semantics of what Quine calls observation sentences from his own personal experiences, and he acquires it publicly and ostensively in the circumstances of his learning situation in his personal history. There is a wide variation among people between what is learned ostensively and contextually, but even for those simple statements learned ostensively by most people, inter subjectivity is increased with successive approximation, as the web of belief grows and imposes increasingly more shared truth conditions on the ostensively acquired semantics. The entire web of beliefs may be viewed on analogy with an underdetermined system of conditional equations, in which the addition of a new equation further restricts the range of numeric values that the set of variables may accept as solution sets. One difference between the mathematical system and the language system is that with just a sufficient number of restrictions the equation system may admit to only one solution set, whereas language is never restricted to a unique interpretation. Another noteworthy departure from the mathematical analogy is that the mathematical variables can take only one numeric value at a time without becoming ambiguous, while each of the descriptive terms, including those used as mathematical variables in applied mathematics in empirical science, simultaneously take on the semantic values distinguishable in the explicitly related universal statements in the system of beliefs, subject only to the preservation of univocity. Thus all the terms explicitly related by the sentences in the web of beliefs may participate in one another's univocal semantics, and thereby resolve one another's vagueness in relation to each other. Furthermore as implicit statements are made explicit by deduction, the vagueness in the meanings of the terms of the system is even further resolved.  But Quine viewed meanings as abstract or mental "entities", and then developed his behavioristic theory of stimulus meanings, which he called "behavioral dispositions" to evade the representative function of language.  He could not be expected to have developed a meta-theory of semantical description enabling him to describe how meanings participate in one another. The closest Quine came to the idea of semantical participation was the idea of the resolution of vagueness. His rejection of the dichotomous analytic-synthetic distinction is a worthy start toward such a meta-theory, but his rejection of the distinction was actually a rejection of analyticity as such, except in the cases that he called "analytical hypotheses" used for translations. As it happens, rejection of the analytic-synthetic dichotomy does not imply the rejection of analyticity as such. Universally quantified statements believed to be true for empirical reasons may also be used analytically to exhibit the complexity in the meanings of their constituent terms by displaying their component semantic values that constitute the discriminating capability in the descriptive function of the language. In other words all universal empirical statements in the web of beliefs are analytical hypotheses. And theories are those that are viewed as relatively more hypothetical than other empirical statements.
There are many means by which to interpret a semantical system.  The semantical rules for interpreting a mechanically generated semantical system might be viewed as analogous to Carnap's meaning postulates, in that all of them are stated in the object language instead of the meta-language, and are not like Carnap's rules of designation, which occur in the meta-­language.  Two relevant types of semantical rules may be distinguished.  One type consists of those semantical rules that are the mechanically generated statements and equations.  These consist only of the statements constituting a mechanically generated and empirically acceptable theory, the outputted theory statements that are believed to be true.  But not all the semantical rules occurring in the object langu­age are mechanically generated.  A second type consists of test design statements, which are accepted independently of any statements of theory generated by the system, so that the generated theory is not tautological and can be tested independently.  But the semantical rules for mechanically generated semantical systems are unlike Carnap's meaning postulates, because they are not just analytical sentences.  With Quine's rejection of any distinctively analytic truth it is possible to view sentences as both analytic and synthe­tic, and the semantical rules that describe the semantical interpretation of the object-language statements must be viewed as both analytic and synthetic sentences.  They are more like Quine's analytical hypotheses or discursive postulates.  These semantical rules might also be viewed as similar to Carnap's reduction sentences, which he says determine only "part" of the meaning of theoretical terms.  But Carnap has never explained how it is possible for the meanings of terms to have parts.  Viewing the sentences as both analytic and synthetic enables the empirical statements constituting the generated theory to exhibit the parts of the meanings of their constituent terms, just as analytic statements always have.  Test design statements and generated theory statements, both of which are believed to be true for empirical reasons and not due to the meanings of their con­stituent terms, are object-language statements functioning as semantical rules, each of which contribute parts to the meaning of each of their common descriptive terms.
“anything that exists in any amount can be measured” Rene Descartes
            Including bias

What is the name of the shape of the volume produced by the intersection of two spheres when the volume’s largest dimension is lesser than or equal to the radius of the smaller sphere?

WRAT


Since its introduction, the Wide Range Achievement Test (WRAT-#) has undergone five revisions, has multiple sub test addenda and has been built upon with a number of expansions.  The WRAT has been applied to numerous settings, along with the development of specialty applications such as the new WRAT-Expanded.  The proprietary test was further developed by Sidney W. Bijou & Joseph Jastak, after having been originally written by Gary J. Robertson.  It was first published in 1946 by Joesph F. Jastak (Wilkinson & Robertson, 2006).  The WRAT was written as an achievement test battery designed to assess the core curricular domains of reading, mathematics, and oral written language (Jastak & Jastak, 1965). The entire WRAT line of tests is now published and sold as intellectual property by Physiological Assessment Resources Inc. of Lutz Florida.   The three sections of the WRAT require roughly an hour and fifty minutes to complete.  The reading comprehension portion requires 40 minutes to administer, the mathematics section requires 40 minutes to administer, and the nonverbal reasoning  requires a half hour to complete (Wilkinson & Robertson, 2006).  There are different forms for each defined age range or specific application with prices for the tests ranging from $190.00 to in excess of $500.00 for expanded packets or multiple tests, available online.
The Wide Range Achievement Test 3 is appropriate for individuals whose ages range from five to 94 years.  The WRAT3 is a measure of the individual’s ability to read words, comprehend sentences, spelling, and computational mathematics problems appropriate for the age range of the individual taking the test.  The WRAT3 has been used in a variety of settings as a measure of the basic academic skills necessary for effective learning, communication, and thinking  (Wilkinson & Robertson, 1993).  The WRAT4 provides two equivalent forms named Blue, and Green; these forms are implemented for retesting on an accelerated time line without having to correct for practice effects from having repeated the same version of the test.  The wide range achievement test has two normative levels (Wilkinson & Robertson, 2006).
1.      Level I
a.       Level one is normalized for children aged five years and zero months to eleven years and eleven months.
2.      Level II
a.       Level two is normalized for children aged twelve years zero months all the way thru 64 years (The WRAT is used in gerontology until age 94).
Only standard scores are used for comparisons among scores.  The norms provided for the 1978 edition include standard scores (remarkably similar to the Standford-Binet IQ test) with a mean of 100 and a standard deviation of 15.  The scores are reported as percentile scores for each grade level.  The standard scores are scaled from the norm group baseline.  The grade levels are arbitrarily assigned and can be interpreted only as rough references to achievement level. 
The reliability reported in the test manual lists split-half reliabilities of .98 for reading at both levels, .94 for arithmetic at both levels, .96 for spelling I, and .97 for spelling II.  During the norming study, both levels of the WRAT4 were administered to children ages 9 through 14. Since there is overlap in skills tested between the high end of level I and the low end of level II, this provides another estimate of the reliability of both. The split half reliabilities for reading and spelling, for all age groups, had a narrow range from .88 to .94.  The split half reliability for arithmetic ranged from .79 to .89. These results indicate that, overall, the reliability of the WRAT4 is excellent (Wilkinson & Robertson, 2006).  For comparative purposes, the validity can be compared to the test most similar to the WRAT4, the Peabody Individual Achievement Test (PIAT).  The PIAT is another short, individually administered achievement test which covers comparable material. In general the WRAT4 correlates very highly with the PIAT. The WRAT4 correlates moderately with various IQ tests, and with other achievement tests in the range of .40 to .70 for most groups and most achievement and ability tests (Cordon, & Snyder, 1981). 
The WRAT4 norms are based on 15,200 subjects from seven states. According to the manual, no attempt was made to make the sample representative of national characteristics. The manual states that minorities were represented, but gives no data on their representation. The sample was stratified by age, sex, and approximate ability.  Possible applications of the WRAT 4 test described in the manual include comparing achievement of one person to another, determining learning ability or learning disability, comparing codes with comprehension in order to prescribe remedial programs, and informally assessing error patterns to plan instructional programs.  An added feature in the WRAT4 is a reading composite score (Wilknison & Robertson, 2006).
The wide range achievement test-Expanded edition is an achievement test battery “designed to assess the core curricular domains of reading, mathematics, and oral and written language.  Published by the same Psychological Assessment Resources Incorporated that publishes the full line of WRAT assessment instruments, the WRAT-Expanded was designed to complement the afore mentioned WRAT3 (WRAT3; T6:2713) and WRAT4 and was published in 2001-2002.  The WRAT-Expanded individual assessment (Form I) is used for ages 5-24 to score an individual’s areas of reading, mathematics, oral expression, and written language (Bo Zhang, MMY16).  The Form I assessment administration requires 30 minutes to administer and costs $195.00 per WRAT-Expanded Individual assessment (Form I) reading/mathematics module package including reading and math flipbook, reading and mathematics manual (2002, 185 pages), 25 response forms, and technical supplements; $55.00 per manual; $120.00 per flip book; $45.00 per 25 response forms; $45.00 per technical manual.
The WRAT-Expanded group assessment (Form G) is used for grades 2-12 to score group areas of reading (basic reading & reading comprehension), mathematics, nonverbal reasoning.  The Form G assessment administration requires 125-135 minutes and costs $195.00 per WRAT-Expanded Group assessment (Form G) comprehensive package including 10 booklets and scoring for each level (2001, 71-75 pages each), and technical supplement (2001 80 pages); $95.00 per 25 booklets, 25 scoring forms, and manual (for the specified level); $45.00 per technical supplement. 
Both forms of the Wide Range Achievement test-Expanded edition are achievement test batteries designed to assess the core curricular domains of reading, mathematics, oral and written language, and nonverbal reasoning.  The WRAT-Expanded, Form G for Group assessments is designed to measure reading (basic reading, reading comprehension), mathematics, and nonverbal reasoning for students in Grades 2-12.  And form I is an individual assessment designed to measure reading, mathematics, listening comprehension, oral expression, and written language for examinees from ages 5-24.  The WRAT-Expanded is organized in terms of five levels:
1.      Level 1 (Grade 2, 120 Items)
2.      Level 2 (Grades 3-4, 121 Items)
3.      Level 3 (Grades 5-6, 120 Items)
4.      Level 4 (Grades 7-9, 119 Items)
5.      Level 5 (Grades 10-12, 120 Items)
The individualized component of the WRAT-Expanded (Form I) consists of 70 items in reading and 75 items in mathematics.  The WRAT-Expanded was designed to provide users with a group administered achievement test closely linked to an individually administered test with similar content and interpretative systems.  This integrated system is intended to provide an initial screen based on a group assessment (Form G) with an in-depth individual assessment (Form I) available for targeted students.  According to the author, the WRAT-Expanded (Form G, group assessment) has six intended uses: preliminary screening to identify students for individual assessments, placement of new or transfer students, identification of special students, developing of individual instructional goals and plans, periodic assessment of special groups of students, and research uses.  Intended uses of Form I are to compliment the WRAT3 assessment, follow-up assessments of students across a wide range of achievement, placement of students, assessment of ability-achiever discrepancies and research use (Wilkinson & Robertson, 2006). 
The WRAT-Expanded consists of reading, mathematics, and nonverbal reasoning tests.  The content selected for the reading test is based on the assumption that the ability to read and understand printed material is necessary for success in school.  Specifications for the reading test were developed by the author of the Stanford Diagnostic Reading Test (Dr. Bjorn Karlsen).  Synthetic reading passages are used rather than authentic text.  The mathematics test is designed to measure the ability to understand concepts and apply them to the solution of various types of mathematical problems.  According to the author, the emphasis of the mathematics test is consistent with current National Council of Teachers of Mathematics standards.  The nonverbal reasoning test consists of nonverbal classification items that are used in other ability tests, such as the Otis-Lennon School Ability Test and the Cognitive Abilities Test (Wilkinson & Robertson, 2006). 
The author of the WRAT-Expanded provides a variety of manuals describing the psychometric characteristics of the instrument.  General information about the instrument is provided in several administration manuals organized by levels (1 to 5), whereas more detailed psychometric information is provided in a separate technical supplement (Wilkinson & Robertson, 2006). 
The group and individual forms of the WRAT-Expanded were normalized concurrently in grades 2-12 with common groups of students completing both Form G and Form I.  The norms for Form G are based on 8,136 students from a variety of settings.  It should be noted that private schools were excluded from the normalizing procedure; and that only parochial schools were sampled for non public schools.  The authors have been very careful to construct a normative sample that is demographically representative of the school-age population in the United States based on census data.  Out-of-level norms are also available.  The normative data for the individually administered test (Form I) is based on fewer students (N=635).  Detailed information provided in the technical supplement allows potential users to see the close match of the normative sample to the school-age population.  Because the WRAT-Expanded covers a wide age range, the potential user needs to be reminded and cautioned about how few examinees actually were assessed within each grade or age group.  A detailed description of the norms is provided in the technical supplement to the WRAT-Expanded and the manual for the individual assessment, but not in the administrative manuals for the group assessments (Form G).  Because users may not have access to the technical supplement, and because the description of the norms is essential for proper interpretation of the standardized scores, future versions of the group administration manuals should include a description of the norms.  The author chose to report the scores based on the normative sample as follows:
1.      Grade based standard scores
2.      Age based standard scores
3.      Percentile ranks
4.      Stanine groups
5.      Normal curve equivalents
6.      Grade equivalents
7.      Ability scale scores
In the technical supplement, evidence regarding the precision of form G is based on internal consistency reliability (KR20) and test-retest reliability.  Across grades the KR-20 coefficients for the standardization sample (range across grades in parentheses) as follows:
1.      Reading (.86-.90)
2.      Mathematics (.80-.88)
3.      Non-verbal reasoning (.80-.89)
The test re-test reliability coefficients for the standardization sample are as follows:
1.      Reading (.75-.90)
2.      Mathematics (.76-.90)
3.      Non-verbal reasoning (.75-.79)
The WRAT-Expanded exhibits reliability coefficients that are comparable to those obtained with similar instruments.  The author provides standard errors of measurement with a clear description of potential users on how to interpret SEM’s.  Information regarding the precision of content skill areas is not provided.  No scorer reliability information is provided for the individually administered Form I test (Wilkinson & Robertson, 2006).  The statement, “these reliability coefficients indicate that the WRAT-Expanded tests are measuring their designated constructs with sufficient consistency. Or homogeneity, to yield dependable results” (Technical Supplement pg. 29) cannot be supported by those relatively high reliability coefficients due to semantic relativism.  Whether any test is measuring those skills is a construct validity issue within the realm of semantic relativism as opposed to a reliability concern. 
            One shortcoming for the test-retest reliability study is the relatively small and non representative sample used.  All students came from one parochial school.  Sample sizes at five grade levels ranged from 19 to 124.  For example, at level 5 only 19 students were used to obtain the test-retest reliability estimate for the reading subtest for the Form G.  Worse yet, there was no test-retest coefficient reported for the mathematics and nonverbal reasoning subtests for that level.  For the individual form only 97 students from ages 5-17 were included in the test-retest reliability study.  Confidence bands of test scores are tabulated for convenient use.  The issue is that 85% and 90% confidence bands were provided for Form G while 90% and 95% confidence bands were provided for Form I; providing a potentiality for unnecessary discrepancies in test score interpretation. 
Validity information for Form G is presented in terms of internal evidence (content, inter-correlations between tests within the WRAT-Expanded speed of responding, and differential item functioning) and external evidence (correlations between Form G and a variety of group and individually administered achievement and ability tests).  Evidence regarding clinical validity is reported in terms of mean differences between two groups of students (learning disabled and gifted) as identified by their teachers, and matched comparison groups of students.  The author is very clear that the validity evidence represents, “only a start of the validation process for the WRAT-Expanded.” (Wilkinson & Robertson, 2006)
The WRAT-Expanded test seems to show a high level of content validity.  An informative description of the test development is provided for both the group and individual forms in the technical manuals.  Content experts were hired to assist in creating specific domains in order to ensure overall validity as well as construct validity.  The authors of the test controlled for readability and the difficulty level of the vocabulary to improve the content validity of the test.  Facial bias (facial validity) was screened by content experts for possible racial and religious offences as well as gender stereotyping. 
The WRAT-Expanded suffers from many of the problems encountered with wide-range achievement tests designed to measure educational performance over large age and grade ranges.  There are very few items within each content area, testing time is quite short, and relatively few students are included in the norms for each grade and age group.  Of course, some of these weaknesses can also be viewed as strengths by potential users who seek a quick and preliminary assessment of their students.  Given the intended purposes of the instrument the WRAT-Expanded has some nice features that potential users may find useful.  In particular, the simultaneous norming of forms G and I make comparisons between achievement and ability levels more meaningful (Wilkinson & Robertson, 1993). 
One of the intended uses of the WRAT-Expanded is the identification of relative strengths and weaknesses of students in reading, mathematics, and non-verbal reasoning.  The most straightforward interpretation is to view the comparisons of student performance in terms of normative data and the distribution of score difference obtained within the normative sample.  Rasch measurement theory was used to calibrate the WRAT-Expanded.  Unfortunately, the author does not provide the psychometric information necessary to evaluate model-data fit.  Traditional psychometric criteria are provided but the additional information available based on the use of Rasch measurement is not reported.  The author should include information regarding item fit, reliability of person separation and other Rasch-based psychometric criteria.  The WRAT-Expanded does not take full advantage of Rasch measurement and its useful features, such as variable maps and person fit displays, that can enhance the usefulness and interpretability of the scores.  Potential users should carefully consider the alignment between the content of the WRAT-Expanded, and the school curriculum when using the test as an achievement test.  No validity evidence is currently available to support the accuracy of decisions based on the use of the WRAT-Expanded as a screen to identify students for individual assessments or the identification of students with special needs (Englehard, MMY16).

As with all other instruments that include individual assessments, the utility of the WRAT-Expanded depends fundamentally on the proper interpretation and use of the test by clinicians.  Target users are ages 7-18 for the group form and ages 5-24 for the individual form.  No explanation is provided about the discrepancy in content coverage and age difference between these two forms.  Although the author did not specify situations in which one would choose to administer the group form rather than the individual form, or vice versa, it was recommended that the group form serve as a screening tool and the individual form as a more in-depth assessment of learning difficulty; primarily because the individual form classifies students into finer categories. 
The modifications made to the WRAT-Expanded resulted in a tool that appears to be much better at identifying possible learning deficiencies than the WRAT3; primarily because the WRAT-Expanded focuses on a narrower and younger age range than the WRAT3.  The modifications made to the WRAT3 by the expanded version results in a tool that is much better at identifying possible learning deficiencies as compared to its predecessors, if only for elementary and secondary school children.  While the tests were standardized concurrently, they are not classified into similar tranches (a division or portion of a pool or whole, in this case ordinal levels).  Therefore, the normative scores have the same meaning while the classifications do not.  Another advantage of the expanded version is that the aptitude-achievement discrepancy can be investigated as the WRIT (Wide Range Intelligence Test; 15:279).  The reading test may also lack validity due to a lack of authenticity, as the reading passages were written specifically for this test, they do not necessarily represent material that would be read in the “real” world.




References

Anastasi, A. (1982). Psychological testing. (5th ed). New York: Macmillan.
Cordon, B., & Snyder, M., (1981). A Comparison of learning disabled college students
achievement from WRAT and PIAT grade, standard and subtest scores, Psychology in the Schools, 18, pg 28-34.
Hale, R. L., (1981). The utility of the Stanford-Binet in predicting WRAT performance,
Psychology in the Schools, 16, pg 488-490.
Jastak, J. F., Jastak, S,R,. (1965). The wide range achievement test: Manual of Instructions 
Revised Edition.  Wilimington, De.: Guidance Associates.
Jastak, J. F., Jastak, S,R,. (1978) Manual of instructions: The Wide Range Achievement Test,
Wilimington De.: Jastak Associates.
Lubin, B., Wallis, R., Paine, C., (1971). Patterns of psychological test usage in the United States
1935-1969, Professional Psychology.
Prasse, D. P., Siewert, J, C,. (1983). An analysis of performance on reading subtests from the
1978 wide range achievement test and woodcock reading mastery test with the WISC-R for Learning Disabled and Regular Education Students, Journal of Learning Disabilities.
Roberts, R., Santa-Barbara, J., WoodWard, C.A., (1975). Test-retest reliability of the wide
range achievement test, Journal of Clinical Psychology. 70-74.
Schale, K. W., Roberts, J., School achievement of children as measured by the reading and
arithmetic subtests of the wide range achievement test.  U.S. Department of Health, Education and Welfare, Public Health Service Publication No. 1000- Series 11, Rockville, Md.
Wide range achievement test-expanded edition. In (2005). R. Spies & B. Plake (Eds.), The sixteenth mental measurements yearbook (16th ed. pp. 1136-1141). Lincoln, Nebraska: The Univeristy of Nebraska Press.
Wilkinson, G. S. (1993). Wide range achievement test (3rd ed.). Wilmington, DE: Wide
Range, Inc.
Wilkinson, G. S., & Robertson, G. J. (2006). Wide range achievement test 4 professional
manual. Lutz, FL: Psychological Assessment Resources.

COMPREHENSIVE ASSESSMENT OF CULTURALLY AND LINGUISTICALLY DIVERSE
STUDENTS: A SYSTEMATIC, PRACTICAL APPROACH FOR NONDISCRIMINATORY ASSESSMENT. (2004) Samuel O. Ortiz

Applied cross cultural psychology

The achievement gap (2010) Karen Mille

Measuring up: what educational testing really tells us (2004) Daniel Koretz

The language police: how pressure restricts what students learn (2004)

The bias of communication (1951) Harold A Innis

Test better, teach better, the instructional role of assessment. (2003) W James Popham

Braquehais, M., D., (2010) Complex system Theories are Necessary for a Better Understanding of out BioPsychoSocioCultural Constitution.  BIOLOGINĖ PSICHIATRIJA IR PSICHOFARMAKOLOGIJA, T12, pg 102.

Sarno, John E. MD "The Mindbody Prescription: Healing the Body, Healing the Pain." 1998

Maass, A., Ceccarelli, R., & Rudin, S. (1996). Linguistic intergroup bias: Evidence for in-groupprotective motivation. Journal of Personality and Social Psychology, 71, 512-526

Maass, A., Milesi, A., Zabbini, S., & Stahlberg, D. (1995). Linguistic intergroup bias:
Differential expectancies or in-group protection. Journal of Personality and Social
Psychology, 68, 116-126.

Maass, A., Salvi, D., Arcuri, L., & Semin, G. (1989). Language use in intergroup contexts: The
linguistic intergroup bias. Journal of Personality and Social Psychology, 57, 981-993.

Maass, A., & Arcuri, L. (1996). Language and stereotyping. In C. N. Macrae, C. Stangor, & M.
Hewstone (Eds.), Stereotypes and stereotyping (pp. 193-226). New York: Guilford Press.
Franco, F. M., & Maass, A. (1999). Intentional control over prejudice: When the choice of the measure matters. European Journal of Social Psychology, 29, 469-477.
Maass, A., & Arcuri, L. (1992). The role of language in the persistence of stereotypes. In K. Fiedler & G. Semin (Eds.), Towards a social psychology of powerful linguistic devices (pp. 129-143). Newbury Park, CA: Sage.
Semin, G. R, & Marsman, J. G. (1994). Multiple inference-inviting properties of interpersonal verbs: Event instigation, dispositional inference, and implicit causality. Journal of Personality and Social Psychology, 67, 836-849
Semin, G. R., & Fiedler, K. (1988). The cognitive functions of linguistic categories in describing persons: Social cognition and language. Journal of Personality and Social Psychology, 54, 558-568.
Fernbach, Philip M.; Darlow, Adam; Sloman, Steven A. (2011). Asymmetries in predictive and diagnostic reasoning.
Journal of Experimental Psychology,140,  2011, 168-185. doi:10.1037/a0022100

Nell, V. (2000). Cross-cultural neuropsychological assessment: Theory and practice. Mahwah:
Erlbaum.

Ripich, D. N., Carpenter, B., & Ziol, E. (1997). Comparison of African American and white
persons with Alzheimer’s disease on language measures. Neurology, 48, 781–783.

Johnson-Selfridge, M. T., Zalewski, C., & Aboudarham, J. F. (1998). The relationship
between ethnicity and word fluency. Archives of Clinical Neuropsychology, 13, 319–325.

Marcopulos, B. McLain, C. A., & Giuliano, A. J. (1997). Cognitive impairment or inadequate
norms? A study of healthy, rural older adults with limited education. The Clinical Neuropsychologist, 11, 111–131.

No comments:

Post a Comment