(2003) Psychological Review.

A Conceptual and Psychometric Framework for Distinguishing Categories and Dimensions

Paul De Boeck
K. U. Leuven

Mark Wilson
University of California, Berkeley

G. Scott Acton
Rochester Institute of Technology

Author Note


An important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach.


This article describes a conceptual and psychometric scheme for distinguishing the categorical versus dimensional nature of psychological variables. By "psychological variables" we mean variables used to distinguish between entities in some psychological respect. These entities are commonly persons, but they can be also situations, tasks, test items, and so on. The scheme arose out of frustrations with instances of psychological research that had either assumed or "proved" that variables were of one kind or the other without examining the philosophical or empirical basis for doing so and without an overarching framework in which either was genuinely possible. In this article, we provide such an overarching framework, called the dimension/category framework, also called Dimcat, and provide empirical illustrations of its use.

A preliminary distinction in determining whether variables are category-like or dimension-like is the distinction between manifest variables and latent variables. Too often these two kinds of variables are confused, which can lead to inappropriate conclusions. Specifically, researchers may confuse manifest categories or dimensions, which are artefacts of the measurement approach, with latent categories or dimensions, which are typically the underlying psychological phenomena of interest.

The issue under consideration here is whether the latent nature of manifest variables is category-like or dimension-like. One assumption might be that the nature of the latent and manifest variables match. As discussed below, however, manifest dimensions can be turned into manifest categories (e.g., in segmentation into groups) and manifest categories into manifest dimensions (e.g., in sum scores on a test). Thus, the relations between different kinds of manifest variables and between different kinds of manifest and latent variables are not so simple as they might at first appear. Consequently, a conceptual and methodological framework that encompasses all of these possibilities is needed.

Manifest dimensions (or manifest continua) are common in psychological research, although their dimensional nature may be only a convenient fiction. For example, raw scores on a test (e.g., number of correct responses) are ordered manifest categories, yet they are commonly seen as approximating a manifest dimension. Items on a test are examples of indicators in the same way that symptoms in a diagnostic system are indicators, although these different kinds of indicators are typically put to very different uses. Whereas items are typically summed to produce a manifest dimension, symptoms are typically summed to produce a manifest category (a diagnosis). To complicate matters, a manifest dimension based on item sums may also be segmented (e.g., using a median split) to produce a manifest category, or the sum of symptoms may be used as an indicator of the extent to which patients show a syndrome. It should be apparent from this discussion that manifest categories and manifest dimensions can be functionally interchangeable and thus arbitrary.

Latent dimensions are quantitative variables with values that depend on the person and that in one way or another contribute to the observations, either directly or indirectly via the effect the quantitative variable has on the probability of the responses. Latent dimensions are invoked as underlying quantities that determine data or functions thereof such as the sum score. For example, in classical test theory, a true score (latent dimension) is believed to be at the basis of the sum score of a test (manifest dimension), except for distortions due to the so-called error term. Latent dimensions are implicit whenever concepts like internal-consistency reliability are used--that is, in virtually all tests of psychological phenomena. The underlying variables in factor analysis (FA) models, structural equation modeling (SEM), and item response theory (IRT) are not manifest but latent dimensions.

A major difference between the true score of classical test theory and the dimensions from factor analysis, SEM, and IRT follows from the form of the model equations. The classical test theory model can be represented as Xp,sum = qp + ep, with Xp,sum as the sum score of person p, qp as the true score of person p, and ep as the error term. We will not explain the assumptions in greater detail, because we want to concentrate on the form of the equation. In FA models (and SEM for factor analysis) with one latent dimension, the basic form of the equation is Xpi = aiqp + bi + epi (epi comprises both, the unicity and the measurement error), whereas for IRT it is hpi = aiqp + bi . Xpi is the observation for person p on indicator i, qp is the latent dimension, ai is the weight of this dimension for indicator i, bi is the item-specific constant, and epi is the error term for the observation regarding person p and indicator i. The alternative equation with hpi has a similar form but no error term: hpi is a logistic transformation of the probability of Xpi = 1 (instead of 0), so that the stochastic element is comprised in the distribution of Xpi given hpi. The major difference with classical test theory is that in factor models and IRT, the dimensions are anchored to indicators. First, the dimensions have indicator-specific weights, the ai's, which are factor loadings (in factor models) or discrimination parameters (in IRT). We will use the term "discrimination" for ai, because this parameter indicates how much the indicator discriminates between values of qp. Second, the constant (bi) may differ depending on the indicator. It is common in factor models to use manifest variables with a mean of zero (after a transformation), so that no constant is needed (if qp also has a mean of zero), but models with indicator constants, the bi's, are also available (Muthén, 1984; Reise, Widaman, & Pugh, 1993; Sörbom, 1974) for cases in which the indicators are not transformed into deviations from the mean. In IRT these constants are often subtracted (given a negative sign: -bi) and are called thresholds or difficulties. We will use the general term "location" here to denote the constant or threshold, because bi locates the indicator independently of the value of qp. The parameter bi indicates the location of indicator i. The anchoring of a dimension to items lies in the discriminations and the locations--both help to interpret a dimension. Thus, one can expect from identical dimensions that they have equal indicator discriminations and equal indicator locations, as will be explained in Section 2.

Note that more than one dimension can be at work in an indicator. In that case, more than one a is needed, one for each dimension. The dimension-specific locations cannot be differentiated, however, because they add up to one general constant (bi1 + bi2 + ...), so that the link with the specific dimensions is lost. The only remaining anchoring is based on the indicator-specific weights (ai1, ai2, …).

Manifest categories are also common in psychological research, as independent or dependent variables. Regardless of whether the categorical variables are independent or dependent variables, they are often (but not always) rooted in, derived from, based on, or linked to some manifest or tacit indicators from the same domain. Indicators need to be either directly or indirectly observed in order to derive a manifest category from them.

A manifest category is commonly derived from indicators through either segmentation or expert judgment. Segmentation means that one indicator or a composite of indicators (e.g., a sum score on a test) is segmented into different manifest categories (Wilson, 1984). Some segments may be omitted, as in the method of extreme groups, in which the middle segment is omitted. Expert judgment means that an expert attributes manifest categories on the implicit or explicit basis of knowledge regarding the values of indicators. For example, a psychiatric diagnosis is based on knowledge of the symptoms. Diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association, 1994) provide the expert with explicit rules based on the sum score obtained from a list of symptoms, but often the expert does not literally follow such rules but rather relies on tacit indicators. Another example of expert judgment is when people judge themselves on a trait (e.g., "I am shy") or on an attitude (e.g., "I am against capital punishment"). In this case, people are thought to be expert judges regarding more specific, possibly tacit, indicators about themselves that indicate a trait, attitude, or another underlying variable.

One may wonder whether a manifest category resulting from segmentation or expert judgment is in any sense more than an arbitrary segmentation of an underlying dimension. For example, it is a common practice to determine cut-off scores, such as those used to distinguish between depressed and non-depressed persons. Such manifest categories should properly be considered only means to an end--namely, finding out about the existence of qualitative differences or quantitative differences, which are the real psychological phenomena of interest (e.g., J. Ruscio & Ruscio, 2002). Although all of the information about the latent structure is derived from the manifest structure, the manifest structure is less important than the latent structure, because the latent structure explains the manifest structure. If one knows the latent structure, then the most sensible alternative is to use indicators that reflect that latent structure, whether they be categorical or dimensional. Only pragmatic reasons, such as preference for a certain type of data analysis, might favor the use of manifest variables when the latent structure is known.

Manifest categories (e.g., a diagnosis) can correspond to either qualitative differences or quantitative differences. The basic issue is whether the categories at the manifest level are category-like or dimension-like in the latent structure. The complementary issue, whether a manifest dimension (e.g., a sum score) is category-like or dimension-like in the latent structure, is a legitimate question but is not addressed directly here; its answer requires the use of latent class or latent profile models (e.g., see Wilson [1989] for a discussion). Thus, the present paper is asymmetric: Given manifest categories, we attempt to answer whether they are really category-like in the latent structure. If they are category-like at the latent level, then they have the properties of latent categories.

The issue we want to investigate parallels an issue in cognitive psychology and linguistics, particularly with respect to the nature and meaning of the categories and words we use in daily life. For example, is the concept behind the category trees really category-like, or does it correspond better to a dimension of tree-ness? When the categories are persons--for example, the category of psychiatric patients--and when human cognition intervenes in category assignment (as with expert judgment) then the similarity is even more relevant. The commonalities and differences between our research questions and those of the domain of concepts and categories are illustrative for what we plan to do. Hereafter, in Section 1, we will summarize the research on categories and concepts and how it applies to our topic.

Section 1: Concepts and Categories in Cognitive Psychology and Linguistics

Categories are an important topic of research in cognitive psychology and linguistics. Based on empirical evidence, scientists in the domain of cognitive categories believe (a) that cognitive categories cannot be defined in terms of singly necessary and jointly sufficient features, (b) that the distinction between category members and non-members is not clear-cut, and (c) that category members differ as to the degree they fit the category, also called typicality (for a summary, see Murphy, 2002). These three conclusions are interrelated and can be summarized in the conjecture that category membership is gradual with no clear cut-off. A similar belief is held in cognitive linguistics (e.g., Lakoff, 1987; Taylor, 1995).

These conclusions contradict the so-called classical (Aristotelean) view. This classical view was described by Rosch (1978) and by Smith and Medin (1981) and defended by Sutcliffe (1993). Wittgenstein (1953) was the first prominent thinker to doubt the classical view. Rosch (1975, 1978) and Smith and Medin (1981) have clearly explained and demonstrated empirically why the classical view is invalid. Alternative theories have been developed in order to explain that people do not use definitions and that categories are gradual instead. These theories are also meant to explain a wide variety of phenomena such as category decisions, category learning, category-based induction, memory for exemplars, and so on (for overviews, see Komatsu, 1992; Medin & Coley, 1998; and Murphy, 2002). We will concentrate here on decisions about category membership, because we want to investigate the nature of what we call a manifest category based on the attribution of a category label--for example, a personality disorder diagnosis, a self-description as being against capital punishment, or assignment to a developmental stage.

The first theory states that category membership is derived from the similarity of an element to the prototype of the category. This theory is called the prototype theory. For a description, see Hampton (1993, 1995). The similarity is based on a weighted sum of features present in the element in question. The features and their weights are the content of the prototype. The prototype is of an abstract nature, unless it is instantiated in an extant exemplar. The weighted sum is a continuous variable to be dichotomized in order to decide on category membership or to be used as an input for a choice rule if the decision is between two or more categories.

The second theory states that category membership is determined on the basis of similarity with earlier encountered exemplars from one's (possibly unconscious) memory. This theory is called the exemplar theory. Two well-known elaborations of this view are the Context Model (Medin & Shaffer, 1978) and the Generalized Context Model (Nosofsky, 1992; Nosofsky & Palmeri, 1997). It is assumed in these models that for a similarity to be high, it needs to be high on all features, and that high similarities have a larger weight than low similarities. The Generalized Context Model is formulated in a rather general way, using various kinds of free parameters, so that it can adapt a lot of phenomena while still having similarities to the category exemplars as its core. Empirical comparisons of the prototype theory and the exemplar theory for category decisions tend to favor the exemplar theory (e.g., Medin & Coley, 1998; Murphy, 2002), including when natural categories are studied (Storms, De Boeck, & Ruts, 2000; Smits, Storms, Rosseel, & De Boeck, 2002).

A third theory is not formulated in a formalized way as are the previous two but must be seen as providing an explanation for the shortcomings of these two. This theory is called the knowledge approach (Murphy, 2002; Murphy & Medin, 1985) or the explanation-based theory (Komatsu, 1992). In this theory it is stressed that categories are embedded in a broader knowledge about the world and that this knowledge plays an important role in how we deal with categories and understand them. Medin and Coley (1998) and Murphy (2002) noted that an important shortcoming of the prototype theory and the exemplar theory is their neglect of feature relations. That the internal structure of categories has been a neglected topic in the study of categories and concepts is not difficult to explain from the basic conjecture by Rosch Mervis, Gray, Johnson, and Boyes-Braem (1976) that categories pick up correlations between features to maximize the informative value of categorization.

Categories are clusters of entities based on the correlations between features in a much larger, between-category space. The implication is that categories explain the correlations away (in a statistical sense, not in a causal sense), so that not much correlation is left within the categories. The conjecture of Rosch et al. (1976) and of others (Murphy & Lassaline, 1997) is primarily meant for so-called basic-level categories, not for so-called subordinate and superordinate categories. The association of categories with correlated features (in the between-category space) was empirically corroborated by Devlin et al. (1998) and Tyler, Moss, Dunant-Peatfield, and Levy (2000). Categories defined on the basis of correlated features were found to be more robust against cognitive and neuropsychological deficits--they seem to be stronger categories.

In contrast with the prototype theory and the exemplar theory, an interesting strength of the knowledge approach is that feature relations are recognized--they are part of the knowledge. For example, we know that wings help a lot to fly, so that a correlation between wings and flying is a quite natural cognition. This correlation is primarily based on between-category differences. Some categories of animals fly and have wings (various kinds of insects, bats, etc.), whereas other categories of animals do not fly and do not have wings either (elephants, spiders, snails, humans, etc.). But within-category correlation is also no problem for the knowledge approach. For example, for vegetables there is a correlation between being green and growing above the ground. The correlation is not perfect (for example, if one counts tomatoes as vegetables), but the exceptions are rare. Basic biological knowledge can explain the correlation between the green color and growing above the ground. The role that feature relations play in a knowledge approach is that they are quite natural and explained from knowledge one has about the world. No formal theory about feature correlations is developed within the knowledge approach, however, perhaps because there is no compelling evidence for within-category feature correlations to play a role in explaining typicality and category decisions (Murphy, 2002). The evidence in support of a feature-correlation effect is at best rather weak (Malt & Smith, 1984).

Perhaps within-category correlations are not so important when it comes to category membership, but they are an important issue as such because they are intrinsically related to the structure of a category and its heterogeneity. An interesting approach to this issue is Lakoff's (1987) idealized cognitive model (ICM) approach. According to Lakoff, ICMs, and therefore concepts, can have different types of structure. To give just two examples, categories can have a radial structure, spreading out from a central point, or they can have a chainlike structure, based on the similarities between the exemplars. In a chainlike structure, the two ends of the chain have nothing in common. This explains the title of Lakoff's book, Women, Fire, and Dangerous Things, all of these being exemplars of the same category of nouns in an aboriginal Australian language. The links between these highly diverse exemplars are domains of experience, not anything objective from the nature of the elements. (Other evidence for the structure of semantic categories can be found, for example, in Geeraerts & Grondelaers, 2002; and Taylor, 1995.)

From a more cognitive-psychological and formal perspective, Storms and De Boeck (1997) described some potentially interesting and rather simple within-category structures. The structures were described based on a binary matrix of elements by properties. The properties could be of any kind: objective characteristics of the elements as well as subjective appraisals and associations of various kinds. A first type of structure is the rectangular structure, with a block of 1's defined by row elements and column elements (a set of elements all being characterized by the same but limited set of properties). This structure is in conformity with the classical view. A second type of structure is the triangular structure, with nested properties, so that the properties at the top imply those at the lower parts of the triangle. At the basis of the triangle, one finds the properties that are most common to the category. This kind of structure is also called a Guttman scale. Finally, a third type of structure is a parallelogram structure, one with a moving property overlap between elements, as in a chainlike structure. The triangular structure and the parallelogram structure do not fit the classical view, and when a stochastic process is added, even the rectangular structure does not look to be in conformity with the classical view. It can be concluded from this short overview that categories are considered heterogeneous in two senses: exemplars differ as to how typical they are of the category, and categories can have an internal structure that strongly deviates from a homogenous uncorrelated structure in the exemplar-by-feature matrix. The internal structure aspect has been somewhat neglected in the prototype theory and the exemplar theory, but it is stressed in the knowledge approach and in more linguistic approaches, such as that of Lakoff (1987).

Although the cognitive nature of categories and the linguistic meaning of lexicalized categories is not the topic or our investigation, the results we briefly discussed are nevertheless important because the ingredients are the same as for our topic of interest. In all of our studies, we have elements (persons) that are categorized (the manifest categories) on the basis of features (the indicators). The ingredients are the same, but our research question is different. We are not interested in the cognitive representation or the semantic structure of the categories but in their formal representation in a category-like or dimension-like structure. The two kinds of structure do not necessarily coincide. The issue we want to formulate more precisely in order to study it systematically is whether or not manifest categories (categories as assigned) can be represented as nothing more than following from cut-offs along a continuum. It is possible that this formal representation is not reflected in the cognitive representation. It has been speculated, for example, that humans tend to think in terms of internal essences (Medin, 1989), which would tend to predispose them toward category-like mental representations of concepts such as mental disorders, whereas the formal representation is an empirical question that may actually be dimension-like. When the feature assignment and the categorization of the elements are based on cognitive processes, however, as in expert judgments, the formal structure may be informative for how the categories are cognitively represented, although one may also want other (additional) kinds of evidence to make inferences on the cognitive representation of categories.

An interesting link between our research topic and the one from cognitive psychology is that for both two types of continua must be distinguished. To explain the first type, let us assume that all category exemplars are alike in that they show the category features with a probability of, say, .60 and that the features are uncorrelated. This is actually in line with the well-known latent class model (Goodman, 1972; Green, 1952; Lazarsfeld, 1950; McCutcheon, 1987). All category members are equal at the latent level in that they share common feature probabilities. The implications of the assumptions are independence of features and heterogeneity of the exemplars in terms of the features. The features are independent within the category, because they are realized through a mechanism that is independent from feature to feature, following the assumption we made. That the features are uncorrelated also means that the category has no internal structure.

Looking at the realized features, one will notice that the exemplars are heterogeneous, that they have quite different feature patterns, because of the stochastic nature of the feature realization. In fact, the probability for two exemplars to share a given feature is only .36. When a category decision is to be made, one can expect that the exemplars with more of the features (as a stochastic result) will be considered category members with more certainty, and that their typicality will be considered higher than that of exemplars with an accidentally lower number of features. The equivalent of this is the posterior probability of class membership given the feature realizations. This posterior probability continuum does not represent anything in the latent level--it merely picks up a characteristic of the realization of the latent structure. The realization of category features may perhaps not be stochastic for the categories that have been most studied (natural categories and artefact categories). The realization of features such as having wings, laying eggs, and the ability to fly are not the subject of a stochastic process behind the category of birds, but things may be different when it comes to psychiatric categories or attitude categories. The continuum that stems from the stochastic mechanism is not a latent continuum but a surface (manifest) phenomenon stemming from a random homogeneous process without any structural basis for differences in the underlying reality. Note that this stochastic mechanism does not prevent reliable typicality and membership degree ratings from being made, given that the feature patterns of the exemplars do not vary once they are generated. Category exemplars as studied in cognitive psychology do not normally change their appearance depending on the person who is doing the rating or from one moment to another, but category exemplars of psychological categories may. We will call the resulting kind of continuum a purely manifest continuum. It is the illusory effect of a homogeneous process, the same process that leads to independent features and lack of within-category structure.

Remarkably, the kind of categories just described is in line both with the classical view (all exemplars are alike at the latent level) and with the common belief that categories are gradual and have no clear cut-off (as they appear at the manifest level). Only one of the two types of heterogeneity mentioned earlier, however, is realized. The aspect that is neglected in prototype theory and exemplar theory is neglected here as well. Categories are heterogeneous in that not all exemplars are equally good exemplars, but not so far as the (latent) internal structure is concerned.

To explain the second kind of continuum, let us assume that the exemplars differ in the true probabilities of showing the category features. Suppose the probabilities are again high but that they depend on the exemplar (in the range from, say, .60 to .90), and that the feature realization mechanism is again independent from feature to feature. The exemplars are now heterogeneous at the latent level, because some have higher feature probabilities than others. The consequences of these assumptions are correlated features and heterogeneous feature patterns. The features are all positively correlated now because they all tend to occur more in some exemplars (because of their higher probability) and less in other exemplars (because of their lower probability). These correlations stem from differences in probability, notwithstanding the independence of the realization mechanism, which is called local or conditional independence in the statistical literature and is a basic assumption in most statistical models. The resulting categories now have an internal structure, a one-dimensional structure. When we make the more realistic assumption that the within-category feature probabilities also differ depending on the feature in the same way for all exemplars, then the stochastic version of the earlier described triangular structure would be obtained. For example, for psychiatric diagnoses it would mean that some symptoms have higher probabilities than other symptoms. Mild symptoms commonly have a higher probability than severe symptoms. When patients differ in a systematic way, some patients may have the more severe symptoms as well as the milder ones, whereas others may have only the milder symptoms. A one-dimensional internal structure can be a rather good approximation of reality, for example for the borderline personality disorder. For example, Sanislow et al. (2002) showed that three latent dimensions underlay the borderline symptoms from the DSM-IV, but also that the intercorrelations of these dimensions are higher than .90, and can reach even .99.

Looking at the realized feature patterns, three sources of differences now come into play. First, the exemplars differ randomly because of the stochastic nature of the feature realization. Second, the exemplars differ systematically because of the level of the generating probabilities. The number of category features an exemplar now shows reflects both the stochastic nature of the process and a systematic difference at the latent level. Third, the features can also have an effect. The second and the third source determine the probability a feature has for a given exemplar. This probably reflects something about the exemplar (how high its probabilities are overall) and something about the feature (how common it is). It is then possible to separate and estimate the contribution from the three sources: the stochastic source, systematic differences between exemplars, and systematic differences between features.

This idea of separating and estimating the three parts is exactly the idea behind a model from a quite different domain, called item response theory (IRT), as will be explained later. The newly derived continuum for the exemplars, their overall level of probability, is no longer a surface continuum or an illusory continuum--it is rooted in the underlying latent structure. The number of features is still a manifest continuum, but now it expresses more than a stochastic mechanism. It also reflects systematic underlying differences between the exemplars--the latent contributions of the exemplars to the feature probabilities. The continuum of the systematic underlying differences is not a manifest continuum but a latent continuum.

This second formal theory of categories, which implies a latent continuum, is no longer in agreement with the classical view, because the exemplars are no longer homogeneous at either the manifest level or the latent level. The theory is in clear agreement, however, with the now common belief that cognitive categories are gradual and have no clear cut-off. Furthermore, both types of heterogeneity described earlier are now realized. Categories are heterogeneous not only in that not all exemplars are equally good exemplars but also due to the internal structure. This kind of within-category structure can be linked to the notion of "fuzzy categories," as discussed by Haslam and Kim (2002) and as tested empirically with taxometric methods by Haslam and Cleland (2002). It should be clear that what we mean by a latent continuum is variation at the latent level and not just at the manifest level. From the way the fuzziness is created by Haslam and Cleland (2002), it can be concluded that this condition is fulfilled.

One can imagine more complex internal structures (e.g., Storms & De Boeck, 1997), but in comparison with no structure, the dimension-like structure is progress. The dimension-like structure is in conformity with the view of categories as typicality dimensions, as represented in the cognitive literature, in that the latent continuum is a generalization of the continuum idea for the manifest level (as implied when no within-category feature intercorrelations are assumed). It is also possible to expand this rather simple kind of internal structure when the data require us to do so, as will be explained later when extensions of Dimcat are discussed.

Thus, one way of framing the issue of whether or not a category is basically dimension-like is by asking the question whether categories have a latent continuum or just a manifest continuum. The results of the studies on categories and concepts cannot answer this question so far as the cognitive representation is concerned, because, as explained, differences in how good exemplars are as exemplars and other effects can either stem from the stochastic nature of a homogeneous latent process or from a genuinely heterogeneous latent process and a similar stochastic component as for the homogeneous process.

When categories are considered as stand-alone variables, then the distinction between category-like and dimension-like manifest categories is the distinction just described--with or without an underlying latent continuum. Although it is meaningful to consider categories on their own (e.g., how category-like is the category of trees?), a more extensive perspective is possible when categories are considered together with other categories while still considering the same features (one can always combine feature sets). It may turn out that the latent continuum for a second, third, etc. category is a different one. For example, it makes sense to describe trees as well as bushes with a common group of features. The category of trees may correspond to a continuum, so that one can conclude that the category is dimension-like. Then there are two possibilities for the category of bushes. Either bushes have a lower degree of tree-ness on the tree-continuum, or they have their own continuum, one that is different from the tree-continuum. In the former case, the category of trees is not qualitatively different from the category of bushes as far as the investigated features are concerned--the difference is only quantitative. In the latter case, bushes define a qualitatively different continuum even when the same group of features is concerned. Although both trees and bushes would correspond to a continuum, they differ qualitatively. In the former case, the categories are only quantitatively different, whereas in the latter they are also qualitatively different.

The comparison with other categories adds another feature to what it means to be category-like. The first feature was latent homogeneity versus heterogeneity. The second feature relates to quantitative versus qualitative differences. Applying this distinction, for example, to the attitude categories of being in favor of or against capital punishment, the choice is between homogeneous versus heterogeneous opinions within the two categories and between quantitative versus qualitative differences. Similarly, when borderline personality disorder is compared with histrionic personality disorder on, say, the borderline symptoms, then the categories may turn out to be homogeneous or heterogeneous, and they may differ only quantitatively (the histrionic personalities being less borderline) or also qualitatively. These two contrasts: homogeneity versus heterogeneity and quantitative versus qualitative differences are the two basic dimensions of the framework to be presented. The comparison with alternative categories offers a methodological opportunity. We do not mean to say that the truth about a category depends on the methodology one follows.

Hereafter, in Section 2, a frame of reference is described for what it means for the latent structure behind a manifest category to be homogeneous or heterogeneous and for manifest categories to indicate qualitative differences or quantitative differences. Along with this frame of reference comes an approach for modeling data and deciding in what sense their structure is category-like or dimension-like. In Section 3, three empirical applications are described to illustrate the approach.

Section 2: A Frame of Reference for Category-Like Versus Dimension-Like Variables

Dimcat: The Dimension/Category Framework

1. Latent Heterogeneity Versus Homogeneity Within Manifest Categories

Within-category homogeneity means that all persons from the manifest category have the same location on the latent dimension. They are all equal at the latent level. Put another way, the "dimension" is collapsed to a single point (as far as individual differences are concerned). This latent homogeneity does not prevent heterogeneity at the manifest level of observed indicators, given that the realization of the indicators from the latent location is a stochastic process, in agreement with the assumption that indicators are random variables. Within-category heterogeneity means that different persons from the manifest category have different locations on the latent dimension. The distinction corresponds to the distinction between a purely manifest continuum and a latent continuum as discussed in the previous section. A purely manifest continuum corresponds to homogeneity at the latent level and a manifest category without internal structure, whereas a latent continuum implies heterogeneity at the latent level and implies that there is internal structure. In fact, three degrees of heterogeneity can be distinguished: manifest homogeneity (as in the classical view on categories), manifest heterogeneity with latent homogeneity (as in the case of categories without internal structure), and manifest heterogeneity with latent heterogeneity (as when the categories have an internal dimension-like structure). Homogeneity at the manifest level can be excluded as unrealistic, so that when we contrast heterogeneous and homogeneous manifest categories, we always refer to their homogeneity and heterogeneity at the latent level. We consider homogeneity more category-like than heterogeneity, and heterogeneity more dimension-like than homogeneity.

2. Latent Quantitative Versus Qualitative Differences Between Manifest Categories

If manifest categories do not show between-category differences, then there is no reason to distinguish them. Let us therefore assume that they show manifest between-category differences. Regarding the within-category differences, we must differentiate between the case of heterogeneity and the case of homogeneity. (We now use these notions in their latent sense.) When the categories are heterogeneous and a latent dimension suffices to describe the heterogeneity within categories, then we would have qualitative differences when the latent dimension differs for members of different manifest categories. As explained earlier, dimensions are anchored in indicators, and they differ from one another if the discriminations or locations of the indicators are different. Taking personality disorders as an example, suppose that the difference is that the borderline symptoms define a dimension within the borderline category that is different from the dimension the same borderline symptoms define in the histrionic category. This would mean that for the same symptom, the dimension has another weight depending on the manifest category or that more of the dimension is needed in one category than in another in order to have the same probability to show the symptom (or to be assigned the symptom). Then one can reasonably claim that the borderline personality disorder is qualitatively different from the histrionic personality disorder, because the borderline dimension differs depending on the diagnostic category under consideration. The same would follow if the histrionic symptom dimension differed for histrionics and borderlines. This principle can be generalized to a joint set of symptoms and a two-dimensional structure.

When the manifest categories are homogeneous, the qualitative differences cannot concern the discrimination of the indicators, because there is nothing to discriminate within the category. Only the indicator locations remain as a potential source of qualitative differences. Given that the locations refer to the levels of the indicators, qualitative differences imply that the indicator level profiles differ from one manifest category to another in more than just the overall level. For example, the symptom profile of the histrionic personality disorder may differ from that of the borderline personality disorder in a qualitative way and not just with respect to its overall lower level of borderline symptoms.

In the case of within-category heterogeneity, quantitative differences mean that the latent dimension is the same (same discriminations and/or locations) when applied to members of different manifest categories, and that the distribution of one manifest category is located at a lower level than the distribution of the other category on the same dimension. In the case of homogeneity (no variance in person locations), quantitative differences mean that the common category level of the indicator profiles differs depending on the manifest category. For example, it would be reasonable to expect that the preponderance of borderline symptoms is higher in the borderline category than in the histrionic category. The difference is that the quantitative differences can be explained as one manifest category having more or less of the same thing as the other, whereas qualitative differences never can be explained in this way. Qualitative differences concern the anchoring of dimensions with indicators (with respect to discriminations and/or locations): Differently anchored dimensions are different. Considering the contrast between quantitative and qualitative between-category differences, qualitative differences may be considered more category-like than quantitative differences and quantitative differences more dimension-like than qualitative differences.

These two contrasts--heterogeneity versus homogeneity and qualitative versus quantitative differences--can be crossed, as in Table 1, to make a 2 x 2 classification. This classification is the framework that will later be used to explicate the relation between a category-like versus dimension-like latent structure for manifest categories.

Figure 1 gives a graphical representation of the same classification. In the upper left part, two different latent dimensions are shown, one for each of two different heterogeneous manifest categories. The heterogeneity is represented with a normal distribution for each category, although normality of the distributions is not required. In the upper right part, the heterogeneity is represented along one common latent dimension. The difference between the two manifest categories is either large (and abrupt) or small (and smooth), as will be explained hereafter. In the lower left part, two different latent dimensions are again shown, one for each manifest category, but now there are no individual differences within the manifest categories. The within-category homogeneity is represented with a narrow bar. Finally, in the lower right part, the two manifest categories are again located along one common latent dimension, but now the two manifest categories are homogeneous, as represented with two bars. Given that in the two lower parts the manifest categories are homogeneous, the between-category differences are abrupt, as will be explained hereafter.

3. Abrupt Versus Smooth Differences

Although we consider the previous two contrasts as the most important, a third contrast, abrupt differences versus smooth differences, can be defined. This contrast cannot be crossed with the other two, as is shown in Table 1. Abrupt differences are qualitative or quantitative discontinuities from one manifest category to another. Differences are necessarily abrupt when the differences are qualitative or when the manifest categories are homogeneous (and therefore do not overlap). Within-category homogeneity as well as between-category qualitative differences imply abrupt differences. Smooth differences are necessarily quantitative differences. Only within the combination of within-category heterogeneity and between-category quantitative differences can both smooth and abrupt differences occur between manifest categories. The within-category heterogeneity is indicated by a distribution of persons within the manifest category (e.g., a normal curve, as in Figure 1), and the between-category quantitative differences are indicated by the fact that the two manifest categories can be located at different points along the same dimension. In the example above, if there were no overlap between borderlines and histrionics when they were located on the borderline dimension, then this would be a clear example of an abrupt difference. One would consider this as evidence that the borderlines are a different latent category from the histrionics. A great deal of overlap between borderlines and histrionics on the borderline dimension and especially the absence of bimodality means that the difference between the two manifest categories is smooth. In the case of smooth differences, the two manifest "categories" do not seem very category-like with respect to the dimension under consideration. No overlap and bimodality would indicate abrupt differences. Abruptness implies discontinuity (e.g., Wilson, 1984, 1989). Abrupt differences are more category-like and less dimension-like than are smooth differences.

In the upper right part of Figure 1, two pairs of normal distributions are shown. The distributions on the left are rather far apart, far enough for the distributions to result in a bimodal distribution when they are added into a joint distribution. The distributions on the right are close enough in order to result in a unimodal joint distribution. The contrast between smooth versus abrupt differences cannot be considered a fundamental dichotomy compared to the heterogeneous/homogeneous and qualitative/quantitative dichotomy. The smooth/abrupt dichotomy is only relevant for quantitative between-category differences and is entirely based on the size of the difference between manifest categories and their within-category standard deviations. Even when there is a gap between the distribution of two manifest categories along the latent continuum, the distinction between the two is still purely quantitative and can be expressed in terms of more or less of the same thing.

4. Simple Versus Complex Qualitative Differences

The qualitative differences between manifest categories can be either simple or complex. Simple differences are differences that can be captured with a few parameters. For example, perhaps the location of a few indicators has shifted relative to the location of the other indicators. Suppose that identity disturbance as a borderline symptom is relatively predominant among borderline patients in comparison with other borderline symptoms but that it drops down in the rank order of borderline symptoms when histrionics are considered. Such shifts or jumps in the indicator locations are called a saltus (Wilson, 1989). For example, suppose there are four indicators, with locations b1, b2, b3, b4 in the first manifest category, and with locations b1, b2, b3 + d, b4 + d in the second manifest category. The locations of indicators 3 and 4 have jumped with a value d. In a similar way, shifts can occur in the indicator discriminations. The saltus model provides a simple explanation for qualitative differences, given that only one or a few parameters are required to explain the qualitative differences. Alternatively, the quantitative differences could be complex, as when each single indicator has its own jump parameter. This would no longer look like a saltus but like an uninterpretable hodgepodge that we refer to as complex qualitative differences. Thus, qualitative differences that can be described using the saltus model are called simple qualitative differences, and those that cannot be described using the saltus model are called complex qualitative differences. If the differences are complex, then they are also large (when taken together), but simple differences can be large or small (depending on the size of d).

Formalization of Dimcat

For the formal representation of Dimcat, symptoms and diagnoses of personality disorders will be used for illustration. The data from which to start are the observations of indicators (e.g., ratings of borderline symptoms) and a manifest category (e.g., the diagnosis of a borderline personality disorder). Most often the indicators are also category-like, and often they are binary, as when symptoms are judged to be present or absent, when a response is correct or incorrect, or when a response is "agree" or "disagree." An extension to polytomous cases is also possible, as discussed below.

The notation for raw scores is as follows:
Xpik = 0, 1, with
p = 1, …, P, an index for the persons,
i = 1, …, I, an index for the indicators, and
k = 1, …, K, an index for the manifest category to which a person belongs.

When the indicators are symptoms, Xpik = 1 means that person p from category k is attributed indicator i. The notation for the manifest categories is Cp = k, meaning that person p is assigned to manifest category k.1 In this model, persons are nested within manifest categories.

The manifest category, C, can be a random variable, or it can have fixed values. In a similar way, the parameters from the model to be presented can be either random or fixed. By convention, in formulas we will not condition on C or on parameters, as the conditioning makes sense only for random variables. The fact that the formulas are not given in their conditional format does not imply, however, that C or one or more parameters cannot be random variables.

Building a Generic Formula

All models we will describe are models for the probability of a positive (= 1) response about person p from category k on dichotomous indicator i (based on self-description or other-description): P(Xpik = 1) = 1 - P(Xpik = 0). The models all share the characteristic that probabilities of this type are a function of indicator parameters such as locations. P(Xijk = 1) = f(bjk), with bjk being the parameter of indicator j for category k. A common type of function for binary variables is the logistic function, so that:

P(Xpik = 1) = exp(bik) / (1 + exp(bik)), (1)
or
hpik = bik , (2)

where hpik = log(P / Q), P = P(Xpik = 1), and P = 1- Q.

It follows from Equation 2 that the b's are nothing more than logistic transformations of the probabilities of showing symptom i in category k: bik = log(P / Q). These logistically transformed probabilities can be represented as the locations of the indicators that can function as anchors on a possible latent dimension. Thus far, all persons have the same set of probabilities. The b's will also be called prevalences, as they indicate the occurrence of symptoms.

Person differences can be introduced into Equation 1 by substituting qpk - bik for bik, with qpk denoting the parameter of person p from category k--this locates the persons on the same scale as the indicators; for example, this locates the patients on the same scale as the symptoms, so that the difference between the location of person p and indicator i determines the probability of a response 1 for a person in category k:

P(Xpik = 1) = exp(qpk - bik) / (1 + exp(qpk - bik)), (3)
or
hpik = qpk - bik . (4)

With the inclusion of a q parameter, the values of the b's are identified only up to an additive constant--one can add a constant to all b's on the condition that the same constant is added to all q's. Note that the minus sign in Equations 3 and 4 is in a way arbitrary--it could as easily be a plus sign, but the minus sign is the usual convention.

For the example of personality disorders, qpk reflects the location of person p on the latent dimension as it applies to borderline symptoms in diagnosis k; the higher qpk is, the higher the probability that person p shows those symptoms. The parameters bik reflect the location of the indicators (symptoms) on the same latent borderline dimension; the higher -bik (the lower bik) the higher the probability that indicator i has value 1 in manifest category k (i.e., that symptom i occurs in manifest category k). The prevalence of symptom i is now reflected by -bik.

One further complication is that the location of a person on a latent dimension is not equally important for all indicators. For indicator 1, a person's location along the dimension may be less important than for indicator 2. For example, symptom 1 may be more strongly related to the borderline dimension than symptom 2. To reflect this difference, the equation is adapted as follows:

P(Xpik = 1) = exp(aikqpk - bik) / (1 + exp(aikqpk - bik)), (5)
or
hpik = aikqpk - bik, (6)

where ajk denotes the weight of qpk in determining the probability of the Xpik values.

Equations 5 and 6 represent the two-parameter logistic (2PL) model (Birnbaum, 1968) for a manifest category k and are the most general model that we will use to illustrate Dimcat. Note that in the formulation of the 2PL as in Equation 6, for each indicator a category-specific linear regression equation is obtained with the underlying within-category dimension as a predictor, with aik as its weight, and with bik as an intercept (see our earlier presentation of the IRT model in comparison with the factor model).

All other models that we will consider follow from restrictions on Equation 6. This model is general in the sense that it can generate all cases in the framework of Table 1 by choosing the appropriate restrictions.

Descriptive Dimensions

We use the term descriptive dimension for the location of the indicators on a dimension. A descriptive dimension is defined by a set of indicator locations. As such it is neutral with respect to the latent category versus latent dimension contrast. It is only when we introduce restrictions on within-category heterogeneity and between-category qualitative differences, and on the equality of indicator locations (b's) and indicator discriminations (a's) depending on the manifest categories (Ck's), that differences in latent structure are obtained.

Before formulating these restrictions on Equation 6 in order to obtain distinct latent structures, we will make use of that equation to characterize two important but distinct features of a dimension: location equivalence and discrimination equivalence. Both types of equivalence are necessary for two dimensions to be identical (i.e., for dimension equivalence). A latent dimension is defined by the location of the indicators and, if individual differences appear, then also by the weights of the indicators.

1. Equivalent descriptive dimensions must have equal locations for the indicators. The latter will be called location equivalence. Because the location parameters are identified only up to an additive constant, location equivalence refers to equality of the location parameters only up to an additive constant, implying that the differences between the indicator locations on the latent dimension are crucial for location equivalence2. One wants the locations of the indicators to be the same independent of the manifest category. If not, then the marks on the scale do not correspond, so that the meaning of the dimensions also differs between the manifest categories. This first aspect of a latent dimension is independent of individual differences among persons.

2. When there are individual differences, a second kind of equivalence becomes relevant. The quality of an indicator depends on how well it differentiates between different positions on the dimension. Any indicator must therefore differentiate equally well on each of the dimensions in order for the dimensions to be equivalent. If the weight of a dimension for an indicator i depends on which group is considered, then the dimension in one group is not equivalent with the dimension in another group. Note that the discriminations are identified only up to a multiplicative constant: multiplying the discriminations with a constant is compensated by dividing the variance of the underlying dimension by the squared value of the same constant. One wants the dimensions to play an equal role in the indicators, independent of the group. If not, then the differentiation capacity of an indicator depends on the manifest category, which implies that the meaning of the dimensions differs between the manifest categories. The second aspect of a descriptive dimension, aik, is traditionally called the discrimination parameter or slope parameter.(this is repetition) Equality of discrimination parameters is called discrimination equivalence. This second aspect of a descriptive dimension makes sense only if there are individual differences among persons, because the a's are the weights of latent individual differences (in terms of q).

The notions of location equivalence and discrimination equivalence are related to the notions of factorial equivalence, measurement invariance, and differential item functioning (DIF). Factorial equivalence is of relevance here, because Takane and de Leeuw (1987) showed that the factor model results when the normal-ogive function is used in place of the logistic function used above, and because the two functions are practically identical except for a different slope. Often in factor analysis one is not interested in the means, and the model is then formulated for within-category deviation values (with a mean of zero), so that factorial equivalence is limited to the factor loadings. We will refer to this notion of factorial equivalence as factorial equivalence in the limited sense. Both Reise et al. (1993) and Meredith (1993), however, pointed out that the full factor model includes an explanation for the means, so that factorial equivalence in this broader (and full) sense includes location equivalence as well. Reise et al. (1993) distinguished between full invariance and partial invariance (see Byrne, Shavelson, & Muthén, 1989). Both are related to the factor loadings, independently of the factor variances and covariances. Full invariance means category-invariant loadings for all variables, whereas partial invariance implies that a substantial amount of the loadings are invariant so that a common metric can still be used. For binary indicators, the factor analytic or structural equation model (SEM) for binary items would be equivalent to an IRT model, but of the normal-ogive type instead of the logistic type (Muthén, 1984; Bock, Gibbons, & Muraki, 1988). A logistic variant is described by McKinley and Reckase (1983).

As to measurement invariance, Reise et al. (1993) refered to the same two aspects we have discerned in dimension equivalence. Meredith (1993) started from a definition by Mellenberg (1989) stating that the cumulative distribution function of the measurement indicators may not depend on external factors beyond the underlying latent variables one assumes to explain the indicators. Simply stated, the measurement of intelligence, for example, may depend only on intelligence and not also on external factors, such as one's ethnicity. Invariance refers to all aspects of the cumulative distribution (expected value, variance, and higher moments), and implies both location and discrimination equivalence.

Lack of location equivalence is called uniform differential item functioning (DIF) in test theory, and lack of both location equivalence and discrimination equivalence is called non-uniform DIF (e.g., Holland & Wainer, 1993; Mellenbergh, 1982). Methods to detect DIF are described in the literature (Holland & Wainer, 1993; Millsap & Everson, 1993), but some of the DIF tests do not distinguish between unequal locations and discriminations.

In summary, discrimination equivalence refers to the indicator-specific slope of the equation (ajk). Location equivalence refers to the indicator-specific intercept of the equation (-bjk). Location equivalence is sometimes not investigated in empirical studies in the literature, because the factor model is used not in its full formulation but rather for deviation transformed variables (Reise et al., 1993).

Types of Latent Structure

We will present the unrestricted latent structure followed by three constrained latent structures. Note that the structure that is presented here as the unrestricted structure (i.e., the generic) is still restricted, in that it corresponds to a 2PL model for each manifest category. For example, it is assumed that the structure is unidimensional within each manifest category (but not necessarily between manifest categories). Extensions to less-constrained, "unrestricted" cases (e.g., multidimensionality within manifest categories) are discussed below.

1. In the first type of latent structure (corresponding to the upper left cell in Table 1), the latent dimensions are qualitatively different depending on the manifest category, and the persons are heterogeneous within manifest categories. An example would be categories of athletes defined on the basis of the kind of sport, with performance levels as indicators. These categories would be between-sports categories with performance indicators. Within each category there are clear and systematic quantitative differences in their performances, and from one category to the other there certainly are qualitative differences in the kind of performances at which they are good.

As far as the modeling is concerned, no restrictions on Equation 6 are introduced, and it is therefore reflected in the general Equation 6, where for k ¹ k', the bik are allowed to differ from the bik', and the aik are allowed to differ from the aik'. This first type will serve as the reference type in the presentation of the other types, given that all others can be defined as restrictions on this one. In this first type, there is continuity within each qualitatively distinct category. Because the latent dimension differs depending on the manifest category k, the differences between manifest categories are qualitative. Both the indicator locations (bik) and the indicator discriminations (aik) are allowed to be category-specific. Because individual differences among persons, as expressed in qpk, are allowed, the manifest categories are heterogeneous. A special case is one with category-specific locations but common discriminations. Note that this type of differences would not be identified when factorial equivalence in the limited sense was the only criterion used to detect qualitative differences. In this case, the locations of the indicators are category-dependent but their discriminative power is not category-dependent.

2. In the second type of latent structure (corresponding to the upper right cell in Table 1), the latent dimensions are quantitatively different depending on the manifest category, and the persons are heterogeneous within manifest categories. An example would be the categorization into a professional and a nonprofessional category of athletes within the same sport. One can expect that both professionals and non-professionals differ in how well they perform at various contests, but the professionals would be clearly better overall than the non-professionals. These categories would be within-sport categories with performance indicators.

The second type of latent structure differs from the first in only one respect: For any pair of manifest categories, k ¹ k', the location of the manifest categories may differ only along a common underlying dimension. As a result of the absence of qualitative differences, all b's and all a's of each of the indicators are equal over manifest categories: bi1 = … = bik = ... = biK = bi , and ai1 = ... = aik = ... = aiK = ai. The second type can be formulated as follows:

hpik = aiqpk - bi (7)

with bi denoting the common location parameters, with ai denoting the common discrimination parameters, and with mqk ¹ mqk', for k ¹ k'.

Depending on how the manifest categories are distributed along the dimension, the differences between the manifest categories may be abrupt or smooth. There is no clear-cut criterion to distinguish between smoothness and abruptness, but two criteria that are often associated with abrupt differences are lack of overlap and bimodality. As discussed above, these criteria, however, are less straightforward than one might think.

First, much depends on the kind of distribution one wants to assume for the two categories. For example, lack of overlap can also look perfectly smooth, as when persons within each category are distributed uniformly and the two distributions touch but do not overlap. Second, much depends on whether one looks at the manifest level or the latent level. For example, Grayson (1987) showed that, depending on the discriminations and on the locations of the indicators, it is possible for a bimodal distribution of sum scores to result from a unimodal distribution of person locations (q's). Although Grayson did not demonstrate the opposite--how a unimodal distribution of sum scores can result from a bimodal distribution of person locations (q's)--this is possible as well. All depends on the locations and discriminations.

3. In the third type of latent structure (corresponding to the lower left cell in Table 1), the latent dimensions are qualitatively different depending on the manifest category, and the persons are homogeneous within manifest categories. This type of latent structure differs from the first in only one respect: the manifest categories are homogeneous in their latent structure. An example would be the categories of athletes defined on the basis of their knowledge of the basic rules of the sport they practice. Within each category there is homogeneous knowledge of the basic rules (they all know the basic rules), although when questioned one may give a wrong answer now and then. The differences between the categories are qualitative in that the athletes differ in the kind of rules they know depending on the sport they practice. These categories are between-sport categories with rule knowledge indicators.

As a result of the homogeneity restriction, all q's within the same category are equal: qpk = qp'k = qk for all pairs of persons p and p' and for all values of k. In this type, the manifest categories do not have any dimension-like character: they are qualitatively different between categories and perfectly homogeneous. There is still an ordering possible for the indicators, but this means nothing more than that the probability of a certain response for a given indicator is different than the probability for other indicators. Note that when there are no individual differences within a manifest category, there is no longer any basis for using a discrimination parameter. The third type can therefore be formulated as follows:

hpik = qk - bik (8)

where for any pair of manifest categories, k ¹ k', the bik may differ from the bik' , with qk denoting the location of all persons p with Cp = k.

4. In the fourth type of latent structure (corresponding to the lower right cell in Table 1), the latent dimensions are quantitatively different depending on the manifest category, and the persons are homogeneous within manifest categories. This type of latent structure differs from the first in two respects: the manifest categories are homogeneous (like the third type), and the differences between the manifest categories are quantitative (like the second type). An example would be the categories of persons who do versus do not play basketball. Those who play chess would know all the basic rules, and those who do not would also be rather homogeneous in their lack of knowledge. They may guess and be correct on some of the rules, but no major systematic differences would exist. So the difference between the two categories is quantitative. The former category simply has a much higher knowledge than the latter. These categories are within-sport categories with rule knowledge indicators.

As a result, all person locations (q's) within the same manifest category are equal: qpk = qp'k = qk for all pairs of persons p and p', and for all values of k, as in the third type; all indicator locations (b's) are also equal: bi1 = ... = bik = ... = biK = bi. Again there is no basis for using a discrimination parameter. In this fourth type, homogeneous manifest categories are located within a latent dimension. The fourth type can be formulated as follows:

hpik = qk - bi (9)

with bi denoting the common location parameters.

Given that the manifest categories are homogeneous, the differences between the manifest categories are by definition abrupt.

Degrees of Being Dimension-Like Versus Category-Like

Considering being category-like a matter of degree and believing hybrid structures to be common, Waller and Meehl (1998) stated, "Taxonicity does not preclude dimensionality… the convenient dichotomy taxonic-vs.-dimensional should, strictly speaking, read 'taxonic-dimensional vs. dimensional only'" (p. 9). Haslam and Kim (2002) also drew attention to the distinction "between matters of kind and matters of degree, itself [might] be a matter of degree" (p. 311), pointing also to an early acknowledgement of this view by Meehl (1979). There are two reasons for thinking of degrees of being dimension-like versus category-like. The first reason is that the features that define the four types of latent structure are crossed, such that most of the latent structures are defined by some features that are category-like and some that are dimension-like. The second reason is that the features that define the four types of latent structure are often realized only imperfectly and are thus matters of degree.

The only type of latent structure that is thoroughly category-like is the homogeneous qualitative difference structure (Type 3). All other structures are at least partly dimension-like. The heterogeneous quantitative difference structures (Type 2) are thoroughly dimension-like if the differences between manifest categories are smooth. If the differences between manifest categories are abrupt, meaning that each manifest category has a distribution that is different enough along the single latent dimension, then heterogeneous quantitative differences are a hybrid structure. The second type of hybrid structure is the heterogeneous qualitative difference structure (Type 1), which is category-like because the differences between manifest categories are qualitative yet is also dimension-like because there is heterogeneity of persons within a descriptive dimension for each manifest category. The third type of hybrid structure is the homogeneous quantitative difference structure (Type 4), which is category-like because there is homogeneity of persons within the manifest categories yet is also dimension-like because the manifest categories differ as to their locations on a common descriptive dimension.

The features that define what it means to be category-like versus dimension-like can be realized to a stronger or weaker degree. That is, within-category heterogeneity, between-category qualitative differences, and abrupt between-category differences can be small or large. For within-category heterogeneity to be small or large means that the within-category variance is small or large, respectively. What it means for between-category differences to be small or large is simple where the differences are quantitative: Small versus large differences correspond, respectively, to small versus large Cohen's d values (a standardized effect size measure), given that the distributions are normal (or symmetrical). The extent of qualitative differences is more complex. Finally, whether abrupt differences are small or large depends on the size of the two previous types of heterogeneity and on the distributional properties (e.g., bimodality, degree of overlap). The gradualness we stress here implies that the differences between the four types of latent structures are gradual and not absolute. This is completely in line with the overall idea behind the framework that being categorical is not itself categorical.

Qualitative differences between manifest categories are complex when they are very general and difficult to explain--not restricted to a few indicators or to a few principles. For locations, qualitative differences can be thought of as jumps of indicators on the descriptive dimension when going from one manifest category to another. Either only a few indicators jump from one location to another, or many indicators jump. And when many make a jump, they may jump in one or a few groups over the same distance, or the size and direction of the jumps may depend on the indicator. When the jumps can be easily summarized as the jumps of only a few indicators or of one or a few groups, then the qualitative differences can be considered simple. When the jumps have a complex pattern, however, the qualitative differences can be considered complex. When indicators jump in groups, an index s may be used to indicate the subset of indicators making identical jumps. For example, bik = bik' + dkk's, with dkk's denoting the jump from category k to category k' of symptoms belonging to subset s. This means that indicator i belongs to subset s, and that all indicators belonging to that subset s make identical jumps of size dkk's when going from manifest category k to manifest category k'. The idea of describing qualitative differences between dimensions with jump or saltus parameters stems from Wilson's (1989) saltus model. Qualitative differences can be considered simple if a few saltus parameters suffice to describe these qualitative differences. For example, Janssen, De Boeck, Viaene, and Vallaeys (1999) showed that the easy mental addition problems on a dimension for mentally retarded children (manifest category 1) jump downwards when located on the dimension of non-retarded children (manifest category 2), whereas the difficult problems keep their location. This means that the easy problems are easier for the non-retarded than for the retarded, whereas the difficult problems are equally difficult for both groups. The interpretation was that, in contrast to the non-retarded, the retarded had not yet automated the easier problems. In this case, the saltus parameter can be interpreted as the effect of automation.

Saltus was originally a model for discovering latent classes that explained observed jumps in indicator locations (b's) from one latent class to another (Wilson, 1989), but Wilson (1984) also described a model using manifest categories, the manifest saltus model. In our data, for example, the personality disorder diagnoses are manifest categories. Saltus parameters can be used to describe qualitative differences parsimoniously if the data conform to the model.

Empirical Methodology, Modeling, and Software

Based on Dimcat, empirical procedures can be used to test the category-like versus dimension-like nature of a concept. The observables are indicators and manifest categories. The simplest case is that only one manifest category is considered. This means that only one feature of the framework is relevant: the within-category homogeneity versus heterogeneity. The more complex case is that more than one manifest category is considered. This allows also for investigating qualitative versus quantitative differences and smooth versus abrupt differences as features of the latent structure. For example borderlines could be contrasted with histrionics, {borderlines, histrionics}--or with normals, {borderlines, normals}--or with both, {borderlines, histrionics, normals}.

For the within-category aspects as well as for the between-category aspects, the methodology is relative. The results depend on the choice of indicators and of alternative manifest categories. In order to study within-category homogeneity versus heterogeneity, a set of indicators is needed. For personality disorder categories, symptoms are an evident choice, but a difficult issue is how one can make sure that all relevant symptoms are included. For other types of manifest categories, it is often less evident of what kind the indicators should be. The choice of indicators (features) is also a difficult issue in the cognitive study of concepts and categories (Murphy, 2002, pp. 45-46). In general, one can never be certain whether the crucial indicators are included in the study. On the other hand, an a fortiori type of reasoning applies. If for the indicators that are chosen within-category heterogeneity is found, then one can conclude against homogeneity. The a fortiori argument is that the manifest category will remain heterogeneous when other indicators are added. If, however, a manifest category turns out to be homogeneous, then the conclusion can change if other indicators are added, given that these new indicators may reveal the heterogeneous nature of the manifest category.

For the between-category aspects, the conclusions may also depend on the choices one makes. The latent structure may be category-like in contrast with one alternative manifest category (for example, borderlines in contrast with normals) but not in contrast with a second alternative manifest category (for example, borderlines in contrast with histrionics). When there is more than one manifest category, another complication is which indicators one should consider: indicators of one of the manifest categories (and of which one?) or indicators of all manifest categories. For example, one may investigate the manifest categories "borderlines versus histrionics" with a set of borderline symptoms as indicators, with a set of histrionic symptoms as indicators, or with a set comprised of both. The result may depend on which set of indicators is being used.

One cannot give an absolute answer to the general question whether the borderline personality disorder is category-like or dimension-like, because the answer may depend on the methodology: the indicators and the other groups one wants to consider (e.g., normals? which other personality disorders?). Thus, being category-like is not only a matter of degree, it is also relational in the way it is investigated. If only one manifest category is considered in isolation, then the relational character is less evident, because as soon as heterogeneity is found, the conclusion is that the manifest category is heterogeneous. If, however, a manifest category is studied in the context of other manifest categories, then the position on the horizontal axis depends on which other manifest categories are considered. It may turn out that a diagnosis A shows qualitative differences with diagnosis B, but only quantitative differences with diagnosis C. A general conclusion is not possible in that case, only a relative one: In relation to diagnosis B, diagnosis A shows qualitative differences, but not in relation to diagnosis C.

Modeling Strategy

In order to find out which type of latent structure applies, one should distinguish the horizontal and vertical axes of Table 1 and Figure 1. For each of the axes, the more one goes to the right, the more category-like the structure is. A flow chart for assessing distinctions in the framework appears in Figure 2.

As a basis for the analysis we start from a general formulation of the model for all manifest categories to be considered and with the first category (k=1) as the reference category:

hpik = (ai1 + a'ik)qpk + (bi1 + b'ik) + gk (10)
and qpk ~ N(0, s2k)

ai1 is the discrimination of indicator i for the descriptive dimension of manifest category 1,
a'ik is the deviation of aik from ai1 (a'ik = aik - ai1), so that a'i1 = 0,
bi1 is the location of indicator i for the descriptive dimension of manifest category 1,
b'ik is the deviation of bik from bi1 (b'ik = bik - bi1), so that b'i1 = 0,
gk is the main effect of manifest category k, or the difference in the mean location of category k to category 1.

Because the mean of the qpk is fixed to zero in all manifest categories k, gk indicates the category main effect. If the within-group variance (s2k) is free and possibly different depending on the manifest category, then for one item the discriminations should be restricted to be equal in all manifest categories in order for the model to be identified. In a similar way the location of one item must be restricted to be equal in all manifest categories if the means of q are zero and a gk is used for main effects of manifest categories. Equation 10 will be the basis for all the analyses.

The horizontal axis: Quantitative differences versus qualitative differences. Qualitative differences can be of two types: differences in discrimination and differences in location.

1. If there are no discrimination differences between the manifest categories (discrimination equivalence), then a'ik = 0 for all values of i and k. This implies discrimination equivalence and factorial equivalence in the limited sense.
2. If there are no location differences between the manifest categories, then b'ik = 0 for all values of i and k. This implies location equivalence. Discrimination equivalence and location equivalence are restrictions on Equation 10.

The following order of analyses is proposed. First, the general model of Equation 10 will be estimated without any restrictions. This model is called QUAL1&2-HET, because it describes qualitative differences of both types (QUAL1&2), and because the manifest categories are allowed to be heterogeneous. Second, the discriminations are restricted to be equal over the manifest categories (discrimination equivalence), yielding model QUAL2-HET, because qualitative differences are allowed only for the locations. The QUAL1&2-HET and QUAL2-HET models are variants of a Type 1 structure. Third, the locations are restricted to be equal over manifest categories (location equivalence), yielding model QUAN-HET, because only quantitative differences remain (expressed in gk). The QUAN-HET model is a Type 2 model.

Heterogeneous quantitative differences (Type 2) are nested within heterogeneous qualitative differences (Type 1), and within Type 1, QUAL2-HET is nested within QUAL1&2-HET. We have chosen to estimate three models of decreasing complexity: QUAL1&2-HET, QUAL2-HET, and QUAN-HET, omitting the fourth possible model: QUAL1-HET, a model with location equivalence without discrimination equivalence. We believe it makes sense to restrict the discriminations first, because their estimation is less reliable than the estimation of the locations. Models QUAL1-HET (which will not be tested) and QUAL2-HET are not nested one into the other.

In case of within-category homogeneity, the same logic applies, but now for homogeneous models (HOM instead of HET). The homogeneous models parallel their heterogeneous equivalents with respect to which models are nested within one another. The QUAL1&2-HOM and QUAL2-HOM models are variants of the Type 3 structure, whereas QUAN-HOM corresponds to Type 4.

The vertical axis: Heterogeneity versus homogeneity. As for the investigation of heterogeneity versus homogeneity, when the restriction of qpk to have zero variance for all values of k is imposed, models that parallel the heterogeneous ones are obtained: QUAL1&2-HOM, QUAL2-HOM, and QUAN-HOM. The homogeneous models are nested within their heterogeneous counterparts. Homogeneous qualitative differences (Type 3) are nested within heterogeneous qualitative differences (Type 1), and homogeneous quantitative differences (Type 4) are nested within heterogeneous quantitative differences (Type 2). Note that it is possible that one of the manifest categories is homogeneous and the other is not. This is not a serious complication, as it would mean for example that s21 ¹ 0, whereas s22 = 0, which is a less severe restriction than when both variances are restricted to zero.

In a preliminary and exploratory investigation of heterogeneity, one can use an internal consistency index, Cronbach's a, in each manifest category. High values of this coefficient are an indication of heterogeneity. Low values, however, can have two, possibly combined, causes: low heterogeneity and multidimensionality. Cronbach's a can be tested for statistical significance and can thus also be used in a hypothesis-testing approach.

Smooth versus abrupt differences. To distinguish within the top right cell of Table 1 between smooth versus abrupt differences, we will simply plot the distributions of the q in the different manifest categories in order to inspect the joint distribution on the presence of more than one mode. One can either derive the distribution from the estimated distribution parameters, or one can first estimate the individual qpk and then plot a histogram of these (taking gk into account in both cases). We will use the latter method (with empirical Bayes estimates for the qpk), because it reflects the data better than the theoretical distributions. Note that the differentiation between smooth and abrupt is limited to a Type 2 structure.

Simple versus complex qualitative differences. To distinguish within the top left cell of Table 1 between simple versus complex qualitative differences, we will investigate whether the lack of location equivalence can be reduced to a few saltus parameters. In principle this method could be followed for discriminations as well as for locations, although it was originally formulated for locations, but it turned out that in our applications we found discrimination equivalence.

Statistical approaches to testing. A first aspect of testing is whether a model fits the data in an absolute sense, independently of a comparison with other models. We will follow two approaches to deal with this problem. In one application, a bootstrap method is used, and in the other applications, a Pearson c2-test is used for an equivalent conditional maximum-likelihood (CML) formulation of the selected model, because the CML framework has nicer statistical properties when it comes to testing absolute goodness of fit (Glas, 1988). Given that the issue here is to select the best-fitting model in order to identify the most appropriate latent structure (Type 1, 2, 3, 4), the absolute goodness of fit is less important than the relative goodness of fit.

Second, a broad range of methods is available to test relative goodness of fit. The first kind of test is the likelihood-ratio test. This test is based on –2logL (L for likelihood), also called the deviance. The test compares the deviance value of two models, one of which is nested into the other. The difference of the two deviances is c2-distributed with a number of degrees of freedom equal to the reduction in the number of parameters of the nested model. Unfortunately, this test is no longer valid if one or more of the restrictions includes a boundary value, such as a variance that is fixed to zero. If the test is used nevertheless to test such zero-variance models, then the result is conservative, as explained by Verbeke and Molenberghs (2000). The authors present a correct alternative, but we will use the conservative test as it did not make a difference in the applications whether the correct or the conservative test was used. The regular likelihood-ratio test can be used for the horizontal axis, but for the vertical axis (to distinguish between heterogeneity and homogeneity), the conservative test will be used. The regular likelihood-ratio test can also be used to test simple versus complex qualitative differences, because the saltus models are a reduced form of qualitative differences.

An important problem with model selection is that the more complex models by definition have a higher chance to fit the data, whereas the simpler models are more parsimonious. A good balance of the two qualities is desirable. This explains the popularity of so-called information criteria. The Akaike (1973) information criterion (AIC) or Schwarz's (1978) Bayesian information criterion (BIC) can be used to compare models, while taking their complexity into account. Both the AIC and BIC penalize models for a higher numbers of parameters, the penalization being more severe in the BIC, because it increases with the log of the number of persons, so that the BIC tends to favor the simpler models more than the AIC, especially for a large sample size. For both, the AIC and the BIC, a model is better if the value of these criteria is lower.

It is also possible to test individual parameter values against their null hypothesis value, using Wald tests--dividing a parameter estimate by its standard error. The resulting statistic follows a t-distribution, but for a high number of observations (as in our applications) it can be interpreted as a z-distribution (and asymptotically it is). As is the likelihood ratio test, however, this test is conservative for the null hypothesis of zero variance.

The general question can be raised how well the four types of structure can be differentiated in an empirical study. This is a highly relevant question, but it cannot be answered in general, because all depends on how large the true qualitative between-category differences and the within-category variances are. We have stressed that the four types are not discrete categories but that the differences are a matter of degree--the differences can be small or large. If they are small, then by definition it will be difficult to differentiate. The differentiation issue concerns for example the power of IRT to detect differential item functioning and to reject a lack of heterogeneity. The models we will estimate (from Type 1 to Type 4) are instantiations of the nonlinear mixed models (Davidian & Giltinan, 1995; McCulloch & Searle, 2001), and as such they share the qualities and problems of this category of models. We will come back to the issue of differentiating capacity in the General Discussion in light of the results we obtained.

Software

Two kinds of software are available for testing distinctions in the Dimcat framework: general statistical software and IRT-specific software. In order to estimate indicator parameters while getting rid of person parameters, some programs assume a particular distribution of persons, usually a normal distribution (i.e., they use marginal maximum-likelihood [MML] estimation) (e.g., Adams, Wilson & Wu, 1997; Mislevy, 1984), whereas other programs make no assumptions about the distribution of persons (i.e., they use conditional maximum-likelihood [CML] estimation) (e.g., Molenaar, 1995). A mid-point between these is the use of very flexible distributions such as a histrogram distribution (Adams, Wilson, & Wu, 1997) The software available for model estimation with MML includes general statistical software for nonlinear mixed models--for example, SAS PROC NLMIXED (SAS Institute Inc., 1999)--and IRT-specific software such as BILOG (Mislevy, & Bock, 1989), MULTILOG (Thissen, 1997), and CONQUEST (Wu, Adams, & Wilson, 1998). BILOG and MULTILOG require a normal distribution of persons and allow estimation of indicator discriminations, whereas CONQUEST allows a non-normal distribution of persons and assumes known but possibly unequal indicator discriminations.

An alternative is loglinear modeling, which uses CML estimation of indicator parameters. Using CML, the general program LOGIMO (Kelderman & Steen, 1993) can perform IRT loglinear analyses, and the IRT-specific program OPLM (Verhelst, Glas, & Verstralen, 1994) can perform IRT analyses. Both allow for a priori differences in indicator discriminations but not for the estimation of discrimination parameters. In the absence of a theory that specifies discrimination values a priori, such methods as pre-exploring the data (OPLM includes a subroutine for this purpose) could result in good approximate discrimination values.

Given that SAS is a widely used software package, we will use SAS PROC NLMIXED (SAS Institute, 1999). This procedure was developed for nonlinear mixed models (McCulloch & Searle, 2001). The item response models we described are of this type (Rijmen, Tuerlinckx, De Boeck, & Kuppens, 2003). Our models are nonlinear in two ways: because of a nonlinear link function (e.g. a logistic function, or a normal-ogive function), and because they are not linear in the parameters, as when products of parameters appear in the model (as in aikqpk). The models are mixed because they contain fixed effect parameters as well as random effect parameters. The a's, b's, and g's are fixed effect parameters in that they do not vary at random over individuals, but qpk is a random parameter. The nonlinear mixed models are generalizations of linear regression models. SAS PROC NLMIXED provides not just the logistic variants of the models but also the normal-ogive variants, so that the factor-analytic versions of the models can also be estimated. It is shown in the Appendix how the estimation of models based on Equation 10 can be set up in SAS PROC NLMIXED. One can also consult Appendixes A and B, which are available in the online version of Rijmen et al. (2003) in the PsycARTICLES.

Extensions

The Dimcat framework can be extended in at least three ways.

1. The first extension is to allow for multidimensionality within manifest categories. This requires that qpk be given a dimension index: qpkr , r = 1, ..., R. Note that as presented the framework already allows for multidimensionality between manifest categories (such a structure would fall on the left side of Table 1). In order to deal with multidimensionality within manifest categories, one either assigns indicators to specific dimensions, or one estimates the discriminations of indicators on each dimension (using dimension-specific weights, aikr, with r indicating the dimension: r = 1, .., R). In the latter case, the problem of unreliable estimates of discriminations becomes serious, because there are now F sets of discriminations per manifest category, and possibly K x R sets for the total.

2. The second extension is to allow for polytomous indicators (instead of only binary indicators). Although several models for polytomous variables can be incorporated into the framework, robustness of estimation is improved when the structure of the indicators is specified in advance. For example, in the rating scale model (Andrich, 1978), the steps from one category to another do not depend on the indicator, but in the partial credit model (Masters, 1982), a different location is specified for each response option within each indicator.

3. The third extension is to allow for latent categories (instead of only manifest categories). Latent categories cannot simply be identified on the basis of manifest variables. This extension implies a reformulation of the models in terms of latent classes (Mislevy & Wilson, 1996; Rost, 1990, 1991; Wilson, 1989). The latent classes do not necessarily correspond to the manifest categories--that is, the latent classes approach does not guarantee that the categorical variable of interest will emerge. Consequently, issues regarding the manifest categories cannot be dealt with directly. Furthermore, because latent classes are not defined a priori, they require interpretation before they can be labeled. A generalized approach to formulating such problems was described by Pirolli and Wilson (1998).

Except for the latent class extension, the extended models can in principle be estimated with SAS PROC NLMIXED, but in practice a model with a high dimensionality will prove difficult to estimate. Other IRT software is also available, but it would lead us too far afield to give an overview, and high dimensionality is also a problem for those programs.

Classical Methods to Distinguish Between Qualitative Differences and Quantitative Differences

Instead of using an IRT approach, as we presented, one can concentrate on other methods to distinguish between category-like and dimension-like latent structures. An early and popular method for distinguishing qualitative differences from quantitative differences was checking for multimodality at the manifest level. If two or more manifest categories are investigated, and the joint distribution of the sum scores has multiple modes corresponding to the different manifest categories, then this is considered a clear sign that the manifest categories are qualitatively different. This method has often been applied to investigate the category-like nature of personality disorders (e.g., Kass, Skodol, Charles, Spitzer, & Williams, 1985; Livesley, Jackson, & Schroeder, 1992; Nestadt et al. 1991; Zimmerman & Coryell, 1990). In none of these studies was any evidence found for multimodality. As explained above, this criterion is equivocal. Multimodality shows only that there are large between-category differences at the manifest level, but the difference at the latent level can be either quantitative or qualitative--and, if quantitative, multimodality does not necessarily apply to the latent level. Alternatively, lack of multimodality can occur when the differences between manifest categories are qualitative. One reason for the popularity of multimodality may be the implicit assumption that multimodality at the manifest level was induced by multimodality at the latent level. As discussed above (see Grayson, 1987), this assumption may be mistaken.

A second method for distinguishing qualitative differences from quantitative differences is checking factorial equivalence in its limited sense across manifest categories. If, in different manifest categories, the same factor loadings are found, then it is concluded that the latent structure is dimensional. This method has been applied quite often in the study of personality disorders, with the result that a dimensional structure seems appropriate (e.g., Livesley, Schroeder, Jackson, & Jang, 1994; Livesley & Schroeder, 1990; Tyrer & Alexander, 1979). From the approach we have developed, however, it is clear that factorial equivalence in its limited sense is important, but also it is only half of the story. Factorial equivalence in its limited sense is not sufficient to conclude that a latent structure is thoroughly dimension-like; even when factorial equivalence in its limited sense is fulfilled, the manifest categories can still be qualitatively different, because location equivalence is also necessary. In other words, strict factorial equivalence as defined by Meredith (1993) is required.

A third method for distinguishing qualitative differences from quantitative differences is the taxometric approach developed by Meehl (1973, 1995). Taxometric methods have been applied to many psychological variables, including borderline personality disorder (e.g., Trull, Widiger, & Guthrie, 1990), dissociation (e.g., Waller, Putnam, & Carlson, 1996; Waller & Ross, 1997), worry (e.g., A. M. Ruscio, Borkovec, & Ruscio, 2001), depression (e.g., Haslam & Beck, 1994; A. M. Ruscio & Ruscio, 2002; J. Ruscio & Ruscio, 2000), sexual orientation (e.g., Gangestad, Bailey, & Martin, 2000; Haslam, 1997), and personality (e.g., Gangestad & Snyder, 1985, 1991; Strube, 1989). The main findings were summarized by Haslam and Kim (2002). They conclude that several psychopathological variables are "taxonic" (the term used in taxometrics for category-like), such as schizotypy and the antisocial personality disorder, whereas other variables are "nontaxonic" (dimension-like), such as depression. As for personality variables, Type A personality seems taxonic, whereas the five-factor model traits and the Jungian traits seem nontaxonic.

The taxometric method called MAXCOV-HITMAX (Waller & Meehl, 1998) is based on two assumptions: (a) between categories, the indicators are correlated, and (b) within categories, the indicators are not correlated. As a consequence of these two assumptions, the sum of the indicators can be a good indicator of category membership. Persons with high sum scores will mostly belong to one category, and persons with low sum scores will mostly belong to the other category. On the other hand, persons with moderate sum scores can come from both categories. Therefore, it is expected that the covariance between pairs of indicators will show a curvilinear relation with the sum score of the remaining indicators. In practice, the sum score is divided into intervals, and the covariances are determined for pairs of indicators within each interval. The interval with the maximum covariance (MAXCOV) is called the HITMAX interval. If the curve is flat, then the conclusion is that the latent structure is not category-like but dimension-like.

Taxometric methods were later extended from a pairwise approach to a multivariate approach. Either the first eigenvalue in a principal components analysis is used as a criterion instead of the covariance between pairs of indicators (the MAXEIG-HITMAX method) (Waller & Meehl, 1998), or the distribution of factor scores on the first factor is checked for multimodality (the L-Mode method) (Waller & Meehl, 1998). Waller and Meehl (1998) showed that the HITMAX and L-Mode methods are formally equivalent for the case of homogeneous taxa.

As a way to detect which indicators to select, the MAMBAC method was developed (Meehl & Yonce, 1994). For each pair of potential indicators, one is chosen as the cut indicator and the other as the criterion indicator. A moving cut-off point on the cut indicator is used to investigate the differentiation on the criterion variable. The cut-off splits a group in two. The difference in the mean of the criterion in these two groups measures the differentiation capacity of the cut. The difference is called MAMBAC (Mean Above Minus Below a Cut), and its maximum indicates the optimal cut to differentiate. It can be shown that for discrete categories, the MAMBAC is inverse U-shaped for valid category indicators.

Beauchaine and Beauchaine (2002) have investigated the MAXCOV method on its qualities for cluster analysis. The comparison with k-means clustering was positive for MAXCOV in that it outperformed k-means clustering in the more difficult circumstances (small number of indicators, small sample size, small difference between categories, high nuisance correlations, and small base rates). When the effect size (difference between the two categories) was relatively small (in terms of Cohen's d around .80), however, MAXCOV failed as well. Beauchaine and Beauchaine (2002) noted that successful applications with an effect size smaller than 1.25 (Cohen's d) have not been reported in the literature. It is noteworthy that in their simulation study they generated independent indicators and then added noise to the correlations. Independence is the truth (the ideal), but it can be distorted somewhat in reality, and this does not seem to be a great detriment to MAXCOV. Remember that independence of indicators implies absence of within-category structure and therefore within-category homogeneity at the latent level.

The taxometric approach focuses on whether taxa are homogeneous, which corresponds to the lower portion of Table 1 (i.e., the types that show within-category homogeneity). Within-category homogeneity is called an auxiliary assumption (Waller & Meehl, 1998, p. 17), because it is an ideal situation; nevertheless, simulation studies have shown that violations of this assumption can occur without detrimental effects for the approach (Waller & Meehl, 1998). Moderate correlations within categories (i.e., moderate within-category heterogeneity) do not hamper the application and power of the taxometric approach (Meehl & Golden, 1982). Large correlations within categories (i.e., large within-category heterogeneity) can be handled using an extension of the MAXCOV approach (Meehl, 1995). Still, the basic idea is that manifest categories are relatively homogeneous by comparison with the between-category differences. Concentrating on relative homogeneity as the concept of category-likeness is reasonable; homogeneity is also a basic assumption in the latent class model. The extension of the taxometric approach to categories that are heterogeneous at the latent level would be similar to the vertical axis of Dimcat.

Another feature of the taxometric approach is that it is limited to applications in which only two categories are investigated (Beauchaine & Beauchaine, 2002). Because two homogeneous categories can always be explained by one bipolar factor, it makes sense that the distinction between qualitative differences and quantitative differences between categories has not been drawn in this literature. All differences are considered quantitative, like differences between factor scores. Our approach makes more stringent requirements on quantitative differences, because all indicators are assumed to have positive discriminations (a's). By explicitly considering the possibility of non-quantitative differences, our approach complements the taxometric approach.

Given that Dimcat applies to binary indicators, it is perhaps specialized in a type of indicators that is sometimes considered problematic for the use of taxometric methods (Miller, 1996; Ruscio, 2000). According to Haslam and Kim (2002), about half of the studies to date make use of dichotomous indicators, and they concluded that taxometric methods are valid for dichotomous indicators as well, but they cautioned that large sample sizes are required, and they recommended that "researchers should use continuous indicators whenever possible, but not shrink from using dichotomous indicators when there is no alternative" (p. 306). This recommendation contrasts with the fact that Dimcat applies equally well to dichotomous and polytomous indicators (but not yet to continuous indicators in the way it is elaborated here).

In sum, various methods relate to our approach, and each stresses one aspect of category-like structure. They are either based on an underlying concept of category-likeness as showing abrupt between-category differences (multimodality), discrimination equivalence (factorial equivalence in its limited sense), or relative homogeneity along a latent dimension (MAXCOV). Implicit in all of these approaches is the assumption of a mainly monothetic definition of category-likeness (but see the earlier quotation from Waller and Meehl [1998, p. 9]). The difference with our approach is that we explicitly include all of these aspects of category-likeness within a broader framework. A category can be category-like in different ways, and a dimension can also be dimension-like in different ways. In this polythetic definition of category-likeness, being category-like is both complex and a matter of degree.

We realize that Dimcat is not exhaustive. Extensions are possible, as indicated above. The aspects we have stressed are not to be seen as absolute, and from a different perspective one can stress other aspects. We will now illustrate Dimcat as described here with three applications.

Section 3: Three Applications

In this section, we describe applications to (a) personality disorders, (b) attitudes toward capital punishment, and (c) stages of cognitive development. In all three applications, manifest categories were defined, either on the basis of expert judgment (by clinicians for personality disorders, by respondents for attitudes) or on the basis of segmentation (for developmental stages).

Application 1: Dramatic, Erratic Personality Disorders

In psychiatry, one used to think of disorders as categories of persons with a typical pattern of symptoms, called a syndrome. The categorical view, however, came under attack, especially with regard to personality disorders (e.g., Livesley et al., 1994; Widiger, 1992). First, patients within a category showed heterogeneous symptoms. Second, disorders seemed to come in degrees, both within the category and in comparison with the absence of the disorder. A twofold reaction to these findings has included (a) revision of the diagnostic system and (b) research on the category/dimension issue.

Psychiatric diagnosis has come to rely primarily on matching of features on a list provided by the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association, 1994). The syndromes defined by such features are supposed to be atheoretical and purely descriptive. The categories of the DSM-IV are not categories in the classical sense defined by singly necessary and jointly sufficient criteria; rather, they are more akin to prototypes, because they are defined by showing a certain number of features from a list, with each feature typically being equally weighted.

Researchers have shown how a prototype approach can be applied directly to the classification of psychopathology. For example, the prototype view has been contrasted with the classical view of psychiatric diagnosis (Cantor, Smith, French, & Mezzich, 1980). A method for deriving prototypes of psychopathology and for clarifying disagreement in psychiatric judgments has been developed (Horowitz, Post, French, Wallis, & Siegelman, 1981; Horowitz, Wright, Lowenstein, & Parad, 1981). A method for using prototypes to construct scales with superior predictive validity has been developed (Broughton, 1984). A prototype approach has been applied to the classification of borderline personality disorder (Clarkin, Widiger, Frances, Hurt, & Gilmore, 1983). Indeed, the concept of mental disorder itself has been speculated to comprise a prototype (Lilienfeld & Marino, 1995).

Overall, the prototype approach has become quite popular as a conceptual model for the domain of psychological disorders. The prototype has to do with the concept of the disorder held by, for example, clinicians or laypersons. A different approach has been taken to answering the question of the structure of mental disorders in reality--namely, the taxometric approach (e.g., Meehl, 1995). The taxometric approach focuses on identifying whether a disorder is category-like or dimension-like at the latent level.

The DSM-IV (APA, 1994) reflects a revision such that diagnosis is based on showing a critical number of symptoms from a list, independently of the specific symptoms shown. This approach allows for heterogeneous symptom patterns, on the condition that they come from the list of symptoms associated with the disorder. The DSM-IV authors did not go so far as to reject the idea of categories altogether. The disorders are still categories, but the categories show within-category heterogeneity. One may wonder what is the basis for resistance against giving up the notion of personality disorder categories altogether. The resistance may be inspired by a cognitive bias toward thinking in categories, which may lead some to feel that categories of personality disorders tally with their experience of reality. Social psychology has a tradition of theories based on the assumption that people tend to categorize other people (Tajfel, 1981; Wilder, 1981), and this is also the view in cognitive psychology (Malt, 1993; Smith, 1995). This argument has been invoked by Beauchaine and Waters (2003) to cast doubt on methods that are based on ratings.

The issue of whether disorders are category-like or dimension-like has become a topic of research and debate. A large majority of studies reject the categorical view in favor of the dimensional view. The essence of the dimensional view is that persons with the disorder simply have a more extreme location on a dimension as compared with persons without the disorder. Three main empirical arguments have been presented for the dimensional view of personality disorders. First, personality disorders do not show bimodality (e.g., Kass et al., 1985; Nestadt et al. 1991; Zimmerman & Coryell, 1990). Second, personality disorders show factorial equivalence in its limited sense (e.g., Livesley et al., 1992). Third, personality disorders do not show relative homogeneity as derived from the MAXCOV methodology (e.g., Trull et al., 1990). The issue we are raising is deeper than the formal issue of whether one should treat personality disorders as category-like, dimension-like, or some combination. If substantial qualitative differences exist, then the meaning of a symptom differs depending on the group to which a person belongs. Thus, the issue has consequences for both the theory and assessment of symptoms and syndromes of psychopathology. A consequence for diagnostic purposes is that a simple score based on symptoms, such as a sum score, can no longer be compared from one group to another.

In the present study (based on Maesschalck, 1998), we focused on borderline personality disorder (BPD) as compared with two other personality disorders of Cluster B (the dramatic, erratic cluster): histrionic personality disorder (HPD) and antisocial personality disorder (APD). These three disorders were compared with respect to the DSM-IV symptoms for BPD. In this connection, we noted earlier that one aspect of the dimension/category issue is relativity to the groups compared.

Some words of caution are needed to see the study in the correct light. First, we used a particular selection of indicators, and the results may depend on the indicators considered. This is a basic feature of our approach and of all other approaches. This is what we meant by deeming the approach "relational."

Second, we used ratings by clinicians. Ratings do not necessarily reflect the truth, and especially not when it is assumed, as Beauchaine and Waters (2003) did, that people tend to view others in terms of categories. These authors point to the possible effects of implicit typologies. They were able to create a mindset in the participants that had effects on how category-like the ratings were (assessed with a taxometric technique), which illustrates the possibility that results can be affected by prior beliefs about the structure of a domain (category-like or dimension-like). This brings us to the approach we took in the introduction when we described the cognitive approach to categories. Because we relied on expert ratings, we cannot claim more than cognitive relevance of the results.

Third, the manifest categories are not mutually exclusive. In psychopathology, overlap is called comorbidity. In our study we excluded overlap among the three personality disorder categories, so the conclusions refer to the pure categories. It sounds reasonable that if the pure diagnoses are category-like or dimension-like, then that there is a category-like or dimension-like core, respectively. Adding cases with a multiple diagnoses could modify the picture, in two ways. First, the overlap creates new manifest categories, so-called conjunctive categories, and their relation with the indicators cannot simply be derived from what is true for the component categories (Storms, De Boeck, Hampton, & Van Mechelen, 1999). Treating a multiple diagnosis as a separate manifest category is difficult, however, because for three diagnostic categories, there are four types of multiple diagnosis, and it becomes difficult to obtain broad enough samples for a reasonably strong conclusion. Second, mixed categories may better reflect reality, but it would be hard to tell whether the conclusions are not solely based on the cases with overlap.

Method

Participants. The sample comprised 370 Dutch-speaking Belgians from 30 inpatient, outpatient, and prison facilities: 122 diagnosed with BPD, 123 with HPD, and 125 with APD. The BPD group included 74% females, the HPD group 77% females, and the APD group 14% females. With regard to marital status, the BPD group included 65% singles, the HPD group 43% singles, and the APD group 54% singles.

Manifest categories. Axis I and II diagnoses were made in the three weeks after first admission or consultation, by one or more diagnosticians, usually including a senior psychiatrist. These diagnoses, which defined the manifest categories, were instances of expert judgment.

Indicators. Each patient was also rated by a clinician other than those on the initial diagnostic team on a list of nine DSM-IV symptoms of BPD. The rating clinicians were unaware of the original diagnosis. Note that this methodological feature of the study favors its objectivity, but at the same time makes it less relevant from a cognitive perspective. In order to draw conclusions of a cognitive kind, it is to be preferred that the same persons rate the indicators and do the categorization. Symptom ratings were based on information from charts, staff meetings, and contacts between the clinician and the patient. The symptoms were presented in a random order to be judged on a 4-point scale from 0 (least severe) to 3 (most severe). In the instructions, scale points 0 and 1 were defined as non-pathological, whereas scale points 3 and 4 were defined as pathological. Responses were later dichotomized, such that 0 and 1 were recoded as 0 (less severe, non-pathological), and 2 and 3 were recoded as 1 (more severe, pathological).

Analyses. As preliminary analyses we investigated within-category homogeneity using the internal consistency of the BPD symptoms within each diagnostic group (BPD, HPD, and APD). A Cronbach's a significantly larger than zero would indicate that the symptom attribution at the latent level was different among patients, making the manifest category heterogeneous. Second, we also applied a group-wise principal-components analysis (PCA) (Kiers, 1990) to explore the dimensionality of the symptoms in the three diagnostic groups. If the percentage of explained variance for a common solution across the entire sample were about equal to the percentage of explained variance for the separate solutions within each manifest category, then we can assume that the structure is equivalent in the three diagnostic groups. Furthermore, if a borderline component appears with no extremely low loadings for some of the symptoms, then all symptoms will be used for the Dimcat analyses, but if some symptoms have extremely low loadings, then they will be excluded from further analyses.

The full modeling approach was followed next, as explained in the section on Modeling Strategy, making use of SAS PROC NLMIXED for the dichotomized data. In order to test absolute goodness of fit, we used a bootstrap approach (Efron & Tibshirani, 1993). One of the aspects investigated is how well the correlations between indicators within each group could be explained from the model. Because we used unidimensional models within each diagnostic group, this bootstrap of correlations is also a test on the undimensionality of the heterogeneous within-category structure. As mentioned earlier, Sanislow et al. (2002) presented a multidimensional model (but with extremely high correlations among the dimensions), so that we want to make sure we do not have to expand our model to be multidimensional as well (within each of the manifest categories). Note that it is possible to find unidimensionality within manifest categories, although the single dimension is different depending on the manifest category, implying that for the total group the model is multidimensional. When the persons belong to different manifest categories and a joint analysis is performed, one can conclude that the structure is multidimensional, whereas in fact it is unidimensional within each manifest category. Such a result would be perfectly in agreement with a Type 1 structure.

Results

First, in all three diagnostic groups, a statistically significant Cronbach's a was found for the BPD symptoms. The values were .49 for BPD, .61 for HPD, and .67 for APD (all p's < .01). The Cronbach's a of .65 (p < .01) for the total sample is not really relevant, because it is also based on between-category variance. Because the person differences with respect to the BPD symptoms within each diagnostic group were systematic, the diagnostic groups were heterogeneous as to how borderline-like they were.

Second, the percentage of variance explained by a PCA within each group separately (based on the 4-point scale ratings) was about as high as when a common solution was imposed on all three diagnostic groups (about 40%). Two of the nine BPD symptoms did not reach a loading of .30 on the BPD component and were therefore removed from further analyses. These symptoms were "inappropriate anger" and "impulsivity in two areas." These symptoms were omitted from further analyses, so that seven symptoms remain. The symptoms were not removed due to group differences in discrimination but due to overall low discrimination. The two eliminated symptoms did not belong to a common factor in the study by Sanislow et al. (2002), so the kind of multidimensionality found in that study cannot explain the poor results for the two symptoms. Other symptoms belonging to the same factor had rather large factor loadings in our study.

Following the strategy presented in Figure 2 and explained in the section on Modeling Strategy, we began by investigating the nature of the between-category differences, based on three models. Using a likelihood-ratio test, it was found that the goodness of fit of the QUAL2-HET model was not statistically significantly worse than that of the QUAL1&2-HET model (c2[12] = 14.3, p < .10). This means we can assume discrimination equivalence. When the QUANT-HET model was compared with the QUAL2-HET model, however, it turned out that its goodness of fit was worse (c2[12] = 56.5, p < .001), which also implies that its goodness of fit was worse than that of the QUAL1&2-HET model. Therefore, we cannot conclude that we have location equivalence--there seemed to be qualitative differences in terms of locations between the manifest categories.

In order to identify the location differences, we inspected the b'ik estimates, because they indicate the deviations of the locations in the HPD and APD groups from the HPD group. Two of these parameters estimates were statistically significant. In the HPD group, the location deviation parameter of the symptom "avoidance of abandonment" was –1.261 (t[369] = 2.79, p < .01), meaning that this symptom was "easier" (relatively more prevalent, more typical) in the HPD group than in the BPD group. In the APD group, the location deviation parameter of the symptom "identity disturbance" was +1.604 (t[369] = -2.11, p < .05), meaning that this symptom was "more difficult" (relatively less prevalent, less typical) in the APD group than in the BPD group. When for each of these two location differences a saltus parameter was used and all other locations were assumed to be equal over the three groups, the resulting saltus model had a statistically significantly worse fit than the QUAL2-HET model (c2[10] = 42.3, p < .001), implying that it did not suffice to limit the location differences. The best fit we obtained exploring various saltus models was with a model containing only one saltus parameter, for an increased prevalence of "avoidance of abandonment" in both the HPD and APD group, and for an increased prevalence of "affective instability" in the HPD group. This model performed still worse than the QUAL2-HET model (c2[11] = 24.4, p < .05), but the difference was not very large. The saltus parameter picked up three location deviations, two of which were rather large but not statistically significant, and in the model these three were treated as equal.

We also compared the models on the AIC and BIC criteria: the lower the values, the better the model. The AIC value of the best saltus model was only slightly larger than that of the QUAL2-HET model (2,930.1 versus 2,928.4), and its BIC value was lower (3,004.5 versus 3,045.1), so that it can be considered a reasonable but strictly speaking not sufficient approximation. For the sake of completeness, it should be mentioned that the AIC and BIC values of the QUAL1&2-HET model were 2,937.4 and 3,101.8, respectively. Both values were higher than the corresponding values of the QUAL2-HET and QUAN-HET models.

When the model was further restricted to have zero variance within the three groups, the goodness of fit is dramatically lower following the likelihood-ratio test, which was conservative given the boundary value of the null hypothesis (c2[2] = 180, p < .001). Each of the variance estimates is highly significant in the QUAL2-HET model using a Wald test (which is also conservative in this case). Therefore we must conclude that the diagnostic groups were heterogeneous. Taking together the conclusions regarding the vertical and the horizontal axes, we end up with a Type 1 structure: between-category qualitative differences and within-category heterogeneity. A reasonably good saltus model was found, so that the qualitative differences can be considered rather simple.

We will now further explore the model that came out as the best, the QUAL2-HET model, a model with discrimination equivalence but not with location equivalence. This model implies a 2PL model within each diagnosis with equal discriminations between diagnoses. To test this model, we applied a bootstrap methodology. Starting from the parameter estimates, we generated 2,000 new data sets, and in each of these data sets the following statistics were derived: Pearson correlations (phi's) between the indicators within each diagnostic group (yielding 21 x 3 correlations), and differences in assigned symptom proportions for the HPD and APD groups in comparison with the BPD group as the reference group (7 x 2 differences). Of the 63 correlations only two fell outside the bootstrap-based .01 confidence interval, and three more fell outside the corresponding .05 confidence interval. This is a remarkably good result, from which it can be concluded that the model and also its unidimensionality within groups should not be rejected. The result was even better where the proportion differences were concerned. All 14 differences fell right in the middle of the confidence interval, implying that the model captures the location differences very well. Based on this bootstrap result, we can accept the QUAL2-HET model as a valid model for our data.

Apart from the crucial aspects of this model to decide on the type of latent structure (in this case Type 1), some other aspects of the model are of interest. First, the variances in the three groups differed. The variance in the BPD groups was fixed to 1.00 as an identification restriction, and the estimates in the other two groups were 1.292 (HPD) and 2.288 (APD). These differences were in agreement with the size order of the internal-consistency coefficients that were reported earlier. Larger variance typically means larger consistency. Second, HPD and APD were less borderline than BPD. The difference of HPD from BPD was –1.311 (gHPD) and the difference of APD from HPD was –2.305 (gAPD) , and they were both statistically significantly different from zero (p < .001), meaning that overall group effects were statistically significant. The most borderline group was BPD, as expected, followed by HPD and APD.

Similar studies were conducted on the diagnoses of HPD and APD, using histrionic and antisocial symptom lists from the DSM-IV, respectively (Maesschalck, 1998). For HPD, the result was similar, in that only simple qualitative differences in location were found. For APD, however, the qualitative differences could not be reduced to a few saltus parameters; the pattern of APD indicator values was quite different among the diagnostic groups.

Discussion

Strictly speaking, with respect to BPD symptoms, there is evidence for qualitative differences between the three groups. These differences can be attributed to a few symptoms. Only two symptoms showed a statistically significant location deviation, and a saltus with a common jump of two symptoms yielded a good approximation. Taking together the differences we have discussed earlier, there seems evidence for the following: (a) "Affective instability" is relatively more common in the HPD group and less common in the APD group than in the BPD group. (b) "Avoidance of abandonment" is relatively more common in both the HPD and APD groups than in the BPD group. (c) "Identity disturbance" is relatively less common in the APD group than in the BPD group. This summary is based on the two statistically significant deviations and on the saltus model that provides a good approximation. Note that the result for "avoidance of abandonment" could be attributed to deficiencies in the indicator rather than to qualitative differences in the diagnostic groups. Specifically, most of the APD patients were prisoners, so they were likely to show this symptom by reason of their isolation in prison rather than their personality disorder. This result illustrates the general point that there are two kinds of qualitative differences: those that indicate true qualitative differences due to the manifest categories and those that indicate differences due to irrelevant reasons. With regard to patients in prison, the abandonment symptom was a theoretically poor indicator; thus, it is not surprising that it also showed a location difference. The best alternative in such a case is probably to remove the indicator from consideration. For the same symptom, however, a location difference was found with the HPD group. In sum, the two diagnoses show simple and rather weak qualitative differences of a kind that one would not have been detected with a simple test for factorial equivalence in its limited sense.

This conclusion cannot be taken as an absolute, because of the restrictions we mentioned earlier. The data concern only a limited number of indicators, although very important ones, and they are based on ratings by clinicians. Because of the latter, our conclusion must primarily relate to the dimension-like versus category-like nature of judgments made by clinicians. As such the results can also be looked upon from the cognitive perspective on categories. The clinicians' category of borderline personality disorder (independently of whether it reflects the true state of affairs) is a manifest category with a latent continuum, with some BPD members being better members of the category than others. The result may have been cognitively induced, although the experts who rated the indicators were different from those who made the diagnosis. The structure within the manifest category is unidimensional, the stochastic variant of what Storms and De Boeck (1997) called a triangular structure. The HPD and APD patients not only are less borderline but also show some slight qualitative differences, enough to conclude that BPD is category-like in at least one respect: that of qualitative between-category differences.

Our findings regarding BPD may not generalize to other categories of personality disorders. For example, numerous studies have found taxometric evidence for the taxonic nature of schizotypy (Golden & Meehl, 1979; Korfine & Lenzenweger, 1995; Lenzenweger, 1999; Lenzenweger & Korfine, 1992; Meehl, 1993) and of APD (Skilling, Quincey, & Craig, 2001), whereas the evidence is more equivocal for BPD, as concluded by Haslam and Kim (2002). This shows that being category-like may depend on the personality disorder, which was also the case for the data we used (Maesschalck, 1998), showing that APD was clearly more category-like than BPD and HPD.

The phenomena we identified at the latent level can be considered endophenotypes. These refer to the phenotype but go deeper than the manifest indicators. Endophenotypes, when category-like, comprise natural kinds, non-arbitrary discontinuities; when dimension-like, they comprise equally non-arbitrary continuities. Haslam (2002) noted,

Of course, a discrete psychopathological kind might arise out of an essence-like cause such as a genetic abnormality (e.g., Down's syndrome) or germ (e.g., general paresis). However, other non-essentialist models are also possible, for example developmental polarization, non-linear interactions of vulnerability factors (e.g., emergenesis), and threshold effects.

A continuous endophenotype, by contrast, is likely to result from divergent causes, such as polygenic influences, idiosyncratic environments, and "bad luck" (cf. Meehl, 1978). When an essence-like cause becomes known, an endophenotype becomes a closed concept, but, contrary to the essentialist beliefs of most laypersons (Haslam & Ernst, 2002), most endophenotypes in psychopathology (including category-like ones) have no essence-like cause and thus remain open concepts.

It is somewhat surprising that the identification of endophenotypes has not always been the primary concern in the classification of psychopathology. Rather, a great deal of concern has been with operationally defined diagnoses, which are closed concepts defined not by specific etiologies but merely by indicators at the manifest level. That is, a psychiatric diagnosis in the DSM-IV is completely defined in terms of the operations or measurements (symptoms) used to recognize it. These operational definitions have come to be polythetic in the DSM-IV, based on a prototype understanding of mental disorders, because of the pronounced failure to find singly necessary and jointly sufficient conditions for disorders in earlier DSMs (e.g., Carson, 1991). The operational approach was espoused in order to increase interjudge reliability. The consequent increase in reliability was purchased at the price of a decreased theoretical basis (e.g., Morey, 1991) and, more formally, a lack of interest in the latent structure. The reason is that observed symptoms were preferred to unobserved symptoms, because they were easier to recognize and thus reliably assess. This contrasts with the explanatory approach in cognitive psychology, discussed earlier (e.g., Muphy & Medin, 1985), in which the glue that ties concepts together is a theory-based understanding of the world (e.g., Kim & Ahn, 2002). Although we did not investigate the theoretical basis of the diagnostic categories, we assessed the validity of several latent structure models of BPD. As far as the endophenotypes are concerned, we were able to find out what the BPD endophenotype is--not all aspects of it, but those related to the DSM-IV borderline indicators.

Beyond the question of the category-like versus dimension-like latent structure of psychiatric diagnoses, at least two controversial issues within psychopathology research could be addressed using Dimcat. The first issue is whether putatively distinct disorders are not really identical. Consider several examples on the border between Axis I and Axis II: avoidant personality disorder and social phobia, schizotypal personality disorder and schizophrenia, borderline personality disorder and mood disorders, antisocial personality disorder and substance use disorders, and depressive personality disorder and dysthymia (Endler & Kocovski, 2002; Widiger & Shea, 1991). Frances, Widiger, and Fyer (1990) noted,

it is rarely clear, when a given symptom serves as a defining feature of two different categories, whether the resulting overlap between them reflects the true state of the relationship or is an unnecessary artifact based on the choice of the identical definitional items in both sets. (pp. 47)

From the perspective of Dimcat, this question can be answered rather straightforwardly. First, the combined symptom lists could be taken as indicators. Whether the symptoms overlap or not does not matter. Second, the diagnoses could be taken as manifest categories. Third, Dimcat could be applied. If the disorders were qualitatively distinct, then they would obviously not be identical. If the disorders were only quantitatively distinct, then they would be identical if the difference between the distributions was of a magnitude considered pragmatically negligible.

The second issue is whether a psychiatric diagnosis can be adequately assessed by a self-report inventory. This issue has been debated with respect to using students who score highly on the Beck Depression Inventory as "analogs" of patients diagnosed with major depression (e.g., Coyne, 1994; Flett, Vredenburg, & Krames, 1997; Vredenburg, Flett, & Krames, 1993; A. M. Ruscio & Ruscio, 2002). From the perspective of Dimcat, this question can also be answered rather straightforwardly. First, the items in the inventory could be taken as indicators. Second, the diagnosis versus the absence of the diagnosis could be taken as manifest categories. Third, Dimcat could be applied. If the diagnosis was qualitatively distinct from its absence, then the self-report inventory would not be an adequate representation of the diagnosis--it would mean that the inventory was measuring qualitatively distinct phenomena for persons with and without the diagnosis. If the diagnosis was only quantitatively distinct from its absence, however, then the latent dimension defined by the self-report inventory would be an adequate representation of the diagnosis.

Application 2: Attitudes Toward Capital Punishment

Capital punishment is a controversial issue. In Belgium, capital punishment was legal until 1996, but it had not been practiced since 1950. In 1996, Belgians voted to ban capital punishment at a time when it was no longer a real issue. Later that year a man was accused of kidnapping, raping, and murdering several girls between the ages of 8 and 17. This case was in the news for over a year and, not surprisingly, affected many people's opinions of capital punishment. The legalization of capital punishment again became a topic of heavy discussion and a source of controversy. At the time we conducted the present study, there seemed to be two clear-cut public opinions: one in favor of capital punishment, one opposed.

We studied attitudes toward different types of crimes varying in the following characteristics: murder or other crimes, sexual or non-sexual crimes, and child or adult victim. A group of respondents was interviewed and asked whether, in their opinions, persons who committed the kind of crime in question should be considered for capital punishment if it were legal. Our first interest was whether the attitudes were qualitatively distinct. This kind of question is not uncommon for attitude research. Eagly and Chaiken (1993), for example, asked whether the relation between liberalism and conservatism, which might seem opposite poles of a single dimension, was actually more complex. One explanation for the latter structure would be that the two groups differ in the values considered relevant to an issue. In the present context, these may concern the unconditional value of human life, the acceptability of revenge, and the seriousness of a crime. The criteria for seriousness of a crime may include taking someone else's life, sexual abuse, and vulnerability of the victim. Differences in these criteria should result in a qualitatively different scale for seriousness of a crime between groups in favor of and opposed to capital punishment.

Our second interest was whether the attitudes were heterogeneous. Only if the attitude groups were heterogeneous could within-category person differences be observed.

Our third interest was related to the study of cognitive categories. Because the data in this application were self-rating data, and because the rating of the indicators and the classification were both made by the same respondents, a cognitive approach to the categories seemed relevant. This offered us an opportunity to test the Generalized Context Model, because it focuses on classification into two categories, and because the respondents both classified themselves in two categories and made the indicator ratings. Let us assume that the respondents decided on whether they were in favor of or against the legalization of capital punishment from what they heared from others. For example, they heared what other people said about various crimes and how the criminals should be treated. These other people can be considered the exemplars of the learning set, before the respondents decided on the classification of their own opinion. The alternative to the exemplar theory is that the self-classification in the two legalization opinions is based on two prototypes.

Method

Participants. In several small towns along the Belgian coast, 300 adults (50% women) were interviewed in 1998 about various types of crime. At that time, the above-mentioned case of child abuse was still very alive in the minds of Belgian people, as indicated by the attention the case received in the media. Based on the response to a single question at the end of the interview, 202 respondents were in favor of legalizing capital punishment, 98 were against. Indicators. The interview consisted of 10 questions, 9 of which referred to the following crimes, in this order: (a) serial murder, (b) murder of one's whole family, (c) murder of a family member, (d) sex murder of an adult, (e) sex murder of a child, (f) robbery with murder of an adult, (g) robbery with murder of a child, (h) rape of an adult, and (i) rape of a child. For each crime, the question was whether the respondent would consider capital punishment appropriate if it were legal ("yes" or "no"). The tenth question was whether the respondent was for or against the legalization of capital punishment.

Manifest categories. Two manifest categories of attitudes were distinguished on the basis of the tenth question: one in favor of legalization, one against legalization. These categories were based on expert judgment, with respondents considered experts on their own attitudes.

Analyses. The main part of the analyses were again based on Dimcat. Because we experienced estimation problems with the more complex models, however, most likely due to the manifest distribution of the data, we based part of the analysis on a conditional maximum-likelihood (CML) approach using the OPLM program (Verhelst, et al., 1994).

In order to analyze the data following the Generalized Context Model (Nosofsky, 1992; Nosofsky & Palmeri, 1997), we made the (arbitrary) choice to select the response patterns of 40 randomly sampled respondents of each group as the learning stimuli and the remaining response patterns as the test stimuli. This is as if the respondents first had been informed about 40 people's opinion (through daily life discussions) before they decided on their own attitude category (in favor of or against legalization) based on what they think of how the criminals should be treated. The procedure was repeated five times, each time with a randomly sampled learning subset from each group, and with the remaining respondents as the subset of test stimuli. To compare the prototype model to the exemplar-based model, the same five sets of learning stimuli and test stimuli were used for the two models. The prototypes for the two categories were defined on an a priori basis. As the prototype for the pro-legalization category, we have taken the overall 1-pattern for all indicators, and as the prototype for the anti-legalization category, we have taken the overall 0-pattern for all indicators (the complement of the first prototype).

For both models, the nine binary indicators were used as nine binary features or dimensions. The maximum-likelihood-based analysis was performed with two different similarity functions, one with an exponential decay (q = 1) and another with a Gaussian decay (q = 2), and with a city-block metric (because of the binary features and a better goodness of fit than the Euclidian metric). Eleven parameters were estimated for both models: c (an overall scaling parameter--the higher its value, the larger the weight of close similarities), b (response bias toward the category in favor of legalization), and nine indicator weights (eight of which were free parameters, given that their sum is one).

Results

We again used the sequential modeling strategy explained earlier. The first model to be tested, however, the QUAL1&2-HET model, yielded convergence problems and extreme parameter estimates. This was also true for simpler models with estimated discriminations and different variances depending on the group. A possible reason for the problems was the distribution of the persons. The frequencies of the 10 possible sum scores were as follows: 63, 6, 6, 6, 16, 15, 27, 32, 39, 90. Of 300 total respondents, 63 would not consider capital punishment for any of the nine crimes, and 90 would consider it for all nine crimes. The proportions of "yes" responses for the nine crimes are given in Table 2. Because of this unusual distribution, we decided to shift to a CML approach, because it is free of distribution assumptions. It has been shown that asymptotically CML estimation corresponds to allowing a histogram distribution with as many nodes as needed (de Leeuw & Verhelst, 1986). The problem with a CML approach is that the discrimination parameters need to be fixed. We made use of the possibility offered by OPLM (Verhelst & Glas, 1995) to work with different degrees of discrimination, using a module of the program. This module suggested a set of indicator discrimination values to be imputed before the estimation step (see Table 2). Non-murder crimes were weighted somewhat lower, which is not surprising, because the preponderance of murder crimes among the indicators made the descriptive dimension into a "murder" dimension. The two kinds of murder with a child as a victim had the highest discriminations, showing that they were among the most relevant for the underlying attitude that was expressed by the respondents.

The one-parameter logistic model (OPLM) (Verhelst & Glas, 1995) is based on conditioning on sufficient person statistics, and it is therefore saturated with respect to the person distribution (de Leeuw & Verhelst, 1986). Using the discrimination values suggested by the OPLM module for both attitude groups (the same for both), a model with discrimination equivalence and location equivalence was fitted to the data with the OPLM program (Verhelst et al, 1994). The model fit the data quite well when tested with a Pearson-c2-based test statistic: the R1c described by (Glas, 1988). The R1c value was 15.75 (df = 17, p = .54). Thus, we can conclude that a Type 2 structure had a reasonable goodness of fit.

Next we estimated a QUAN-HET model with SAS PROC NLMIXED with the same fixed discrimination values (see Table 2) and also with location equivalence. The resulting deviance was 1,630.9, and the corresponding AIC and BIC values were 1,654.9 and 1,699.4, respectively. The deviance of this model was only slightly higher than that of the corresponding CML model (1,630.9 versus 1,625.9), so that the difference in distribution between the two approaches did not seem to play an important role in the goodness of fit. When a QUAN-HET model with equal degrees of discrimination was estimated with SAS PROC NLMIXED, however, the result was a deviance of 1,582.0, with corresponding AIC and BIC values of 1,606.0 and 1,650.4, respectively. From this result it seemed that equal discriminations were a good option when a normal distribution was assumed. Assuming equal discriminations for all indicators, we estimated a QUAL-HET2 model in the next step, which is actually a step back in the order of testing. The resulting deviance was 1,576.7. Based on a likelihood-ratio test, this is not statistically significantly lower than the deviance of the QUAL-HET2 model with equal discriminations (c2[8] = 5.30, p > .10). Accepting location differences between the two groups did not seem to pay off. Therefore, we continued with the QUAN-HET model with equal discriminations for all indicators as the reference model.

We tested this model against the QUAN-HOM model in order to make a choice along the vertical axis in Table 1. The resulting deviance was 2,133.1, and the corresponding conservative likelihood-ratio test was statistically significant (c2[2] = 551.1, p < .001). The conclusion must be that the QUAN-HET model was the better one and that the groups were heterogeneous. From an inspection of the QUAN-HET parameter estimates, the two attitude groups seemed to differ in attitude level as well as in heterogeneity. When reporting the estimates, we mentioned the standard errors in parentheses. The estimate of the group effect on the latent continuum (the estimate of ganti) was –8.847 (.847), which was statistically significant. The group that was against capital punishment was located much lower on the attitude continuum than was the group that was for capital punishment. The variance of the two attitude groups was quite different: s2pro = 4.227 (.842), and s2anti = 17.391 (3.404). Both estimates were statistically significantly different from zero using the conservative Wald test for variances. This confirmed the earlier conclusion that the groups were heterogeneous. The difference between the two variances was also estimated (in a separate run). The result was 13.164 (3.444), which was statistically significant using a Wald test. The latent structure for the two groups seemed to be one with a relatively homogeneous group in favor that is rather far above a much more heterogeneous group against.

In order to have a better view on how the respondents in both groups were distributed along the latent continuum, empirical Bayes estimates were derived, again using SAS PROC NLMIXED. A histogram of the resulting latent distribution is shown in Figure 3. The distribution was clearly bimodal. The anti-legalization group (left on the continuum) might seem rather homogeneous in Figure 3, but in fact it overlaps with the pro-legalization group, which explains its larger variance. Given the pronounced bimodality of the distribution, the quantitative difference between the two attitude groups must be considered to be abrupt.

As for testing the exemplar model and the prototype model with q = 1, the means of the log likelihoods were 78.6 and 78.9, respectively, and for q = 2 the corresponding values were 76.5 and 78.9, respectively. (The value of q does not make a difference for the prototype model because of the way the prototypes were defined.) This means that the two models performed about equally well. For the prototype model, the c estimate varied between 1.98 and 9.54, whereas the corresponding values for the exemplar model were more extreme--from 6.43 to 15.97 for q = 2, and even more extreme for q = 1. High values of c mean that close similarities weighed much more heavily in determining the classification decision. The b estimates were found to be in line with the fact that the group in favor was larger than the group against. Finally, the weights were more stable (over the five runs) for the prototype model than for the exemplar model. The highest average weights in the prototype model were found for serial murder (.349), murder of one's whole family (.239), and rape of a child (.132) (the same for the two values of q).

Discussion

A latent structure with heterogeneous quantitative and abrupt differences between attitude groups appears to describe Belgian attitudes toward capital punishment. No evidence was found for location differences, and a CML model with location equivalence also seemed to fit the data in an absolute sense. Thus, the two attitude groups can be considered to be located along the same latent dimension, although at different point on that dimension, with a gap in between, and with a different variance.

According to criteria from Dimcat, the structure does not look very category-like: no qualitative differences and heterogeneous manifest categories. One feature that makes the latent structure look category-like, however, is the latent distribution of respondents. The bimodal distribution could be a reason for calling the two attitude groups "categories." If one prefers to do so, however, then one should realize that one is relying on a relative criterion, namely the size of the main effect of the group factor. All other aspects of the latent structure are dimension-like. Because there is no definitive way to tell how large the absolute difference should be, nor how large Cohen's d should be, and because the bimodality follows from the size of Cohen's d, the bimodality is at best a relative criterion.

The conclusion that the structure is dimension-like (apart from the abrupt difference) needs a word of caution. First, one can imagine that indicators could be used other than the nine we studied. For the personality disorder categories, the selection of indicators (the symptoms) had a strong basis in the DSM-IV. For the attitudes toward capital punishment, the choice was less evident. Second, ratings were again used, but now they were self-ratings instead of ratings by experts. Given that the indicator ratings and the classifications were made by the same persons, the conclusions may reflect the cognitive construction of attitudes by the respondents.

As to the relevance of the cognitive models for our data, there is no way to compare the goodness of fit of the exemplar-based and prototype models with the nonlinear mixed models that we estimated. The purpose and the structure of the models are totally different. In the Dimcat models, the purpose is to explain the indicator data on the basis of the classification (the manifest category); the classification is used as a predictor (with weights gk). In the cognitive models, the purpose is to explain the classification (the manifest category) on the basis of the indicator data. The structure of the cognitive models is also quite different--for example, because of the crucial role of similarities between exemplars or of exemplars with the prototype. There is no counterpart of this in the nonlinear mixed model family. Another important difference concerns the set-up of the study. Our respondents each classified only one stimulus--their own opinion; in the cognitive studies on category learning respondents from the same condition commonly classify the same set of test stimuli.

Furthermore, in our study two identical learning stimuli (two different respondents with the same response pattern) can be classified in two different manifest categories. Everything depends on what the respondent answers to the classification question. By contrast, a set-up with overlapping manifest categories has never been considered in the cognitive literature, although it is rather common in real life that now and then identical exemplars are labeled with different category labels.

The fact that the performance of the prototype model is about as good as that of the exemplar-based model is in line with a dimension-like model and with a linear separability between the manifest categories. It might be of interest to set up cognitive studies with manifest categories and features that are based on a latent continuum (with a dimension-like structure), the same for the two manifest categories under study, in order to investigate whether the superiority of the exemplar-based model generalizes to such structures. As discussed earlier, within-category structure has been neglected thus far in the cognitive literature. Our results could inspire studies to investigate the effect of the within-category structure on the validity of the exemplar model and the prototype model.

Application 3: Stages of Cognitive Development

Several stage models of cognitive development have been formulated. The saltus model (Wilson, 1989) was developed to overcome the limitations of some other stage models, described below.

First, the scalogram model (Guttman, 1944) has been applied to stage-like development (see Kofsky, 1966, for a critique). This model is deterministic, meaning that performance on different cognitive problems is perfectly determined by the stage reached. The model implies that the stages are homogeneous and linearly ordered.

Second, the multitask approach (K. W. Fischer, Pipp, & Bullock, 1984) was developed to relax the limitation that stages need to be homogeneous, in order to capture micro-sequences within the stages. K. W. Fischer et al. (1984) made an interesting distinction between first-order versus second-order discontinuity, a distinction similar to our distinction between simple versus complex qualitative differences. A first-order discontinuity is a sudden leap in performance (corresponding to qualitative differences, and reflected in the g parameter in Equation 10) on all relevant problems, one that is equal for all problems, whereas a second-order discontinuity is a discordant leap (corresponding to qualitative differences), one that is large for some problems but not for others (as reflected in the d parameter for the leap). K. W. Fischer et al. (1984) accepted the probabilistic link between stages and solving problems, but did not use the idea for formal modeling.

Third, the ordered latent class model (Croon, 1990) can be used to relax the deterministic nature of the model (and of the stages). It provides an explicit probabilistic link between stages and performance on problems. Within-stage homogeneity is still assumed, as in the scalogram model, albeit homogeneity of a stochastic kind. Although the classes (stages) are ordered, they can show qualitative differences, because problem locations can differ across classes. Indeed, the problem locations must meet certain inequality restrictions for the classes to be ordered (see also Hoijtink & Molenaar, 1997). The ordered latent class model is situated between Type 3 and Type 4 from Table 1, but for latent categories.

In contrast with these three models, the saltus model combines a probabilistic view of stages, the assumption of within-stage heterogeneity, and the possibility of modeling certain between-stage qualitative differences. The saltus model has a special type of parameter to distinguish between first-order and second-order discontinuities, the d-parameters. A dkks' ¹ 0 implies that, for stage k in comparison with stage k', performance on a subset s of problems differs from performance on the complementary subset of problems. Differences of this kind are qualitative, because differences between problem locations are not equivalent across stages. When no saltus parameters are required (the saltus parameters are zero) and the stage main effects (the gk's) suffice, the discontinuities are of the first-order type and quantitative. For a first-order discontinuity to occur, the distance between groups of persons on the latent dimension (which is also the proficiency scale) must be large--for example, without overlap. In sum, the saltus model lacks the limitations of the previous models, and it allows for the distinction between two kinds of discontinuities. Furthermore, the saltus model is a particular specification of a Type 1 model from Table 1.

Saltus parameters can capture how some problems become much easier relative to others as persons add to or reconceptualize their knowledge. Saltus parameters can also capture how some problems actually become harder as persons progress from an earlier stage to a more advanced stage, because they previously gave the correct answer but for the wrong reasons. Mislevy and Wilson (1996) showed how to use a mixture model approach to estimate the parameters of the saltus model for latent categories.

There are two ways to apply the saltus model. One way (in which it was originally developed) is to assume that class membership is a latent variable estimated from the data--we will call this the latent saltus model (Wilson, 1989). A second way is to assume that class membership is an observed variable that is given by, for example, segmentation or expert judgment--we will call this the manifest saltus model (G. Fischer, 1992; Wilson, 1993). The assumption of manifest class membership makes estimation of the model simpler, and it may make interpretation more straightforward, but it also involves certain limitations (Wilson, 1993).

A Rule Assessment Hierarchy Approach

Siegler (1981) developed modified Piagetian problems to test the cognitive developmental theory called rule assessment. The most important characteristic of the rule assessment approach is the specification of a series of increasingly powerful rules for solving problems. Following this theory, the behavior of a learner is dominated by the rule he or she is using at a particular level of development (a particular stage). The sequence of development through the rules is assumed to be fixed. The theory differs from a Piagetian approach in that (a) the rules do not need to be the same across concepts, and (b) the indicators are non-verbal choices to concrete problem-solving tasks.

Siegler (1981) investigated the rule assessment theory with three experimental problems involving proportionality: a balance-scale problem, a projection-of-shadows problem, and a probability problem. We will concentrate on the balance-scale problem. Using problem analysis and by reference to previous empirical and theoretical work, Siegler posited a series of rules that children might use in tackling the problem. He then developed a group of concrete problem types that were replicable, that had a well-defined set of variations, and for which there were a small number of possible solutions, so that a person could indicate a choice with minimal verbal interaction. These problems have the following properties: (a) The alternative solutions presented are exhaustive--no other answer makes sense; and (b) the rules predict not only which problems should be answered correctly but also which problems will provoke guesses and which will be answered incorrectly (for the latter, the rules also specify which alternative will be chosen).

For the balance scale, Siegler (1981) called weight the dominant dimension, because, in cases of conflict, young children were found to use the weight on each side of the fulcrum more frequently than the distance of the weights from the fulcrum. He called distance the subordinate dimension. A child using Rule I will not consider the distances of the weights from the fulcrum; to such a child, only the amounts of the weights matter. A child using Rule II will consider the distances of the weights from the fulcrum only when the weights are the same; otherwise the child will consider only the amounts of the weights. A child using Rule III is aware of his or her lack of understanding of the behavior of the balance scale when both weights and distances vary, and will use a cognitive strategy such as guessing or taking cues from the experimenter. A child using Rule IV will compute torques on either side of the balance beam and choose accordingly; this computation can be executed either by actual calculation or "by eye."

In order to distinguish between persons at these four rule levels, Siegler (1981) designed six problem types, of which we will present three: dominant problems (D), with unequal values on the dominant dimension (weight) and equal values on the subordinate dimension (distance); subordinate problems (S), with equal values on the dominant dimension (weight) and unequal values on the subordinate dimension (distance); and conflict-equal problems (CE), with unequal values on both dimensions but with the two sides balanced. The problems are illustrated in Figure 4.

The six problem types yield different profiles for the four rules, and this difference was the basis for Siegler's classification. For the three kinds of problems we described, the differentiation is as follows. Rule I differentiates between D problems and S problems, because D problems can be solved when exclusively the dominant dimension is used, but S problems cannot. Rule II differentiates between D or S problems and CE problems, because taking the subordinate dimension into account in the case of equality on the first dimension helps a person solve S problems but not CE problems. Rule III differentiates in a similar way, except that a person will guess on CE problems. Finally, Rule IV also will lead a person to guess on CE problems, because the combination of distance and weight on both sides yields a tie. The three problem types considered here permit assessment of three stages: Rule I children, Rule II children, and Rule III or IV children. Consecutive pairs of problem types permit the distinction between adjacent rule levels: D versus S (Rule I versus higher), and S versus CE (Rule II versus higher). The three stages are differentiated on the basis of the hypothesized distances in difficulty between D, S, and CE problems. Rule I children should show a large distance between D on the one hand, and S and CE on the other hand (D----S-CE), Rule II children should show a large distance between D and S on the one hand, and CE on the other hand (D-S----CE), and finally, Rule III and Rule IV children should show a smaller distance between D and S on the one hand, and CE on the other hand (D-S-CE).

Method

Participants. The data (generously shared and described more fully by van Maanen, Been, & Sijtsma, 1989) consisted of responses to Siegler-type balance beam problems by 484 students in grade seven or eight.

Indicators. The presentation of analyses will be restricted to a comparison between two kinds of problems: D and S. Five D problems and five S problems were considered. Results were similar for comparisons between the other pair of consecutive problems (S and CE) and among all three problems (D, S, and CE).

Manifest categories. Students who scored 0 to 5 were assigned to the first stage (Rule I level), and those who scored 6 to 10 were assigned to the second stage (Rule II level). This method of defining manifest categories is an example of segmentation. More sophisticated methods of defining categories (e.g., using latent saltus class probabilities) can also be applied (Wilson, 1989).

Results

For the analyses we again estimated the Dimcat models, relying on the sequential strategy that was explained earlier. To begin, we estimated the QUANT1&2-HET model, the QUAL2-HET model, and the QUAN-HET model. In all three cases we were confronted with estimation problems: extreme parameter estimates with extreme standard errors, and negative variances. All estimated models with either indicator-dependent discriminations, different variances for the two groups, or both, gave results that looked degenerated in one way or another. Therefore, we restricted the models to have equal indicator discriminations and a variance of one. The overall discrimination instead of the variance became a parameter. The crucial aspect to test was whether there were differences between the two groups with respect to the location of the problems, and if so, whether these differences could be explained with a saltus parameter for one type of tasks (in our case, increasing the distance between the S and the D problems).

Therefore, we tested three models: (a) a QUAL2-HET model with one task-independent overall degree of discrimination and with a person variance of one in both groups, (a) a QUAN-HET model with the same restrictions, and (c) a saltus model with a dS for the expected jump for the problems requiring that the subordinate dimension be used. The corresponding deviance values were 2,441.9, 2,729.9, and 2,448.5, respectively. The corresponding AIC and BIC values were 2,483.9 and 2,571.7 (QUAL2-HET), 2,753.9 and 2,804.1 (QUAN-HET), and 2,474.5 and 2,528.9 (saltus model), respectively. The likelihood-ratio test comparing the restricted QUAL2-HET with the restricted QUAN-HET was statistically significant: (c2[9] = 288.0, p < .001), but when the saltus model was compared with the QUAL2-HET model, the difference in goodness of fit was not statistically significant: (c2[8] = 6.6, p > .10). The saltus model seemed to capture all qualitative differences between the two groups. It was also the best model with respect to the AIC and BIC. The dS indicated the size of the jump of the S items for the Rule II group. The estimated jump from the Rule I to the Rule II level was -4.856 (.337), which was statistically significant given its standard error. The S problems were drastically easier at the Rule II level than at the Rule I level. No other differences were needed to approach the restricted QUAL2-HET model, so we concluded that the D tasks were equally easy for both groups.

After the assessment of between-category differences, we tested for within-category differences, in line with the vertical axis of Dimcat. The saltus model with homogeneity yielded a deviance of 2,507.5, with AIC and BIC values of 2,531.5 and 2,609.3, respectively, and a statistically significant (conservative) likelihood-ratio test when compared with the corresponding model with heterogeneity (c2[1] = 59, p < .001). The goodness of fit could largely be improved, however, when the discrimination for the Rule I level was fixed to zero (implying homogeneity in one group). The resulting deviance of 2,428.6 was also better than that of the corresponding full heterogeneity model. The heterogeneous model was in fact the best model of all those that could be estimated with good results. The AIC and BIC values were 2,454.6 and 2,509.0, respectively. Because the overall discrimination for the Rule II group was statistically significant, 1.551 (.116), we concluded that there was homogeneity at the Rule I level, and heterogeneity at the Rule II level. This finding was interesting, because it was the first time among our three applications that a manifest category turned out to be homogeneous.

We replicated the comparisons above for the S and CE problems, and also for the D, S, and CE problems (in the latter case, using a segmentation that yielded three manifest categories when all three kinds of problems were analyzed). The results for D and S problems replicated the above results, meaning that the difference was again qualitative, and that again the manifest saltus model could explain this qualitative difference. For S and CE, one saltus parameter was again needed. To fit the data from the D, S, and CE problems, two saltus parameters were needed, one for the difference between D and S, and one for the difference between D and CE.

Discussion

The findings show that development cannot be fully described by quantitative differences--there is a strong effect of student group (i.e., stage) on problem locations. This makes a Type 1 model with a saltus restriction the best model for the kind of development studied in this application. The latter result is not trivial, given that nothing in the formal way we performed the segmentation favored latent qualitative differences.

Another interesting finding is that the stages (or rule assessment classes) as defined by our segmentation rule are heterogeneous at the manifest level but not necessarily also at the latent level. The Rule II stage seems to exhibit the micro-sequence phenomenon noted by K. W. Fischer et al. (1984), but the Rule I stage does not. A speculation to explain this result is that each stage shows the so-called micro-sequence phenomenon, implying within-stage quantitative development until a homogeneous end-state within the stage is reached, followed by a qualitative jump to the next stage, where again within-stage quantitative development occurs. The results can be explained by assuming that the Rule I students have reached the end-state of the Rule I level and that the other students are at different points of their quantitative development with respect to Rule II.

General Discussion

The first important result of the three applications is that all but one of the manifest categories that were defined on the basis of expert judgment or segmentation are heterogeneous, not just at the manifest level but also at the latent level. In principle, heterogeneity at the manifest level can originate from stochastic processes based on a homogeneous latent structure, with all persons in a manifest category being concentrated at one point in the latent structure. This is the common case in the cognitive studies on categories and concepts: no latent continuum but only a manifest continuum (no internal category structure, no correlated features). In the present applications, by contrast, heterogeneity also occurred on the latent level, as indicated by differences in person locations. The only exception is the Rule I group in the developmental application. Thus, what some would consider categories on the basis of expert judgment or segmentation would seem to be rather heterogeneous entities. This result feeds back into the cognitive study of categories and concepts, and it is consistent with a need for giving more attention to within-category structure, as expressed by Murphy (2002).

The second important result is that heterogeneity, even when captured by a descriptive dimension, does not necessarily imply that the manifest categories are only quantitatively different. In the first and third applications, there was clear evidence for qualitative differences. Thinking of manifest categories as being dimension-like while still reflecting qualitative differences may seem contradictory, but as we have shown qualitative differences and heterogeneity relate to different features of what it means to be dimension-like. In this situation, the use of the saltus parameters gives us a way to describe qualitative differences for a dimension-like structure.

The third important result is that, when the differences are quantitative, the abruptness of the difference can be investigated at the latent level, so that one need not rely on the distribution of manifest variables, such as sum scores. In particular, in Application 2, where quantitative differences were found, the manifest distribution and the latent distribution were both clearly bimodal, but this correspondence is not guaranteed, as shown by Grayson (1987). The fourth important result is that qualitative differences between manifest categories can sometimes be captured in a simple way. This is either because the qualitative differences are only minor (as in Application 1) or because a simple principle applies (as in Application 3). The latter is of special interest, because it allows one to test a theory of qualitative differences. In Application 3, the theory is Piaget's theory of cognitive development.

It is remarkable that in all three applications discrimination equivalence was realized, whereas in two of the three applications location equivalence was not realized. Checking discrimination equivalence is a common practice when one compares factor loadings (to check factorial equivalence in the limited sense), whereas checking location equivalence is much less common, although more recently developed factor models and SEM methods can be utilized for this purpose (Meredith, 1993; Reise et al., 1993), as can IRT. Given that these methods are not always used, however, one can expect that, in some cases, the lack of location equivalence goes unnoticed.

It is of interest to note that in our applications a large variety of latent structures were found, often with strong evidence against alternative structures. In all cases, we started from a rather simple manifest categorical variable, either based on expert judgment or on segmentation. The implication of our findings is that manifest categories can differ a lot in their underlying structure. Without an investigation such as we conducted, one would perhaps not be aware of the quite different underlying status of the categorical variable one is using.

The differences between the different types of structure we found often turned out to be quite drastic, in all cases when within-category homogeneity versus heterogeneity was considered, and in the third application also with respect to qualitative differences. Looked upon from this practical viewpoint, differentiating between the different types of structures was often not a problem. The issue of differentiating power, however, remains an important one.

Our approach hinges on the indicators that are selected, on the method of observation (e.g., ratings), and on the alternative manifest categories. For the study of personality disorders, the selection of the indicators was rather self-evident, given that both the indicators (symptoms) and the manifest categories (diagnoses) were based on the DSM-IV. For the study of attitudes, several alternatives were available. We could have referred to the circumstances of the crimes and to characteristics of the criminals, and one cannot tell whether these would have yielded the same results. For the study of cognitive development, the indicators certainly make sense, given that they are well-known tasks from this domain of study, but alternative tasks have been used. Perhaps the most severe limitation is that in the first two applications ratings were used, so that a cognitive bias may have affected the results. The conclusions must therefore be stated in terms of the manifest categories as used by raters. The situation is different for the developmental application, in which objective data were used. The choice of alternative categories for a reference category is also an important issue. In some cases the choice is evident, as for application on attitudes toward capital punishment and for the developmental application. However, for the personality disorder study, the category of people without any personality disorder would be a meaningful alternative category. As noted earlier, the true nature of a category does not depend on the alternative categories it is compared with, but the alternative categories are an important methodological feature that restricts what one can or cannot find. For example, we believe that before one may come to a well-founded conclusion on personality disorders, it seem worth to compare a given disorder with alternative disorders and with normality.

The link we made with the cognitive study of concepts and categories can be considered as a mutually inspiring one. Our applications point to the need to include within-category heterogeneity and structure in studies on the cognitive representation of categories. In principle, one can analyze an element-by-feature matrix with elements from different categories, in the same way we did. On the other hand, the cognitive models are a good basis to investigate the way raters (experts and lay persons) come to a category-like decision on other persons or themselves. The cognitive models should be tried out more for heterogeneous manifest categories, given that our results differ from those obtained with stimuli from categories without an internal structure (without correlated features).

We believe the approach we have formulated and applied is rather general and workable. It completes several other approaches, which can be deemed more specialized in one or another aspect of the concept of category-likeness. For example, the taxometric approach is specialized in detecting discreteness between categories along a dimension, and it concentrates on pairs of categories. Another example are methods to investigate factorial equivalence in its limited sense (checking only the factor loadings), which concentrate on discrimination equivalence, one aspect of qualitative versus quantitative differences. We do not claim that our framework is all-encompassing, but we believe that there is not just one feature that is distinctive for category-likeness, and that the meta-category of category-likeness is itself polythetic, as most categories are. It was our aim to leave freedom for such a polythetic view of category-likeness, and that room was needed to explain our data.


References

Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.

Aikake, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki, (Eds.), Second international symposium on information theory (pp. 267-281). Budapest, Hungary: Akademiai Kiado.

American Psychiatric Association. (1994). Diagnostic and Statistical Manual of Mental Disorders (4th ed.). Washington, DC: Author.

Andrich, D. (1978). A rating scale formulation for ordered response categories. Psychometrika, 43, 567-573.

Beauchaine, T. P., & Beauchaine, R. J. III (2002). A comparison of maximum covariance and k-means cluster analysis in classifying cases into known taxon groups. Psychological Methods, 7, 245-261.

Beauchaine, T. P., & Waters, E. (2003). Pseudotaxonicity in MAMBAC and MAXCOV analyses of rating-scale data: Turning continua into classes by manipulating observer's expectations. Psychological Methods, 8, 3-15.

Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-424). Reading, MA: Addison-Wesley.

Bock, R.D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261-280.

Broughton, R. (1984). A prototype strategy for construction of personality scales. Journal of Personality and Social Psychology, 47, 1334-1346.

Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456-466.

Cantor, N., Smith, E. E., French, R. D., & Mezzich, J. (1980). Psychiatric diagnosis as prototype categorization. Journal of Abnormal Psychology, 89, 181-193.

Carson, R. C. (1991). Dilemmas in the pathway of the DSM-IV. Journal of Abnormal Psychology, 100, 302-307.

Clarkin, J. F., Widiger, T. A., Frances, A., Hurt, S. W., & Gilmore, M. (1983). Prototypic typology and the borderline personality disorder. Journal of Abnormal Psychology, 92, 263-275.

Coyne, J. C. (1994). Self-reported distress: Analog or ersatz depression? Psychological Bulletin, 116, 29-45.

Croon, M. (1990). Latent class analysis with ordered latent classes. British Journal of Mathematical and Statistical Psychology, 43, 171-192.

de Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal of Educational Statistics, 11, 183-196.

Devlin, J. T. Gonnerman, L. M., Andersen, E. S., & Seidenberg, M. S. (1998). Category-specific semantic deficits in focal and widespread brain damage: A computational account. Journal of Cognitive Neuroscience, 1, 77-94.

Eagly, A. H., & Chaiken, S. (1993). The psychology of attitudes. Orlando, FL: Harcourt Brace.

Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.

Endler, N. S., & Kocovski, N. L. (2002). Personality disorders at the crossroads. Journal of Personality Disorders, 16, 487-502.

Fischer, G. (1992). The 'saltus model' revisited. Methodika, 6, 87-98.

Fischer, K. W., Pipp, S. L., & Bullock, D. (1984). Detecting discontinuities in development: Methods and measurement. In R. N. Emde & R. Harmon (Eds.), Continuities and discontinuities in development. Norwood, NJ: Ablex.

Flett, G. L., Vredenburg, K., & Krames, L. (1997). The continuity of depression in clinical and nonclinical samples. Psychological Bulletin, 121, 395-416.

Frances, A., Widiger, T., & Fyer, M. R. (1990). The influence of classification methods on comorbidity. In J. D. Maser & C. R. Cloninger (Eds.), Comorbidity of mood and anxiety disorders (pp. 41-59). Washington, DC: American Psychiatric Press.

Gangestad, S. W., Bailey, J. M., & Martin, N. G. (2000). Taxometric analyses of sexual orientation and gender identity. Journal of Personality and Social Psychology, 78, 1109-1121.

Gangestad, S., & Snyder, M. (1985). "To carve nature at its joints": On the existence of discrete classes in personality. Psychological Review, 92, 317-349.

Gangestad, S. W., & Snyder, M. (1991). Taxonomic analysis redux: Some statistical considerations for testing a latent class model. Journal of Personality and Social Psychology, 61, 141-146.

Geeraerts, D. & Grondelaers, S. (1994). Structuring of word meaning: An overview. In D. A. Cruse, F. Hundsnurscher, M. Job, & P. R. Lutzeier (Eds.), Lexicology: An international handbook on the nature and structure of words and vocabularies (Vol. 1, pp. 304-318). Berlin, Germany: De Gruyter.

Glas, C. A. W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, 53, 525-546.

Golden, R. R., & Meehl, P. E. (1979). Detection of the schizoid taxon with MMPI indicators. Journal of Abnormal Psychology, 88, 217-233.

Goodman, L. A. (1972). A general model for the analysis of surveys. American Journal of Sociology, 77, 1035-1086.

Grayson, D. A. (1987). Can categorical and continuous views of psychiatric illness be distinguished? British Journal of Psychiatry, 151, 355-361.

Green, B. F. (1952). Latent structure analysis and its relation to factor analysis. Journal of the American Statistical Association, 47, 71-76.

Guttman, L. A. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139-150.

Hampton, J. A. (1993). Prototype models of concept representation. In I. Van Mechelen, J. Hampton, R. Michalski, & P. Theuns (Eds.), Categories and concepts: Theoretical views and inductive data analysis (pp. 67-95). New York: Academic.

Hampton, J. A. (1995). Testing the prototype theory of concepts. Journal of Memory and Language, 34, 686-708.

Haslam, N. (1997). Evidence that male sexual orientation is a matter of degree. Journal of Personality and Social Psychology, 73, 862-870.

Haslam, N. (2002). Natural kinds, practical kinds, and psychiatric categories. Psycoloquy, 13(001).

Haslam, N., & Beck, A. T. (1994). Subtyping major depression: A taxometric analysis. Journal of Abnormal Psychology, 103, 686-692.

Haslam, N., & Cleland, C. (2002). Taxometric analysis of fuzzy categories: A Monte Carlo study. Psychological Reports, 90, 401-404.

Haslam, N., & Ernst, D. (2002). Essentialist beliefs about mental disorders. Journal of Social and Clinical Psychology, 21, 628-644.

Haslam, N., & Kim, H. C. (2002). Categories and continua: A review of taxometric research. Genetic, Social, and General Psychology Monographs, 128, 271-320.

Hoijtink, H, & Molenaar, I. W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62, 171-189.

Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.

Horowitz, L. M., Post, D. L., French, R. D., Wallis, K. D., & Siegelman, E. Y. (1981). The prototype as a contruct in abnormal psychology: 2. Clarifying disagreement in psychiatric judgments. Journal of Abnormal Psychology, 90, 575-585.

Horowitz, L. M., Wright, J. C., Lowenstein, E., & Parad, H. W. (1981). The prototype as a contruct in abnormal psychology: 1. A method for deriving prototypes. Journal of Abnormal Psychology, 90, 568-574.

Janssen, R., De Boeck, P., Viaene, M., & Vallaeys, L. (1999). Simple mental addition in children with and without mild mental retardation. Journal of Experimental Child Psychology, 74, 261-281.

Kass, F., Skodol, A. E., Charles, E., Spitzer, R., & Williams, J. B. W. (1985). Scaled ratings of DSM-III personality disorders. American Journal of Psychiatry, 142, 627-630.

Kelderman, H., & Steen, R. (1993). LOGIMO [computer software]. Groningen, The Netherlands: ProGAMMA.

Kiers, H. A. L. (1990). SCA: A program for simultaneous analysis of variables measured in two or more populations [Computer software and manual]. Groningen, The Netherlands: ProGAMMA.

Kim, N. S., & Ahn, W. K. (2002). Clinical psychologists' theory-based representations of mental disorders predict their diagnostic reasoning and memory. Journal of Experimental Psychology: General, 131, 451-476.

Kofsky, E. (1966). A scalogram study of classificatory development. Child Development, 37, 191-204.

Komatsu, L. U. (1992). Recent views of conceptual structure. Psychological Bulletin, 112, 500-526.

Korfine, L., & Lenzenweger, M. F. (1995). The taxonicity of schizotypy: A replication. Journal of Abnormal Psychology, 104, 26-31.

Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press.

Lazarsfeld, P. F. (1950). The interpretation and computation of some latent structures. In S. A. Stouffer, L. A. Gottman, A. Suchma, & P. F. Lazarsfeld (Eds.), Measurement and prediction (pp. 362-412). Princeton, NJ: Princeton Univsersity Press.

Lenzenweger, M. F. (1999). Deeper into the schizotypy taxon: On the robust nature of maximum covariance analysis. Journal of Abnormal Psychology, 108, 182-187.

Lenzenweger, M. F., & Korfine, L. (1992). Confirming the latent structure and base rate of schizotypy: A taxometric analysis. Journal of Abnormal Psychology, 101, 567-571.

Lilienfeld, S. O., & Marino, L. (1995). Mental disorder as a Roschian concept: A critique of Wakefield's harmful dysfunction analysis. Journal of Abnormal Psychology, 104, 411-420.

Livesley, W. J., Jackson, D. N., & Schroeder, M. L. (1992). Factorial structure of traits delineating personality disorders in clinical and general population samples. Journal of Abnormal Psychology, 101, 432-440.

Livesley, W. J., & Schroeder, M. L. (1990). Continua of personality disorder: The DSM-III-R Cluster A diagnoses. The Journal of Nervous and Mental Disease, 178, 627-635.

Livesley, W. J., Schroeder, M. L, Jackson, D. N., & Jang, K. L. (1994). Categorical distinctions in the study of personality disorders: Implications for classification. Journal of Abnormal Psychology, 103, 6-17.

Maesschalck, C. (1998). A psychometric modelling framework for testing categorical and/or continuous aspects of the borderline, histrionic, and antisocial personality disorders. Unpublished doctoral dissertation, University of Leuven, Belgium.

Malt, B. C. (1993). Concept structure and catgeory boundaries. In G. V. Nakamura, D. L. Medin, R. Taraban (Eds.), Categorization by humans and machines. The psychology of learning and motivation: Advances in research and theory (Vol. 29, pp. 363-390). San Diego, CA: Academic.

Malt, B. C., & Smith, E. E. (1984). Correlated properties in natural categories. Journal of Verbal Learning & Verbal Behavior, 23, 250-238.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

McKinley, R. L., & Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods and Instrumentation, 15, 389-390.

McCulloch, C. E., & Searle, S. R. (2001). Generalized, linear, and mixed models. New York: Wiley.

McCutcheon, A. L. (1987). Latent class analysis. Newbury Park, NJ: Sage.

Medin, D. L. (1989). Concepts and conceptual structure. American Psychologist, 44, 1469-1481.

Medin, D. L., & Coley, J. D. (1998). Concepts and categorization. In J. Hochberg & J. E. Cutting (Eds.), Perception and cognition at century's end: Handbook of perception and cognition (pp. 403-439). San Diego, CA: Academic.

Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238.

Meehl, P. E. (1973). MAXCOV-HITMAX: A taxonomic search method for loose genetic syndromes. In Psychodiagnosis: Selected papers (pp. 200-224). Minneapolis, MN: University of Minnesota Press.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806-834.

Meehl, P. E. (1979). A funny thing happened to us on the way to the latent entities. Journal of Personality Assessment, 43, 563-581.

Meehl, P. E. (1995). Bootstraps taxometrics: Solving the classification problem in psychopathology. American Psychologist, 50, 266-275.

Meehl, P. E. (1999). Clarifications about the taxometric method. Journal of Applied and Preventive Psychology, 8, 165-174.

Meehl, P. E., & Golden, R. R. (1982). Taxometric methods. In P. Kendall & J. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 127-181). New York: Wiley.

Meehl, P. E., & Yonce, L. J. (1994). Taxometric analysis: I. Detecting taxonicity with two quantitative indicators using means above and below a sliding cut (MAMBAC procedure). Psychological Reports, 74, 1059- 1274.

Mellenbergh, G. J. (1982). Contingency-table models for assessing item bias. Journal of Educational Statistics, 7, 105-108.

Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127-143.

Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.

Millsap, R. E., & Everson, M. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334.

Miller, M. B. (1996). Limitations of Meehl's MAXCOV-HITMAX procedure. American Psychologist, 51, 554-556.

Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359-381.

Mislevy, R. J., & Bock, R. D. (1989). PC-BILOG 3: Item analysis and test scoring with binary logistic models [Computer software]. Mooresville, IN: Scientific Software.

Mislevy, R. J., & Wilson, M. (1996). Marginal maximum likelihood estimation for a psychometric model of discontinuous development. Psychometrika, 61, 41-71.

Molenaar, I. (1995). Estimation of item parameters. In G. Fischer & I. Molenaar (Eds.), Rasch models: Foundations, recent developments and applications (pp. 39-51). New York: Springer.

Morey, L. C. (1991). Classification of mental disorder as a collection of hypothetical constructs. Journal of Abnormal Psychology, 100, 289-293.

Murphy, G. L. (2002). The big book of concepts. Boston: MIT Press.

Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psychological Review, 92, 289-316.

Murphy, G. L., & Lassaline, M. E. (1997). Hierarchical structure in concepts and the basic level of categorization. In K. Lamberts & D. Shanks (Eds.), Knowledge, concepts and categories (pp 93-131). London: UCL Press.

Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.

Nestadt, G., Romanoski, A. J., Brown, C. H., Chahal, R., Merchant, A., Folstein, M. F., Gruenberg, E. M., & McHugh, P. R. (1991). DSM-III compulsive personality disorder: An epidemiological survey. Psychological Medicine, 21, 461-471.

Nosofsky, R. M. (1992). Exemplars, prototypes and similarity rules. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning theory to connectionist theory: Essays in honor of W. K. Estes (Vol. 1, pp 149-168). Hillsdale, NJ: Erlbaum.

Nosofsky, R. M., & Palmeri, J. J. (1997). An exemplar based random walk model of speeded classification. Psychological Review, 104, 266-300.

Pirolli, P., & Wilson, M. (1998). A theory of the measurement of knowledge content, access, and learning. Psychological Review, 105, 58-82.

Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114, 552-566.

Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (in press). A nonlinear mixed model framework for item response theory. Psychological Methods.

Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104, 192-233.

Rosch, E. (1978). Principles of categorization. In E. Rosch & B. B. Lloyd (Eds), Cognition & categorization (pp. 27-48). Hillsdale, NJ: Erlbaum.

Rosch, E., Mervis, C. B., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382-439.

Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271-282.

Rost, J. (1991). A logistic mixture distribution model for polychotomous item responses. British Journal of Mathematical and Statistical Psychology, 44, 75-92.

Ruscio, A. M., Borkovec, T. D., & Ruscio, J. (2001). A taxometric investigation of the latent structure of worry. Journal of Abnormal Psychology, 110, 413-422.

Ruscio, A. M., & Ruscio, J. (2002). The latent structure of analogue depression: Should the Beck Depression Inventory be used to classify groups? Psychological Assessment, 14, 135-145.

Ruscio, J. (2000). Taxometric analysis with dichotomous indicators: The modified MAXCOV procedure and a case removal consistency test. Psychological Reports, 87, 929-939.

Ruscio, J., & Ruscio, A. M. (2000). Informing the continuity controversy: A taxometric analysis of depression. Journal of Abnormal Psychology, 109, 473-487.

Ruscio, J., & Ruscio, A. M. (2002). A structure-based approach to psychological assessment: Matching measurement models to latent structure. Assessment, 9, 4-16.

Sanislow, C. A., Grilo, C. M., Morey, L. C., Bender, D. S., Skodol, A. E., Gunderson, J. G., Shea, M. T., Stout, R. D., Zanarini, M. C., & McGlashan, T. H. (2002). Confirmatory factor analysis of DSM-IV criteria for borderline personality disorder: Findings from the Collaborative Longitudinal Personality Disorder Study. American Journal of Psychiatry, 159, 284-290.

SAS Institute, Inc. (1999). SAS online doc (Version 8) [software manual on CD-ROM]. Cary, NC: SAS Institute, Inc.

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

Siegler, R. S. (1981). Developmental sequences within and between concepts. Monographs of the Society for Research in Child Development, 46, 1-4.

Skilling, T. A., Quincey, V. L., & Craig, W. M. (2001). Evidence of a taxon underlying serious antisocial behavior in boys. Criminal Justice and Behavior, 28, 450-470.

Smith, E. E. (1995). Concepts and categorization. In E. Smith & D. Osherson (Eds.), Thinking: An invitation to cognitive science (2nd ed., Vol. 3, pp. 3-33). Cambridge, MA: MIT Press.

Smith, E. E. & Medin, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard University Press.

Smits, T., Storms, G., Rosseel, Y., & De Boeck, P. (2002). Fruits and vegetables categorized: An application of the generalized context model. Psychonomic Bulletin & Review, 9, 836-844.

Sörbom, D. (1974). A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology, 28, 229-239.

Storms, G. & De Boeck, P. (1997). Formal models for intra-categorical structure that can be used for data analysis. In K. Lamberts & D. Shanks (Eds.), Knowledge, concepts, and categories (pp. 439-459). London: UCL Press.

Storms, G., De Boeck, P., Hampton, J., & Van Mechelen, I. (1999). Predicting conjunction typicalities by component typicalities. Psychonomic Bulletin & Review, 6, 677-684.

Storms, G., De Boeck, & Ruts, W. (2000). Prototype and exemplar based information in natural language categories. Journal of Memory & Language, 42, 51-73.

Strube, M. J. (1989). Evidence for the type in Type A behavior: A taxometric analysis. Journal of Personality and Social Psychology, 56, 972-987.

Sutcliffe, J. P. (1993). Concept, class and category in the tradition of Aristotle. In I. Van Mechelen, J. Hampton, R. S. Michalski, & P. Theuns (Eds.), Categories and concepts: Theoretical views and inductive data analysis (pp 35- 65). London: Academic.

Tajfel, H. (1981). Human groups and social categories: Studies in social psychology. Cambridge, MA: Harvard University Press.

Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408.

Taylor, J.R. (1995) Linguistic categorization: Prototypes in linguistic theory (2nd ed). Oxford, England: Oxford University Press.

Thissen, D. (1997). MULTILOG [Computer software]. Mooresville, IN: Scientific Software.

Trull, T. J., Widiger, T. A., & Guthrie, P. (1990). Categorical versus dimensional status of borderline personality disorder. Journal of Abnormal Psychology, 99, 40-48.

Tyler, L. K., Moss, H. E., Dunant-Peatfield, M. R., & Levy, J. P. (2000). Conceptual structure and the structure of concepts: A distributed account of category-specific deficits. Brain and Language, 75, 195-231.

Tyrer, P., & Alexander, J. (1979). Classification of personality disorders. British Journal of Psychiatry, 135, 163-167.

van Maanen, L., Been, P., & Sijtsma, K. (1989). The linear logistic test model and heterogeneity of cognitive strategies. In E. E. Roskam (Ed.), Mathematical psychology in progress (pp.267-287). New York: Springer-Verlag.

Verbeke, G, & Molenberghs, G. (2000). Linear mixed models for longitudinal data. New York: Springer.

Verhelst, N., & Glas, C. A. W. (1995). The one parameter logistic model. In G. Fischer & I. Molenaar (Eds.), Rasch models: Foundations, recent developments and applications (pp. 215-237). New York: Springer-Verlag.

Verhelst, N. D., Glas, C. A. W., & Verstralen, H. H. F. M. (1994). OPLM: Computer program and manual [Computer software]. Arnhem, The Netherlands: CITO.

Vredenburg, K., Flett, G. L., & Krames, L. (1993). Analogue versus clinical depression: A critical reappraisal. Psychological Bulletin, 113, 327-344.

Waller, N. G., & Meehl, P. E. (1998). Multivariate taxometric procedures: Distinguishing types from continua. London: Sage.

Waller, N. G., Putnam, F. W., & Carlson, E. B. (1996). Types of dissociation and dissociative types: A taxometric analysis of dissociative experiences. Psychological Methods, 1, 300-321.

Waller, N. G., & Ross, C. A. (1997). The prevalence and biometric structure of pathological dissociation in the general population: Taxometric and behavior genetic findings. Journal of Abnormal Psychology, 106, 499-510.

Widiger, T. A. (1992). Categorical versus dimensional classification: Implications from and for research. Journal of Personality Disorders, 6, 287-300.

Widiger, T. A., & Shea, T. (1991). Differentiation of Axis I and Axis II disorders. Journal of Abnormal Psychology, 100, 399-406.

Wilder, D. A. (1981). Perceiving persons as a group: Categorization and intergroup relations. In D. L. Hamilton (Ed.), Cognitive processes and intergroup behavior (213-257). Hillsdale, NJ: Erlbaum.

Wilson, M. (1984). A psychometric model of hierarchical development. Unpublished doctoral dissertation, University of Chicago, Chicago, IL.

Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development. Psychological Bulletin, 105, 276-289.

Wilson, M. (1993). The "saltus model" misunderstood. Methodika, 7, 1-4.

Wittgenstein, L. (1953). Philosophical investigations. Oxford, England: Blackwell.

Wu, M. L., Adams, R. J., & Wilson, M. (1998). ACER Conquest: Generalized item response modelling software [Computer software]. Melbourne, Australia: Australian Council for Educational Research.

Zimmerman, M., & Coryell, W. H. (1990). DSM-III personality disorder dimensions. The Journal of Nervous and Mental Disease, 178, 686-692.