The Divergent Autoencoder (DIVA) Account of Human Category Learning
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The Divergent Autoencoder (DIVA) Account of Human Category Learning Kenneth J. Kurtz (kkurtz@binghamton.edu) Department of Psychology, PO Box 6000 Binghamton University (State University of New York) Binghamton, NY 13902 USA Abstract to n-way classification tasks or to cases where an A/B/neither classification response is required. The DIVA network model is introduced based on the novel The innovation unique to DIVA is a method for computational principle of divergent autoencoding. DIVA converting any supervised learning problem into a form produces excellent fits to classic data sets from Shepard, addressable by autoassociative learning. Traditionally, an Hovland & Jenkins (1961) and Medin & Schafffer (1978). DIVA is also resistant to catastrophic interference. Such results autoassociative system is only capable of categorization to have not previously been demonstrated by a model that is not the extent that it picks out the statistical structure of a committed to both localist coding of exemplars (or exceptions) training set in a manner like clustering. This process and the use of an explicit selective attention mechanism. suggests category formation in the sense that if a training environment is naturally organized in terms of sets of self- Introduction similar cases, the autoassociative learning system will The problem of supervised classification learning is of extract that structure. Similar inputs are similarly fundamental importance in both cognitive psychology and represented and subsequent generalization behavior reflects machine learning. Models of many kinds have been put these attractors. However, such a system has no capacity to forward offering powerful solutions. This paper presents a acquire a classification scheme based on supervision that novel approach to supervised learning that shows crosscuts the correlational structure of the training set. considerable promise as an account of human category The computational principle of divergent autoencoding learning and as a technology for applied problems. The offers an elegant solution to this problem using an DIVergent Autoencoder (DIVA) network model takes as a autoassociative learning channel for each output class in an starting point the back-propagation learning algorithm n-way classification problem. For a standard A/B (Rumelhart, Hinton, & Williams, 1986) and the classification learning task, one output channel is designated reconstructive autoencoder architecture (McClelland & for reconstructing patterns labeled A by the teaching signal Rumelhart, 1986). Autoassociative systems are powerful and the other is assigned to patterns labeled B. No output learning devices that have been shown to implement units are explicitly assigned to code for the categories principle component analysis and avoid local minima (Baldi themselves. The correct classification choice is used to & Hornik, 1989); to be extensible to non-linear function select the channel on which to apply the targets (which are approximation (Japkowicz, Hanson, & Gluck, 2000); and to the same as the input). The architecture consists of an input perform compression (e.g., DeMers & Cottrell, 1993). layer, a shared hidden layer, and a set of autoassociative DIVA also draws on a design principle of multi-task output banks. The pattern of connectivity is full and learning mediated through a common hidden layer that been feedforward; all weight update is by back-propagation. articulated in the ORACL model of concept formation (Kurtz, 1997; Kurtz & Smith, in preparation) as well as in Autoencoder A Autoencoder B the literature on neural computation (Caruana, 1995; Intrator & Edelman, 1997; Gluck & Myers, 1993). Japkowicz (2001) developed an approach for applying unsupervised learning to binary classification that is close in spirit to the present proposal. An autoencoder is trained only on the positive instances of a category. Subsequently, inputs can be tested for membership in the category by evaluating Shared the reconstructive success of the autoencoder. A new Hidden Units example that is consistent with the set of learned category examples will show minimal error while an example that is inconsistent will show a higher level of output error suggesting an inability to construe the input as a category Input member. Japkowicz (2001) demonstrated good results on Features binary classification problems by training a model to recognize examples of one class. Successful reconstructions are classified as members and rejections are assumed to Figure 1: Architecture of the DIVA network. belong to the other category. The approach is not extensible 1214
The recoding of input information at the hidden layer of roughly equivalent performance of FR, Type III, and Type DIVA is shared by the set of channels, each of which is V; followed by Type VI (though see Kurtz, under review). dedicated to learning to reconstruct the members of one The relative ease of learning the six types was tested class. This is different from forming compressed across six random initializations of a (3-2-3x2) DIVA representations in traditional autoencoding and also differs network (note: this refers to a DIVA network with three from learning a recoding to achieve a linearly separable input units, two hidden units, and two autoassociative output boundary between classes in a standard multi-layer channels with three units each). The number of epochs to perceptron architecture. DIVA will tend to produce different criterion was determined based on total sum-squared error internal representations of an item depending upon the other across the eight training patterns. Error was recorded only same-category members included in the training set and also on the target-active (correct) channel. Two stopping points depending upon the contrasting categories being learned at (SSE = .2; SSE = .1) were applied in accord with the strict the same time. criteria used in the behavioral study. The DIVA network is tested by presenting an input which For the data reported in Table 1, learning rate of 0.25 and is processed along each channel in parallel. A classification initial weight range of zero +/- 0.5 were used. However, response can be based on the amount of reconstructive error qualitative performance was found to be consistent across along a particular channel (i.e., testing the hypothesis that variations in learning rate and the range of initial weight the example is a member of a particular category) as in randomization. The only critical parameter is the number of Japkowicz (2001). In standard n-way classification tasks, hidden units. A simple systematic basis is used to determine the response is determined by selecting the class the number of hidden units for a task. The smallest number corresponding to the channel with the best reconstruction, of hidden units that can successfully reach asymptotic i.e., the lowest sum-squared error. A version of Luce’s minimization of error across the manipulated learning (1963) choice rule is used to generate response probabilities conditions is the number that are used. This approach is in for each choice K based on the inverse of the sum squared sharp contrast to the usual technique of exhaustive search error at the output layer of the N channels. This is an through parameter space to find the best fit for each extension of the common application of the choice rule to phenomenon of interest. In this case, two hidden units were response generation based on output unit activations (e.g., required to consistently reduce error on the six SHJ types. Kruschke, 1992): Table 1: Relative ease of category learning by DIVA N Pr (K) = (1/SSE(K)) / Σ (1/SSE(k)), (1) SHJ Type Mean number of Epochs to Mean number of Epochs to k=1 criterion (0.2) criterion (0.1) I 566 840 The logic of this paper is to demonstrate the power and II 847 1295 promise of the DIVA network for cognitive simulation. To address the topic of human category learning, the primary III 1195 1953 goal is to evaluate the model on the two most widely studied IV 1232 2087 datasets in the literature: the Shepard, Hovland, & Jenkins’ V 1144 1750 (1961) dataset on ease of learning and the 5-4 learning VI 5719 9416 problem introduced by Medin & Schaffer (1978). In addition to fitting benchmark data, a number of appreciable As can be seen in Table 1, the data are well fit (Type 1 < properties of DIVA in comparison with competing models Type II < Types III, IV ,V < Type VI). Consistent findings will be outlined. were observed across the time course of training as was found in the SHJ replication by Nosofsky, Gluck, Palmeri, Experiment 1. The Relative Ease of Learning McKinley, & Glauthier (1994). By way of comparison, a Across Category Structures standard feedforward (4-2-1) back-propagation network was tested under matching conditions. As also reported by Shepard, Hovland, & Jenkins (1961) produced a Kruschke (1992), the network was far too quick to learn FR groundbreaking analysis of the rate of acquisition of the six (comparable speed to UNI) and too slow to learn XOR. general types of category structures that are possible within With two hidden units, some initializations became stuck in a training set of binary-valued, overtly analyzable, three- local minima (especially on Type V) and the system showed dimensional stimuli. The most interpretable of these no progress on Type VI (a version of parity problem) structures are: Type I, a unidimensional rule (UNI); Type II, without more hidden units. the exclusive-or problem plus an irrelevant dimension Another way to test the performance of DIVA is to (XOR); and Type IV, a family resemblance structure (FR). compute the classification response to each pattern using the The results that have generated considerable challenges to choice rule over sum-squared error as outlined above. A model-builders is a qualitative ordering of the relative ease single simulation was conducted in this fashion using the of learning: UNI fastest; followed by XOR; followed by 1215
identical set of initial weights for each of the SHJ types. The explicitly represented the critical correlation between the learning results after 500 training epochs appear in Table 2. diagnostic features (a standard back-propagation network would search for a recoding of the input specifically Table 2: Classification accuracy for a DIVA network. targeted to allow for linearly separable classification between the hidden layer to the output.) F1 and F3 were SHJ Problem Category Classification always correct on the ‘incorrect’ category channel, and the Type Structure Accuracy output there for F2 was always exactly opposite to the input I UNI .97 activation. II XOR .94 On the FR problem, a representative DIVA network III .84 reached the following solution. H1 received an excitatory IV FR .83 signal from F3 and an inhibitory signal from F2. H2 was V .93 sensitive to all three input features with a strong inhibitory signal from F1 and lesser excitation from F2 and F3 VI .56 yielding the recodings shown in Table 4. The fit is excellent except for the overly good Table 4: Recodings formed by a DIVA network on FR. performance of Type V. There is a degree of variation across initializations of DIVA networks and in the case Input Hidden1 Hidden2 Target presented above, Type V showed performance at the upper Activation Activation Category end of its usual range. Such variation results primarily from 101 1 0 1 the degree of consistency between the initial random 001 1 0.9 0 configuration of weights and the form of the solution that is 000 0.4 0.6 0 required. When lower learning rates and smaller initial 011 0.5 1 1 weight variation are selected, the degree of variation lessens 111 0.5 0.4 1 considerably. 100 0.5 0 0 In order to make clear how the learning occurs, DIVA 110 0 0 1 solutions to the most interesting of the SHJ problem types 010 0 1 0 (UNI, XOR, FR) are described as follows. A representative DIVA network solved the UNI problem (on F1) by The network assigned each input item to a unique location assigning one hidden unit to code for the presence of F1 and in the two-dimensional representational space of the hidden F2 respectively. Each hidden unit strongly activated the F1 layer. The two channels showed equivalent connectivity output units: via excitation on one channel and inhibition on projecting from the hidden layer and used strong bias the other. Each hidden unit also activated the appropriate weights to differentiate their performance. It is interesting to non-diagnostic feature on each channel. F2 and F3 were note that this solution parallels the behavior of an ordinary always correct on the ‘incorrect’ category channel, while the autoencoder operating on this training set. Once again, output there for F1 was always exactly opposite to the input while operating entirely on the basis of the back- activation. propagation algorithm, the hidden units do not act to To solve XOR (on F1 and F2), a representative DIVA transform the input for linearly separable classification. The network largely ignored F2, but used signals from F1 and ‘incorrect’ channel attempts to interpret each input as a F3 to generate hidden layer recodings as shown in Table 3. member of its category and therefore produces markedly Table 3: Recodings formed by DIVA network on XOR. increased or reducing activation on one or more of the features. The XOR problem holds a high place in the contemporary Input Hidden1 Hidden2 Target study of both human and machine learning. For decades, the Activation Activation Category connectionist tradition was halted by the lack of an 101 0.2 0 0 algorithm to handle cases of hard learning, i.e., non-linearly 001 0 0.8 1 separable functions. Rumelhart, Hinton, & Williams’ (1986) 000 0.8 1 0 paper on back-propagation of errors was a breakthrough that 011 0 0.9 0 elicited tremendous productivity. The XOR problem 111 0.2 0 1 remains a benchmark for evaluation of learning systems. A 100 1 0 0 standard (hetero-associative) back-propagation network 110 1 0 1 reaches asymptote on Type II learning (the XOR problem 010 0.8 1 0 with an added irrelevant dimension) after approximately 3000 epochs of training. The DIVA network reached The DIVA network used four areas of the activation space asymptote on average in 847 epochs. This nearly fourfold on H1 to code for the pairwise combinations of the increase in speed of learning suggests that DIVA can diagnostic F1 and the non-diagnostic F3, while H2 primarily perform non-linear function approximation with coded for F1. It is interesting to note that neither hidden unit 1216
considerable ease. The SHJ Type VI is the parity problem from the prototype. Once again, the advantage goes to the with three dimensions. The standard back-propagation exemplar view. network did not make any headway with two hidden units, A (4-2-4x2) DIVA network was applied to the 5-4 but the DIVA network coasted smoothly down the error problem using a learning rate of 0.1 and initial weights gradient. These findings suggest the power of the DIVA randomized in a range of zero +/- .05. The model was network as a general learning device. allowed to run for 1000 epochs. Performance on each In sum, the Shepard, Hovland, & Jenkins (1961) dataset is training instance and the transfer items was determined by something of a litmus test for models of classification applying the choice rule to the sum-squared error along each learning. Despite some question about the generality of the channel. In terms of quantitative fit, a correlation of .96 was finding (see Kurtz, under review), it a seminal result in the found between the probabilistic responses of the DIVA literature. The design features of those models which have network and a summarization of thirty different behavioral successfully fit this data have come to represent the state of tests of the 5-4 problem published by Smith & Minda the art in the field. Localist encoding and selective attention (2000). The DIVA network produced a probability of A, are core components of the three successful models: Pr(A) = .96 for Stimulus A2 and Pr(A) = .85 for Stimulus ALCOVE (Kruschke, 1992), SUSTAIN (Love, Medin, & A1; thereby fitting the critical qualitative result that was Gureckis, 2004) and RULEX (Nosofsky, et al., 1994). previously captured only by pure exemplar models and These models all depend upon multiple free parameters (not RULEX (Nosofsky, Palmeri, & McKinley, 1994). In including learning rate) that are selected according to the addition, the transfer item T3 which is the prototype of same data that is to be fit. RULEX uses three best-fitting Category A produced Pr(A) = .86 which was the strongest parameters in addition to best-fitting attentional weights. response to any transfer item, but was a lesser response than ALCOVE and SUSTAIN each use three best-fitting that shown for the training items A2 and A3. DIVA offers parameters. DIVA offers a successful fit with a single the first successful fit to these results by a model that does parameter which is set a priori, rather than post-hoc, and not implement the theoretical framework of localist offers a strong challenge to the widespread view that encoding and selective attention. selective attention and localist representation are the correct explanatory constructs. Experiment 3. Avoiding Catastrophic Interference Experiment 2. Learning the 5-4 Among some researchers, the phenomenon of catastrophic Categorization Problem interference has been considered a fatal flaw for back- The case for the superiority of exemplar models has rested propagation as an account of human learning and memory in no small part on extensive behavioral and computational (e.g., McCloskey & Cohen, 1989). In point of fact, a tests of the 5-4 problem introduced by Medin & Schaffer number of intriguing solutions and more nuanced treatments (1978). A challenge has been raised recently (e.g., Smith & (McClelland, McNaughton, & O’Reilly, 1995; Mirman & Minda, 2000) based on successful fits by a ‘souped-up’ Spivey, 2001) have appeared. Nonetheless, a minimal version of a prototype model and questioning of the solution (one that does not graft an additional component, satisfactory nature of the exemplar account presented by integrate additional mechanisms, or make modifications to Nosofsky, Kruschke, & McKinley (1992). the training set, etc.) has not been found. Is it possible to The 5-4 category problem consists of nine training items preserve the computational power and psychological with four binary-valued features plus a set of transfer items. validity of learning distributed internal representations via The design feature of the problem is that it is linearly back-propagation without catastrophic interference? separable (and therefore fair game for testing prototype The definitive demonstration of catastrophic interference models), but includes three very weak category members for neural network models trained by back propagation is (for which only two out of the four features are consistent Ratcliffe’s (1990) simulation result using the 4-4 encoder with the underlying prototype). Category B consists only of problem. The problem involves two learning phases. its prototype, one strong example, and two weak examples. Training is performed to a certain level on the Phase I Model testing has focused not only on overall quantitative examples and then the training set is swapped. Phase II fit, but also to two qualitative aspects of the data. The first is consists of training on only the second training set. The that in non-elaborated experimental versions of the task, observed phenomenon is that the network performs well on learners are more accurate on Stimulus A2 (which has two the first training set at the end of Phase I, but the process of features in common with the A prototype) than they are on learning in Phase II “catastrophically” disrupts performance Stimulus A1 (which has three prototypical features). The on Phase I examples. Phase I consists of three four- prototype model predicts the opposite, while exemplar dimensional patterns to be autoassociatively reconstructed models capture the result (Nosofsky, et al., 1992). In through an intermediate hidden layer. The patterns are: addition, behavioral results typically show that a transfer 1000, 0100, and 0010. Phase II consists of a single pattern: test on the Category A prototype produces highly accurate 0001. responding, though not more so than the observed Using DIVA, it is straightforward to assign a separate performance on training items that are somewhat distant output channel to each sequential phase of learning. The 1217
divergent autoencoding principle is applied in this case to are hardly affected by Phase II training, and the weights separate phases of learning rather than to separate from the hidden layer to the P1 channel are affected not at classification labels (as above). The same input and hidden all. However, this is not at all equivalent to using entirely units are used, however separate bank of outputs are used different networks for the two phases of learning. The same for each phase. Both channels are present in the architecture input units, hidden units, and connecting weights are used. at all times, but targets only are applied to adjust the weights The two learning phases are equivalent for DIVA to along the active channel. The critical assumption is that the learning a two-way classification problem with massed shift between phases of learning must somehow be practice. One can interpret the DIVA solution to the demarcated and psychologically encoded. The task context problem of catastrophic interference as the establishment of must make clear that “now you are to learn something else.” a contextually-driven classification of inputs as members of In point of fact, traditional paradigms for studying either Phase 1 or Phase 2. With this one very plausible interference usually make a very clear distinction between assumption, divergent autoencoding preserves the back- List 1 and List 2. An intriguing prediction is that an propagation machinery for error-driven learning without the unannounced or non-obvious shift from Phase I to Phase II catastrophic interference. ought to elicit CI unless the switch is made manifest. As a final point of emphasis, no known model has been able to General Discussion exploit the phase variable to prevent CI by devoting input or Given the demonstrated promise of DIVA, a number of output units to code for the phase of each presented pattern. further explorations are underway. DIVA shows a tendency A (4-3-4x2) DIVA network was tested with three hidden to shift during learning from more general to more specific units and a learning rate of 0.2 in accord with Ratcliffe category representations (e.g., Smith & Minda, 1998). (1990) and Kruschke (1992). Weights were randomly DIVA is naturally extensible to the recently vigorous initialized in a tighter range around zero. The network investigation of category learning beyond traditional required 550 epochs to reach the 70% training criterion for classification, i.e., inference learning, category use, Phase 1 learning used by previous investigators. As unsupervised learning, and cross-classification. Since explained above, Phase I training applied targets only on the autoassociative processing naturally generates a feature- P1 channel. The same amount of training was conducted for based representation as its output, applications to Phase II on just the 0001 pattern using only the targets on recognition memory, memory distortions, and feature the P2 channel. prediction are forthcoming. An intriguing aspect of the DIVA architecture is that it Table 5: Output Activations of DIVA network on offers a straightforward mechanism for producing a Sequential Learning Task. convolved representation of any input in terms of any category known to the network. Imagine that a pattern Input Channel for P1 Channel for P2 representing a cat is presented to a DIVA network trained After Phase 1 on various animal concepts. Regardless of which animal is 1000 .74 .19 .17 .04 0100 .18 .68 .23 .03 the actual classification response, every channel produces an 0010 .16 .24 .70 .04 interpretation or construal of the input in terms of its 0001 .49 .49 .49 .50 category. The psychological nature of such construals is of After Phase 2 great interest. For example, the similarity of concepts A and 1000 .73 .21 .15 .03 B can be computed as the degree of reconstructive success a 0100 .16 .66 .24 .03 DIVA network achieves in processing a prototypical 0010 .17 .22 .68 .03 example of A along a channel trained on concept B. 0001 .04 .04 .04 .96 Typicality or graded structure of category members can be understood as the degree of reconstructive success in As shown in Table 5, catastrophic forgetting was fully processing a member of category A through the channel for avoided. Similar performance was observed across that category. Argument strength for category-based differently initialized runs and variations in learning rate induction can be understood as the degree of reconstructive and initial weight range. Two follow-up tests were success in processing a representation of the conclusion conducted. The DIVA network was tested using negative category along the channel(s) of premise categories. The valued (-1) input activations rather than zero-valued ‘off’ internal representation generated by inputting a units. This also yielded successful results. In addition, an representation or representative example of one concept to alternate version of Phase II learning was conducted using the channel of another concept is likely to produce a the pattern . This extends the problem beyond conceptual combination or metaphoric interpretation. If a the case in which positive activation of the features is parsimonious means can be found to represent structural segregated between the two training phases. Once again, information in a form submittable to a neural network, the performance on Phase I examples remained intact. potential deepens. The success of the DIVA network can be explained very simply. The weights from Features 1-3 to the hidden layer 1218
In sum, DIVA provides an uncompromisingly good fit to mathematical psychology (pp. 103-189). New York: the two most influential data sets on human category Wiley. learning and does so with the following characteristics: McClelland, J.L., McNaughton, B.L. & O'Reilly, R.C. 1. Distributed representation rather than localist nodes (1995). Why There are Complementary Learning Systems for individual instances in the Hippocampus and Neocortex: Insights from the 2. No selective attention mechanism Successes and Failures of Connectionist Models of 3. No performance-optimized free parameters Learning and Memory. Psychological Review, 102, 419- Therefore, the success of this model calls into question 457. widely held theoretical assumptions. The DIVA network McClelland, J.L. & Rumelhart, D.E. (1986). A Distributed offers the brain-style computational power of back- Model of Memory. In Rumelhart, D. E., McClelland, J.L. (Eds.), Parallel distributed processing: Explorations in propagation and overcomes its shortcomings in simulating the microstructure of cognition: Vol 1I. Applications human learning. The computational design principle of (pp.170-215). Cambridge MA: MIT Press. divergent autoencoding deserves consideration as an McCloskey, M., & Cohen, N. J. (1989). Catastrophic explanatory construct underlying broad aspects of cognition. interference in connectionist networks: The sequential learning problem. In G. H. Bower (Ed.), The psychology Acknowledgments of learning and motivation (Vol 24, pp. 109-165). New With thanks to David E. Rumelhart. This project was York: Academic Press. partially supported by NIH award 1R03MH68412-1. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207- References 238. Mirman, D. and Spivey, M. (2001) Retroactive interference Baldi, P. and Hornik, K. (1989) Neural networks and in neural networks and in humans: the effect of pattern- principal components analysis: Learning from examples based learning. Connection Science, 13(3), 257-275. without local minima. Neural Networks, 2, 53-58. Nosofsky, R.M., Gluck, M., Palmeri, T.J., McKinley, S.C., Caruana, R. (1995). Learning many related tasks at the same & Glauthier, P. (1994). Comparing models of rule-based time with backpropagation, Advances in Neural classification learning: A replication and extension of Information Processing Systems Vol. 7, pp. 657-664, Shepard, Hovland, and Jenkins (1961). Memory & Morgan Kaufmann, San Mateo, CA, 1995. Cognition, 22, 352-369. DeMers, D. & Cottrell, G. (1993). Nonlinear dimensionality Nosofsky, R., Kruschke, J., & McKinley, S. (1992). reduction. Advances in Neural Information Processing Combining exemplar-based category representations and Systems Vol. 5, pp. 580-587, San Mateo, CA: Morgan connectionist learning rules. Journal of Experimental Kaufmann. Psychology: Learning, Memory, and Cognition, 18, 211- Gluck, M. A. & Myers, C. E. (1993). Hippocampal 233. mediation of stimulus representation: A computational Nosofsky, R. M., Palmeri, T. J., & McKinley, S. K. (1994). theory. Hippocampus, 3, 491-516. Rule-plus-exception model of classification learning. Intrator, N. and Edelman, S. (1997). Learning low- Psychological Review, 101,55-79. dimensional representations via the usage of multiple- Ratcliff, R. (1990). Connectionist models of recognition class labels. Network 8, 259-281. memory: Constraints imposed by learning and forgetting Japkowicz, N. (2001). Supervised versus unsupervised functions. Psychological Review, 97, 285-308. binary-learning by feedforward neural networks. Machine Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning, 42, 97-122. Learning internal representations by error propagation. In Japkowicz, N., Hanson S.J., & Gluck, M.A. (2000). D. E. Rumelhart & J. L. McClelland (Eds.), Parallel Nonlinear Autoassociation is not Equivalent to PCA. distributed processing: Explorations in the microstructure Neural Computation, 12, 531-545. of cognition: Vol 1. Foundations (pp.318-362). Kruschke, J. K. (1992). ALCOVE: An exemplar-based Cambridge, MA: Bradford Books/MIT Press. connectionist model of category learning. Psychological Shepard, R.N., Hovland, C.L., & Jenkins, H.M. (1961). Review, 99, 22-44. Learning and memorization of classifications. Kurtz, K.J (1997). The influence of category learning on Psychological Monographs, 75 (13, Whole No. 517). similarity. Unpublished doctoral dissertation. Smith, J. D., & Minda, J. P. (1998). Prototypes in the mist: Kurtz, K.J. (under review). Abstraction versus selective The early epochs of category learning. Journal of attention in classification learning. Experimental Psychology: Learning, Memory, and Kurtz, K.J. & Smith, G. (in preparation). The ORACL Cognition, 24, 1411-1430. model of concept formation and representation. Smith, J.D. & Minda, J.P. (2000). Thirty categorization Love, B.C., Medin, D.L, & Gureckis, T.M (2004). results in search of a model. Journal of Experimental SUSTAIN: A Network Model of Category Learning. Psychology: Learning, Memory, and Cognition, 26, 3-27. Psychological Review, 111, 309-332. Luce, R.D. (1963). Detection and recognition. In R.D. Luce, R.R. Bush, & E. Galanter (Eds.), Handbook of 1219
You can also read