Professor of Social Psychology
Department of Psychology
University of Southern California
Los Angeles, CA 90089-1061
Phone: (213) 740-2291



Connectionist Models of Social Reasoning and Social Behavior

Stephen J. Read and Lynn C. Miller


University of Southern California


|Connectionest Models of Social Reasoning Group|
|Center for the Social Uses of Interactive Media|


back to home


Table of Contents


1. Making Sense of People: Coherence Mechanisms

            Paul Thagard and Ziva Kunda

    2. On the Dynamic Construction of Meaning: An Interactive Activation and Competition

    Model of Social Perception

            Stephen J. Read and Lynn C. Miller


3. The Dynamics of Group Impression Formation: The Tensor Product Model of

Exemplar-Based Social Category Learning

        Yoshihisa Kashima, Jodie Woolcock, and Deborah King

4. Person perception and stereotyping: Simulation using Distributed Representations in a

      Recurrent Connectionist Network

            Eliot R. Smith and Jamie DeCoster


5. A Connectionist Approach to Causal Attribution

            Frank Van Overwalle and Dirk Van Rooy


6. Personality as a Stable Cognitive-Affective Activation Network: Characteristic Patterns

of Behavior Variation Emerge from a Stable Personality Structure

        Yuichi Shoda and Walter Mischel


7. The Consonance Model of Dissonance Reduction

    Thomas R. Shultz and Mark R. Lepper

8. Toward an Integration of the Social and the Scientific: Observing, Modeling, and

      Promoting the Explanatory Coherence of Reasoning

    Michael Ranney and Patricia Schank


9. Toward Computational Social Psychology: Cellular Automata and Neural Network

Models of Interpersonal Dynamics

Andrzej Nowak and Robin R. Vallacher

10. Attitudes, Beliefs and Other Minds: Shared Representations in Self-Organizing Systems

Richard Eiser, Mark J. A. Claessen, and Jonathan J. Loose








Stephen J. Read and Lynn Carol Miller


        Neural network models, also called connectionist or parallel distributed processing models,
seem to represent a major paradigm shift in cognitive psychology, cognitive science and artificial
intelligence. Such models move us away from the idea of mind as computer, and instead promise
the possibility of brain style models of the mind, admitting the possibility that models of high level
cognitive processing can be built from simple neuron like units. That is, we can build
computational models of the mind composed of units functionally similar to the physical units that
compose a real brain. This approach has led to some fundamental new insights about the way the
mind might work and the way it might interact with the environment.
        Surprisingly, given the importance of these models, until recently social psychologists had
paid little attention to them. Yet, these models directly address several fundamental characteristics
of social perception and social interaction: the simultaneous integration of multiple pieces of
information and the quite short time frame within which such integration occurs. Any mundane act
of social perception (and any resulting behavior) results from the simultaneous integration of
multiple pieces of information, such that the meaning of each piece of information mutually
influences and constrains the meaning of each other piece. Thus, social perception can be viewed
as the solution of simultaneous mutually interacting constraints. Moreover, this integration
typically takes place in a very short time frame, much shorter than would be possible for any kind
of reasonable serial integration process. Thus, much of social perception must occur in parallel.
Both of these are central characteristics of neural network models (Rumelhart & McClelland,
        Social psychologists lack of involvement with these models is surprising for another
reason. As Read, Vanman and Miller (1997) have recently shown, there are a number of important
parallels between characteristics of these models and the Gestalt principles that formed the
theoretical foundation of much of modern social psychology (Asch, 1946; Festinger, 1950, 1957;
Heider, 1958; Lewin, 1935,1947a, 1947b).
        However, there has been a recent surge of interest in the application of these kinds of
models to social phenomena. This book brings most of this work together in one place. Doing so
allows the reader to appreciate the breadth of these approaches, as well as the theoretical
commonality of many of these models. Each of the chapters provides an explicit connectionist
model of a central problem in social psychology. Because most of the authors either use a standard
architecture, can provide a computer program for their model, or use a publically available system
for modeling, the interested reader, with a little work, should be able to implement their own
variation of a model.
        The authors in this volume address a number of central issues in social psychology and
show how these kinds of models provide insight into a number of classic issues. Moreover, many
of the chapters provide hints that this approach provides the seeds of a theoretical integration that
the field has long lacked.
        Smith and DeCoster, and Kashima, Woolcock, and King outline models of the learning
and application of social categories and stereotypes. Kunda and Thagard, Read and Miller, and





Van Overwalle and Van Rooy describe models of causal reasoning, social explanation and person
perception. Shoda and Mischel present a model of personality and social behavior. Shultz and
Lepper show how a neural network model can capture many of the classic dissonance phenomena,
while Ranney and Schank grapple with belief change and the coherence of large scale belief
systems. Finally, Nowak and Vallacher, and Eiser, Claessen, and Loose show that these are not
just models of individual cognition, but that they can also capture important aspects of social
influence and group interaction.
                                      Connectionist models
      In the following we present a very brief overview of connectionist models. We considered
(briefly) a more extensive tutorial. However, there are a number of good introductions, some
aimed at cognitive psychologists and two recent ones aimed specifically at social psychologists.
Thus, it seemed pointless to repeat what had already been said, in much more detail elsewhere.
      There is probably still no better introduction to neural network models and their
psychological implications than the two edited volumes by Rumelhart and McClelland and the PDP
research group (1986; McClelland & Rumelhart, 1986). Other good resources for the social
psychologist are Anderson's (1995) recent textbook and Bechtel and Abrahamsen's (1991) book,
which was written as a companion to the PDP volumes. And recently Smith (1996) and Read,
Vanman, and Miller (1997) have specifically focused on the implications of these kinds of models
for the kinds of problems with which social psychologists are concerned. Moreover, Read,
Vanman, and Miller (1997) also extensively discuss the numerous parallels between key aspects of
neural network models and the Gestalt psychological principles that formed the theoretical
foundations of much of modern social psychology.
      Connectionist modeling (e.g., Hertz, Krogh, & Palmer, 1991; McClelland & Rumelhart,
1986; Rumelhart & McClelland, 1986) treats the processing involved in perceptual and cognitive
tasks in terms of the passage of activation, in parallel, among simple, neuron-like units. The most
important components of these models are: (1) simple processing units or nodes, which sum the
incoming activation, following a specified equation, and then send the resulting activation to the
nodes to which they are connected, (2) equations that determine the activation of each node at each
point in time, based on the incoming activation from other nodes, previous activation, and the
decay rate, (3) weighted connections between the nodes, where the weights affect how activation is
spread, and (4) a learning rule which specifies how the weights change in response to experience
(Bechtel & Abrahamsen, 1991). Processing in a connectionist model proceeds solely by the
spread of activation among nodes, where the pattern of connections affects how activation spreads.
There is no higher order executive or control process. Moreover, knowledge in a connectionist
model is represented entirely in the pattern of weights among nodes.
      Although there are a number of differences among potential neural network models, here we
focus on two important differences. One is whether there are feedback relations among the nodes.
In feed forward networks, units have unidirectional connections, with no feedback relations. The
network is organized in layers, with inputs fed into the input layer and outputs generated at the top
layer as a result of a single forward sweep of activation. The simplest such network has two
layers, an input and an output layer, although more complicated networks may have intervening or
"hidden" layers (so-called because they have no direct connections to the environment.).
Networks with hidden layers, such as the well-known back propagation network, have greater
computational power. A prototypical example of a feed forward network is the pattern associator,
in which the system learns an arbitrary association between an input represented as a pattern of
activation on the input layer and a pattern represented on the output layer. Such networks can learn
to categorize objects or assign names to objects.
      By contrast, in interactive or feedback networks, at least some connections are bi-directional,
resulting in feedback relations, and processing occurs dynamically across a large number of cycles.





Nodes in these networks have a minimum and maximum possible activation (typically ranging
from 0 to 1, or from -1 to 1). The activation of the nodes is updated many times as the activation of
the units moves towards asymptote, and as the system works toward settling into a solution to a
particular input. In contrast, in feed forward networks, activation is updated only once.
      Because of the feedback relations, interactive or feedback networks are dynamic systems
whose behavior evolves over time. As a result they have interesting and useful properties that are
not characteristic of feed forward networks. One of the most useful properties of such networks is
that they function as parallel constraint satisfaction systems, acting to satisfy multiple simultaneous
constraints among elements in a network. Most of the networks in this book are feedback
networks and the constraint satisfaction abilities of the networks are central aspects of the models.
      A second important difference among models is whether concepts have a distributed or a
localist representation. In a localist representation, a concept or perhaps an entire proposition is
represented by a single node. In contrast, in a distributed representation a concept is represented by
a pattern of activation over a number of nodes. Although some researchers see distributed
representations as a defining characteristic of connectionism, we take the view of many researchers
that the representation one should use should depend on one's question.
      Each of these types of representations has their strengths and weaknesses. We see three
major advantages to a distributed representation. First, such a representation does seem more in
line with the attempt to model the mind using neuron like units and does seem to fit our intuition
that the representation of a concept should be in terms of the action of large clusters of neurons,
rather than an individual neuron. Second, a distributed representation has the property of graceful
degradation. That is, loss of a small number of neurons has little if any impact on the
representational ability of the model. In contrast, in a localist model loss of a single neuron leads
to the loss of the corresponding concept. Third, during learning a distributed representation
implicitly calculates the degree of similarity among inputs. That is, if the activation vectors
representing different inputs are sufficiently similar, they will tend to receive a common
representation in the network. This underlies the ability of such models to learn prototypes from
related exemplars. In contrast, a localist model has no such ability.
      However, localist models have their own strengths, which are the flip side of some of the
weaknesses of distributed models. First, localist models are often much more interpretable, as
each concept corresponds to a single node. In contrast, in a distributed representation, because
each concept is represented by a pattern of activation over a large number of nodes, it can often be
quite difficult to interpret the behavior of such models. Second, localist models are often much
more computationally tractable. Consider a simple model with 20 concepts. In a localist model,
this will only take 20 units and a 20 X 20 weight matrix. In contrast, assume we had a distributed
representation in which each concept was represented by 20 elements. In this model we need 400
units and a 400 X 400 weight matrix. The distributed model has 400 times as many weights. And
the problem only gets more serious as the model gets bigger.
      One other issue is relevant to the issue of whether one should use a localist or a distributed
representation. Assume that one is developing a model of high level cognition, such as a model of
analogical reasoning, explanation, or cognitive consistency. In these kinds of models, one is
typically interested in relationships among concepts, such as causal or implicational relationships.
And frequently the key theoretical mechanism is the parallel satisfaction of mutual constraints
among concepts. What is central is the relations among concepts, rather than the representation of
concepts. In such cases, it seems likely that the pattern of activation of an ensemble of neurons can
be treated as if it were a single node, with little or no loss of theoretical power. In that case a
distributed representation would have no advantages and many costs.
      Thus, one's choice of representation, we argue, should be a function of one's question. If
graceful degradation is important or if one is looking at questions of concept learning or
categorization, where sensitivity to similarity is central, then a distributed representation would
seem essential. However, in cases where the special strengths of distributed representations are





unnecessary, then the relative conceptual and computational simplicity of localist models would
seem more desirable.
      The various chapters in this book represent some of the conditions under which each kind of
representation would seem most appropriate. For instance, several authors such as Smith and
DeCoster, and Kashima, Woolcock, and King are explicitly interested in models of category
learning. Or Read and Miller are interested in the learning of the components of trait concept.
Here distributed representations would seem critical. However, other chapters, such as Shultz and
Lepper's chapter on Dissonance, Shoda and Mischel's model of personality-behavior relationships
or Kunda and Thagard's chapter on the role of coherence are primarily interested in the
implications of processing in recurrent networks, specifically the fact that such networks function
as systems for the parallel satisfaction of multiple simultaneous constraints. In these chapters,
distributed representations would have provided no additional insights and would have
tremendously complicated the models.
                                        Overview of the book
      We considered two possible ways to conceptually group the current chapters: (1) in terms of
the underlying neural network architecture that is used, or (2) in terms of the specific topic being
investigated. Our ultimate choice was the latter, on the assumption that most readers would be
primarily interested in the specific topic and how the different investigators approached it.
However, in the following descriptions of the chapters we have briefly noted the kind of model
that was used. It is an interesting side note that 8 of the 10 chapters use a recurrent or feedback
architecture, while only three use a feedforward architecture (Eiser, Claessen, and Loose explore
both kinds of architectures). So here follows an overview of each of the 10 chapters.
      Thagard and Kunda argue that coherence mechanisms play a central role in three different
processes by which people make sense of other people's behavior, how we: (1) integrate a number
of concepts, such as traits, to form an impression of another, (2) arrive at an attribution or
explanation of someone's behavior, and (3) use analogies to familiar others to make sense of
someone's behavior. Not surprisingly to anyone familiar with their work, Thagard and Kunda
argue that coherence mechanisms can be treated as constraint satisfaction problems that can be
captured by recurrent or feedback connectionist networks.
      They then review their work in each of these three areas. First, they describe their recent
model of impression formation (Kunda & Thagard, 1996) and how it can capture such phenomena
as shifts in meaning of concepts during impression formation and the development of new or
emergent concepts from combinations of other concepts. Second, they discuss Thagard's (1989,
1992) model of explanatory coherence and its implications for the understanding of social
explanation (also see, Miller & Read, 1991; Read & Marcus-Newhall, 1993; Read & Miller,
1993). Third, they describe Holyoak and Thagard's (1989, 1995) work on constraint satisfaction
models of analogical reasoning and analog retrieval and they discuss its possible application to a
number of phenomena in social perception, such as social comparison, using the self as a model to
understand others, and using parents and friends to understand new acquaintances. As part of
their discussion they demonstrate how each of these somewhat different phenomena can be treated
in terms of the same underlying principle, as a coherence mechanism, operationalized as a
constraint satisfaction process. In line with this conclusion, they also discuss the likelihood that
these three types of coherence mechanisms are integrated when we actually try to make sense of
behavior in social interaction. Finally, following a major focus in social cognition, they also
examine the extent to which each of these different processes are automatic or controlled.
      Read and Miller present an interactive activation and competition(IAC) model of social
perception, based on work by McClelland and Rumelhart (1981; McClelland & Elman, 1986;
Rumelhart & McClelland, 1982) on word recognition and speech perception. This model is a
feedback or recurrent network, with the nodes organized into multiple layers, where each layer
does a different kind of processing and sends the results to higher levels. One interesting aspect of
this kind of model is that not only do lower levels, such as feature analysis, send activation to





higher levels, but higher levels can also affect lower levels. For instance, a highly activated trait
node can send activation back to the feature nodes that compose the original behavior,
disambiguating unclear or ambiguous inputs.
      Read and Miller propose a four level network, with each level sending activation to the level
above, and in turn receiving activation from the higher level. The nodes in such a model can be
treated as hypotheses about the presence or absence of the corresponding concept, with alternative
construals or hypotheses having inhibitory links and consistent or supportive hypotheses having
excitatory links. The first level in their model is the Feature level, composed of nodes sensitive to
the features of human beings, objects, and behavior. Activation from this level then goes to an
Identification level, where the individual features are used to identify social actors, objects, and
behaviors. Actors, objects, and behaviors identified at this level are then assembled into a coherent
representation of the social action at the third level, the Story or Scenario level.
      A central aspect of Read and Miller's model is the proposal that social concepts at this level
are represented in terms of plot units or frame-based structures, with a case-role structure, where
each action centers around a verb or action unit that identifies the various roles, such as actor,
patient, and instrument that participate in that action. For instance, they argue that many traits are
composed of underlying story structures.
      Finally, information from the Story level is used to arrive at the meaning of the interaction at
the Conceptual or Meaning level. For example, the instantiated story structure may be used to
access various trait characterizations for a social actor.
      This model naturally implements various principles of Explanatory Coherence (Thagard,
1989, 1992) that have been shown to play a central role in social reasoning (Ranney & Schank,
this volume; Read & Marcus-Newhall, 1993; Read & Miller, 1993), as well as capturing the
impact of a limited capacity working memory. Read and Miller also discuss some of the
implications of such feedback or attractor models for both learning of social concepts and the
combination of old concepts to form novel ones. They note that during learning such models
perform a componential analysis of concepts. For example, readers can learn subcomponents of
words or social perceivers can learn subcomponents of traits, such as goals, plans, and beliefs. As
a result, such a model can capture the acquisition of primitive concepts during learning. Moreover,
they discuss how such models can take advantage of such a componential analysis to combine
previously learned concepts to form novel concepts. This focus on conceptual combinations is
also shared with Kunda and Thagard, and Smith and DeCoster.
      Finally, Read and Miller apply their model to two major topics in social perception. First,
they discuss how it provides an explicit process model of spontaneous trait inferences, capturing
the inferential processing involved in going from the features of the social interaction to the final
trait inference. Second, they show how their model can provide an account of Trope's (1986) two
stage model of dispositional inference, and in particular how it can capture the impact of higher
level concepts on the identification of social actions.
      Kashima, Woolcock, and King use an architecture that is fairly novel in this literature,
the tensor product model. However, the central issues they address, the representation of social
categories and stereotypes, overlap with those of Smith and DeCoster.
      Kashima et al. note that since little work has been done specifying the details of the
representation of social groups, they intend their model as a step toward addressing that issue.
Further, they note that the little work that has been done has taken two divergent paths: one looking
at how impressions of groups are formed, and the other at how individuals are classified into social
groupings, that is, how social categories are represented. The aim of their chapter is to present a
model that can explain the findings in both of these areas.
      They first present a mechanism for how memories are initially encoded and then examine
how those memories can be used for judgment and memory retrieval. Their model uses a
distributed representation in which a given feature is represented as a pattern of activation over a set
of nodes. One unique characteristic of the model is that it provides a mechanism for the





representation of attribute - value pairs (or what Kashima et al. call aspects and features), such as
skin-color: black, or eye-color: blue. For example, assume we have an individual John with an
attribute, skin-color, that has a value black. Representing this notion of an attribute which applies
to an object and has a particular value, is difficult to do in standard connectionist models that use
distributed representations. For example, a typical connectionist model with a distributed
representation would directly associate the individual John with black skin. This is because the
standard representation is in terms of a two-dimensional weight matrix that gives the association of
two vectors. There is no easy way to represent the idea of an attribute that can take on multiple
values. Thus, one could not easily ask the model, "What is John's skin color."
      Let us see how this works. Assume that we have two features, each represented by a vector,
a and b. Multiplying the two vectors together (taking the outer product) gives a matrix, where the
elements in the matrix represent the degree of association between each element in a and each
element in b. The tensor formulation is a generalization of this to the association among n vectors.
Thus, if we had a third vector c, we would multiply a, b, and c and end up with a three
dimensional array that represents all the associations among all the elements in each of the three
vectors. In this representation, one vector can represent John, a second vector can represent the
attribute skin color, and the third vector can represent the value, black. And the resulting three
dimensional array represents the association among John, skin color and black. Once one has this
array, one could then do the equivalent of asking for John's skin color, by taking the two-
dimensional matrix representing the association among John and skin color, and then apply it to the
three dimensional array with appropriate mathematical manipulations to retrieve the third vector
representing black.
      In this model different memory traces are superimposed on each other by simply adding
together the tensor products for different memories. Thus, one ends up with one array in which
are superimposed a large number of memories.
      Kashima et al. then apply their model to several phenomena. First, they demonstrate how
characteristics of the group can be used to retrieve a category or group label. As is true of other
models, such as Smith and DeCoster's, provision of a partial pattern of cues enables the retrieval
of the entire pattern, although the mechanism by which this happens is somewhat different than in
Smith and DeCoster's model.
      Second, they show how this model can simulate the use of both exemplars and prototypes in
classification. As part of this demonstration, they show analytically how the Tensor Product
model is consistent with various Context Model theories of classification, first proposed by Medin
and Schaffer (1978) and extended by Nosofsky (1984, 1986). These are exemplar based models
which argue that classification of items into a category is based on similarity to exemplars that
make up the category. Further, they demonstrate that their model can simulate results of
experiments by Smith and Zarate (1990) supporting a mixture model of classification that seem to
show that subjects can use both prototypes and exemplars to classify new items, depending upon
the experimental conditions.
      Third, they show how this model can simulate judgments or impressions of a group.
Essentially, they provide a vector representation of the high and low end points of a judgment scale
and then calculate the similarity of that vector to the representation of the group. In doing this, they
note that judgments of groups seem to fit a weighted averaging model and they show how their
model can successfully simulate this. Fourth, they show how the Tensor Product Model can
handle Hamilton and Gifford's (1976) work on the distinctive based illusory correlation
      In concluding, they argue that their model has the advantage of capturing both classification
and judgment in the same model. And it is consistent with major models of classification, such as
GCM, and major findings in judgment, such as weighted averaging.
      Smith and DeCoster apply a recurrent connectionist network, specifically an
autoassociative network developed by McClelland and Rumelhart (1986), to key findings in person





perception and stereotyping. In an autoassociative model, each unit is linked to every other unit
and receives activation from all other units, as well as receiving external input. They use a
distributed representation in which a pattern of activation across a set of units represents a concept,
rather than having a single node correspond to a single concept. Such a model can do pattern
learning, pattern completion of incomplete patterns, and memory reconstruction or schematic
      Learning in their model is instantiated by the delta rule (Widrow & Hoff, 1960) which uses
the difference between the activation of nodes due to internal inputs from the network and the
activation due to external inputs, to adjust the weights. The aim of this procedure is to modify the
weights so that the activation of each node from all its internal connections approximates the
activation of each node from external or stimulus input. Essentially, the network is learning the
pattern of external inputs. One result of this is that the network will learn to reinstantiate the
complete pattern from partial input.
      Smith and DeCoster show how their model handles four phenomena. First, it can learn
characteristics of individual exemplars or cases and then retrieve those characteristics from a partial
cue. Second, it can learn a group stereotype or category from multiple exemplars and then given
partial cues it can retrieve or reconstruct the prototype or stereotype. As Smith and DeCoster note,
this demonstrates that a single mechanism and a single representational format, can account for
these two seemingly different phenomena. This is in contrast to most models in social cognition
that assume very different representational forms for exemplars and prototypes. Third, the model
can learn multiple knowledge structures in the same network and then create novel or emergent
structures by combining the existing structures to form a new structure. This provides a
mechanism for the development of novel or emergent concepts. Classic schema models seem to
lack a mechanism for combining old concepts to create novel ones. (Also see Read & Miller, this
volume; Thagard & Kunda, this volume). Finally, they show that several aspects of construct
accessibility can be captured by such a model, specifically demonstrating that both recency and
frequency of activation of a concept increases its impact on future inferences. In addition they
show that spaced patterns will have a greater impact than patterns that are massed. They do this by
demonstrating that a partial pattern does a better job of re instantiating a complete pattern when the
original pattern has been recently and/or frequently presented, or presented in a spaced fashion.
      Smith and DeCoster note that they are able to handle each of these with the same mechanism,
although typical work in social cognition proposes a separate model for each. Following work by
Rumelhart, Smolensky, et al. (1986) they also observe that such a model can produce what looks
like schemas and schematic processing despite the lack of any schematic structures. (Also see Read
and Miller, this volume)
      Van Overwalle and Van Rooy investigate how a simple two layer feedforward network
using delta rule learning, a pattern associator, can simulate several interesting findings from the
literature on causal learning. Their work extends earlier work by others, such as Gluck and Bower
(1988a, 1988b) and Shanks (1991, 1993) which has demonstrated that the classic Rescorla-
Wagner model of animal learning is formally identical to a two layer (lacking hidden units)
feedforward network that uses delta rule learning to learn new associations.
      They also compare this kind of model with statistical models, such as Cheng and Novick's
(1990) probabilistic contrast model and show that the connectionist model is sensitive to factors
that the probabilistic contrast model is not. The basic difference between statistical models, such as
the probabilistic contrast model, and the connectionist model, is that the probabilistic contrast
model is sensitive only to relative frequencies of the pairings of different kinds of events, whereas
the connectionist model is also sensitive to the absolute frequency of presentation. For instance,
according to the probabilistic contrast model the case in which we have one instance of the effect
given the cause and no instance of the effect given the absence of the cause, should be equivalent to
a case where we have five instances of the effect given the cause and no instances of the effect
given the absence of the cause, because in both instances the differences between the probabilities





is 1.0. In contrast, the connectionist model is sensitive to the absolute frequency of pairing of the
cause and effect. And, they provide evidence that humans have the same sensitivity.
      In addition, following other work (e.g., Vallee-Tourangeau, Baker, and Mercier, 1994) they
investigate the parallels between effects in the associative learning literature known as blocking and
conditioned inhibition and the well known phenomena of discounting and augmenting in the
attribution literature. As part of this work they show that in human beings the strength of
discounting and augmenting is sensitive to the frequency of instances, which is consistent with the
predictions of the associative model, but not with the original version of the probabilistic contrast
      Finally, they examine the learning of multiple causes and they test the ability of various
connectionist models to simulate human responses. They compare the two layer feedforward
network with Pearce's (1994) configural cue model, and with a standard three layer
backpropagation network with hidden units. Pearce's model was explicitly developed to handle
configurations of cues, by assigning a single node to the configuration, whereas the
backpropagation network should be able, at least in theory, to learn hidden units that represent a
configuration of cues. The authors find that Pearce's configural cue model does the best job of
simulating results from human subjects.
      Shoda and Mischel use an autoassociative, recurrent network to tackle a recent
controversy in personality: the apparent paradox between expectations of stable individual
differences in patterns of personality and the actually obtained, relatively low, cross-situational
consistency in behavior. Their answer to this apparent paradox has been two-fold. First, in an
extensive body of research they and their colleagues have demonstrated that stable situation-
behavior, if-then relationships characterize individuals. That is, while people may not show
general cross-situational consistency in behavior, they do show characteristic responses to different
situations. For example, two people may be highly aggressive, but in response to different
situations. One may be aggressive when dealing with those who try to dominate them and the
other when someone is weaker than they are. Thus, we cannot ignore situations in conceptualizing
personality, but must deal with the individual's characteristic response to situations.
      Second, they have used an autoassociative, recurrent network to investigate whether stable
patterns of relationships among the "cognitive-affective" units they postulate can give variable
patterns of behavior in response to differing situations. The different kinds of units they use are:
encodings (categories), expectancies and beliefs, affective responses, goals and values, and
competencies and self-regulatory plans. In their typical implementation, a set of feature detectors is
activated by a situation and activation from these feature detectors then flows to the cognitive
affective units. The pattern of activation from the cognitive affective units then activates the
behavior node. In their simulations each individual has a stable pattern of relationships among the
various cognitive-affective units, although the pattern differs across individuals. Thus, one can
view each individual as having a stable "personality."
      They demonstrate that each individual model shows a consistent pattern of relationships
between the situations and the behavior, although the nature of the pattern differs for different
individuals. Thus, each individual has a characteristic set of stable, if-then situation-behavior
relationships. But interestingly, the situation-behavior relationships are not completely stable, the
impact of the same situation may differ depending upon the recent activation history of the
network, or what one may think of as the immediately preceding mental state of the individual.
Finally, they provide a real-world example of the application of the model to health protective
behavior, specifically breast self-examination.
      Shultz and Lepper follow up on some of their earlier work published in Psychological
and use a variant of a Hopfield type network (one type of single layer autoassociative or
recurrent network) to successfully simulate the results of a number of different paradigms in the
dissonance literature (e.g., Insufficient justification via Initiation, Insufficient justification via
Prohibition, Free choice among alternatives). In some cases their simulation better fits the data





than does the original dissonance formulation and in one case their simulation leads to a novel
prediction which they have experimentally verified. Unfortunately, consistent with the fragility of
the work on selective exposure, they were much less successful in capturing the results in this
paradigm. As they note, their ability to simulate the results of most of the major paradigms argues
that such parallel constraint satisfaction models may provide the basis for theoretical unification
within this field.
      There are several particularly interesting aspects of their model. First, they are able to use
ideas derived from Hopfield's (1982, 1984) notion of the energy of a system to provide a
quantitative measure of the overall consonance of the system of beliefs, as well as a measure of the
contribution of each belief to the consonance of the system. This was not possible in previous
conceptualizations of dissonance. Second, they include the importance of each cognition as a
parameter in their model. This allows one to explicitly simulate how dissonance reduction is
affected by the degree of importance and amount of support of individual cognitions. Third, they
represent each cognition by two negatively linked nodes, where each node can be treated as
representing one pole of the cognition. Thus, an attitude toward an activity is represented by the
summed activation of both a positive and negative node. Although the negative link will tend to
insure that only one of the two nodes is activated, in some cases both could be simultaneously
activated, indicating ambivalence.
      In addition to simulating the results of the major paradigms, they also examine how their
model fares with other recent research. For instance, researchers such as Cooper, Zanna, and
Taves (1978) have directly looked at the impact of arousal on attitude change. They have shown
that when students write a counterattitudinal essay under high choice, they show the greatest
dissonance effect when given a stimulant and the smallest effect when given a tranquilizer. Shultz
and Lepper show that their model can simulate the impact of arousal and they include an interesting
speculation about the relationship between the role of activation in their model and the impact of
stimulants and tranquilizers on cortical arousal. They also successfully address the role of the self-
concept in dissonance, including successfully addressing Steele's (1988) work on self-affirmation
      As do several authors in this volume they conclude by making a case for the theoretical
unification that can be provided by constraint satisfaction models. Not only can these kinds of
models handle the dissonance literature, they can also be applied in a variety of other domains. As
they and others have noted, constraint satisfaction models have been employed in a wide variety of
domains: belief revision, explanation, comprehension, schema completion, analogical retrieval and
mapping, content addressable memory storage and retrieval, attitude change, impression
formation, and cognitive balance.
      Ranney and Schank try something a little different. Rather than focus on using a
particular kind of neural network to address a specific problem, they decide to tackle some broad
questions, using their work on the importance of explanatory coherence in thinking. For example,
they take the typical distinction that is often made between scientific and social thinking and ask
how real this distinction really is. Their answer is: not very. Based on their work in both social
reasoning and reasoning about physical systems, they argue that fundamentally, scientific and
social thinking rely on the same mechanisms; in particular, principles of explanatory coherence
play a central role in both domains.
      They also describe some of their work using their program Convince Me. This is a program,
partially based on Thagard's model of Explanatory Coherence (1989, 1992), that can be used to
uncover people's reasoning about a variety of domains. It can uncover the individual beliefs, the
explanatory relations among them, and the coherence or consistency of the set of beliefs.
Moreover, by giving subjects feedback on how consistent their beliefs are, it can also be used to
encourage people to develop more coherent sets of beliefs. Relevant to the earlier point, in their
work with this program, there seems little difference in how people use it to address scientific and
social problems.





      Finally, they decide to address a really big question: How do we decide what are the most
socially significant or important social issues? They use their work with Convince Me and ideas
about coherence to explore the role of explanatory coherence in identifying which problems and
issues are most socially significant.
      Nowak and Vallacher examine how complex social dynamics involving interactions
among people in groups can be modeled by neural networks. They argue that such models can
provide insights into social dynamics and how such dynamics depend on the connections among
people. As part of their discussion, they first introduce another class of models, cellular automata,
that have been used to model social dynamics in such social phenomena as social influence and
attitude change. They then discuss the limitations of these kinds of models, in particular the rigid
nature of social ties, and then note the advantages of neural network models. For example, neural
networks can capture negative social relationships, with which cellular automata have trouble.
Another attraction is their ability to simulate states of equilibrium; the idea that networks may
evolve to certain states but not others.
      As an example, they analyze the implications of one type of attractor network, a Hopfield
type network where each individual is represented by a node and the connections among nodes
represent the relations among them. They also take advantage of the energy function discussed by
Hopfield to capture the notion that such systems can have a number of potential equilibria which
differ in how good they are, and represent different distributions of beliefs.
      They note that one can investigate two kinds of dynamics in these models. First, one can
investigate how relations between individuals, such as liking or influence, affect the development
of attitudes or similar constructs in a social network. This is equivalent to examining how the links
among nodes influence the change of activation of the nodes over time. Second, one can
investigate how the opinions of individuals influences the relationships between them, by using
what we know about learning in such networks. For example, the Hebbian learning rule states that
if two nodes are positively activated at the same time, then the weight between them should
increase, whereas if one node is positive and the other negative, then the weight should decrease.
This is akin to how similarity in opinion between two individuals can affect their degree of liking.
      They also make an interesting set of observations about how the impact of wider societal
factors, beyond individual relationships, can be captured in such models. They note that social
influence is rarely the only source of opinion change. Typically, in society any individual receives
input from a number of other sources, such as media and personal memory. For any particular
individual these can be treated as essentially random influences. In neural network terms this can
be viewed as noise. They note that as noise in such a network increases, up to a certain point, that
the number of equilibrium or stable points decreases. This is akin to shaking the state of the
system out of the shallower hills and valleys, so that it is more likely to enter the deeper valleys.
Thus, the larger the random noise, the greater the likelihood of a small number of ideological
positions. This would seem to suggest that at times of great ferment or activity in society, the
societal opinion is likely to crystallize into a small number of ideological positions. However, they
point out that if the amount of noise becomes too high, then all equilibria disappear; in essence
everyone has their own separate, independent opinion.
      Eiser, Claessen, and Loose are interested in investigating processes of self-organization
in social systems. And like Nowak and Vallacher, they propose using connectionist models to
investigate processes occurring in groups of individuals, rather than just looking at intra-individual
      Eiser et al. look at two different issues and use two different kinds of architectures. First,
they attempt to simulate the development of Cognitive Balance (Heider, 1946) among a group of
people (rather than within a single individual). They use a fully recurrent, feedback network in
which each individual's feeling about an impersonal object is represented by the activation of a
node and the relationship (or amount of liking) between two individuals is represented by the
weight between the two corresponding nodes. Thus, similarly to Read and Miller (1994) they treat





Cognitive Balance as a constraint satisfaction process. Eiser et al. then use this model to study the
extent to which the development of balance is due to changes in relationships among individuals
versus changes in how individuals feel about impersonal objects. They find that, at least in their
particular implementation, changing relationships among individuals is far more important than is
changing feelings about objects. As they note, this kind of simulation can be used to extend our
analysis of such theories as Balance.
      In a second set of simulations, they present a hybrid architecture that combines cellular
automata with feedforward, backpropagation networks. In this model, each individual is
represented by a cell and the internal state of the cell or individual is represented by the feedforward
network. Rules applied to the cells determine how they "talk" to one another. Eiser et al. use this
model to study how a group of individuals may come to an agreement about naming an object in
their environment; that is, it attempts to model communication among individuals in a social
network. As part of their simulation they study various kinds of communication rules that
determine who talks to who and how much. Although the model is interesting and innovative, it
has one flaw. If it is trying to name two or more different objects, it exhibits what the authors call
"Smurfing behavior." That is, all the objects come to receive exactly the same name. So in its
current state the model is unable to capture how a group might come to give different names to
different objects.
      Although social psychologists are just beginning to study the applications of neural network
models to social phenomena, it is clear from the chapters in this book that they have great potential
for addressing fundamental issues in social psychology. In fact, the present authors have already
made significant contributions to our understanding of these issues. We thank the authors for the
strength of their contributions.





      Anderson, J. A. (1995). An Introduction to Neural Networks. Cambridge, MA:
Bradford/MIT Press.
      Asch, S. E. (1946). Forming impressions of personality. Journal of Abnormal and
Social Psychology
, 41, 258-290.
      Bechtel, W., & Abrahamsen, A. (1991). Connectionism and the mind: An introduction to
parallel processing in networks. Cambridge, MA: Basil Blackwell.
      Cheng, P. W., & Novick, L. R. (1990). A probabilistic contrast model of causal
induction. Journal of Personality and Social Psychology, 58, 545-567.
      Cooper, J., Zanna, M. P. & Taves, P. A. (1978). Arousal as a necessary condition for
attitude change following forced compliance. Journal of Personality and Social Psychology,
36, 1101-1106.
      Festinger, L. (1950). Informal social communication. Psychological Review, 57,
      Festinger, L. (1957). A theory of cognitive dissonance. Evanston, IL: Row, Peterson.
      Gluck, M. A., & Bower, G. H. (1988a). From conditioning to category learning: An
adaptive network model. Journal of Experimental Psychology: General, 117, 227-247.
      Gluck, M. A., & Bower, G. H. (1988b). Evaluating an adaptive network model of
human learning. Journal of Memory and language, 27, 166-195.
      Hamilton, D. L., & Gifford, R. K. (1976). Illusory correlation in interpersonal
perception: A cognitive basis of stereotypic judgments. Journal of Experimental Social
, 12, 392-407.
      Heider, F. (1946). Attitudes and cognitive organization. Journal of Psychology, 21,
      Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley.
      Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural
. Redwood City, CA: Addison Wesley.
      Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction.
Cognitive Science, 13, 295-355.
      Holyoak, K. J., & Thagard, P. (1995). Mental leaps: Analogy in creative thought.
Cambridge, MA: MIT Press/Bradford Books.
      Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective
computational abilities. Proceedings of the National Academy of Sciences, USA, 79, 2554-
      Hopfield, J. J. (1984). Neurons with graded responses have collective computational
properties like those of two-state neurons. Proceedings of the National Academy of Sciences,
, 81, 3088-3092.
      Kunda, Z., & Thagard, P. (1996). Forming impressions from stereotypes, traits, and
behaviors: A parallel constraint satisfaction theory. Psychological Review, 103, 284-308.
      Lewin, K. (1935). A dynamic theory of personality. New York: McGraw-Hill.
      Lewin, K. (1947a). Frontiers in group dynamics: I. Human Relations, 1, 2-38.
      Lewin, K. (1947b). Frontiers in group dynamics: II. Human Relations, 1, 143-153.
      McClelland, J. L., & Elman, J. L. (1986). Interactive processes in speech perception:
The TRACE model. In McClelland, J. L., & Rumelhart, D.E. (Eds.) Parallel Distributed
Processing: Explorations in the microstructure of cognition.
Vol. 2: Psychological and
Biological Models. (Pp. 58-121). Cambridge, MA: MIT Press/Bradford Books.
      McClelland, J. L., & Rumelhart, D.E. (1981). An interactive activation model of context
effects in letter perception: part 1. An account of basic findings. Psychological Review, 88,
      McClelland, J. L., & Rumelhart, D.E. (1986). (Eds.). Parallel Distributed Processing:
Explorations in the microstructure of cognition. Vol. 2: Psychological and Biological Models.





Cambridge, MA: MIT Press/Bradford Books.
      Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning.
Psychological Review, 85, 207-238.
      Miller, L. C., & Read, S. J. (1991). On the coherence of mental models of persons and
relationships: A knowledge structure approach. In F. Fincham & G. J. O. Fletcher (Eds.),
Cognition in close relationships. (Pp. 69-99). Hillsdale, NJ: Lawrence Erlbaum Associates,
      Nosofsky, R. M. (1984). Choice, similarity, and the context theory of classification.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 104-114.
      Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization
relationship. Journal of Experiment Psychology: General, 115, 39-57.
      Pearce, J. M. (1994). Similarity and discrimination: A selective review and a
connectionist model. Psychological Review, 101, 587-607.
      Read, S. J., & Marcus-Newhall, A. (1993). Explanatory coherence in social
explanations: A parallel distributed processing account. Journal of Personality and Social
, 65, 429-447.
      Read, S. J., & Miller, L.C. (1993). Rapist or "regular guy": Explanatory coherence in
the construction of mental models of others. Personality and Social Psychology Bulletin, 19,
      Read, S.J., & Miller, L.C. (1995). Stories are fundamental to meaning and memory: For
social creatures, could it be otherwise? In R.S. Wyer, Jr. (Ed.), Knowledge and Memory: The
Real Story, Advances in Social Cognition, Vol. VIII.
(Lead article by R.C. Schank & R.P.
Abelson, pp. 139-152). Hillsdale, NJ: Lawrence Erlbaum Associates.
      Rumelhart, D. E., Smolensky, P., McClelland, J. L., & Hinton, G.E. (1986). Schemata
and sequential thought processes in PDP models. In McClelland, J. L., & Rumelhart, D.E.
(Eds.) Parallel Distributed Processing: Explorations in the microstructure of cognition. Vol. 2:
Psychological and Biological Models
. (Pp. 7-57). Cambridge, MA: MIT Press/Bradford
      Rumelhart, D.E., & McClelland, J. L. (1982). An interactive activation model of context
effects in letter perception: Part 2. The contextual enhancement effect and some tests and
extensions of the model. Psychological Review, 89, 60-94.
      Read, S. J., & Miller, L. C. (1994). Dissonance and balance in belief systems: The
promise of parallel constraint satisfaction processes and connectionist modeling approaches. In
R. C. Schank & E. J. Langer (Eds.), Beliefs, reasoning, and decision making: Psycho-logic in
honor of Bob Abelson
(pp. 209-235). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
      Read, S. J., Vanman, E. J., & Miller, L. C. (1997). Connectionism, parallel constraint
satisfaction processes, and Gestalt principles: (Re)Introducing cognitive dynamics to social
psychology. Personality and Social Psychology Review, 1, 26-53.
      Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing:
Explorations in the microstructure of cognition: Vol. 1. Foundations.
Cambridge, MA: MIT
      Shanks, D. R. (1991). Categorization by a connectionist network. Journal of
Experimental Psychology: Learning, Memory and Cognition
, 17, 433-443.
      Shanks, D. R. (1993). Human instrumental learning: A critical review of data and theory.
British Journal of Psychology, 84, 319-354.
      Smith, E. R. (1996). What do connectionism and social psychology offer each other?
Journal of Personality and Social Psychology, 70, 893-912.
      Smith, E. R., & Zaraté, M. A. (1990). Exemplar and prototype use in social
categorisation. Social Cognition, 8, 243-262.
      Steele, C. M. (1988). The psychology of self-affirmation: Sustaining the integrity of the
self. In L. Berkowitz (Ed.), Advances in Experimental Social Psychology, (Vol. 21, pp. 261-





302.). New York: Academic Press.
      Thagard, P. (1989). Explanatory coherence. Behavioral and Brain Sciences, 12, 435-
      Thagard, P. (1992). Conceptual revolutions. Princeton: Princeton University Press.
      Trope, Y. (1986). Identification and inferential processes in dispositional attribution.
Psychological Review, 93, 239-257.
      Vallée-Tourangeau, F., Baker, A. G., & Mercier, P. (1994). Discounting in causality
and covariation judgments. The Quarterly Journal of Experimental Psychology, 47B, 151-
      Widrow, G., & Hoff, M. E. (1960). Adaptive switching circuits. Institute of Radio
Engineers, Western Electronic Show and Convention, Convention Record, Part 4
, 96-104.

 E-Mail me with questions and comments


back to home



designed by Anna Kostygina