Integrated information theory of consciousness: an updated account
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Archives Italiennes de Biologie, 150: 290-326, 2012. Integrated information theory of consciousness: an updated account G. Tononi Department of Psychiatry, University of Wisconsin, Madison, WI, USA A bst r act This article presents an updated account of integrated information theory of consciousness (IIT) and some of its implications. IIT stems from thought experiments that lead to phenomenological axioms (existence, compositional- ity, information, integration, exclusion) and corresponding ontological postulates. The information axiom asserts that every experience is specific – it is what it is by differing in its particular way from a large repertoire of alternatives. The integration axiom asserts that each experience is unified – it cannot be reduced to independent components. The exclusion axiom asserts that every experience is definite – it is limited to particular things and not others and flows at a particular speed and resolution. IIT formalizes these intuitions with postulates. The infor- mation postulate states that only “differences that make a difference” from the intrinsic perspective of a system matter: a mechanism generates cause-effect information if its present state has selective past causes and selective future effects within a system. The integration postulate states that only information that is irreducible matters: mechanisms generate integrated information only to the extent that the information they generate cannot be par- titioned into that generated within independent components. The exclusion postulate states that only maxima of integrated information matter: a mechanism specifies only one maximally irreducible set of past causes and future effects – a concept. A complex is a set of elements specifying a maximally irreducible constellation of concepts, where the maximum is evaluated over elements and at the optimal spatio-temporal scale. Its concepts specify a maximally integrated conceptual information structure or quale, which is identical with an experience. Finally, changes in information integration upon exposure to the environment reflect a system’s ability to match the causal structure of the world. After introducing an updated definition of information integration and related quantities, the article presents some theoretical considerations about the relationship between information and causation and about the relational structure of concepts within a quale. It also explores the relationship between the temporal grain size of information integration and the dynamic of metastable states in the corticothalamic complex. Finally, it summarizes how IIT accounts for empirical findings about the neural substrate of consciousness, and how vari- ous aspects of phenomenology may in principle be addressed in terms of the geometry of information integration. Key words Brain • Experience • Awareness • Causation • Emergence Phenomenology: Consciousness as thoughts or emotions, about the world or about the integrated information self. It is also common knowledge that our consciousness Everybody knows what consciousness is: it is what depends on certain parts of the brain. For example, vanishes every night when we fall into dreamless the widespread destruction of the cerebral cortex sleep and reappears when we wake up or when we leaves people permanently unconscious (vegeta- dream. Thus, consciousness is synonymous with tive), whereas the complete removal of the cerebel- experience – any experience – of shapes or sounds, lum, even richer in neurons, hardly affects con- Corresponding Author: Giulio Tononi, 6001 Research Park Boulevard, Madison, WI, 53719, USA - E-mail: gtononi@wisc.edu
294 G. Tononi sciousness. Furthermore, it matters how the cerebral mechanism, which can distinguish between a low cortex is functioning. For example, cortical neurons and a high current, detects a high current and, say, remain active throughout sleep, although their firing triggers the output ‘light’ rather than the output patterns may change. Correspondingly, at certain ‘dark’. Since the distinction is between two alter- times during sleep consciousness fades, while at natives, the photodiode generates 1 bit of informa- other times we dream. It is also well established tion. We take that bit of information to specify that different parts of the cortex influence qualita- ‘light’ as opposed to ‘dark’, but it is important to tive aspects of consciousness: damage to certain realize that, from the photodiode’s perspective, the parts of the cortex impairs the experience of color, only specification it can make is whether its input whereas other lesions impair that of visual shapes. were in one of two ways and whether therefore its Neuroscientific findings are making progress in outputs should be in one of two ways – this way or identifying the neural correlates of consciousness not this way. Any further specification is impos- (Koch, 2004). However, to explain why experience sible because it does not have mechanisms for it. is generated in the cortex and not in the cerebellum, Therefore, when the photodiode detects and reports why it fades in certain stages of sleep, why some ‘light’, such light cannot possibly mean what it cortical areas contribute color and others sound, means for us – it does not even mean that it is a and to address difficult issues such as the presence visual attribute. and quality of consciousness in newborn babies, in When a human reports pure light, by contrast, animals, or in pathological conditions, empirical mechanisms in his brain distinguish, in a specific studies are usefully complemented by a theoretical way, among a much larger number of alternatives, approach. Integrated information theory (IIT) con- and are primed accordingly for a large number of stitutes such an approach. What follows is an outline different outcomes, thus generating many bits of of IIT, streamlined and updated with respect to pre- information. This is because ‘light’ is distinguished vious expositions (Tononi, 2004, 2008). not only from ‘dark’, but from a multitude of other possibilities, for example a red screen, a green Three thought experiments screen, this movie frame, that movie frame, a sound, Three thought experiments lie at the heart of IIT a different sound, a thought, another thought, and – the photodiode thought experiment, the camera so on. In other words, each alternative can be dis- thought experiment, and the internet thought experi- tinguished from the others in its own specific way, ment. and can lead to different consequences, including different verbal reports, actions, thoughts, memories The photodiode thought experiment etc. To us, then, ‘light’ is much more meaningful Consider a human and a simple photodiode facing a precisely because we have mechanisms that can blank screen that is alternately on and off. The pho- specifically distinguish this particular state of affairs todiode can tell ‘light’ from ‘dark’ just as well as a we call ‘light’ against each and every one of a large human. However, a human also has an experience number of alternatives, and lead to appropriately of light or dark, whereas the photodiode presumably different consequences. Indeed, as a human, no mat- does not. What is the critical property that humans ter how hard I try, I cannot empty an experience of have and photodiodes lack? meaning: I cannot reduce the experience of ‘light’ to According to IIT, the critical property has to do ‘this and not this’. More generally, if I am not blind with how much information is generated when the from birth, I cannot reduce myself to lacking visual distinction between light and dark is made. From experiences; if I am not color-blind, I cannot reduce the intrinsic perspective of a system – photodiode myself to seeing the world in black-and-white; if I or human – information can best be defined as a know English, I cannot see the word “English” and “difference that makes a difference”1: the more not understand it; if I am an experienced musician, I alternatives (differences) can be distinguished, to cannot reduce myself to listening to a sonata as if I the extent they lead to distinguishable consequenc- were a novice, and so on. es (make a difference), the greater the information. This central point may be appreciated either by addi- When the blank screen turns on, the photodiode’s tion or by subtraction. By addition, I realize that I
Integrated information theory of consciousness: an updated account 295 can only see ‘light’ the way I see it, as progressively By contrast, a human distinguishes among a vast more and more meaning is added by mechanisms repertoire of alternatives as a single, integrated sys- that specify how ‘light’ differs from each of count- tem, one that cannot be broken down into indepen- less alternatives: from various colors, shapes, and dent components each with their own separate rep- countless other visual and non-visual experiences. ertoire. Phenomenologically, every experience is an By subtraction, I can realize that, if I were to lose integrated whole, one that means what it means by one neural mechanism after the other, my being virtue of being one, and which is experienced from conscious of ‘light’ would degrade – it would lose a single point of view. For example, no matter how its non-coloredness, its non-shapedness, it would hard I try, experiencing the full visual field cannot even lose its visualness – while its meaning is pro- be reduced into experiencing separately the left half gressively stripped down to just ‘one of two ways’, and the right half. No matter how hard I try, I cannot as with the photodiode. Either way, the theory says reduce the experience of a red apple into the sepa- that, the more my mechanisms specify how ‘light’ rate experience of its color and its shape. Indeed, differs from its many alternatives, and thereby lead the only way to split an experience into independent to different consequences – the more they specify experiences seems to be splitting the brain in two, as what light means – the more I am conscious of it. in patients who underwent the section of the corpus callosum to treat severe epilepsy (Gazzaniga, 2005). The camera thought experiment Such patients do indeed experience the left half Information – the ability to discriminate among a of the visual field independently of the right side, large number of alternatives – is thus an essential but then the surgery has created two separate con- ingredient for consciousness. However, another sciousnesses instead of one. Therefore, underlying thought experiment, this time involving a digital the unity of experience must be causal interactions camera, shows the need for a second ingredient. among certain elements within the brain. This means Assume the sensor chip of the camera is a collection that these elements work together as an integrated of a million binary photodiodes. Taken together, system, which is why, unlike the camera, their per- then, the camera’s photodiodes can distinguish formance breaks down if they are disconnected. among 21,000,000 alternative states, an immense number, corresponding to 1 million bits of informa- The internet thought experiment tion. Indeed, the camera would respond differently Unlike the camera chip, the internet is obviously to every possible image. Yet few would argue that integrated – in fact, its main purpose is to permit the camera is conscious. What is the critical differ- exchanges of messages between any point of the ence between a human being and a camera? net and any other point. It can also be used to dis- According to IIT, the difference has to do with seminate or ‘broadcast’ messages from any one information integration. From the point of view of node to many others. The integration is achieved by an external observer, the camera may be considered routers that act as dynamic switches connecting any as a single system with a repertoire of 21,000,000 states. address in the network with any other address. And However, the chip is not an integrated entity: since yet it seems unlikely that, at least in its current form, its 1 million photodiodes have no way to interact, the internet is giving rise to some kind of globally each photodiode performs its own local discrimina- integrated consciousness. What could be the critical tion between a low and a high current, completely difference between the network of neurons inside independent of what every other photodiode might the brain that gives rise to human consciousness, and be doing. In reality, the chip is just a collection of 1 the network of internet routers connecting devices million independent photodiodes, each with a reper- throughout the world? toire of 2 inputs and outputs – there is no intrinsic According to IIT, the difference has to do with the point of view associated with the camera chip as a fact that the neural substrate of consciousness is whole. This is easy to see: if the sensor chip were wired to achieve maxima of integrated information, cut into 1 million pieces each holding its individual whereas the internet is not. Consider the internet first. photodiode, the performance of the camera would The internet is not designed to achieve a maximum not change at all. of integrated information, but to ensure point to point
296 G. Tononi communication. Indeed, interactions within the inter- help maintain my posture. And I certainly do not net can typically be reduced to independent compo- have access to whatever is going on in peripheral nents, and they better be independent, otherwise there organs in my body, such as the liver, the kidneys would be a chaotic cross-talk and point-to-point com- and so on. Furthermore, while I can interact with munication would not be possible. In other words, the other people, I have no access to their internal work- ability to obtain independent, point-to-point signaling ings. Exclusion applies also within consciousness: excludes the ability to perform global computations, at any given time, there is only one consciousness and vice versa. Thus, the internet, while integrated – one maximally integrated subject – me – hav- enough to permit point-to-point signaling, is certainly ing one full experience, not a multitude of partial not maximally integrated – not from the intrinsic consciousnesses, each experiencing a subset of the perspective of the internet itself. On the other hand, contents of my experience. Instead, each experience from the perspective of an external user, this has is compositional, i.e. structured – it is constituted great advantages. For example, from a particular of different aspects in various combinations: I see node, say the terminal of an information technologist, the shape of the apple, I see its red color, I see a one can access without any cross-talk a connected position in space, and I also see that the apple is red hand-held device to diagnose exactly what the speech and occupies that position. Exclusion also occurs in recognition module is doing or why it may be mal- spatio-temporal terms: what I experience, I experi- functioning; or how the power regulating circuits are ence at a particular spatial and temporal resolution: I performing; or one can access a connected peripheral, have no way to experience directly processes within say a printer, to diagnose if it is running properly; or my brain – even within the parts that are involved in access anybody else’s computer and check any aspect generating experience – that happen at a much finer of its functioning; and so on for any other connected spatial grain, such as the workings of molecules and device. Moreover, one can check the computations of atoms within neural cells, or at a much finer tem- any connected node at a range of spatial and temporal poral grain, such as the millisecond-by-millisecond scales, from the operations performed by individual traffic of spikes among neurons. Similarly, I cannot transistors at microsecond resolution to daily aver- experience events at a coarser spatial or temporal ages of traffic over a hub. However, the price of such scale: for example, no matter how hard I try, I can- complete access is that the internet is not well suited, not lump together into a single experience an entire at least in its current form, to achieve what one may movie, a waking day, or a lifetime: there is a “right” call ‘global’, autonomous computations. time scale at which consciousness flows – at other By contrast, within consciousness information is time scales, consciousness simply does not exist. maximally integrated: every experience is whole, and the entire set of concepts that make up any par- ticular experience – what makes the experience what Phenomenological axioms, it is and what it is not – are maximally interrelated. ontological postulates, and identities This integration is excellent for a context-dependent understanding of a particular state of affairs, but Based on the intuitions provided by these thought the flip side of maximal information integration is experiments, the main tenets of IIT can be presented exclusion. No matter how hard I try, I cannot become as a set of phenomenological axioms, ontological conscious of what is going on within the modules in postulates, and identities. The central axioms, which my brain that perform language parsing: I hear and are taken to be immediately evident, are as follows: understand an English sentence, but I have no con- scious access to how the relevant part of my brain An initial axiom is simply that consciousness exists. are achieving this computation, although of course Paraphrasing Descartes, “I experience therefore I they must be connected to those other parts that give am”2. rise to my present consciousness. Similarly, I have no conscious access to those other parts of my brain Another axiom concerns compositionality: experi- that are in charge of blood pressure regulation; or ence is structured, consisting of multiple aspects in to the complex computations in the cerebellum that various combinations. Thus, even an experience of
Integrated information theory of consciousness: an updated account 297 pure darkness and silence contains visual and audi- intrinsic, causal notion of information can be assessed tory aspects, spatial aspects such as left center and by examining the cause-effect repertoire (CER) speci- right, and so on. fied by a mechanism in a state – the set of past system states that could have been the causes of its present A central axiom concerns information: experience state and the set of future system states that could is informative or specific – in that it differs in its have been its effects. If a mechanism in a state does particular way from other possible experiences. not specify either selective causes or selective effects Thus, an experience of pure darkness and silence (for example by lacking inputs or outputs), then the is what it is by differing, in its particular way, from mechanism does not generate any cause-effect infor- an immense number of other possible experiences – mation (CEI) within the system. Ontologically, the including the experiences triggered by any frame of information postulate claims that, from the intrinsic any possible movie. perspective of a system, only differences that make a difference within the system exist. Another axiom concerns integration: experience is integrated – in that it cannot be reduced to inde- Another postulate concerns integration: a mecha- pendent components. Thus, experiencing the word nism in a state generates integrated information “SONO” written in the middle of a blank page can- only if it cannot be partitioned into independent not be reduced to an experience of the word “SO” at submechanisms. That is, the information generated the right border of a half-page, plus an experience within a system should be irreducible to the infor- of he word “NO” on the left border of another half- mation generated within independent sub-systems page – the experience is whole. or independent interactions. Integrated information (ϕ) can be captured by measuring to what extent the Yet another axiom is exclusion: experience is exclu- information generated by the whole differs from the sive – in that it has definite borders, temporal, and information generated by its components (minimum spatial grain. Thus, an experience encompasses information partition MIP). Ontologically, the inte- what it does, and nothing more; at any given time gration postulate claims that only irreducible inter- there is only one of its having its full content, it actions exist intrinsically, i.e. in and of themselves. flows at a particular speed, and it has a certain reso- lution such that certain distinctions are possible and Yet another postulate concerns exclusion: a mecha- finer or coarser distinctions are not. nism in a state generates integrated information about only one set of causes and effects – the one To parallel the phenomenological axioms, IIT posits that is maximally irreducible. That is, the mecha- some ontological postulates: nism can specify only one pair of causes and effects. By a principle of causal parsimony, this is the pair An initial postulate is simply that mechanisms in a of causes and effects whose partition would produce state exist. That is, there are operators that, given an the greatest loss of information. This maximally input, produce an output, and at a given time such irreducible set of causes and effects is called a con- operators are in a particular state. cept. Exclusion can be captured by measuring the maximum of integrated information maxϕMIP over all Another postulate concerns compositionality: mech- possible cause-effect repertoires of the mechanism anisms can be structured, forming higher order over the system. Ontologically, the exclusion postu- mechanisms in various combinations. late claims that only maximally irreducible entities exist intrinsically3. A central postulate concerns information: from the intrinsic perspective of a system, a mechanism in a As will be discussed below, the postulates can be state generates information only if it has both selec- applied to subsets of elements within a system (mech- tive causes and selective effects within the system anisms) as well as to systems (sets of concepts). A – that is, the mechanism must constitute “a differ- system of elements that generates cause-effect infor- ence that makes a difference within the system”. This mation (it has concepts), is irreducible (it cannot be
298 G. Tononi split into mutually independent subsystems), and is mechanism and state (the cause repertoire CR), a local maximum of irreducibility (in terms of the and the maximum uncertainty (entropy) distribu- concepts it generates) over a set of elements and over tion PHmax, in which all P outputs are equally likely an optimal spatio-temporal grain of interactions, con- a priori7. Thus, EI(P|s) represents the differences stitutes a complex – a maximally irreducible entity. in the past states of P that that can be detected by In this view, only complexes are entities that exist mechanism S in its present state s. Similarly, D intrinsically, i.e. in and of themselves. between the distribution of F states that would be the effect of ‘fixing’ mechanism S in its present state Finally, IIT posits identities between phenomeno- s (the effect repertoire ER) and the distribution of logical aspects and informational/causal aspects of states of F in which all F inputs are equally likely systems. The central identity is the following: an (FHmax), is the effective information s generates about experience is a maximally integrated conceptual future states of F: information structure. Said otherwise, an experience is a “shape” or maximally irreducible constellation EI (F | s) = D [(F | s), FHmax] of concepts in qualia space (a quale), where qualia space is a space spanned by all possible past and Thus, EI(F|s) represents the differences to the future future states of a complex. In this space, concepts states of F made by mechanism S being in its present are points in the space whose coordinates are the state s. Clearly, EI(P|s) > 0 only if past states of P probabilities of past and future states corresponding make a difference to s, and EI(F|s) 0 only if s makes to maximally irreducible cause-effect repertoires a difference to F. specified by various subsets of elements. Based on the information postulate, a mechanism in a state (s) generates information from the intrinsic In what follows, the postulates of IIT are briefly perspective of a system only if it both detects differ- illustrated by considering a set of mechanisms (a ences in the past states of the system and it makes candidate system of elements). Within the system, a difference to its future states. That is, s generates the postulates are the first applied to mechanisms in a information only if it has both selective causes state, alone or in combination (all subsets), to identify (EI(P|s) > 0) and selective effects (EI(F|s) > 0). The concepts; then the postulates are applied to different minimum of the two, which represents the ‘bottle- systems of elements and the collection of concepts neck’ in the channel between past causes over P and they generate, in order to identify complexes4. future effects over F as mediated by the mechanism S in its present state s, is called cause-effect informa- Information tion (CEI): The information postulate says that information is a difference that makes a difference from the intrinsic CEI(P, F | s) = min [ EI (P | s), EI (F | s) ] perspective of a system. This intrinsic, causal5 notion of information is assessed by considering if the Clearly, CEI > 0 only if the system’s states make present state of a mechanism can specify both past a difference to the mechanism, and the state of the causes and future effects within the system. mechanism makes a difference to the system. Thus an element that monitors the state of the system (say Within a system X, consider a subset of elements S a parity detector), but has no effects on the system, in its present state s6. The information s generates may be relevant from the extrinsic perspective of an about some subset of elements of X in the past (P) is observer, but is irrelevant from the intrinsic perspec- the effective information (EI) between P and s: tive of the system, as it makes no difference to it. If CEI > 0, the cause and effect repertoires together can EI (P | s) = D [(P | s), PHmax] be said to specify a cause-effect repertoire (CER). As an example, consider a mechanism A within an where D indicates the difference between two dis- isolated system ABC (Fig. 1). The wiring diagram tributions, in this case between the distribution of is unfolded into a directed acyclic graph over past, P states that could have caused s given its present present, and future. A’s mechanism is a logical AND
Integrated information theory of consciousness: an updated account 299 Fig. 1. - A cause-effect repertoire (CER) and the cause-effect information it generates (“differences that make a difference”). See text for explanation. gate of elements B and C, turning ON if both B and causes and effects. This integrated (irreducible) C are ON; moreover, if A is ON, it turns OFF B. information is quantified by ϕ (small phi), a measure Thus, A specifies that, starting from the eight possi- of the difference D between the repertoire specified ble past states of elements ABC (maximum entropy by a whole and the product of the repertoires speci- distribution), only two past outputs of ABC can lead fied by its partition into causally independent com- to A’s present state (ON) – those in which B and C ponents. The difference is taken over the partition are both ON (cause repertoire CR), thereby ‘detect- that yields the least difference from the whole (the ing differences’ and generating EI. Moreover, A minimum information partition (MIP)), i.e. ϕMIP8. specifies that, starting from maximum entropy over the inputs to ABC, A’s present state (ON) can only Consider a partition / that splits the interactions lead to four future states of ABC – those in which between P and S into independent interactions B is OFF (effect repertoire ER), thereby ‘making between parts of P and parts of S9, which can be a difference’. Together, CR and ER specify the done by ‘injecting’ noise (Hmax) in the connections cause-effect repertoire CER = (ABC)pa | Apr, (ABC)fu among them. One can then measure the difference D | Apr where the subscripts refer to present, past, and between the unpartitioned cause repertoire CR and future. The cause-effect information (CEI) gener- the partitioned CR. For the partition that minimizes ated by a mechanism over its cause-effect repertoire D, known as minimum information partition (MIP), (CER) is the minimum between EI [(ABC)pa | Apr ] the difference D is called ϕ (small phi). The same and EI [(ABC)fu | Apr ]. holds for the difference D between the unpartitioned and partitioned effect repertoire ER: Integration The integration postulate says that information is ϕMIP (P | s) = D [(P | s), ∏ (P | s / MIP) ]; integrated if it cannot be partitioned into indepen- ϕMIP (F | s) = D [(F | s), ∏ (F | s / MIP) ] dent components. That is, a mechanism in state generates integrated information only if it cannot be Thus, ϕMIP(P|s) is the ‘past’ integrated (irreducible) partitioned into submechanisms with independent information, and ϕMIP(F|s) is the ‘future’ integrated
300 G. Tononi (irreducible) information. Clearly, ϕMIP(P|s) > 0 only on the other side: ϕMIP (P | s) = (ABCD)pa | (ABCD) if the past states of P make a difference to s that can- pr || (AB)pa | (AB)pr x (CD)pa | (CD)pr = 0. Similarly not be reduced to differences made by parts of P on for the effect repertoire, ϕMIP (F | s) = (ABCD) parts of s, and likewise for ϕMIP(F|s) > 0. fu | (ABCD)pr || (AB)fu | (AB)pr x (CD)fu | (CD)pr = Based again on the information postulate, a mecha- 0. Thus, as expected, for this partition ϕMIP = min nism in a state (s) generates integrated information [ϕMIP (P | s), ϕMIP (P | s)] = 0. That is, considering the from the intrinsic perspective of a system only if this ‘whole’ CER specified by (ABCD)pa | (ABCD)pr and information is irreducible both in the past and in the (ABCD)fu | (ABCD)pr adds nothing compared to con- future. That is, s generates integrated information sidering the independent ‘partial’ CER specified by only if it has both irreducible causes (ϕMIP(P|s) > 0) (AB)pa | (AB)pr, (AB)fu | (AB)pr and by (CD)pa | (CD)pr, and irreducible effects (ϕMIP(F|s)>0). The minimum (CD)fu | (CD)pr. In other words, there is no reason to of the two, which represents the ‘bottleneck’ in the maintain that the ‘whole’ CER ABCD exists in and channel between the past P and the future F as medi- of itself, as it makes no difference above and beyond ated by the mechanism S in its present state s, is the two partial CER AB and CD. Thus, searching for called ‘cause-effect’ integrated information: partitions among sets of elements yielding ϕMIP = 0 enforces a principle of causal parsimony. ϕMIP (P, F | s) = min [ϕMIP (P | s), ϕMIP (F | s)] As another example, consider a partition between interactions. The system depicted in Fig. 2b is such As an example, Fig. 2a shows a set of 4 elements that A copies B and B copies A. For the cause-rep- ABCD, where A is reciprocally connected to B and ertoire CR of AB and its partition into independent C is reciprocally connected to D. The wiring dia- interactions of A with B and B with A one has that gram is again unfolded into a directed acyclic graph ϕMIP (P | s) = (AB)pa | (AB)pr || (B)pa | (A)pr x (A)pa | over past, present, and future. Consider now the (B)pr = 0, and similarly for the effect repertoire ER. cause repertoire (ABCD)pa | (ABCD)pr and a partition That is, the CER of AB over AB (written AB/AB) between subsets of elements AB on one side and CD reduces without loss to the independent CER of A/B Fig. 2. - Integrated information generated by an irreducible CER, as established by performing partitions. See text for explanation.
Integrated information theory of consciousness: an updated account 301 and B/A both in the past and in the future. Thus, maximally integrated information only if it has there is no reason to maintain that the CER AB/ both maximally irreducible causes (maxϕMIP(P|s) > AB exists in and of itself, as it makes no difference 0) and maximally irreducible effects (maxϕMIP(F|s) > above and beyond the independent CER of A/B and 0). The minimum of the two, which represents the B/A. Again, searching for partitions among interac- ‘bottleneck’ in the channel between the past P and tions yielding ϕMIP = 0 enforces a principle of causal the future F as mediated by the mechanism S in its parsimony. present state s, is called ‘cause-effect’ maximally By contrast, consider a system in which A is a lin- integrated information: ear threshold unit that receives strong inputs from B and C, which if both ON are sufficient to turn A ϕMIP (P, F | s) = min [maxϕMIP (P | s), maxϕMIP (F | s) ] max ON, and a weak input from D; and in which A has strong outputs to B and C (it turns both ON), and a The cause-effect repertoire of s that has maxϕMIP weak output to D (Fig. 2c). Considering the CR of (P,F|s) within a system X is called a concept. Thus, A/BCD, one has that its partition A/BC x D/[] ([] from the intrinsic perspective of a system, a concept indicates the empty set) yields ϕMIP > 0, and the same is a maximally irreducible set of causes and effects holds for the ER. Thus, this CER is irreducible, (MICE) specified by a mechanism in a state. since there is no way to partition it without losing For example, in Fig. 3 the powerset of CER (or ‘pur- some information – in this case some information views’) of subset A within system ABCD includes, about element D. for the cause repertoires, A/A; A/B; A/C; A/D; A/ AB; A/AC; A/AD; A/BC; A/BD; A/CD; A/ABC; Exclusion A/ABD; A/ACD; A/BCD; A/ABCD. Of these, the The exclusion postulate says that integrated infor- partition A/BC || A/B x []/C = maxϕMIP turns out to mation is about one set of causes and effects only – be maximal (Fig. 3b), higher for example than the those that are maximally irreducible – other causes partition in Fig. 3a (A/BCD || A/BC x []/D). This is and effects are excluded. That is, a mechanism in a because partitioning away element B (or A) loses state can specify only one pair of causes and effects, much more integrated information than any other which, by a principle of causal parsimony, is the partition. A similar result is obtained for the pow- one whose partition would produce the greatest loss erset of partitions of A/ABCD for the effect rep- of information. This maximally irreducible set of ertoires. By the exclusion postulate, only one CER causes and effects (MICE) is called a concept or, for exists – the one made of the maximally irreducible emphasis, a “core concept”. CR and ER – excluding any other CER11. The reason to consider exclusively the CER with For a given subset of elements S in a present state max ϕMIP is as before a principle of causal parsi- s, there are potentially many cause repertoires CR mony – more precisely, a principle of least reduc- depending on the particular subset P one considers ible reason. Consider A being ON in the previous (within system X). Exclusion states that, at a given example: it specifies a cause repertoire, but cannot time, s can have only one CR – which is the one distinguish which particular cause was actually having the maximum value of ϕMIP (maxϕMIP), where responsible for its being ON; and with respect to its the maximum is taken over all possible subsets P effects, it makes no difference which cause turned within the system10. The corresponding CR is called A ON. Since the particular cause does not matter, the core cause of s within X. Similarly, the effect the exclusion postulate enforces causal parsimony, repertoire ER having maxϕMIP over all possible sub- defaulting to the maximally irreducible set of causes sets F within the system is called the core effect of for A being ON. These least ‘dispensable’ and thus s within X. most likely ‘responsible’ causes can be called the Based again on the information postulate, a mecha- ‘core’ causes for A being ON, in the sense that nism in a state (s) generates integrated information their elimination would have made the most differ- from the intrinsic perspective of a system only ence12 13. In turn, the fact that A is ON also specifies if this information is maximally irreducible both a forward repertoire of possible effects, but once in the past and in the future. That is, s generates again A should be held most responsible only for its
302 G. Tononi Fig. 3. - Maximally integrated information generated by a maximally irreducible CER over all possible CER specified by a subset of elements within a system. See text for explanation. maximally irreducible or ‘core’ effects: the effects As an example, consider the system in Fig. 4, whose for which A being ON is least dispensable, meaning wiring diagram is on the left. The middle panel that eliminating A’s output would have made the shows the four concepts generated by the system, most difference14. with their maximally irreducible cause-effect reper- toires and the corresponding maxϕMIP. For the concept Concepts generated by all three elements (ABC, top row) the A concept or ‘core’ concept thus specifies a max- figure also shows the product repertoires generated by the minimum information partitions of its maxi- imally irreducible cause-effect repertoire (CER) mal cause and effect repertoires. implemented by a mechanism in a state. Within a For a given set of elements, it is useful to consider concept, one can distinguish a core cause – the set of concepts as points within a space (concept space) past input states (cause repertoire CR) constituting that has as many axes as the number of possible past maximally irreducible causes of the present state of and future states of the set (Fig. 4, right panel; the the mechanism; and a core effect – the set of future axes are depicted along a circle but should be imag- output states (effect repertoire ER) constituting ined in a high-dimensional space; the points are indi- maximally irreducible effects of its present state. cated as stars). Each concept specifies a maximally For example, an element (or set of elements) imple- irreducible CER, which is a set of probabilities over menting the concept “table”, when ON, specifies all possible past and future states, and these prob- ‘backward’ the maximally irreducible set of inputs abilities specify a particular point in concept space that could have caused its turning ON (e.g. seeing, (more precisely, since probabilities must sum to 1, touching, imagining a table); ‘forward’, it specifies in the subspace given by the corresponding concept the set of outputs that would be the effects of its simplex). The concept ‘exists’ with an ‘intensity’ turning ON (e.g. thinking of sitting at, writing over, given by maxϕMIP, that is, its degree of irreducibility pounding on a table)15. (shown by the size of the star).
Integrated information theory of consciousness: an updated account 303 Fig. 4. - An integrated conceptual information structure. See text for explanation. It is thus possible to evaluate the overall constellation the constellation of concepts generated by a set of of concepts generated by the set of elements in a sin- elements cannot be reduced to the product of the gle concept space, which can be called a conceptual constellations generated by the parts (integration information structure C. Among the relevant features postulate); ii) ensuring that the constellation of con- one can consider are: i) the intensity, i.e. irreducibil- cepts generated by one part of the system have both ity maxϕMIP of existing concepts; ii) the “shape” of the selective causes and selective effects in the other constellation of concepts in concept space; iii) the part (information postulate); iii) choosing the set of dimensionality of the sub-space spanned by all the elements that generates the most irreducible constel- concepts; iv) the scope of the subspace covered by lation of concepts (exclusion postulate). the concepts; v) the scope of the subspace covered by As before, the irreducibility mandated by the inte- the concepts weighted by their intensity16 17. gration postulate can be determined by measuring the difference D between the constellation of con- Complexes cepts generated by the whole, unpartitioned set of elements s, and that generated after its partition P By considering the conceptual information structure into parts: C (“constellation” C) specified in concept space by all the concepts generated by a system (Fig. 4), the ΦP→ (C | s) = D (C |s, C | s/P→); postulates of IIT can be applied not only to find the ΦP← (C | s) = D (C |s, C | s/P←) maximally irreducible CER of a subset of elements (concepts), but also to find sets of elements, called where the arrow next to P indicates a unidirectional complexes, which generate maximally integrated partition, i.e. one that separates causes from effects conceptual information structures. As with concepts, across the parts by injecting noise in the connections so with complexes, this can be done by: i) making going from one part to the other. Applying as before sure, by partitioning the elements of a system, that the information postulate, one has:
304 G. Tononi ΦP (C | s) = min [ ΦP→ (C | s), ΦP← (C | s) ] among them is illustrated in Fig. 5a. Note, for exam- ple, that due to the exclusion postulate, although That is, one first partitions across the inputs (causes) complexes can interact, they cannot overlap. Thus, to one side of the partition (i.e. the outputs or effects when two complexes of high maxΦMIP interact weak- from the other side), then the other way around, and ly, their union does not constitute a third complex, one takes the minimum across the partition. Finally, even though its ΦMIP value may be > 0: once again, as before, one finds the partition for which ΦP (C | there is no need to postulate additional entities, s) reaches its minimum value, ΦMIP (C | s), where because they would make no further difference MIP is the minimum information partition, and ΦMIP beyond what is accounted by the two complexes of stands for integrated conceptual information. Thus, high maxΦMIP plus their weak interactions21. This is a if ΦMIP (C | s) >0, no partition can divide the system direct application of Occam’s razor: “entities should into non-interacting, mutually independent parts. not be multiplied beyond necessity”22. We recognize Moreover, the greater the value of ΦMIP, the more this principle intuitively when we talk to each other: irreducible the constellation of concepts generated most people would assume that there are just two by a particular set of elements18. Finally, according consciousness (complexes of maxΦMIP) that interact to the exclusion postulate, out of many possible con- a little, and not also a third consciousness (complex stellations of concepts generated by overlapping sets of lower ΦMIP) that includes both speakers. In sum- of elements only one exists: the one that is maximal- mary, a complex is an individual, informationally ly irreducible. Thus, one needs to evaluate ΦMIP for integrated entity that is maximally irreducible: i) it all sets of elements s, i.e. s = A, B, C, AB, AC, BC, cannot be partitioned into more integrated parts; ii) ABC19. The set of elements generating the constel- it is not part of a more integrated system; iii) it is lation with the maximum value of ΦMIP (maxΦMIP, or separated through a boundary from everything exter- maximally integrated conceptual information) con- nal to it (it excludes it). In this view, any system of stitutes the main complex within the overall system; elements ‘condenses’ into distinct, non-overlapping the corresponding concept space (simplex) is called complexes that constitute local maxima of integrat- qualia space, and the constellation of concepts it ed conceptual information. generates – the maximally integrated conceptual (information) structure – is called a quale Q20. Optimal spatio-temporal grain For example, an exhaustive analysis of the system The exclusion postulate should be applied not only in Fig. 4 shows that the full set ABC constitutes a over sets of elements, but over different spatial and complex, as no other set of elements yields inte- temporal scales. For any given system, one can grated conceptual structures having a higher value group and average the states of several microele- of ΦMIP. In larger systems, one would first identify ments into states of a smaller number of macro-ele- the main complex and then, recursively, identify ments. Similarly, one can group and average states other complexes among the remaining elements. over several micro-intervals into longer macro-inter- Therefore, a complex can be defined as a set of ele- vals. For each spatio-temporal grain, one calculates ments generating a maximally irreducible constella- CER, concepts (maximally irreducible CER), and tion of concepts (a maximally integrated conceptual complexes (sets of elements generating maximally structure). In essence, then, just like a concept speci- integrated conceptual structures). By the exclusion fies a particular, maximally integrated distribution postulate, a particular set of elements, over a particu- of system states out of possible distributions (a point lar spatio-temporal grain, will yield the max value of in concept space), a complex specifies a particular, ΦMIP, thereby excluding any overlapping subsets and maximally integrated conceptual structure (constel- spatio-temporal grains. lation of points) out of possible conceptual struc- As an example, consider the brain: over which ele- tures in concept space. As indicated by the informa- ments should one consider perturbations and the rep- tion axiom, that constellation differs in its particular ertoire of possible states? A natural choice would be way from other possible constellations. neurons, but other choices, such as neuronal groups A schematic representation of a reduction of a sys- at a coarser scale, or synapses at a finer scale, might tem into complexes plus the residual interactions also be considered, not to mention molecules and
Integrated information theory of consciousness: an updated account 305 Fig. 5. - Complexes: maxima of integrated conceptual information over elements, space, and time. In the left panel, the blue ovals represent several separate complexes, i.e. local maxima of maxΦMIP, each containing a schematic constellation, i.e. an integrated information structure comprising different concepts (stars). Each large blue oval – a main complex corresponding to an individual consciousness generated by a subset of neurons within the brain – is contained within a larger white oval that stands e.g. for the body, a system that does not constitute a complex and is thus not conscious. Inside the body, besides the main complex, are smaller complexes having very low max ΦMIP (only one shown) and presumably many smaller ones that are not represented. The curved lines represent interactions among parts of the body that remain outside individual complexes and thus outside consciousness. The large oval that encompasses both bodies indicates that the two consciousnesses interact within a larger sys- tem that is again not a complex and is thus not conscious. The outer dashed oval stands for the immediate envi- ronment. The right panels indicate that, within a system such as the brain, maxΦMIP will reach a maximum not only within a particular subset of elements but also at a particular spatio-temporal scale. See text for further explanation. atoms. Importantly, under certain circumstances, a of just a few milliseconds. However, consciousness coarser spatial scale (‘macro’-level) may produce appears to flow at a longer time scale, from tens of a complex with higher values of ΦMIP than a finer milliseconds to 2-3 seconds, usually reaching maxi- scale (‘micro’-level), despite the smaller number of mum vividness and distinctness at a few hundred macro- compared to micro-elements. In principle, milliseconds (Fig. 5c). IIT predicts that, despite the then, it should be possible to establish if in the brain larger number of neural ‘micro’-states (spikes/no consciousness is generated by neurons or groups spikes, every few milliseconds), ΦMIP will be higher of neurons. In this case the exclusion postulate at the level of neural ‘macro’-states (burst of spikes/ would also mandate that the spatial scale at which no bursts, averaged over hundreds of milliseconds). ΦMIP is maximal, be it neurons or neuronal groups, This is likely the case because a set of neurons excludes finer or coarser groupings: there cannot be widely distributed over the cerebral cortex can any superposition of (conscious) entities at different interact cooperatively only if there is enough time spatio-temporal scales if they share informational/ to set up transiently stable firing patterns (attractors, causal interactions (Fig. 5b)23. see below) by allowing spikes to percolate forward Similar considerations apply to time. Integrated and backward. Again, the exclusion postulate would information can be measured at many temporal mandate that, whatever the temporal scale that maxi- scales. Neurons can choose to spike or not at a scale mizes ΦMIP, be it spikes or bursts, there cannot be
306 G. Tononi any superposition of (conscious) entities evolving at In principle, then, given the “wiring diagram” and different temporal scales if they share informational/ present state of a given system, IIT offers a way causal interactions24 25. of specifying the maximally integrated conceptual structure it generates (if any)27. According to IIT, that structure completely specifies “what it is like to Identity between maximally be” that particular mechanism in that particular state, integrated conceptual structures whether that is a set of three interconnected logical (qualia) and experiences gates in an OFF state; a complex of neurons within the brain of a bat spotting a fly through its sonar; or a In summary, a particular set of elements at a par- complex of neurons within the brain of a human won- ticular spatio-temporal scale yielding a maximum of dering about free will. In the latter examples, the full integrated conceptual information (maxΦMIP) consti- integrated conceptual structure is going to be extraor- tutes a complex, a ‘locus’ of consciousness. The set dinarily complex and practically out of reach: we are of its concepts – maximally irreducible cause-effect not remotely close to having the full wiring diagram of repertoires (maxϕMIP>0) specified by various subsets the relevant portions of a rodent or human brain; even of elements within the complex – constitute a maxi- if we did, obtaining the precise quale would be com- mally integrated conceptual information structure putationally unfeasible28. Nevertheless, by comparing or quale (Fig. 4) – a shape or constellation of points some overall features of the shapes of qualia generated in qualia (concept) space26. by different systems or by the same system in different Having defined complexes and qualia, IIT posits states, it should be possible to evaluate broad simi- identities between phenomenological and informa- larities and differences between experiences. IIT also tional/causal aspects of systems. The central iden- implies that, if a collection of mechanisms does not tity is the following: an experience is a maximally give rise to a single maximally integrated conceptual integrated conceptual (information) structure or structure, but to separate qualia each reaching a maxi- quale – that is, a maximally irreducible constellation mum of integrated conceptual information, then there of points in qualia space. Tentative corollaries of is nothing it is like to be that collection, whether it is this identity include the following: i) the particular an array of electronic circuits, a heap of sand, a swarm ‘content’ or quality of the experience is the shape of bats, or a crowd of humans. of the maximally integrated conceptual structure in qualia space (the constellation of concepts); ii) a Matching phenomenological distinction is a maximally irre- So far, the maximally integrated conceptual struc- ducible cause-effect distinction (a concept). In other tures generated by a system of elements have been words, unless there is a mechanism that can generate considered in isolation from the environment – as is a maximally irreducible cause-effect repertoire (con- the case for the brain when it dreams. But of course cept) – a distinct point in the quale – there is no cor- it is also essential to consider how integrated con- responding distinction in the experience the subject ceptual structures are affected by the external world, is having; iii) the intensity of each concept is its maxϕ especially since the mechanisms generating them MIP value; iv) the ‘richness’ of an experience is the become what they are through a long evolutionary number of dimensions of the shape; v) the scope of history, developmental changes, and plastic changes the experience is the portion of qualia space spanned due to interactions with the environment. by its concepts; vi) the level of consciousness is the value of maximally integrated conceptual informa- In any situation, a complex of high maxΦMIP has at its tion maxΦMIP; vii) the similarity between concepts is disposal a large number of concepts – maximally their distance in qualia space, given the appropriate irreducible cause-effect repertoires specified within metric; viii) clusters of nearby concepts form modali- a single conceptual structure. These concepts allow ties and submodalities of experience; ix) the similar- the complex to understand the situation and act in it ity between experiences would be given by the simi- in a context-dependent, valuable fashion. It would larity between the corresponding shapes (see also the be helpful to have a measure that assesses how well final section and Tononi, 2008, 2010), and so on. the integrated conceptual structure generated by an
Integrated information theory of consciousness: an updated account 307 adapted complex fits the causal structure of the envi- tion structure generates a good intrinsic model of ronment. One way to do so is to define cause-effect its input. Again, the system can do so in two ways: matching (M) between a system and its environment by modifying its own connections so they generate as the difference between two terms, called Capture a correlation structure similar to that induced by and Mismatch: the environment (the system’s Dream becomes a model of World). In this way ‘memories’ formed Matching = Capture – Mismatch over a long time can help to disambiguate / fill in current inputs and, more generally, to predict many Capture is the minimum average difference aspects of the environment (Tononi and Edelman, between the constellations C when a complex inter- 1997). Another way is to change the environment acts with its environment (C World), compared to by exploring it or modifying it to make inputs match when it is exposed to an uncorrelated, structureless its own values and expectations (World is made to environment (C Noise). conform to the system’s ‘Dream’). In general, the interactions with the environment would have to Capture = min < D [ C |s World, C |s Noise ] > match specific cause repertoires with specific effect repertoires in a way that yields perception-action As before, D specifies a distance metric. Capture is cycles of high adaptive value: in short, the ‘right’ an indication of how well the system samples the cause should lead to the ‘right’ effect statistical structure of its environment (deviations from independence). Thus, high capture means that Note that the balance between the two terms in the the system is highly sensitive to the correlations in expression for matching has two useful consequenc- the environment. The system can do so in two ways: es: maximizing Capture ensures that the system on the input side, by sampling as many correlations does not minimize Mismatch simply by disconnect- as possible from the environment through a large ing from World. Conversely, minimizing Mismatch sensory bandwidth and distributing these correla- ensures that the system does not maximize Capture tions efficiently within the brain through a special- simply by becoming sensitive to the correlations ized connectivity (thereby reflecting to what extent in its input from World without developing a good World deviates from Noise, Tononi et al., 1996). generative model. On the output side, an organism can extract more information by actively exploring its environment Importantly, since within a given system it is likely or modifying it to better pick up correlations, aided that similar states yield similar constellations, a by a rich behavioral repertoire (Tononi et al., 1999). simpler expression for matching can be obtained by Note that the minimum is taken because to match considering differences between the probability dis- system constellations generated with World and tribution of system states S, rather than differences with Noise one should pair them in such a ways as between sets of constellations C: to minimize the overall difference. M = D [S World, S Noise] – D [S World, S Dream] Mismatch is the minimum average difference between the constellations C when a complex inter- (note that, while the above expression is based on acts with its environment (C World), compared to the distribution of system states, in principle the when it is dreaming (C Dream), that is, when it notion of matching can also be applied to the distri- is disconnected from the environment both on the bution of sequences of system states). input and the output sides. In the course of evolution, development, and learn- Mismatch = min < D [ C |s World, C |s Dream ] > ing, one would expect that the mechanisms of a system change in such a way as to increase match- Mismatch is an indication of how well the system ing. Capture should increase because, everything models the statistics of its environment. Thus, low else being equal, an organism that obtains more mismatch means that the system’s causal informa- information about the structure of the environment
You can also read