Spectral properties of the Google matrix of the World Wide Web and other directed networks
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
PHYSICAL REVIEW E 81, 056109 共2010兲 Spectral properties of the Google matrix of the World Wide Web and other directed networks Bertrand Georgeot, Olivier Giraud,* and Dima L. Shepelyansky Laboratoire de Physique Théorique (IRSAMC), Université de Toulouse–UPS, F-31062 Toulouse, France and LPT (IRSAMC), CNRS, F-31062 Toulouse, France 共Received 17 February 2010; published 25 May 2010兲 We study numerically the spectrum and eigenstate properties of the Google matrix of various examples of directed networks such as vocabulary networks of dictionaries and university World Wide Web networks. The spectra have gapless structure in the vicinity of the maximal eigenvalue for Google damping parameter ␣ equal to unity. The vocabulary networks have relatively homogeneous spectral density, while university networks have pronounced spectral structures which change from one university to another, reflecting specific properties of the networks. We also determine specific properties of eigenstates of the Google matrix, including the PageRank. The fidelity of the PageRank is proposed as a characterization of its stability. DOI: 10.1103/PhysRevE.81.056109 PACS number共s兲: 89.75.Hc, 89.20.Hh, 05.40.Fb, 72.15.Rn I. INTRODUCTION matrix A and repeated applications of G on a random vector converges quickly to the PageRank vector, after 50–100 it- The rapid growth of the World Wide Web 共WWW兲 brings erations for ␣ = 0.85 which is the most commonly used value the challenge of information retrieval from this enormous 关2兴. The PageRank vector is real non-negative and can be database which at present contains about 1011 webpages. An ordered by decreasing values p j, giving the relative impor- efficient algorithm for ranking webpages was proposed in 关1兴 tance of the node j. It is known that when ␣ varies, all and is now known as the PageRank algorithm 共PRA兲. This eigenvalues evolve as ␣i where i are the eigenvalues for PRA formed the basis of the Google search engine, which is ␣ = 1 and i = 2 , . . . , N, while the largest eigenvalue 1 = 1, as- used by many internet users in everyday life. The PRA al- sociated with the PageRank, remains unchanged 关2兴. lows us to determine efficiently a vector ranking the nodes of The properties of the PageRank vector for WWW have a network by order of importance. This PageRank vector is been extensively studied by the computer science community obtained as an eigenvector of the Google matrix G built on and many important properties have been established 关4–8兴. the basis of the directed links between WWW nodes 共see, For example, it was shown that p j decreases approximately e.g., 关2兴兲: in an algebraic way p j ⬃ 1 / j with the exponent  ⬇ 0.9 关4兴. It is also known that typically for the Google matrix of WWW at ␣ = 1 there are many eigenvalues very close or G = ␣S + 共1 − ␣兲E/N. 共1兲 equal to = 1 and that even at finite ␣ ⬍ 1 there are degen- eracies of eigenvalues with = ␣ 共see, e.g., 关9兴兲. In spite of the important progress obtained during these Here S is the matrix constructed from the adjacency matrix investigations of PageRank vectors, the spectrum of the Aij of the directed links of the network of size N, with Aij Google matrix G was rarely studied as a whole. Neverthe- = 1 if there is a link from node j to node i and Aij = 0 other- less, it is clear that the structure of the network is directly wise. Namely, Sij = Aij / 兺kAkj if 兺kAkj ⬎ 0, and Sij = 1 / N if all linked to this spectrum. Eigenvectors other than the PageR- elements in the column j of A are zero. The last term in Eq. ank describe the relaxation processes toward the steady state 共1兲 with uniform matrix Eij = 1 describes the probability 1 and also characterize various communities or subsets of the − ␣ of a random surfer propagating along the network to network. Even if models of directed networks of small-world jump randomly to any other node. The matrix G belongs to type 关10兴 have been analyzed, constructed, and investigated, the class of Perron-Frobenius operators which describe the the spectral properties of matrices corresponding to such net- evolution of classical dynamical systems and Markov chains works were not so much studied. Generally for a directed 共the description of general properties of such operators can network the matrix G is nonsymmetric and thus the spectrum be find, e.g., in 关3兴兲. For 0 ⬍ ␣ ⬍ 1 it has a unique maximal of eigenvalues is complex. Recently the spectral study of the eigenvalue at = 1, separated from the others by a gap of size Google matrix for the Albert-Barabasi 共AB兲 model 关11兴 and at least 1 − ␣ 共see, e.g., 关2兴兲. The eigenvector associated to randomized university WWW networks was performed in this maximal eigenvalue is the PageRank vector, which can 关12兴. For the AB model the distribution of links is typical of be viewed as the steady-state distribution for the random scale-free networks 关10兴. The distribution of links for the surfer. Usual WWW networks correspond to very sparse university network is approximately the same and is not af- fected by the randomization procedure. Indeed, the random- ization procedure corresponds to the one proposed in 关13兴 *Present address: Laboratoire de Physique Théorique et Modèles and is performed by taking pairs of links and inverting the Statistiques, UMR 8626 du CNRS, Université Paris-Sud, Orsay, initial vertices, keeping unchanged the number of ingoing France. and outgoing links for each vertex. It was established that the 1539-3755/2010/81共5兲/056109共8兲 056109-1 ©2010 The American Physical Society
GEORGEOT, GIRAUD, AND SHEPELYANSKY PHYSICAL REVIEW E 81, 056109 共2010兲 spectra of the AB model and the randomized university net- A much larger sample of university networks from the works were quite similar. Both have a large gap between the same database was actually used, including universities from largest eigenvalue 1 = 1 and the next one with 兩2兩 ⬇ 0.5 at the US, Australia, and New Zealand, in order to ensure that ␣ = 1. This is in contrast with the known property of WWW the results presented were representative. where 2 is usually very close or equal to unity 关2,9兴. Thus it As opposed to the full spectrum of the Google matrix, the appears that the AB model and the randomized scale-free PageRank can be computed and studied for much larger ma- networks have a very different spectral structure compared to trix sizes. In the studies of Sec. IV, we therefore included real WWW networks. Therefore it is important to study the additional data from the university networks of Oxford in spectral properties of examples of real networks 共without 2006 共www.oxford.ac.uk兲 with 173 733 sites and 2 917 014 randomization兲. links taken from 关14兴, and the network of Notre Dame Uni- In this paper, we thus study the spectra of Google matri- versity from the US taken from the database 关15兴 with ces for the WWW networks of several universities and show 325 729 sites and 1 497 135 links 共without removing any that indeed they display very different properties compared node兲. to random scale-free networks considered in 关12兴. We also In addition, we also investigated several vocabulary net- explore the spectra of a completely different type of real works constructed from dictionaries; the network data were network, built from the vocabulary links in various dictionar- taken from 关16兴. ies. In addition, we analyze the properties of eigenvectors of 共i兲 Roget dictionary 共1022 vertices and 5075 links兲 关17兴. the Google matrix for these networks. A special attention is The 1022 vertices correspond to the categories in the 1879 paid to the PageRank vector and in particular we characterize edition of Roget’s Thesaurus of English Words and Phrases. its sensitivity to ␣ through a new quantity, the PageRank There is a link from category X to category Y if Roget gave fidelity. a reference to Y among the words and phrases of X or if the The paper is organized as follows. In Sec. II we give the two categories are related by their positions in the book. description of the university and vocabulary networks whose 共ii兲 Online Dictionary of Library and Information Science Google matrices we consider. The properties of spectra and 共ODLIS兲, version December 2000 关18兴 共2909 vertices and eigenstates are investigated in Sec. III. The fidelity of Pag- 18419 links兲. A link 共X , Y兲 from term X to term Y is created eRank and its other properties are analyzed in Sec. IV. Sec- if the term Y is used in the definition of term X. tion V explores various models of random networks for 共iii兲 Free On-Line Dictionary of Computing 共FOLDOC兲 which the spectrum can be closer to the one of real networks. 关19兴 共13 356 vertices and 120 238 links兲. A link 共X , Y兲 from The conclusion is given in Sec. VI. term X to term Y is created if the term Y is used in the definition of term X. Distribution of ingoing and outgoing links for the univer- II. DESCRIPTION OF NETWORKS OF UNIVERSITY sity WWW networks is similar to those of much larger WWW AND DICTIONARIES WWW networks discussed in 关4,8,10兴. An example is shown in the Appendix for the network of Liverpool John Moores In order to study the spectra and eigenvectors of Google University, together with data from AB models discussed in matrices of real networks, we numerically explored several 关12兴 共see Fig. 12兲. systems. Our first example consists in the WWW networks of UK universities, taken from the database 关14兴. The vertices are III. PROPERTIES OF SPECTRUM AND EIGENSTATES the HTML pages of the university websites in 2002. The links correspond to hyperlinks in the pages directing to an- To study the spectrum of the networks described in the other webpage. To reduce the size of the matrices in order to previous section, we construct the Google matrix G associ- perform exact diagonalization, only webpages with at least ated to them at ␣ = 1. After that the spectrum i and right one outlink were considered. There are still dangling nodes, eigenstates i of G 共satisfying the relation Gi = ii兲 are despite of this selection, since some sites have outlinks only computed by direct diagonalization using standard linear al- to sites with no outlink. We checked on several examples that gebra 共LAPACK兲 routines. Since G is generally a nonsym- the general properties of the spectra were not affected by this metric matrix for our networks, the eigenvalues i are dis- reduction in size. We present data on the spectra from five tributed in the complex plane, inside the unit disk 兩i兩 ⱕ 1. universities: The spectrum for our eight networks is shown in Fig. 1. 共i兲 University of Wales at Cardiff 共www.uwic.ac.uk兲, with An important property of these spectra is the presence of 2778 sites and 29 281 links. eigenvalues very close to = 1 and moreover we find that 共ii兲 Birmingham City University 共www.uce.ac.uk兲; 10 631 = 1 eigenvalue has significant degeneracy. It is known that sites and 82 574 links. such an exact degeneracy is typical for WWW networks 共see, 共iii兲 Keele University 共Staffordshire兲 共www.keele.ac.uk兲; e.g., 关8,9兴兲, and the absence of spectral gap has been seen in 11 437 sites and 67 761 links. other real networks 关20兴. In addition to this exact degeneracy, 共iv兲 Nottingham Trent University 共www.ntu.ac.uk兲; there are quasidegenerate eigenvalues very close to = 1. It 12 660 sites and 85 826 links. is important to note that these features are absent in the spec- 共v兲 Liverpool John Moores University 共www.livjm.ac.uk兲; tra of random networks studied in 关12兴 based on the AB 13 578 sites and 11 1648 links. model and on the randomization of WWW university 056109-2
SPECTRAL PROPERTIES OF THE GOOGLE MATRIX OF… PHYSICAL REVIEW E 81, 056109 共2010兲 1 A B 1 0.5 0.75 W(γ) 0.5 0 0.25 -0.5 0 -1 300 1 200 ξ C D 0.5 100 0 00 2.5 5 7.5 10 (a) γ -0.5 -1 1 1 E F 0.75 W(γ) 0.5 0.5 0 0.25 0 -0.5 300 -1 200 ξ 1 G H 100 0.5 00 2.5 5 7.5 10 0 (b) γ -0.5 FIG. 2. 共Color online兲 Top: Roget dictionary, ␣ = 1. Top panel: -1 normalized density of states W 共red/gray兲 obtained as a derivative of -1 0 1 -1 0 1 a smoothed version of the integrated density 共smoothed over a small interval ⌬␥ varying with matrix size兲, integrated density is shown in FIG. 1. Distribution of eigenvalues i of Google matrices in the black. Bottom panel: PAR of eigenvectors as a function of ␥; de- complex plane at ␣ = 1 for dictionary networks: Roget generacy of = 1 is 18 共note that the value W共0兲 corresponds to 共A , N = 1022兲, ODLIS 共B , N = 2909兲, and FOLDOC 共C , N = 13 356兲; eigenvalues with 兩兩 = 1兲. Bottom: ODLIS dictionary, same as top; university WWW networks: University of Wales 共Cardiff兲 degeneracy of = 1 is 4. 共D , N = 2778兲, Birmingham City University 共E , N = 10 631兲, Keele University 共Staffordshire兲 共F , N = 11 437兲, Nottingham Trent Uni- versity 共G , N = 12 660兲, Liverpool John Moores University 共H , N ization of eigenvectors i共j兲, we use the participation ratio = 13 578兲共data for universities are for 2002兲. 共PAR兲 defined by = 共兺 j兩i共j兲兩2兲2 / 兺 j兩i共j兲兩4. This quantity gives the effective number of vertices of the network con- networks, where the spectrum is characterized by a large tributing to a given eigenstate i; it is often used in solid- gap between the first eigenvalue 1 = 1 and the second one state systems with disorder to characterize localization prop- with 兩2兩 ⬇ 0.5. For example, the spectrum shown in erties 共see, e.g., 关21兴兲. The dependence of the density of Fig. 1 panel H corresponds to the same university whose randomized spectrum was displayed in Fig. 1 共bottom panel兲 states W共␥兲 in ␥, which gives the number of eigenstates in in 关12兴. Clearly the structure of the spectrum becomes very the interval 关␥ , ␥ + d␥兴, is shown in Figs. 2–5 共top panels兲. different after randomization of links. Another property of The normalization is chosen such that 兰⬁0 W共␥兲d␥ = 1, the spectra displayed in Fig. 1 that we want to stress is the corresponding to the total number of eigenvalues N presence of clearly pronounced structures which are different 共equal to the matrix size兲. We also show the integrated ver- from one network to another. The structure is less pro- sion of this quantity in the same panels. In the same figures nounced in the case of the three spectra obtained from dic- we show the PAR of the eigenstates as a function of ␥ tionary networks. In this case, the spectrum is flattened, be- 共bottom panels兲. ing closer to the real axis. In contrast, for the WWW It is clear that for the dictionary networks the density of university networks, the spectrum is spread out over the unit states W depends on ␥ in a relatively smooth way, with a disk. However, there is still a significant fraction of broad maximum at ␥ ⬇ 1 – 2. The distribution of PAR has eigenvalues close to the real axis. We understand this feature also a maximum at approximately the same values. The case by the existence of a significant number of symmetric ingo- of the dictionary FOLDOC is a bit special, showing a bimo- ing and outgoing links 共48% in the case of the Liverpool dal distribution which is also clearly seen in the dependence John Moores University network兲, which is larger compared of on ␥. This comes from the fact that the distribution of to the case of randomized university networks considered in eigenvalues in Fig. 1 共panel C兲 is highly asymmetric with 关12兴. respect to the imaginary axis. The latter case has also no To characterize the spectrum, we introduce the relaxation degeneracy at = 1. In these three networks the density of rate ␥ defined by the relation 兩兩 = exp共−␥ / 2兲. For character- states decreases for ␥ approaching 0. We note that the 056109-3
GEORGEOT, GIRAUD, AND SHEPELYANSKY PHYSICAL REVIEW E 81, 056109 共2010兲 1 1 0.75 0.75 W(γ) W(γ) 0.5 0.5 0.25 0.25 0 0 4000 80 ξ ξ 2000 40 00 2.5 5 7.5 10 00 γ 5 10 (a) (a) γ 1 1 0.75 0.75 W(γ) W(γ) 0.5 0.5 0.25 0.25 0 0 50 300 ξ 200 ξ 25 100 00 2.5 5 7.5 10 00 γ 2.5 5 7.5 10 (b) (b) γ FIG. 3. 共Color online兲 Top: FOLDOC dictionary, same as Fig. 2; FIG. 5. 共Color online兲 Top: Nottingham Trent University, same degeneracy of = 1 is 1; Bottom: University of Wales 共Cardiff兲, as Fig. 2; degeneracy of = 1 is 229. Bottom: Liverpool John same as top; degeneracy of = 1 is 69. Moores University, same as top; degeneracy of = 1 is 109; other degeneracy peaks correspond to = 1 / 2 共16兲, = 1 / 3 共8兲; = 1 / 4 integrated version of the density of states reaches a plateau 共947兲, = 1 / 5 共97兲, being located at ␥ = −2 ln ; other degeneracies for ␥ ⱖ 6 – 7. This saturation value is less than 1, meaning are also present, e.g., = 1 / 冑2 共41兲. that a certain nonzero fraction of eigenvalues are extremely close to = 0. For the WWW university networks, the density of states is 1 much more inhomogeneous in ␥. Even if a broad maximum 0.75 is visible, there are sharp peaks at certain values of ␥. The W(γ) sharpest peaks correspond to exact degeneracies at certain 0.5 complex values of . The degeneracies are especially visible 0.25 at the real values = 1 / 2, = 1 / 3, and other 1 / n with integer 0 values of n. We attribute this phenomenon to the fact that the 300 small number of links gives only a small number of different 200 ξ values for the matrix elements of the matrix G. For the uni- 100 versity networks, the degeneracy at = 1 is much larger than 00 in the case of dictionaries. The integrated densities of states 5 10 (a) γ show visible vertical jumps which correspond to the degen- eracies; their growth saturates at ␥ ⬇ 7 showing that about 1 30– 50 % of the eigenvalues are located in the vicinity 0.75 of = 0. W(γ) 0.5 The PAR distribution for the university networks fluctu- ates strongly even if a broad maximum is visible. Typical 0.25 values have ⬇ 100, which is small compared to the matrix 0 size N ⬃ 104. This indicates that the majority of eigenstates 150 are localized on certain zones of the network. This does not 100 exclude that certain eigenstates with a larger will be delo- ξ 50 calized on a large fraction of the network in the limit of very 00 large N. 2.5 5 7.5 10 (b) γ The exact G matrix diagonalization requires significant computer memory and is practically restricted to matrix size FIG. 4. 共Color online兲 Top: Birmingham City University, same N of about N ⬍ 30 000. However, real networks such as as Fig. 2; degeneracy of = 1 is 71; Bottom: Keele University WWW networks can be much larger. It is therefore important 共Staffordshire兲, same as top; degeneracy of = 1 is 205. to find numerical approaches in order to obtain the spectrum 056109-4
SPECTRAL PROPERTIES OF THE GOOGLE MATRIX OF… PHYSICAL REVIEW E 81, 056109 共2010兲 0.8 0.6 -1 0.4 0.2 -2 log10(pi) 0 -3 -0.2 -0.4 -4 -0.6 -0.8 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -5 FIG. 6. 共Color online兲 Cloud of eigenvalues for Liverpool John -6 Moores University, ␣ = 1. Circles: full matrix N = 13 578. Stars: 0 1 2 3 4 5 truncated matrix of size 8192 共left兲 and 4096 共right兲. log10(i) (a) of large networks using approximate methods. A natural pos- -1 sibility is to order the sites through the PageRank method -2 and to consider the spectrum of the 共properly renormalized兲 -3 log10(pi) truncated matrix restricted to the sites with PageRank larger -4 than a certain value. In this way, the truncation takes into account the most important sites of the network. The effect -5 of such a truncation is shown in Fig. 6 for the largest net- -6 work of our sample. The numerical data show that the global -7 features of the spectrum are preserved by moderate trunca- 0 1 2 3 4 5 tion, but individual eigenvalues deviate from their exact val- log10(i) (b) ues when more than 50% of sites are truncated. Probably future developments of this approach are needed in order to FIG. 8. 共Color online兲 Some PageRank vectors p j for Notre- be able to truncate a larger fraction of sites. Dame University 共top panel兲 and Oxford 共bottom panel兲. From top to bottom at log10共i兲 = 5: ␣ = 0.49 共black兲, 0.59 共red兲, 0.69 共green兲, 0.79 共blue兲, 0.89 共violet兲, and 0.99 共orange兲. Dashed line indicates IV. FIDELITY OF PAGERANK AND ITS OTHER the slope −1. PROPERTIES In the previous section we studied the properties of the know how sensitive the PageRank is with respect to the full spectrum and all eigenstates of the G matrix for several Google parameter 共damping parameter兲 ␣ in Eq. 共1兲. The real networks. The PageRank is especially important since it localization property of the PageRank can be quantified allows to obtain an efficient classification of the sites of the through the PAR defined above. The dependence of on ␣ network 关1,2兴. Since the networks usually have small number is shown in Fig. 7 for four university WWW networks, in- of links, it is possible to obtain the PageRank by vector it- cluding two from Fig. 1 共panels D and H兲 and two of much eration for enormously large size of networks as described in larger sizes 共Notre Dame and Oxford兲. For ␣ → 0 the PAR 关1,2兴. goes to the matrix size since the G matrix is dominated by Due to this significance of the PageRank, it is important the second part of Eq. 共1兲. However, in the interval 0.4⬍ ␣ to characterize its properties. In addition, it is important to ⬍ 0.9 the dependence on ␣ is rather weak, indicating stability of the PageRank. For 0.9⬍ ␣ ⬍ 1 the PAR value has a local maximum where its value can be increased by a factor of 40 2–3. We attribute this effect to the existence of an exact 40 30 20 degeneracy of the eigenvalue = 1 at ␣ = 1, discussed in the ξ 30 10 0 previous section. In spite of this interesting behavior of in 0.95 0.975 α 1 the vicinity of ␣ = 1, the value of , which gives the effective 20 number of populated sites, remains much smaller than the ξ network size. In other models considered in 关12,22兴, a delo- 10 calization of the PageRank was observed for some ␣ values so that was growing with system size N. For the WWW university networks considered here, delocalization is clearly 0 absent 共network sizes in Fig. 7 vary over two order of mag- 0.2 0.4 0.6 0.8 1 α nitudes兲. This is in agreement with the value of the exponent FIG. 7. 共Color online兲 PAR of PageRank as a function of ␣ for  ⬇ 0.9 for the PageRank decay, which was found for large University of Wales 共Cardiff兲 共black/dashed兲, Notre-Dame 共blue, samples of WWW data in 关4,8兴. Indeed, for that value of , dotted兲, Liverpool John Moores University 共red/long dashed兲, and PAR should be independent of system size. Oxford 共green/solid兲 Universities 共curves from top to bottom at ␣ Our data for PageRank distribution also show its stability = 0.6兲. Network sizes vary from N = 2778 to N = 325 729. Inset is a as a whole for variation in ␣ in the interval 0.4⬍ ␣ ⬍ 0.9, as zoom for data from Oxford University close to ␣ = 1. it is shown in Fig. 8. 056109-5
GEORGEOT, GIRAUD, AND SHEPELYANSKY PHYSICAL REVIEW E 81, 056109 共2010兲 1 0.4 2 0.9 || 0.8 0.2 0.7 0 0.6 0.5 -0.2 0.4 -0.4 0.30 0.2 0.4 0.8 1 0.6 -0.2 0 0.2 0.4 0.6 0.8 1 (a) α 1.25 FIG. 10. Spectrum of eigenvalues in the complex plane for the 1 Avrachenkov-Lebedev model 关6兴, with N = 211 共network size兲, 1 ␣ = 0.85, m = 5 outgoing links per node. Multiplicity of links is taken 0.75 into account in the construction of G. 0.75 α' 0.5 0.5 0.25 of  close to the one of real networks. For example, the 0.25 model studied by Avrachenkov and Lebedev 关6兴 allows us to 0.25 0.5 0.75 1 obtain analytical expressions for  with a value close to the (b) α 0 one obtained for WWW. It is interesting to see what are the spectral properties of this model. In Fig. 10 we show the FIG. 9. 共Color online兲 PageRank fidelity f共␣ , ␣⬘兲 for Notre- Dame University 共N = 325 729兲; top panel: f共␣ , ␣⬘ = 0.85兲 spectrum for this model for ␣ = 0.85. Our data show that this = 兩具共␣兲 兩 共0.85兲典兩2 关see Eq. 共2兲兴; bottom panel: color density plot of model has an enormous gap, thus being very different from f共␣ , ␣⬘兲. spectra of real networks shown in Fig. 1. The above results, together with those of 关12兴, show that many commonly used network models are characterized by a The sensitivity of the PageRank with respect to ␣ can be large gap between = 1 and the second eigenvalue, in con- more precisely characterized through the PageRank fidelity trast with real networks. In order to build a network model defined as where this gap is absent, we introduce here what we call the color model. It is an extension of the AB model that allows f共␣, ␣⬘兲 = 冏兺 j 冏 1共j, ␣兲1共j, ␣⬘兲 , 2 共2兲 to obtain results for the spectral distribution that are closer to real networks. We divide the nodes into n sets 共“colors”兲, allowing n to grow with network size. Each node is labeled by an integer between 0 and n − 1. At each step, links and where 1共j , ␣兲 is the eigenstate at = 1 of the Google matrix nodes are added as in the AB model but also with probability G with parameter ␣ in Eq. 共1兲; here the sum over j runs over the new node is introduced with a new color. The only the network sites 共without PageRank reordering兲. We remind links authorized between nodes are links within each set. that the eigenvector 1共j , ␣兲 is normalized by 兺 j1共j , ␣兲2 Such a structure implies that the second eigenvalue of matrix = 1. Fidelity is often used in the context of quantum chaos and quantum computing to characterize the sensitivity of wave functions with respect to a perturbation 关23,24兴. The variation in this quantity with ␣ and ␣⬘ is shown in Fig. 9. 0.4 The fidelity reaches its maximum value f = 1 for ␣ = ␣⬘. Ac- cording to Fig. 9 共bottom panel兲, the stability plateau where 0.2 fidelity remains close to 1, indicating stability of PageRank, 0 is broadest for ␣ = 0.5. This is in agreement with previous results presented in 关25兴, where the same optimal value of ␣ -0.2 was found based on different arguments. -0.4 V. SPECTRUM OF MODEL SYSTEMS -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 FIG. 11. Spectrum of eigenvalues in the complex plane for the The results obtained in 关12兴 compared to those presented color model, N = 213, p = 0.2, q = 0.1, and ␣ = 0.85. Nodes are divided in the previous section show that while the spectrum of the into n color sets labeled from i = 0 to n − 1; nodes and links are network has a large gap between = 1 and the other eigen- created according to AB model; only authorized links are links values, still certain properties of the PageRank can be similar within a color set i. This rule is broken with probability ⑀ = 10−3. We in both cases 共e.g., the exponent 兲. In fact the studies per- start with three color sets; with probability a new color is intro- formed in the computer science community were often based duced 共we take = 10−2兲. In the example displayed, when the num- on simplified models, which can nevertheless give the value ber of nodes reaches N, n = 83 colors. 056109-6
SPECTRAL PROPERTIES OF THE GOOGLE MATRIX OF… PHYSICAL REVIEW E 81, 056109 共2010兲 G is real and exactly equal to ␣ 关26兴. The colors correspond presented here is a first step in this direction. We note that to communities in the network, in the sense that each color Ulam networks built from dynamical maps can capture cer- represents a set of nodes with links only within the set. tain properties of real networks in a relatively good manner In order to have a more realistic model, we allow for the 关22,28兴. In these latter networks, it is possible to have a rule for links to be broken with some probability . That is, delocalization of the PageRank when ␣ or map parameters at each time step an link between two nodes is chosen at vary; we did not observe such feature here. random according to the rules of the AB model. Then if it Indeed, our data show that the PageRank remains local- agrees with the color rule above it is used; if it does not then ized for all values of ␣ ⬎ 0.3. We also showed that the use of with probability 1 − it is just omitted, and with probability the fidelity as a quantity to characterize the stability of Pag- it is nevertheless added. eRank enables to identify a stability plateau located around The spectrum of this color model is shown in Fig. 11 for ␣ = 0.5. ␣ = 0.85. The second eigenvalue is now exactly at = 0.85, We think that future investigation of the spectral proper- demonstrating the absence of a gap. There is also a set of ties of the Google matrix will open new access to identifica- eigenvalues which is located on the real line, but the majority tion of important communities and their properties which can of states remains inside a circle 兩兩 ⬍ 0.3 as in the AB model. be hidden in the tail of the PageRank and hardly accessible Thus the color model allows to eliminate the gap but still to classification by the PageRank algorithm. Furthermore, the distribution of eigenvalues in the complex plane re- the degeneracies at various values of and the characteristic mains different from the spectra of real networks shown in patterns directly visible in the spectra of the Google matrix should allow to identify other hidden properties of real net- Fig. 1: the structures prominent in real networks are not vis- works. ible, and eigenvalues in the gap are concentrated only on the real axis or close to it. ACKNOWLEDGMENTS VI. CONCLUSION We thank Leonardo Ermann and Klaus Frahm for discus- In this work we performed numerical analysis of the spec- sions and CALcul en MIdi-Pyrénées 共CALMIP兲 for the use tra and eigenstates of the Google matrix G for several real of their supercomputers. networks. The spectra of the analyzed networks have no gap between first and second eigenvalues, in contrast with com- monly used scale-free network models 共e.g., AB model兲. The spectra of university WWW networks are characterized by APPENDIX complex structures which are different from one university to another. At the same time, PageRank of these university net- Here we show in Fig. 12 the distributions of links for the works look rather similar. In contrast, the Google matrices of AB model discussed in 关12兴 and for the university WWW vocabulary networks of dictionaries have spectra with much network 共panel H of Fig. 1兲. less structure. These studies show that usual models of random scale- free networks miss many important features of real networks. 0 In particular, they are characterized by a large spectral gap, -1 log Pc (k) which is generally absent in real networks. The absence of -2 in this gap has been linked to the presence of community struc- -3 -4 tures in the network 关8,9,20,26兴. In the language of dynami- -5 cal systems, the physical origin of this gap can be related to 0 the known property of small-world and scale-free networks log Pc (k) -1 that only logarithmic time 共in system size兲 is needed to go -2 out -3 from any node to any other node 共see, e.g., 关10兴兲. Due to that, -4 the relaxation process in such networks is fast and the gap, -5 -6 being inversely proportional to this time, is accordingly very 0 1 2 3 log k large. In contrast, the presence of weakly coupled communi- ties in real networks makes the relaxation time very large, at FIG. 12. 共Color online兲 Cumulative distribution of ingoing links least for certain configurations. An interesting future investi- 共k兲 共top panel兲 and of outgoing links Pout Pin c c 共k兲 共bottom panel兲 for gation would be to use algorithms for community detection the AB model with vector size N = 214, for q = 0.1 共black/solid兲 and 共such as the ones reviewed in 关27兴兲, in order to assess the q = 0.7 共red/dashed兲, data are averaged over 80 realizations of AB precise link between the distributions of eigenvalues of the model, and for the network of Liverpool John Moores University Google matrix and the community structures for the net- with N = 13578, 共panel H in Fig. 1兲 共blue/dotted兲. Average number works considered. It is also desirable to construct new ran- of ingoing or outgoing links is 具k典 = 6.43 for q = 0.1, 具k典 = 14.98 for dom scale-free models which could capture in a better way q = 0.7, 具k典 = 8.2227 for LJMU. Dashed straight line indicates the the actual properties of real networks. The color model slope −1. Logarithms are decimal. 056109-7
GEORGEOT, GIRAUD, AND SHEPELYANSKY PHYSICAL REVIEW E 81, 056109 共2010兲 关1兴 S. Brin and L. Page, Comput. Netw. ISDN Syst. 30, 107 关11兴 R. Albert and A.-L. Barabási, Phys. Rev. Lett. 85, 5234 共1998兲. 共2000兲. 关2兴 A. M. Langville and C. D. Meyer, Google’s PageRank and 关12兴 O. Giraud, B. Georgeot, and D. L. Shepelyansky, Phys. Rev. E Beyond: The Science of Search Engine Rankings 共Princeton 80, 026107 共2009兲. University Press, Princeton, 2006兲; D. Austin, AMS Feature 关13兴 S. Maslov and K. Sneppen, Science 296, 910 共2002兲. Columns 共2008兲 available at http://www.ams.org/ 关14兴 Academic Web Link Database Project http:// featurecolumn/archive/pagerank.html cybermetrics.wlv.ac.uk/database/ 关3兴 M. Brin and G. Stuck, Introduction to Dynamical Systems 关15兴 A.-L. Barabási webpage at the University of Notre Dame 共Cambridge University Press, Cambridge, England, 2002兲. http://www.nd.edu/networks/resources.htm 关4兴 D. Donato, L. Laura, S. Leonardi, and S. Millozzi, Eur. Phys. 关16兴 V. Batagelj and A. Mrvar, 共2006兲: Pajek datasets. URL: http:// J. B 38, 239 共2004兲; G. Pandurangan, P. Raghavan, and E. vlado.fmf.uni-lj.si/pub/networks/data/ Upfal, Internet Math. 3, 1 共2006兲. 关5兴 P. Boldi, M. Santini, and S. Vigna, in Proceedings of the 14th 关17兴 Peter Mark Roget: Roget’s Thesaurus of English Words and International Conference on World Wide Web, edited by A. Phrases, http://www.gutenberg.org/etext/22 Ellis and T. Hagino 共ACM Press, New York, 2005兲, p. 557; S. 关18兴 J. M. Reitz 共2002兲: ODLIS: Online Dictionary of Library and Vigna, in Proceedings of the 14th International Conference on Information Science, http://vax.wcsu.edu/library/odlis.html World Wide Web, edited by A. Ellis and T. Hagino 共ACM 关19兴 FOLDOC 共2002兲: Free on-line dictionary of computing, edited Press, New York, 2005兲 p. 976. by D. Howe, http://foldoc.org/ 关6兴 K. Avrachenkov and D. Lebedev, Internet Math. 3, 207 关20兴 E. Estrada, Europhys. Lett. 73, 649 共2006兲; Phys. Rev. E 75, 共2007兲. 016103 共2007兲. 关7兴 K. Avrachenkov, N. Litvak, and K. S. Pham, in Algorithms and 关21兴 F. Evers and A. D. Mirlin, Rev. Mod. Phys. 80, 1355 共2008兲. Models for the Web-Graph: Fifth International Workshop, 关22兴 D. L. Shepelyansky and O. V. Zhirov, Phys. Rev. E 81, WAW 2007 San Diego, CA, Proceedings, Lecture Notes in 036213 共2010兲. Computer Science Vol. 4863, edited by A. Bonato and F. R. K. 关23兴 T. Gorin, T. Prosen, T. H. Seligman, and M. Znidaric, Phys. Chung 共Springer-Verlag, Berlin, 2007兲, p. 16. Rep. 435, 33 共2006兲. 关8兴 Algorithms and Models for the Web-Graph: Sixth International 关24兴 K. M. Frahm, R. Fleckinger, and D. L. Shepelyansky, Eur. Workshop, WAW 2009 Barcelona, Proceedings, Springer- Phys. J. D 29, 139 共2004兲. Verlag, Berlin, Lecture Notes in Computer Science Vol. 5427, 关25兴 K. Avrachenkov, N. Litvak, and K. S. Pham, Internet Math. 5, edited by K. Avrachenkov, D. Donato, and N. Litvak 47 共2008兲. 共Springer, Berlin, 2009兲. 关26兴 T. Haveliwala and S. Kamvar, Stanford University Technical 关9兴 S. Serra-Capizzano, SIAM J. Matrix Anal. Appl. 27, 305 Report No. 582 共2003兲. 共2005兲. 关27兴 S. Fortunato, Phys. Rep. 486, 75 共2010兲. 关10兴 S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks 关28兴 L. Ermann and D. L. Shepelyansky, Phys. Rev. E 81, 036221 共Oxford University Press, Oxford, 2003兲. 共2010兲. 056109-8
You can also read