Network map of Knowledge and Art

Network map of Knowledge and Art

Finally, after weeks, I have the time and the energy to post complex content. I wrote the essay below for a online course on network analysis. The overall experience was great as I discovered a world of possibilities for the discipline; any type of connection: power , conspiracy, social or knowledge network can be analyzed with the same underlining theory.

Abstract

I wish to propose a network model to map the knowledge and ideas of the people contained in Wikipedia. The methodology of the creation of the dataset is generic and can be re-applied to any category of Wikipedia. The algorithms used were successful in identifying the clusters and to provide some insights on the dynamics of knowledge. The analysis is performed by utilizing different metrics such as modularity, weighted degrees and eccentricity. A small world test according to the Watts and Strogatz model is performed as well. You can find a printable and zoomable version of the full map here or the high res image here.

Dataset and methodology

I obtained the network information by performing a set of queries on dbpedia, a structured repository of the Wikipedia project. The database allows everyone to perform complex interrogations using the SPARQL code. The code is reported below.

SELECT *
WHERE {
?p a
<http://dbpedia.org/ontology/Person> .
?p <http://dbpedia.org/ontology/influenced> ?influenced.
}

SELECT *
WHERE {
?p a
<http://dbpedia.org/ontology/Person> .
?p <http://dbpedia.org/ontology/influencedBy> ?influencedBy.
}

The queries list all the people contained in dbpedia with non-null values in the field “influenced” or “influencedBy”. After some manipulation in Excel to concatenate the results of the two queries and to fix some UNICODE issues, the table looks like the one reported in Table below listing the name of the influenced (target) and influencer (source). All the rest of the analysis is performed in Gephi.

Influencer and influenced in Wikipedia People category

Influencer and influenced in Wikipedia People category

Nodes represent people (writers, artists, actors, etc); edges are created whenever the Wikipedia infobox contains the name of the other node. For example Jean-Jacques Rousseau influenced Kant and was influenced by Cicero.

Example on how relationships are constructed in my simple model.

Example on how relationships are constructed in my simple model.

The query is designed to be as broad as possible, limited only by the number of records in the database that had exercised or experienced some influence over others and spans across time, space and domain. The analysis is limited by the data quality contained in Wikipedia which is strongly biased towards the western culture (ref. ATTACHMENT 3 – Geotagged articles in Wikipedia). I will discuss the other limitations further in the text.

Gallery of selected clusters

Analysis

The network is a direct graph counting 13 814 nodes and 23 487 edges. Due to limitation of the computational power I filtered out the nodes with outdegree lower than 2. This yields a significantly more agile graph consisting in 2986 nodes (21%) and edges 7643 (32%).

I used the Force Atlas 2 sorting algorithm and a few adjustment performed by the Noverlap one. For the first part of the analysis the size of the nodes is proportional to the out degree, colors are according to the Modularity class.

Communities identification – Modularity

Modularity measures are important to identify communities. The choice of Modularity class as partitioning criteria seems to be adequate; writers are mainly represented by light green, philosophers are red, artists are in pink and modern writers are blue. The Force Atlas 2 algorithm did a satisfactory job as well. The nodes are as a matter of fact arranged in a meaningful manner with the Philosophers at the bottom, the most celebrated writers in the middle and the modern ones at the top, suggesting ideas had moved according to a specific path.

The interface area between philosophers and writers reported below. is of particular interest as it rightly captures the transition between the two domains as well known philosophical authors are correctly placed.

Interfaces between Philosophers and Writers

Who is the most influencial? – Weighed degree

At first, the choice of out degree as sizing factors for nodes appears to be justified, the major names are very visible and the influence is well represented. However, some details of the model are not right. For example, contributions of Confucius one of the most influential Chinese Philosopher, are clearly under-estimated, similar considerations do apply for example to Homer Thales and John Milton. In some cases it might be an issue of the bias towards the West culture or issues in the data format (ref.
ATTACHMENT 1 – Full Graph) as the field mix both names and categories such as Plato, who according to the infobox of Wikipedia “Most of subsequent western philosophy, including…”. Another reason is that the model adopted is simple and does not contemplate more than one level of influence; therefore the founders of movements tend to be under-represented. For example Arthur Schopenhauer influenced Friedrich Nietzsche but, Schopenhauer himself was influenced by Giordano Bruno and the chain can continue.

Averroes => Giordano Bruno  => Arthur Schopenhauer => Friedrich Nietzsche

An interesting continuation would be to map deeper the relationship chains thus generating exponentially bigger number of edges and include more complex influence patterns in the model, the steep decreasing curve of out-degree distribution highlight this limitation.

The influence of some nodes is under-estimated

The influence of some nodes is under-estimated

More on  influence – Eccentricity

Another way of looking at the influence is to use the eccentricity instead of the out-degree. To my knowledge, this measure was not discussed in the lessons.

From Wikipedia: “The eccentricity of a vertex is the greatest geodesic distance between any other vertex. It can be thought of as how far a node is from the node most distant from it in the graph.”

This measure of centrality provides more accurate interpretation of the influence exerted by a thinker. We can assume that a low eccentricity node will be connected a few peers and therefore have influenced a smaller group. Larger eccentricity might be associated to longer chain of influence (greater distance from other nodes).

This transformation yields more balanced results regarding the contribution of individuals, however the information regarding the directionality of the graph is lost so influencer and influenced are weighted alike. A function defined using different measures might provide a viable alternative, for example we can multiply the put-degree times the eccentricity to capture a certain directionality. The development of this is however outside the boundaries of this essay.

Different nodes sizes according to eccentricity.

Different nodes sizes according to eccentricity.

Are we talking (and listening) to each other? Small World hypothesis

In order to test the hypothesis of small network according to Watts and Strogatz model we need to satisfy two conditions:

The network fails the test as the low average shortest path of 5.53 is different from the ln(13814) = 9.53. The implication of this is that, according to this model ideas and knowledge are segregated between clusters. This outcome is somehow confirmed by the historical constrains of movement of ideas and people (e.g.: a few German philosophers were studying Confucius or other Chinese thinkers), this is most likely destined to change in the near future.

Interpretation, conclusions and further developments

The analysis showed some interesting insight regarding the knowledge and how ideas are propagated, however it is negatively influenced by some current limitations of Wikipedia, specifically the western bias and other issues discussed in the essay.
Further development would need to address the lack of fit for the influence by developing two improvements: firstly a more realistic chain of connections, not limited to only one level, secondly the definition of an improved ranking function, third the enrichment of the dataset with more unbiased information (maybe complementing the current one with other data on citations).

References

Blog article Drunks&Lampposts by SIMONRAPER available at: http://drunks-and-lampposts.com/2012/06/13/graphing-the-history-of-philosophy/

Easley & Kleinberg, Networks, Crowds and Markets

Lectures and slides from Coursera Lada Adamic SocialNetwork Analysis.

Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, in Journal of Mathematical Sociology 25(2):163-177, (2001)

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000

Wikipedia article on Eccentricity available at http://en.wikipedia.org/wiki/Eccentricity_(graph_theory)

About these ads

13 thoughts on “Network map of Knowledge and Art

  1. Bot's Blog

    Very nice article, Paolo. I’m wondering after you retrieved the database, what software did you use to visualize the graph in large scale? I really wish MATLAB has such visualization tools so that I can visualize my functional brain network. Thank you very much, Paolo. Keep up your good work! –Bot

    Reply
    1. Paolo Negrini Post author

      hanks Bot, I used a software called Gephi (https://gephi.org/), it’s a great – and free – network visualization software. I am pretty sure we can import your network from Matlab into Gephi and do some network analysis. Drop me a line if interested.
      Ciao,
      Paolo

      Reply
      1. Bot's Blog

        Thanks for your quick response, Paolo! I’m still working on it, and once I figure out a good brain network to visualize I will let you know.

  2. Pingback: Trawling Wikipedia to produce network graphics « Martin House Consulting

  3. Pingback: Data in Social Networks (Part 1) | Technifying!

  4. Pingback: Network Map of Knowledge and Art V2.0: Preview | Paolo's blog

  5. Jodi

    Just desire to say your article is as surprising. The clarity in your submit is simply great and i could think you are knowledgeable
    on this subject. Fine along with your permission let me to
    grab your RSS feed to stay updated with coming near
    near post. Thank you one million and please continue the enjoyable work.

    Reply
    1. Paolo Negrini Post author

      Hi Jodi, Thanks for the kind words. I like to dig on some issues and post the results here (when I have time). You’re most welcome to subscribe to the RSS feed.

      Reply
  6. Ali Gajani

    Hey Paolo, how did you clean excel files to include names and how did you merge them into one. I would love to know the process. Thank you

    Reply

Interested? Let's discuss!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s