Network Map of Knowledge and Art V2.0: Preview

I planned to improve this project since last year and now finally I found some more time to dedicate this hobby. The idea is to overcome the major limitation of my previous work create a more realistic map of influence that considers the nested influence: if A influences B and B influences C, then A should influences C as well. This longer chain was not considered in my first model and has deep implications on how to consider authors influence.

Until some time ago I was stuck with two major roadblocks. The first is size of the data – my Wikipedia query contains over 13000 people – so I had to experiment different solutions and implementations to settle with a matrix representation to simplify the computational aspects and the Excel VBA code. 

Sketch of the network representation and pseudo-algorithm

Sketch of the network representation and pseudo-algorithm

The second (two  issues in one)  was the confidence on the source data and the lack of some motivation to carry on the research. I was lucky enough to find a fellow blogger – Sands (http://www.sandsfish.com, @sandsfish) – with whom to team-up and to pursue the research. Together we are  brainstorming on how the solution should look like and what kind of information would be interesting to include in the analysis, plus he actually understands the Wikipedia ontology and Big Data sources (whereas I am just pressing query buttons to get some data…).

So, after some time spent on VBA code, the  Excel application is up and running – it also automatically generates a “.dl” file of the network to be directly imported in Gephi. Below the core code that does the magic. essentially it scans the Influence matrix and add the node’s influences every time there is an interaction between two nodes.

'Inherited influece
For r = 1 To UBound(PeopleID)

 For i = 1 To UBound(PeopleID)
  If InfluenceMatrix(r)(i) = 1 Then
  For p = 1 To UBound(PeopleID)
   InfluenceMatrix(r)(p) = InfluenceMatrix(r)(p) + InfluenceMatrix(i)(p)
  Next

End If
Next

I made a successful trial run considering only people who have influenced at least 3 others. This allows to reduce significantly the dataset, having 1816 nodes and about 180K edges. The pictures attached provide a peek of what is brewing: I will spend the next weeks reviewing and analyzing the data.

2013-03-08_155215 2013-03-08_155155 2013-03-08_155038 2013-03-08_154714

Interested? Let's discuss!