Understanding the structure of knowledge and mapping how the different knowledge branches or academic disciplines are related is an old ambition. One recurrent theme is the collaboration network among researchers, publications and knowledge disciplines. Much like early world charts, maps of science provide an overall visual perspective of science as well as a reference system that stimulates further exploration.
Maps of science have been mostly derived from citation data. Citation analysis has been the source of inspiration for academics studies and practical applications (notably the PageRank algorithm behind Google’s search engine). However, since scientific publications are predominantly accessed online, and scholarly web portals routinely log the interactions of users with their collections, the resulting log datasets are worth exploring as an alternative to citation datasets.
Over the course of 2007 and 2008, a group of researchers from Los Alamos National Laboratory collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covered a significant part of world-wide use of scholarly web portals in 2006, and provided a balanced coverage of the humanities, social sciences, and natural sciences. A journal click stream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences.
They conclude that log datasets have attractive characteristics when compared to citation datasets: they can be aggregated to cover all scholarly disciplines, and reflect the activities of a broader scholarly community. The immediacy of log datasets offer the possibility to study the dynamics of scholarship in real-time, not with a multi-year delay, as is currently the case with citation data. Most interestingly, in the conclusions, they point out:
There can exist stark differences between what people claim they do and what they actually do. This also applies to the distinction between citing behavior and online information seeking behavior. The first is a public and explicit expression of influence by scholarly authors, whereas the latter results from the private navigation behavior of scholarly users of web portals. This distinction leads to different insights regarding scholarly activity (Johan Bollen et. al. “Clickstream Data Yields High-Resolution Maps of Science”)
How inscrutable are Knowledge’s Ways!
Feature Image: Map of science derived from clickstream data