Himalaya of Data

I have previously done some writing on WikiLeaks, a review for instance of Daniel Domscheit-Berg’s experience of this ‘organization’, and recently I submitted a first draft of an article – in an upcoming book on WikiLekas, edited by Christian Christensen (and hopefully published next year by Peter Lang), which tries to situate WikiLeaks within a broader archival discourse on data distribution. What type of ‘archive’ (or database) is WikiLeaks, and how does the site challenge traditional archives and libraries through new forms of massive information and data retrieval, as well as user oriented exploration? If (more or less) public data can be found online by anyone at all times, what are the implications for, and the contemporary role of archives and libraries (understood in a broad sense)? Naturally, the controversial nature of the leaked information from WikiLeaks is truly ‘hot data’, which is hardly the case at most heritage institutions. Still, the way the site’s massive amounts of freely distributed documents have entered the cultural circulation of the digital domain in general, as well as more media specific and web 2.0 areas in particular, does hint at various emerging archival models, where free access to hitherto locked material can generate innumerrous forms of new knowledge (of the past and sometimes even the future)—which, after all, is the purpose of most memory institutions. Hence, the importance of WikiLeaks as sort of a new archival modality. The article takes of using the Wayback Machine:

The Wayback Machine is truly an incredible piece of crawler software. Through its three dimensional index, basically anything that has appeared online in the last couple of years can be made visible again. This particular search engine, in fact, serves as a correction to the general newness and ‘flatness’ of digital culture—even if some would indeed argue that the web means the end of forgetting. All likely, we are only beginning to grasp what it means that so much of what we say, think and write in print and pixel is in the end transformed into permanent (and publicly distributed) digital files—whether leaked or not. Then again, all code is deep, and the Wayback Machine is, arguably, one of the more sophisticated digital methods to extract and visualize the specific historicity of the web medium. Essentially, the Wayback Machine (run by the Internet Archive) stores screen shots of various GUIs. This means that the web cannot be surfed through its interface, rather specific URLs are always needed. Still, some 150 billion web pages have been crawled since 1996. In fact, archived versions of web pages across time and space appear through the Wayback Machine’s digital time capsule almost akin to magic.

On January 17, 2007, the Wayback Machine’s software crawler captured wikileaks.org for the first time. The crawler’s act of harvesting and documenting the web, hence, meta stored a developing site for “untraceable mass document leaking”—all in the form of an “anonymous global avenue for disseminating documents”, to quote the archived image of the site. The initial WikiLeaks captures in the beginning of 2007, and there were additional sweeps stored during the following months, vividly illustrates how WikiLeaks gradually developed into a site of almost unprecedented global media attention. The WikiLeaks logo, with it’s blue-green hourglass, was, for example, graphically present right from the start, with subsequent headings to the right as ‘news’, ‘FAQ’, ‘support’, ‘press’ and ‘links’—the latter directing users to various network services for anonymous data publication as i2P.net or Tor. Interestingly, links to the initial press coverage is kept (and can still be accessed). Apparently, one of the first online article’s to mention what the site was all about stated: “a new internet initiative called WikiLeaks seeks to promote good government and democratization by enabling anonymous disclosure and publication of confidential government records.”

Looking and clicking at, reading and thinking about the first stored captures of wikileaks.org through the Wayback Machine, one cannot help but notice how the site initially wanted to become a new Wikipedia. In short, WikiLeaks strived to ‘wikify’ leaking by way of incorporating advanced cryptographic technologies for anonymity and untraceability, all in the form of a wiki. Massive amounts of documents were to be combined with “the transparency and simplicity of a wiki interface”, at least according to initial FAQs. To users, WikiLeaks will “look very much like Wikipedia. Anybody can post to it, anybody can edit it. No technical knowledge is required. Leakers can post documents anonymously and untraceably.” Furthermore, it was argued that all users can “publicly discuss documents and analyze their credibility and veracity.” As a consequence, users of the site would have the ability to openly “discuss interpretations and context and collaboratively formulate collective publications.”

As is well known, WikiLeaks did not become what it promised back in January 2007. Rather—to quote the site it wanted to resemble—WikiLeaks was “originally launched as a user-editable wiki (hence its name), but has progressively moved towards a more traditional publication model and no longer accepts either user comments or edits.” What did not change, however, is the fact that WikiLeaks was (and is) a distinct archival phenomenon, more or less aptly described as a database of scanned documents, forming a giant information repository. It comes as no surprise that web captures of the site in February 2008—a little more than a year after WikiLeaks was launched—claimed a database of more than 1,2 million documents.