Towards a first draft of my report on the library lab – datalab.kb.se

My work with writing a report for the National Library of Sweden on the establishment of library lab is progressing fine. The text should be finished in a few weeks time, and presently the first pages – in a first draft – reads as follows:


Introduction
The Annual Report from the British Library usually offers insights into the many domains and whereabouts of national libraries—not the least in terms of future directions. In the latest report (from 2017/18) it is, for example, stated that the British Library “Digital Scholarship team” continues to “undertake innovative research with digital collections and open up new datasets for use by researchers.” One way to facilitate digital scholarship is to start a lab, and the British Library set up a library lab environment already in 2013. Ever since the British Library Labs has been inviting researchers, developers and artists “from around the world” to undertake “creative endeavours” using the library’s digitally curated collections, content and data. Following the latest Annual Report, the Library Labs team has now “facilitated the use of over 180 terabytes of data including 97 freely available datasets at data.bl.uk. One example is the In the Spotlight project which makes digitised entertainment playbills from the 1730s to the 1950s available as a single dataset.”

Digital scholarship, curated data, single datasets, invited developers and programmers—these are all present buzzwords and novel categories within the library domain. Before computational expertise were necessary and primarily required for internal workflow within IT departments—now such skills and competencies are increasingly turning into a prerequisite for doing actual research in a gradually altered library infrastructure that is increasingly turning digital. This infrastructural and scholarly transformation can appear as swift and sudden. Yet, digitisation activities within the ALM-sector (archives, libraries and museums) has been a harbinger of novel times to come—both in terms of scholarly perspectives and library practices.
National libraries have been digitising their collections for decades—in Sweden digitisation work started already in the late 1990s. For a number of years, collections were digitised primarily for preservational purposes, but after the millenium—due to the rise of the Web and initiatives as Google Books—digital access to library collections steadily became more important. Permission to use library collections were, however, often hindered by copyright legislation, and digital access was foremost given to older (textual) collections prior to the 20th century.

Digitisation work performed at the National Library of Sweden has in general been similar to other European countries. The library has digitised a major amount of its audiovisual collections, various selected works from the print collections, and a large amount of newspapers. The latter has been a prioritised category since newspapers are an important research material for many users. Born digital collections have also grown through web archiving activities (Kulturarw3) and audiovisual deposits, and even more so since 2015 when (some) electronic materials became subject to legal deposit. Regarding the digital trajectory that the National Library has undertaken during the last 15 years, preservation was most important at first, then digitisation for access was increasingly advocated. There are, however, also good reasons to question the distinction between digitising for access and digitising for preservation. Some scholars have even argued that the split “is artificial and misleading” since access to collections are usually “a given” and an outcome of all digital transformation—even if usage is fully realised only through functioning electronic networks and the legal frameworks that manage permissions.

Nevertheless, during recent years digital scholarship within the ALM-sector has expanded the focus of digitisation activities towards different forms of investigations and explorations. Thus, there has been a scholarly driven progression within the institutional heritage domain from preservation to access—to analyses. Today all forms of digital heritage are computationable—hence, how to enhance and increase the research potential of this material? If humanities and social science scholars traditionally were interested in the collections that archives and libraries had to offer deep down in their stacks and vaults, such archival driven humanities research has thus turned into data driven research due to the digitisation of heritage. And more data is better data (as Google would have it).

The long-term magnitude of this ongoing transformation is striking—both for scholars and libraries. Within the library sector the gradual alteration effects the very foundation and principles of what libraries are—and should be at a time when ‘the digital’ is slowly becoming default. Today, governmental decrees for national libraries (and similar statutes för university libraries) usually stipulate that libraries are to provide a beneficial infrastructure for research. During centuries great book and manuscript collections at university libraries and national libraries played a pivotal role for the humanities and social sciences. They were envisioned as a key infrastructures for scholarship. National libraries and deposit laws are, in fact, illustrative examples of how traditional knowledge structures were enacted through concrete and primarily humanistic infrastructures. They have essentially remained the same over centuries, but have during the last decade—due to repeated digitisation efforts—begun to alter.
As digital copies of heritage start to become a preservational focus for the ALM-sector, novel ways of giving access and sustaining digital scholarship are the flip coin of the same digital development. In short, mass digitisation combined with new media, technology and distribution networks has transformed the possibilities for libraries and their users. Emerging scholarly disciplines—from data science and data journalism to the digital humanities—all take advantage of new computing tools and infrastructure, and provide different models for creating new forms of access to and analyses of library collections. Especially within digital humanities scholarship the systematic intertwining of research questions, digital materials, and tools have stressed the need to reformulate what an apt library and research infrastructure for the humanities (and social sciences) should pertain. Digitisation has in essence begun to transform the epistemic foundation of the library. The knowledge than can be deduced from collections in digital form is different—and foremost one of scale. So called distant reading of major textual corpora have even been envisioned as a new “condition of knowledge”.


About the Report
About a year ago I was asked if I had an interest to examine, survey and evaluate in what ways a lab might—or could be—established at the National Library of Sweden. As a media studies professor at Umeå University (a chair directed towards the digital humanities) I have for a number of years worked and done research at the digital humanities center Humlab. I accepted the offer and applied for the position—a PM for a “pilot study” on a data lab at the National Library was drafted by library personnel Lars Björk and Peter Krantz, and additional funding was made available by Riksbankens jubileumsfond.

Together with professor Patrik Svensson (Umeå University / UCLA), I had at the same time (during autumn 2017) organised a conference on data driven humanities research at KTH, partly aimed at guiding (and hopefully influencing) preparatory work at the Swedish Research Council and its future funding for research project grants around “digitisation and accessibility of cultural heritage collections” (a call that went public in May 2018). Together with Svensson (and a distinguished group of Swedish humanist) I have also been active in raising awareness (or lobbying) for the need of strengthening and developing new forms of humanistic infrastructures. The idea to investigate how a lab at the National Library could be initiated was thus consistent with a number of similar research activities and ideas, including work at funding agencies—a call committed to quantitative and qualitative methods has also been in preparation at Riksbankens jubileumsfond.

From January 2018 I have been working (part time) during nine months with this report— including ways to prepare the ground for making my recommendations a reality. Lars Björk has functioned as my co-worker. During winter, spring and summer 2018 we visited a number of scholarly environments, university libraries and research groups in Sweden with an interest in using a lab at the National Library. We have thus talked to many Swedish scholars and librarians with an interest in the matter; we established both a reference group and a steering committee for our work; we made a study trip to the British Library Labs and the Dutch KB Lab; we sent out a survey regarding available digital collections within the ALM-sector in Sweden (Appendix A), and we presented and discussed our work within the “Group for digitisation and digital access” (with me as chair and Björk as secretary), a group that is part of the “Forum for national library collaboration and development”. I have also made a number of presentations of our lab ideas at Swedish universities, at the management board of the National Archives, at the Research board of the National Library, and at national and international conferences. Furthermore I organised a workshop on digital scholarship at the National Library (in April 2018) with some 25 scholars and librarians (funded by Riksbankens jubileumsfond). Our preparatory work, conversations and scholarly visits have been thorough.

This report is entitled datalab.kb.se—it is a term Björk and I suggest for naming an eventual lab, where the digital and Swedish connotation are obvious (including a necessary distinction and contrast to the Danish and Dutch KB Lab). The report is divided into three subsequent sections—“Library Labs” and “Digital Scholarship” (with some subsections)—as well as a final part on “Recommendations”. The first part sketches and maps the international terrain of current library labs, with a focus on different lab environments at national libraries. The second section puts novel forms of computational scholarship at the center of attention, with a particular emphasis on methods and (necessary) curation of datasets. In the final section on recommendations I suggest how a lab at the National Library could be organised, focusing both on actual tasks and workflow, as well as job descriptions and required skill sets.


Library Labs
Digitally inclined research within the humanities and social sciences have during the last decade started to influence both national and university libraries to take advantage of the scholarly possibilities that arise when documents as data are sharable and networked, linkable and traceable, reusable and processable. The development and set up of library labs is one concrete result. The primary function of library labs are to deliver digital collections as data (or datasets) to researchers and other interested users. Following the literal meaning of the term laboratory—“a room or building equipped for scientific experiments”—library labs are usually devoted to experimentation with provided datasets. “British Library Labs – experiment with our collections”, as the slogan goes. Library labs can hence be envisioned as a scholarly, artistic or creative industries playground. The British Library Labs is an endeavor that supports and “inspires the public use of the British Library’s digital collections and data in exciting and innovative ways.” In a similar manner the Dutch KB Lab wants to be experimental; “we try out new techniques and tinker with tools to make our content as accessible as we can. Warning, that means stuff can be broken.”

However, since library labs are becoming more and more common, the focus on experimentation can also become misleading. Providing datasets and working with these in different ways is today hardly cutting edge. Hence library labs can also increasingly be perceived as a core service that national libraries provide, with the lab (or its services) becoming an integrated part of a developed digital infrastructure. Such perspectives were advocated at a recent conference at the British Library, Building Library Labs in mid September 2018. It brought some 40 libraries and partner institutions from North America, Europe, Asia and Africa—with no less than ten national libraries present. “Around the world, leading national, state, university and public libraries are creating ‘digital lab type environments’”, the conference program stated. The aim is to develop novel forms of library usage, where library labs ensure that “digitised and born digital collections / data can be opened up and re­used for creative, innovative and inspiring projects by everyone such as digital researchers, artists, entrepreneurs and educators.”

The issue of library labs is hence timely. Presentations and discussions in London evolved around issues as labs services and spaces, technical infrastructures, the values of a library lab, planning a lab and establishing it, as well as various funding models for labs. Usage, research and different presentations of ongoing projects were also on the agenda. One result of the conference was a supportive network, another a forthcoming global report on library labs. Most libraries and institutions present did also take part in a library lab survey. The results are in no way conclusive, but rather give a tentative impression of how major libraries presently deal with lab issues (Appendix B). One thing to note was that library labs started to emerge between 2013 and 2015, and that this first wave of initiatives is now reinforced by a more general trend (following the survey some 20 libraries are about to launch a lab in 2019 or 2020). Most of the existing library labs are furthermore aimed to serve academic research followed by internal staff, the general public or creative industries. The most common tasks according to the survey were “facilitating access to data & digital collections at scale” and “creating new datasets & digital collections”, followed by “providing training in digital methods & tools” and “public engagement”. Half of the library labs provided access to restricted digital collections (through various contracts), and (only) half of them offered a physical space in the library—thus for many library labs focus is mostly put on web based presence. The Austrian National Library lab, ÖNB-LAB, for example (to be launched in November 2018) will foremost devote its activities to a homepage with datasets and tools, including code and tutorials provided through Gitlab.

At present there is, in short, a considerable international interest in library lab issues (and the prime reason why this report is written in English). Even if library labs are usually established with the purpose to enhance and amplify digital usage of digitised (or born digital) collections and datasets they differ in approach, scope and orientation. The library lab at the Yale University Library, for example, has a distinct digital humanities agenda, all in order to help “scholars in their own engagement with digital tools and methods in the pursuit of humanistic questions.” Therefore a brief description of some different types of library labs can serve as a smorgasbord of how labs can be designed, organised and accustomed.