Tobias Steinke is a computer scientist working at Die Deutsche Bibliothek, the German National Library. He is specialized in long-term archiving and preservation and is partner project manager of the German project kopal. Contact: steinke@dbf.ddb.de.

INDICARE DDB, Die Deutsche Bibliothek (the German National Library), made it recently to the news with headings like "German Library Allowed To Crack Copy Protection" (cf. EDRI-gram 2005). What exactly is the agreement about between DDB and the German Federation of the Phonographic Industry (Bundesverband der phonographischen Wirtschaft) and the German Booksellers and Publishers Association (Börsenverein des Deutschen Buchhandels)?

T. Steinke: In principle it's about our need to bypass copy protection in order to fulfil our legal obligations. The use of programs able to do so is normally forbidden in Germany due to the legal anti-circumvention rules. The urgent need behind this agreement was the fact that the German Music Archive (Deutsches Musikarchiv), which is part of DDB, has already collected numerous copy protected audio CDs. To ensure the preservation of these CDs it is necessary to make legal copies. In principle DDB has the right to make copies, but without the agreement we wouldn't be allowed to use computer programs which enable us to effectively do so. So far we have no experience with copy protection beyond audio CDs. You can find all official information available about the agreement on our website (DDB 2005) – an English translation of the agreement is available from the "Frankfurt Group" (2005).

INDICARE How can you ensure that the staff of DDB is skilled enough to hack and crack whatever protected content comes along? Think of a situation where circumvention-tools are not available legally…

T. Steinke: We will think about this when we get this kind of material. As a basic principle, we want deliveries without any copy protection.

INDICARE You probably know about the agreement between KB, Koninklijke Bibliotheek (National Library of the Netherlands), and Elsevier (and other publishers) about the preservation of scientific electronic journals. In this agreement KB is clearly specified as responsible institution for long-term archiving. What are the differences and the similarities between the task and the approach of KB and DDB?

T. Steinke: First, DDB in Germany and KB in the Netherlands are the very institutions responsible for long-term archiving of electronic journals among others. While it is still voluntary to deposit an electronic copy at DDB (according to the present legal deposit law, i.e. Gesetz über Die Deutsche Bibliothek, DBiblG), this will change with the oncoming new law making the legal deposit of electronic copies mandatory. The proposed bill passed cabinet this month. Many publishers have already signed delivery contracts with DDB (e.g., Springer, Wiley-VCH) in this way anticipating the future legal situation.

Second, DDB has accumulated considerable experience with, for example, online theses and dissertations, while KB has gathered more experience with other materials. As both institutions have to fulfil roughly the same tasks, they are well advised to share their experiences with specific publication types to their mutual benefit. There is already an ongoing co-operation with the KB at several levels, especially regarding long-term archiving.

INDICARE It appears as if DDB as well as KB prefer agreements on a private basis between publishers and libraries instead of a legal regulation on exemptions for libraries. I heard some library experts advocate for a legal regulation to ensure that libraries can fulfil their tasks without being dependent on bargaining power or the good will of publishers. What is your view?

T. Steinke: Your assumption is not entirely true. If legal regulations could be found representing equally the interests of all institutions involved, no further agreements would be necessary. Indeed this would be the ideal case: Legal regulations providing sufficiently clear structures. If, however, the legal regulations are not sufficient to guarantee the fulfilment of our tasks (e.g., technical protection measures must not be broken) then it is of course useful to get individual contracts with publishers or publishers' interest groups (e.g., allowing DDB to crack TPM). Realistically, in the future there'll be no way to avoid a dualism of both strategies, because the publication variance in the electronic sector is too widespread for any law to capture. Individual agreements can help to simplify the co-operation (e.g., a publisher agrees with DDB not to apply the TPM to the copies delivered to DDB). As for that, we understand the legal fixation of our rights as a clarification that helps avoiding uncertainties on both sides. That doesn’t alter the need to actively seek and to intensify our contacts with publishers.

INDICARE Let me turn to some more technical questions. I would assume that different publication types go together with rather different technical requirements for preservation. A database of online journals is one thing, while an item like an e-book is quite a different animal.

T. Steinke: We accept all file formats for publications we are obliged to collect. Currently the most common formats for electronic publications are PDF, XML, and HTML. But numerous other formats are in use, some of them are indeed very exotic. These formats complicate of course long-term preservation. Because electronic journals are mostly delivered to end-users in PDF or HTML, we get them in these formats as well. Therefore, from a technical point of view, e-journals are also single objects. We don’t collect the complete presentation as it is on the publisher’s site (webpage with database and shopping system).

INDICARE As the field of scientific publishing is as international as science itself, a network of journal archives would seem more appropriate than a huge effort of one central library…

T. Steinke: Yes, definitely, and that's true from a national perspective too. There's no way for DDB to collect all available electronic publications on its own in one huge effort. We are thinking of building-up a network of reliable partners (such as regional libraries, university libraries etc.) which collect part of the publication production (not only journals but also websites etc.) in a well-defined geographical area. The collections of all these partners will then be archived at DDB without further (bibliographical) processing. By this DDB will at the same time function as backup for the partner institutions. At present we are in the state of planning this network on a national level. At the international levels discussions about co-operation and the way to chose are ongoing. With respect to web-harvesting a co-operation of national libraries and the Internet Archive (cf. sources) is already in place, however DDB has not yet joined in.

INDICARE Well, I would have expected that international co-operation in the field of scientific publications would be most advanced. What is the state in this segment?

T. Steinke: The collecting duties and activities of a national library are normally defined by national law and target the national production of publications. Although the American Library of Congress also collects German books, this does not exempt us from our duty to collect them. Therefore co-operation among national libraries is primarily related to technical issues. We are trying to establish common technical standards and to share our different experiences.

INDICARE Building archives for digital objects will need standards at different levels. I have heard e.g. of OAIS (Open Archival Information System) and SAN (Storage Area Network).

T. Steinke: The OAIS model is very important in the long-term preservation community. It is a theoretical model defining functional entities. It was originally developed by NASA and enhanced within the European project NEDLIB (cf sources). This model defines a terminology to ease comparison of archival systems at the conceptual level and in the phase of planning. However, the OAIS model doesn't say anything about the implementation of these systems.

SAN is a technical term of network technology meaning a specific technical realisation of storage techniques. From the viewpoint of long-term preservation, concepts should be independent of particular technical realisations, because these are constantly changing. But it's necessary to have agreements about the degree of reliability and about suitable service concepts (backup, refreshment).

INDICARE I mentioned SAN, because Manfred Osten (2004, pp. 88-90) presented it in his book as a key technology to solve problems of long-term archiving by a distributed system architecture. Independent of SAN, the idea of distributed long-term archives exchanging information remains intriguing – especially when you envisage them to be used remotely by end-users all over the world.

T. Steinke: The idea of creating a shared archival system based on shared storage is, e.g. realised in the project LOCKSS (Lots of copies keeps stuff safe) at the University of Stanford (cf. sources). However long-term preservation (LTP) is not primarily about sharing documents, and sharing is not one of the main problems of long-term preservation for which we try to find solutions. A high degree of technical skills and continuous development is needed for long-term preservation, and therefore central organisations should care about preservation and availability of committed material. These specific organisations could be understood as kind of a bank, in which you have a safe deposit box accessible for you only. A goal of our project kopal (cf. sources) is to create this kind of basis. Based on a stable technical solution of this kind we aim to develop a co-operatively usable archival system for long-term preservation. The system itself will then be hosted by a technical service provider, who is responsible for providing the requested technical competencies.

INDICARE Digital technology blurs the border between archives and digital libraries and both may strive to offer their users permanent access. How should the borderline between digital archives and digital libraries be defined today?

T. Steinke: First some words of clarification why long-term preservation of electronic documents is needed and what the essential problems are. There are two problems in the field of long-term preservation: On the one hand it is about the preservation of the binary bit stream as storage technologies only guarantee duration for a limited time. Therefore service guidelines are needed to guarantee the migration to new storage technologies right in time. The second problem is more complex. Every file format is only usable within a given context (software, operating system, hardware). As a consequence relatively soon it will not be possible to access the content of the preserved binary bit stream. There are two concepts to address this problem. Migration is a process to convert a file format to another file format as long as it is still possible to interpret the source file. Of course the target file should have the same content afterwards. Emulation is a simulation of an old system environment needed for a chosen file on a current system. Both strategies require a continuing high effort and there is always the risk of losing some information. But it's the only chance to access any of the content in the future. A digital archive for long-term preservation should deal with these problems. A digital library on the other hand emphasises sharing and organisation of digital objects and can rely on current technologies.

There will be lots of digital libraries; nearly every institution has set up one already. Not every institution, however, has the task and/or resources to set up a digital archive for long-term preservation. True digital archives will only exist on well-defined foundations, e.g., connected to the legally defined deposit task of regional and national libraries. Most other libraries will be digital libraries which may guarantee to provide all e-publications for a limited time (~5 years). After that, digital archives – at the well-defined (higher) level – will get into place to serve as a backup (as said above) and as institutions making these publications available after a defined timeframe.

INDICARE What happens when copyright of archived digital publications expires?

T. Steinke: Access to our whole collection is possible via the OPAC (Online Public Access Catalogue). You can use the OPAC on our webpage (http://opac.ddb.de/) or at PCs in our library. If a catalogue entry refers to an electronic resource you will get a link to the corresponding file. Depending on permissions, some links are displayed on PCs in the library only. In other words we are able to grant or cede access at any time when required.

INDICARE Recently I heard library experts saying that libraries and archives would be willing to accept and employ DRM systems if on the other hand publishers are willing to let the libraries do their preservation job. Would you say that this kind of bargain will be typical in the future? Are there already archives with DRMS in place?

T. Steinke: As said before the challenges of long-term preservation require continuous processes of migration and/or emulation. But the goal of DRM is to prevent exactly this. Therefore a digital archive for long-term preservation is not able to preserve DRM protected material. DRM is suitable within access components for end-users.

For example, at present links to some of the objects are not shown within the web-accessible OPAC. It would be imaginable to have an agreement with the right holders to show these links but to put some kind of DRM on them, on-the-fly during access. Note however, this process would not be connected to the archival system itself in any way. It is like fetching ware from a warehouse and sticking your label on it before selling it to the customer.

INDICARE Is there a role of TPM and DRM in safeguarding integrity and authenticity of electronic documents stored in digital libraries and archives?

T. Steinke: Digital archives for long-term preservation should be as trustworthy as banks. Of course, within the archives techniques like checksums are used to ensure authenticity. In the end, customers of those archives have to trust in getting the "right" objects and the right content. It is the same as with books, which could be manipulated. Either you trust a library to not tear out pages or you don’t. But we expect that we will have to use digital signatures for end-user access in the future.

INDICARE A final question, more and more information is being made available by others than professional publishers forming part of our cultural heritage as well. Will this development change the task of national libraries and are they aware of the challenge?

T. Steinke: Yes, and it’s a very difficult issue. Are all web pages worth being collected? What are German web pages at all? These questions are being discussed, but there are no clear answers yet. We only know for sure that we have to start collecting online publications (which we already have done), otherwise a lot of today’s publications will be lost.

INDICARE Thank you very much for this interview.

Sources

Status: first posted 30/05/05; included in INDICARE Monitor Vol. 2, No. 3, 30 May 2005; licensed under Creative Commons
URL: http://www.indicare.org/tiki-read_article.php?articleId=107