Try the new FT.com

June 29, 2005 11:28 am

British Library faces digital avalanche

  • Share
  • Print
  • Clip
  • Gift Article
  • Comments

An avalanche of digital information is produced in the UK every year, catering to every possible taste and keeping the nation informed, but presenting the British Library with a terabyte sized headache as it attempts to fulfil its mission to preserve the nation’s knowledge in the digital age.

The internet has brought vast new audiences within range of the library’s collections and its precious ancient treasures, such as Leonardo’s notebook and Jane Austen manuscripts, can be read in award winning multimedia reproductions on its website.

The library’s collections of scientific journals, historic newspapers and recordings of birdsong and oral history are also being made available electronically.

But the digital era has led to a massive change in the library’s role as a safe storehouse of knowledge.

It is legally entitled to receive a copy of everything published in the UK and Ireland, and every year another three million 3m items arrive, taking up twelve 12 km of shelf space.

But in January 2004, a new Act of Parliament added to this titanic task of preservation, extending the library’s role to cover new digital forms of information, from CD-roms, e-journals and internet sites to new forms yet to be created.

“A big part of our strategy going forward is how to handle what we call ‘born digital’ material – items that have never been in print,” says Lynne Brindley, chief executive of the British Library.

One of the key areas the library has decided to focus on is scientific journals, many of which are published exclusively in electronic form. “Unless we do this now, then our grandchildren will never know about our science,” says Ms Brindley.

This is not just a case of collecting the information and copying it. They have to negotiate copyright terms with all the publishers, of the journals – a complex task.

They are also looking to start archiving the data sets and simulations behind the scientific research.

Finding space to store digital information is not too hard, compared to the challenge of finding new shelf space for books every year.

Currently, the British Library’s data store only contains five terabytes (thousand billion bytes). Given that a terabyte could fit on to just 17 iPod music players, it is still relatively small. The store is projected to grow to 300 terabytes, which is still not particularly huge.

However, the timescale the library is working to is highly unusual. Its collections have to last for hundreds of years. “Long term for the industry is five years,” says Ms Brindley. “This is a different sense of long term. We’re going to be looking after this for hundreds of years.”

If it escapes the ravages of damp, pests and fire, a well-produced book will look much the same today as it did a thousand years ago. But digital collections are much harder to preserve. In a century’s time, the hardware and software they run on will have been obsolete for generations.

One answer to this problem is simply to copy the data from one generation of hardware to the next, every five years or so. “There are concerns around the process of repeatedly doing that. You are losing information each time and you may be reforming and changing the document over time,” says Richard Boulderstone, director of e-strategy.

To see a digital document exactly as it would have looked, you really need to have the original machines it was designed for. A text document would not change too much, but a more dynamic web page would most likely become unreadable. And future historians investigating, say, early 21st century computer games would find a copy of Grand Theft Auto useless without a console to play it on.

The British Library can not keep a working example of every different type of computer hardware. The alternative is to run a program on contemporary hardware which pretends to be the old-fashioned hardware, an approach called emulation. It is not an easy strategy to execute, as the emulation software will itself become obsolete in due course. The National Library of the Netherlands is currently working on an answer to this problem.

Selecting particular areas of future interest is hard enough, but the library’s ultimate aim is to archive the entirety of the UK web domain, that is every site with an address that ends ‘.uk’. It is the lead partner in the six-member UK Web Archive Consortium. “The task is far too big,” says Ms Brindley. “We couldn’t do it on our own.”

Using software developed by the National Library of Australia, the project has started to take snapshots of some of the most important sites in the UK, in politics, science and technology, medicine, the arts and culture.

The project only went online in May, and only has a few dozen sites included, all by request from the web publishers themselves. Even a digital library has to be built slowly, brick by brick. “If we could just get a complete collection of scientific journals, that would be a start,” says Mr Boulderstone.

Related Topics

Copyright The Financial Times Limited 2017. You may share using our article tools.
Please don't cut articles from FT.com and redistribute by email or post to the web.

  • Share
  • Print
  • Clip
  • Gift Article
  • Comments
SHARE THIS QUOTE