The British Library is to embark on the biggest expansion of its archiving power since the 17th century.
The guardian of Britain’s cultural heritage will gain legal backing to collect and store every UK website, digital publications such as ebooks, and even chatter on social networks such as Facebook and Twitter.
Ed Vaizey, culture minister, will sign the regulations into law on Friday. At midnight, the legislation will come into effect and the British Library will start harvesting the web.
The project will start by archiving the .uk domain, or about 5m sites and 1bn pages. In the coming years it will expand to UK content on domains such as .com and .org.
Roly Keating, chief executive of the British Library, said its new powers were “a reassertion of what it means to be a library in the digital age”.
The British Library has 800km of shelving in vast warehouses and underground tunnels. Since 1662, English law has required publishers to provide authorities with a copy of every printed work they produce. But until now digital publications have had no equivalent legal requirement.
A huge amount of material has been lost since the rise of digital publishing in the 1990s. With the average life of a web page at just 75 days, future historians are likely to confront a “digital black hole” when they look back to the end of the 20th century.
“As the years go by, this will increasingly become the only record that survives of a huge range of content,” Mr Keating said. “The full range of how British people are using the web in the 21st century will be there for scholars, researchers, historians, filmmakers, writers, to explore.”
To win support for the project, the British Library had to convince the UK’s biggest publishers that the archive would not damage their commercial interests. Because the database will be free for the public to access, some publishers had feared the archive could hurt sales of their publications.
Since the government gave the go-ahead for the archive in 2003 it has taken a decade of talks to reach an agreement that balanced the needs of libraries and copyright holders.
“There was a huge concern early on that it could damage commercial publishers,” said Chris Fell, digital publishing director at Cambridge University Press.
But he said the final design of the archive was welcomed by the industry. “We want things to be archived for posterity just as much as libraries do, in order to tell our authors and our readers that such things will continue to exist,” he said.
Readers will be able to access the digital archive only from the premises of the British Library or the five other “legal deposit” libraries such as the Bodleian at the University of Oxford.
Publishers of high-value content will be allowed to put an embargo of up to three years on their work. In addition, any single item can only be viewed by one person at a time.
Some publishers could reap gains from the switch to digital, by providing digital files instead of more costly printed versions. PwC, the accountancy firm, has estimated that change could save the publishing industry £14m, more than the estimated £9m cost of upgrading their IT systems.
Yet some observers are concerned that the ambitious scale of the archive may lead to trouble.
Nick Pickles, director of the privacy campaign group Big Brother Watch, said the library’s plan to collect material posted by individuals on social networks such as Facebook went too far beyond its historical remit of archiving publications such as newspapers.
While the archive cannot access private or password-protected websites, many people might not realise that what they upload to the public web would be enshrined forever, Mr Pickles said.
“The danger of unintended consequences is magnified by how wide they’ve cast the net,” he said.
Get alerts on British Library when a new story is published