Library of Congress Lags in Archiving Digital Media
Concern as the Nation's Creations Turn Electronic
By KATIE HAFNER
The Library of Congress is charged with collecting the creative work of the American people. This has come to include such varied output as the papers of Thomas Jefferson and the Wright brothers, the original compositions of Leonard Bernstein and the video archives of the Martha Graham Dance Company.
But now the nation's creativity extends to Web sites, electronic journals and magazines, and CD-ROM's of every sort. And the library is lagging in collecting and archiving that digital material, according to a report released today by the National Academy of Sciences. Unless its administrators act swiftly, the report says, the library risks diminishing in relevance.
The 260-page report is the most comprehensive overview to date of the Library of Congress's digital preparedness.
"The nation's creativity is at this point significantly represented by what's happening electronically," said James O'Donnell, vice provost for information systems and computing at the University of Pennsylvania and chairman of the committee that produced the National Academy's report. The committee was composed of outside experts in digital libraries, databases, computer networking and digital preservation.
"If you keep the mission of the library what it has been, the change in the landscape brought about by networked information is dramatic and brings about unavoidable challenges for the library," Professor O'Donnell said.
Those challenges are, of course, shared by scores of other research libraries faced with the task of collecting, making accessible and preserving the nation's assorted electronica: scholarly journals, books and magazines published in electronic form; multimedia products like CD-ROM's; digital photographs, music and films; and millions of miscellaneous pieces of Internet-based material.
Further, the report stated, the library has some built-in limits to its ability to respond in an agile fashion to new trends.
Its primary function, as a creature of Congress, bogs the library down in governmental bureaucracy.
In its report, the committee said it saw "signs that the library is already losing the momentum and purchase required to make the next steep ascent" to respond to the new challenges "in a timely and effective way."
One problem is inadequate infrastructure. The library urgently needs a sophisticated system for receiving and managing digital documents deposited with the library and registered for copyright. The library also needs to recruit and retain computer experts from a field already suffering a labor shortage.
The report cited a lack of strategic planning and recommended that the library embark immediately on a strategic course. To that end, the report recommended that the library appoint a new deputy librarian for strategic initiatives.
The report also emphasized the need for the library to work more collaboratively with other national libraries and institutions. "At a time when we are trying to build large, interconnected systems, it is unlikely that a single institution could create such complicated systems independently of other key players," said Ann Okerson, a committee member who is associate university librarian at Yale University. "That goes for the Library of Congress or any other large library or organization."
The committee expressed less concern over the library's ongoing efforts to make digital copies of items in its physical collection.
"Digitizing your analog material is less urgent," Professor O'Donnell said. "You can manage that by asking: 'What's there a need for? Why should it be made digital?' But if you don't do it this year, it'll still be there in five years, and you could do it then. Digital information that you're losing is probably lost forever."
James H. Billington, the Librarian of Congress, who commissioned the study two years ago, said he was encouraged by the committee's findings.
"We view it as extremely positive that they are stressing the importance of this," Mr. Billington said in an interview.
He pointed out that a "Digital Futures" group within the library is already at work on many of the problems pinpointed by the National Academy report. But the report, he said, brings a renewed sense of urgency to the situation. "We've seen that this is a problem and we now have a strong reinforcement for doing something on it at a more accelerated rate," he said.
The report is a set of recommendations only, and the library is not required to act on them. "I suppose to some degree, a test of how serious they are is which of them they act on, and how soon," said Margaret Hedstrom, a committee member who is an associate professor in the School of Information at the University of Michigan.
Mr. Billington said one of the main obstacles to implementing the report's recommendations is financing. In its 2001 budget request, the library asked for a $21 million increase in the allocation for digital archiving; what it will actually receive is likely to fall far short of what is needed. "A major unresolved issue is how to fund this effort," Mr. Billington said.
Private partnerships are one answer.
Six years ago, the library embarked on the National Digital Library Program, paid for in large part by corporations, foundations and individual donors. By the end of this year, the library will have placed 5 million of its 119 million items on its American Memory site (memory.loc.gov), for use by the public.
The American Memory site receives more than four million hits each day, most of them from schoolchildren, who go there to view artifacts like old baseball cards, Lincoln's papers and historic pamphlets from the National American Woman Suffrage Association.
A few other pieces of the digital puzzle are falling into place. The library has preserved many of its own digital resources, including the full-text databases of the Thomas system for legislative information and its own bibliographic databases.
The library has an extensive collection of CD's and CD-ROM's, and it is about to sign an agreement with the American Physical Society, which will regularly deposit its eight physics online journals.
The library is also referring researchers to databases that serve as portals to specialized sources of online information, including the table of contents of 10,000 journals and the full text of every doctoral dissertation written in this country since 1861.
"Still, the rapid clip at which digital information of all kinds is proliferating has caught the Library of Congress largely unprepared. The sheer quantity of networked information is astonishing, and it keeps increasing at astonishing rates," Professor O'Donnell said.
Many Web pages created before 1996 have been lost because no one thought to take periodic snapshots for archival purposes until then. In 1998, working with Alexa Internet, a San Francisco company, and the Internet Archive, a nonprofit organization there, the library received 44 tapes containing 2 trillion bytes of 1997 Web data, the equivalent of 500,000 Web pages, a small fraction of the total number of Web pages in existence today.
The Internet Archive, which has snapshots of the Web dating from 1996, is willing to provide those snapshots to the library when the library is prepared to take them on a continuing basis, said Nancy A. Davenport, the library's director of acquisitions.
The library applies the same criteria to digital material that it applies to physical material, Ms. Davenport said. "We say, 'If you would have been selecting it in physical form, then select it," she said. "The format does not make a difference."
Yet in the course of studying the library's digital future, committee members pondered some fundamental questions. What, for instance, constitutes a publication? And what really deserves to be saved?
"If you accept that the Internet should also be collected, what does that mean?" asked Ginnie Cooper, director of libraries for the Multnomah County Library in Portland, Ore., and a committee member. "Today's Internet? Tomorrow's? All? And do you collect just the first layer of a Web page and none of the links? Three layers? Ten?"
Merely archiving digital material isn't enough, however. The Library of Congress and other research libraries are wrestling with the problem of finding an effective means of preserving it.
Digital archives are more vulnerable than their acid-free-paper counterparts. That is because computer hardware and software quickly become obsolete, and the durability of magnetic storage media like tapes and disks is limited.
Web-based documents that are filled with links pose yet another preservation problem because keeping an electronic research paper vital and relevant means keeping its links alive.
Mr. Billington said the library is looking into a number of preservation methods.
In general, Professor O'Donnell said, "I think they've done a worthy job of thinking their way through some pilots, saying, 'Let's dip our toe firmly into the water and see what we can do.' We say, 'We applaud that, but now it's time to jump in.' "