REDMOND, Wash., Oct. 4, 2004 — The state of Washington opens the doors today on its new digital archive center in Cheney, Washington, a facility designed to preserve the state’s historical and legally significant documents and make them more accessible to the public. The archive solution — the first state and local government digital archives designed from the ground up to preserve and make accessible all the historical, legal and fiscal records for an entire state — was designed by Microsoft and EDS and built on the Microsoft Windows Server System. PressPass asked Steven Excell, Washington’s Assistant Secretary of State, to discuss the challenges of document preservation for state and local governments, and to describe the development of Washington’s new digital archive.
Steven Excell, Assistant Secretary of State, State of Washington
PressPass: Why is building a digital archive for state and local records important?
Excell: Since the late 1800s, most local, state and federal government agencies have done a pretty good job of preserving their historical and legally significant paper-based records. Since these are public records, they must remain available to scholars, researchers, legal experts and other interested individuals. However, maintaining public access to these often increasingly fragile public documents is both expensive to provide and threatening to the documents themselves. And since these physical documents usually exist in only one location, people must travel to where they are stored to view them.
Since the start of the computer revolution in the 1970s, the document archive situation has become more complex. In Washington state government offices, we estimate that the destruction rate for electronic documents is over 50 percent. This isn’t due to malicious activity. As in any large organization, in government offices server storage space costs money and the need to create more room on the drives sometimes tempts people to hit that delete key and destroy records that are historically or legally significant.
PressPass: Are these documents duplicated elsewhere or are they gone when they are gone?
Excell: Increasingly, many records are born electronic from the first moment, so they never have a paper existence. You can go to places in the United States where obtaining a marriage license involves filling out and signing forms in an online kiosk terminal. The document is born electronic and lives as a file in a data base; paper-based copies may not be stored. And as e-commerce becomes more commonplace and electronic work flow processes dominate the workplace, we’re actually eliminating the need for paper altogether in many government offices. For an archivist, records that are born electronic are even a greater challenge because they may not have a paper counterpart. If the file is deleted, the history is lost forever. We often pose the question: If Abraham Lincoln went to dedicate a cemetery at Gettysburg, Pennsylvania, and had used a laptop to write his Gettysburg Address, would we have it today?
So Washington state’s new digital archive is a preservation tool in two ways. It helps preserve records that were born electronic. It also enables us to make electronic images of rare records, manuscripts, maps or legal documents that we prefer people not handle, and allow the public to access them through a Web browser from the convenience of their home or office
PressPass: What were the challenges of building Washington state’s Digital Archives?
Excell: The scope of the project called for us to develop a digital archive solution that would preserve vast amounts of historical documents and information. We also needed to make the process of pulling in and storing all the electronic documents that state and local governments are currently producing as seamless and automatic as possible, so it would not cause an undue burden on other agencies and local governments. The public is another audience for the archives, so we also needed to make certain that accessing the electronic records was easy and convenient.
So our challenge was to create a digital archive solution that featured vast and expandable capacities of mass storage, and that allowed us to automatically pull in and “ingest” electronic digital records from a wide variety of sources and in a variety of file formats from across the state. We also needed the solution to automatically collect key information about the records so they are searchable and easily arrayed for viewing on the Web. The solution needed to be totally secure, so that no one could tamper with the records, yet they needed to accessible to the public.
PressPass: How did Microsoft and EDS become involved in the digital archive project?
Excell: The money for Washington state’s digital archives was allocated in 2001, and we started by doing a lot of experiments on our own as well as kicking the tires with different software companies and solution providers. We considered content management software solutions from a number of vendors, as they initially seemed to address our needs. However, these turned out to be useful from the point of view of document creation, version control and document security, but these solutions weren’t designed to manage a mass storage environment. We were more interested in a solution that could handle 800 terabytes of data and still have the robustness to quickly dish content out to the Web.
We also considered using a data warehousing approach, but this didn’t address our need to easily interact with our stakeholders out there in state and local government who don’t want to change their platforms and their applications just so it’s easier for our storage solution. Data warehousing also doesn’t provide a quick and easy way to address the expectations of the public, who demand a fast and robust experience when downloading documents to their Web browser.
I can’t tell you how many company presentations we have sat through in the last three years, but the one that caught our imagination was with Microsoft. We were invited to the Microsoft campus, where, in conjunction with EDS, their engineers had created a proof of concept demonstration at the Microsoft Partner Solution Center. They showed us a solution based on Microsoft Biz Talk Server 2004 as a core technology that would allow us pull data in from multiple sources in multiple file formats, and convert it into a consistent and searchable database.
Microsoft and EDS also showed us Microsoft SQL Server databases that were huge in scale, holding many hundreds of terabytes of satellite photos and imagery. We liked what we saw because it gave us robustness on a mass scale while remaining very scalable. With SQL Server, you can start small and keep growing organically. It grows with you, and you don’t have to make an investment in more capacity or software than you need at the time. In the end, I called it the Goldilocks solution, because we weren’t buying too little or too much, but buying just right.
PressPass: Could you provide an overview of the technology components in the Washington State digital archive solution?
Excell: Microsoft and EDS used the Microsoft Windows Server System and Visual Studio .NET to create the Digital Archive system and delivered it to the state in under nine months, and amazingly, under budget. BizTalk Server 2004 is used to connect hundreds of state and local government offices with the Digital Archives so legal and historical records can be automatically and electronically transmitted and archived with no human involvement of any kind.
This is where Biz Talk Server 2004 solution is magic, because it’s our Rosetta Stone. It takes the data in the format that our stakeholders and partners provide it in. From this it creates metadata, which is the index data that the search tools use, and then converts it in such a way that we can have uniform databases that are completely searchable yet secure. So we don’t have to go to all 39 Washington counties and ask them to convert to our standards. No matter what their standards are, we can use the power of the .NET Framework, XML, Web services and other underlying technologies to convert it. Local offices and governments can continue to work with their current vendors and systems. Biz Talk Server also allows us to lock archived records so they can’t be tampered with, and to generate index and metadata on the fly so records could be easily searched. Biz Talk Server 2004 also allowed us to set up levels of protections and permissions to monitor access to different types of materials, such as email, video, audio files, as examples.
The Microsoft SQL Server 2000 database is used to store archived records for long-term retrieval. There are a lot of myths around mass storage, and one of them is that Oracle is made for the big guys and SQL Server is made for the middleweights. There’s nothing in our experience with SQL Server to make us believe that it couldn’t stand up to Oracle toe to toe. With the Washington state archive solution, we are looking at something that is actually beyond enterprise size — it is designed to handle 800 terabytes of data storage, or the equivalent of 200 billion pages of text, coming in from 33,000 local and state agencies. We’re looking at holding the record for the size of a mass storage solution in a consistently searchable format.
PressPass: Washington state’s digital archive solution is the first of its kind. Why is this solution only being developed now?
Excell: The question I often get is why didn’t you build this 10 years earlier? Or why don’t you wait 10 more years? Well, we couldn’t do it 10 years earlier, because the unit cost of mass storage was too high. The standards for storage area network (SAN) technology have now been established, so we know that SANs provide a stable storage environment that will continue to grow organically. Also, 10 years ago it wasn’t clear that we were going to have the kind of interoperability across platforms that we have now using XML and Web services. The convergence of these three things has meant that this kind of archival solution became affordable and technically practical at the same moment. And if we waited 10 years, all we’d do is loose 10 more years of records. It’s a credit to our legislature, our governor and other key individuals that they saw the wisdom of dealing with this issue now.