Masinter's Musings: archiving

May 8, 2019

Using Secret Sharing for National Archives

Topic: preservation

Reading about the National Archives budget woes: perhaps as digital material is becoming more prevalent, the volumes of paper documents isn't growing?

What are the unique requirements of the National Archives for online storage of digital material?
Unlike most business archives, the lifetime of archived documents is measured in centuries.
The security of archives from both accidental and intentional loss is for that lifetime; unauthorized revelation most be prevented for at least decades.

For public records, LOCKSS (Lots of Copies Keeps Stuff Safe) might be a solution. Distribute copies to each state or region.

But for confidential records, the more copies, the more likely it is the information will be revealed.

But there is an approach worth investigating, using Secret Sharing where each State could maintain a separate secure facility under its own control.

This would allow some resilience to meddling, take-down, or premature release of information unless a large number of states agree.

April 9, 2019

The Paperless Office and the Horseless Carriage

Topic: preservation

You know the story of the horseless carriage with a buggy-whip holder (in case you needed to put a horse in front of it). But what we wound up with is a wide variety of forms: motorcycle, automobile, tank, train, etc. The transition was accompanied by corresponding developments in infrastructure.

When Xerox started its Palo Alto Research Center nearly 50 years ago, part of its mission was the Paperless Office -- a world where work was done without the use of paper.

We're still in the middle of a long transition to paperless processes from billing, statements, advertising, news, receipts, insurance, medical records, government, textbooks, fiction, with documents still playing a major role in data-based activities. In most cases, these processes are turned from using paper documents to electronic ones, with PDF being an important carrier because of its ability to straddle the divide between the paper and electronic world.

In many of these processes, we're only seeing the beginnings of transition to another phase, of data connections, where electronic documents and email are being supplanted by sharing data, and documents only generated on demand by those who are outside of the data-centric roles.

HTML may have been originally designed as a document format for scientific papers at CERN, but its primary thrust in the last decade has been as a way of delivering applications to consumers.

The problem of the digital dark age is not so much technological obsolescence as it is that there are no document by-products of work done; this is not something that can be solved by new kinds of archival documents. In the records management community, documents must have or carry their own context which allows auditing of past behavior by examining the documents the process left behind.

We need some better ways of straddling the document and data world such that data archives are produced in a way that allows it to be audited, redacted, processed, without having all of the original context. Most data channels are, for efficiency reasons, context free. Metadata (embedded or supplemental) is a document-centric way of supplying context, but it isn't enough.

Masinter's Musings

May 8, 2019

Using Secret Sharing for National Archives

Topic: preservation

April 9, 2019

The Paperless Office and the Horseless Carriage

Topic: preservation

Medley Interlisp Project, by Larry Masinter et al.

Links