April 9, 2019

The Paperless Office and the Horseless Carriage

Topic: preservation

You know the story of the horseless carriage with a buggy-whip holder (in case you needed to put a horse in front of it). But what we wound up with is a wide variety of forms: motorcycle, automobile, tank, train, etc.  The transition was accompanied by corresponding developments in infrastructure.

When Xerox started its Palo Alto Research Center nearly 50 years ago, part of its mission was the Paperless Office -- a world where work was done without the use of paper.

We're still in the middle of a long transition to paperless processes from billing, statements, advertising, news, receipts, insurance, medical records,  government, textbooks, fiction, with documents still playing a major role in data-based activities. In most cases, these processes are turned from using paper documents to electronic ones, with PDF being an important carrier because of its ability to straddle the divide between the paper and electronic world.

In many of these processes, we're only seeing the beginnings of transition to another phase, of data connections, where electronic documents and email are being supplanted by sharing data, and documents only generated on demand by those who are outside of the data-centric roles.

HTML may have been originally designed as a document format for scientific papers at CERN, but its primary thrust in the last decade has been as a way of delivering applications to consumers.

The problem of the digital dark age is not so much technological obsolescence as it is that there are no document by-products of work done; this is not something that can be solved by new kinds of archival documents. In the records management community, documents must have or carry their own context which allows auditing of past behavior by examining the documents the process left behind.

We need some better ways of straddling the document and data world such that data archives are produced in a way that allows it to be audited, redacted, processed, without having all of the original context. Most data channels are, for efficiency reasons, context free. Metadata (embedded or supplemental) is a document-centric way of supplying context, but it isn't enough.