April 9, 2019

The Paperless Office and the Horseless Carriage

Topic: preservation

You know the story of the horseless carriage with a buggy-whip holder (in case you needed to put a horse in front of it). But what we wound up with is a wide variety of forms: motorcycle, automobile, tank, train, etc.  The transition was accompanied by corresponding developments in infrastructure.

When Xerox started its Palo Alto Research Center nearly 50 years ago, part of its mission was the Paperless Office -- a world where work was done without the use of paper.

We're still in the middle of a long transition to paperless processes from billing, statements, advertising, news, receipts, insurance, medical records,  government, textbooks, fiction, with documents still playing a major role in data-based activities. In most cases, these processes are turned from using paper documents to electronic ones, with PDF being an important carrier because of its ability to straddle the divide between the paper and electronic world.

In many of these processes, we're only seeing the beginnings of transition to another phase, of data connections, where electronic documents and email are being supplanted by sharing data, and documents only generated on demand by those who are outside of the data-centric roles.

HTML may have been originally designed as a document format for scientific papers at CERN, but its primary thrust in the last decade has been as a way of delivering applications to consumers.

The problem of the digital dark age is not so much technological obsolescence as it is that there are no document by-products of work done; this is not something that can be solved by new kinds of archival documents. In the records management community, documents must have or carry their own context which allows auditing of past behavior by examining the documents the process left behind.

We need some better ways of straddling the document and data world such that data archives are produced in a way that allows it to be audited, redacted, processed, without having all of the original context. Most data channels are, for efficiency reasons, context free. Metadata (embedded or supplemental) is a document-centric way of supplying context, but it isn't enough.


3 comments:

  1. Auditability can be supplied by keeping a private merkle-tree or merkle-chain (eg blockchain) and periodically publically publishing hashes (to verify that what I claim as history today can't be changed).

    Redactability can be either baked-in at publish-time, or must function as a view filter.

    Processing, and keeping context - This may be the most difficult part, because worst case it means archiving the (history of the) entire system that produced the data, to be able to re-place the data in context. Best case, given a well-known problem space, a fully reflective, self-describing document format should be okay.

    ReplyDelete
  2. Sure you can create a document to be a record of an event, but unless the document is actually used in the work, there's no reason to believe that the document reflects the event.

    Redaction of the external presentation doesn't redact the data if it is replicated.

    ReplyDelete
  3. Right, if your sources are indelible "semantic logs" (I'm including arbitrary graphs here), then you can create whatever "documents" you want -- they are just temporary artifacts generated for presentation, not the primary representation of ground truth.

    ReplyDelete