Showing posts with label PDF. Show all posts
Showing posts with label PDF. Show all posts

April 9, 2019

The Paperless Office and the Horseless Carriage

Topic: preservation

You know the story of the horseless carriage with a buggy-whip holder (in case you needed to put a horse in front of it). But what we wound up with is a wide variety of forms: motorcycle, automobile, tank, train, etc.  The transition was accompanied by corresponding developments in infrastructure.

When Xerox started its Palo Alto Research Center nearly 50 years ago, part of its mission was the Paperless Office -- a world where work was done without the use of paper.

We're still in the middle of a long transition to paperless processes from billing, statements, advertising, news, receipts, insurance, medical records,  government, textbooks, fiction, with documents still playing a major role in data-based activities. In most cases, these processes are turned from using paper documents to electronic ones, with PDF being an important carrier because of its ability to straddle the divide between the paper and electronic world.

In many of these processes, we're only seeing the beginnings of transition to another phase, of data connections, where electronic documents and email are being supplanted by sharing data, and documents only generated on demand by those who are outside of the data-centric roles.

HTML may have been originally designed as a document format for scientific papers at CERN, but its primary thrust in the last decade has been as a way of delivering applications to consumers.

The problem of the digital dark age is not so much technological obsolescence as it is that there are no document by-products of work done; this is not something that can be solved by new kinds of archival documents. In the records management community, documents must have or carry their own context which allows auditing of past behavior by examining the documents the process left behind.

We need some better ways of straddling the document and data world such that data archives are produced in a way that allows it to be audited, redacted, processed, without having all of the original context. Most data channels are, for efficiency reasons, context free. Metadata (embedded or supplemental) is a document-centric way of supplying context, but it isn't enough.


September 10, 2016

IETF "Security Considerations" and PDF

One of the things I've been doing lately is trying to dampen some of the misconceptions and misdirections concerning the Portable Document Format (PDF). I'm not sure why, except people seem to forget what was good about it and what wasn't (isn't).

Some background about PDF

Everyone has heard of PDF, but I'm not sure there's widespread understanding of its role and history. Wikipedia PDF isn't too bad; page independent data structures but based on Postscript, a way of getting licenses to embed fonts, first released in 1993. Originally a "distill" of a printed page, over the years, features were added: forms, 3D, compression, reflow, accessibility.

PDF is over 20 years old ... "as old as the Web"-- I first heard about it at GopherCon '93. Has it run its course, time for something new? But PDF supplies a unique solution for an application that spans the work between paper documents and electronic, and assurances of fidelity: if I send you a document, and you say you got it and read it (using a conforming reader) then I know what you saw.

 

 MIME types

In email and web, file types are labeled by a two part label, like text/html, image/jpeg, application/pdf. This "Internet Media Type" is (supposed to be) used in content-type headers in email and web as a way for the sender to say how a receiver should interpret the content (except for "sniffing" but that's another blog post).

There's an official list of media types managed by IANA (in the news lately for other reasons, another blog post). IANA, Internet Assigned Numbers[sic] Auhority, is in charge of maintaining the
 registries, as directed by the IETF.

IETF has a different decision-making process than other standards groups. However, as usual, the process involves creating and distributing a draft, and asking for comments. Comments need to be responded to, even if you don't make changes because of the comment. Different kinds of documents have different criteria for advancement, and it's sometimes hard to figure out what rules apply.

 

Getting draft-hardy-pdf-mime to RFC

I got into updating the registration of PDF a while back, while working on "PDF for RFCs", and, after consultation, took the path of revising the RFC which authorized the current registration, RFC 3778, in the form of a document that replaced 3778, including the registration template for application/pdf . That's the document I'm trying to get passed.

There were lots of comments during the review period, and I responded to most of them last week, in a single email.

Which process?

 I won't go into the detailed rules, but the path we chose involved getting IESG approval for an Informational specification, one of the paths laid out in RFC 6838, which lays out the rules for the IANA media type registry.

But which rules apply?  RFC 6838 Section 3.1 for types "registered by a recognized standards-related organization using the 'Specification Required' IANA registration policy [RFC5226]"? Or do we follow Section 6.1, "in cases where the original definition of the scheme is contained in an IESG-approved document, updates of the specification also requires IESG approval."?

And does the "DISCUSS" laid on the document's approval meet any of the criteria of the rules for a DISCUSS?

But I'd like to accommodate the  common request that the document say more about security of PDF software. It's well-known that PDF has been a vector for several infamous exploits... why can't we say more?

"Security Considerations"

IETF has an unusual policy of requiring ALL documents (https://tools.ietf.org/html/rfc3552 Section 5) to consider security and document threats and possible mitigations. ISO has no such rule; security is considered the responsibility of the implementation. W3C nominally does through TAG review, I think, but WHATWG is more haphazard.    The question remains: does a conforming implementation require that the implementation expose the user to security risks.

I'm sure we could say more. And if this were a new registration or the PDF spec itself I'd try. But application/pdf has been around over 20 years, the exploits and their prevention publicized.
 But there is no single valid account of software vulnerabilities; the paper suggested (in a COMMENT, not a DISCUSS) isn't anything I could cite; I disagree with too many parts.

I’ll go back to the question of the purpose of “Security Considerations” in MIME registrations; for whom should it be written?  For a novice, it is not enough. For an expert, you wind up enumerating the exploits that are understood and can be explained. The situation is fluid because the deployment of browser-based PDF interpreters is changing for desktop and mobile, and PDF is just another part of the web.

I agreed with the reasoning behind the requirement, that requiring everyone to write about Security might make them think a little more about security.

But I think there’s another view, that  Security is a feature of the implementation. It’s the implementation’s job to mitigate vulnerabilities. So any security problems, blame the implementation, not the protocol.  And the implementors need to worry about not writing buggy code, not just about security per se.

And there is no point of saying “write your implementations carefully”, because there are so many ways to write software badly. Talking about the obvious easy-to-describe exploits isn’t really useful, because we know how to avoid those.

Now perhaps this is just "don't set a bad precedent". So maybe the clue is to follow text/html, and suggest that "entire novels" have been written about PDF security, but not here in the Internet Media Type registration.

Medley Interlisp Project, by Larry Masinter et al.

I haven't been blogging -- most of my focus has been on Medley Interlisp. Tell me what you think!