Masinter's Musings: 09/2016

One of the things I've been doing lately is trying to dampen some of the misconceptions and misdirections concerning the Portable Document Format (PDF). I'm not sure why, except people seem to forget what was good about it and what wasn't (isn't).

Some background about PDF

Everyone has heard of PDF, but I'm not sure there's widespread understanding of its role and history. Wikipedia PDF isn't too bad; page independent data structures but based on Postscript, a way of getting licenses to embed fonts, first released in 1993. Originally a "distill" of a printed page, over the years, features were added: forms, 3D, compression, reflow, accessibility.

PDF is over 20 years old ... "as old as the Web"-- I first heard about it at GopherCon '93. Has it run its course, time for something new? But PDF supplies a unique solution for an application that spans the work between paper documents and electronic, and assurances of fidelity: if I send you a document, and you say you got it and read it (using a conforming reader) then I know what you saw.

MIME types

In email and web, file types are labeled by a two part label, like text/html, image/jpeg, application/pdf. This "Internet Media Type" is (supposed to be) used in content-type headers in email and web as a way for the sender to say how a receiver should interpret the content (except for "sniffing" but that's another blog post).

There's an official list of media types managed by IANA (in the news lately for other reasons, another blog post). IANA, Internet Assigned Numbers[sic] Auhority, is in charge of maintaining the
registries, as directed by the IETF.

IETF has a different decision-making process than other standards groups. However, as usual, the process involves creating and distributing a draft, and asking for comments. Comments need to be responded to, even if you don't make changes because of the comment. Different kinds of documents have different criteria for advancement, and it's sometimes hard to figure out what rules apply.

Getting draft-hardy-pdf-mime to RFC

I got into updating the registration of PDF a while back, while working on "PDF for RFCs", and, after consultation, took the path of revising the RFC which authorized the current registration, RFC 3778, in the form of a document that replaced 3778, including the registration template for application/pdf . That's the document I'm trying to get passed.

There were lots of comments during the review period, and I responded to most of them last week, in a single email.

Which process?

I won't go into the detailed rules, but the path we chose involved getting IESG approval for an Informational specification, one of the paths laid out in RFC 6838, which lays out the rules for the IANA media type registry.

But which rules apply? RFC 6838 Section 3.1 for types "registered by a recognized standards-related organization using the 'Specification Required' IANA registration policy [RFC5226]"? Or do we follow Section 6.1, "in cases where the original definition of the scheme is contained in an IESG-approved document, updates of the specification also requires IESG approval."?

And does the "DISCUSS" laid on the document's approval meet any of the criteria of the rules for a DISCUSS?

But I'd like to accommodate the common request that the document say more about security of PDF software. It's well-known that PDF has been a vector for several infamous exploits... why can't we say more?

"Security Considerations"

IETF has an unusual policy of requiring ALL documents (https://tools.ietf.org/html/rfc3552 Section 5) to consider security and document threats and possible mitigations. ISO has no such rule; security is considered the responsibility of the implementation. W3C nominally does through TAG review, I think, but WHATWG is more haphazard. The question remains: does a conforming implementation require that the implementation expose the user to security risks.

I'm sure we could say more. And if this were a new registration or the PDF spec itself I'd try. But application/pdf has been around over 20 years, the exploits and their prevention publicized.
But there is no single valid account of software vulnerabilities; the paper suggested (in a COMMENT, not a DISCUSS) isn't anything I could cite; I disagree with too many parts.

I’ll go back to the question of the purpose of “Security Considerations” in MIME registrations; for whom should it be written? For a novice, it is not enough. For an expert, you wind up enumerating the exploits that are understood and can be explained. The situation is fluid because the deployment of browser-based PDF interpreters is changing for desktop and mobile, and PDF is just another part of the web.

I agreed with the reasoning behind the requirement, that requiring everyone to write about Security might make them think a little more about security.

But I think there’s another view, that Security is a feature of the implementation. It’s the implementation’s job to mitigate vulnerabilities. So any security problems, blame the implementation, not the protocol. And the implementors need to worry about not writing buggy code, not just about security per se.

And there is no point of saying “write your implementations carefully”, because there are so many ways to write software badly. Talking about the obvious easy-to-describe exploits isn’t really useful, because we know how to avoid those.

Now perhaps this is just "don't set a bad precedent". So maybe the clue is to follow text/html, and suggest that "entire novels" have been written about PDF security, but not here in the Internet Media Type registration.

Masinter's Musings

September 10, 2016

IETF "Security Considerations" and PDF

Some background about PDF

MIME types

Getting draft-hardy-pdf-mime to RFC

Which process?

"Security Considerations"

Medley Interlisp Project, by Larry Masinter et al.

Links