Masinter's Musings: IETF

Showing posts with label IETF. Show all posts

May 9, 2019

On the nature of hierarchical URLs

Topic: URL

Dear Dr Masinter,

I am a French software engineer and have a technical question about RFC 3986, Uniform Resource Identifier (URI): Generic Syntax.

What do you mean exactly by “hierarchical data” in this paragraph?

“The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource within the scope of the URI’s scheme and naming authority (if any).”

The language "usually" in RFC 3986 indicates the text isn't normative, but rather explanatory. So the terms used may not be precise.

For arbitrary URIs, interpretation of "/" and the hierarchy it established depends on the scheme. For http: and https: URIs, the interpretation depends on the server implementation. While many HTTP servers map the path hierarchy to hierarchy of file names in the server's file system, such mappings are not usually 1-1, and might vary depending on server configuration, case sensitivity or interpretation of unnormalized Unicode strings. For web clients, the WHATWG URL spec defines how web clients parse and interpret URLs (which correspond roughly to RFC 3987 IRIs).

For an idea of what was intended, a search for "hierarchical data" yields a Wikipedia entry for "Hierarchical data model" that has relevant examples and how there are many data models in use.

When I look at a directed graph of resources (where resources are the vertices and links between them are the edges), I don’t know how to decide that a particular link between two resources A and B is “hierarchical” or “non hierarchical”. Should it be just antisymmetric (A → B, but not B → A)? Should it be more restrictive, for instance the inclusion relationship (A ⊃ B)? Or another property?

The choice of data model is up to your application, and you can decide for each application which model matches your use.

For instance, which of the following directed graphs of resources should I consider as having “hierarchical” links and which should I consider as having “non hierarchical” links?

1. animalia → mammalia → dog
2. Elisabeth II → Charles → William
3. 1989 → 01 → 31
4. Pink Floyd → The Dark Side of the Moon → Money

If we use antisymmetry as the criterion for being “hierarchical”, the links of all directed graphs will be “hierarchical”.
If we use inclusion as the criterion for being “hierarchical”, only the links of 1 will be “hierarchical”, since in 2 a child has two parents, in 3 a month is shared by every year and in 4 an album can be created by several artists.

This information is needed to know if we should use the path component:

1. /animalia/mammalia/dog
2. /elisabeth-ii/charles/william
3. /1989/01/31
4. /pink-floyd/the-dark-side-of-the-moon/money

or the query component:

1. ?kingdom=animalia&class=mammalia&species=dog
2. ?grandparent=elisabeth-ii&parent=charles&child=william
3. ?year=1989&month=01&day=31
4. ?artist=pink-floyd&album=the-dark-side-of-the-moon&song=money

to identify a resource.

which one of these you use depends on not one case but the entire set of data you're trying to organize.

In case 1, if you're identifying properties of groups of organisms by their taxonomic designation, the namespace is by definition hierarchical, established and managed by complex rules to resolve non-hierarchical conflicts, overlaps and disagreements. But the taxonomy of rankings has 7 or 8 levels (depending on how you count), not 3, and your example says "dog" and it's not clear if you mean Canis lupus familiaris or the entire family of Canidae. The inclusion relation is not hierarchical.

In case 2, you might have a hierarchy if you restricted the relationship to "heir apparent of".
In case 3, many use the hierarchy to organize the data such that 1989/01/13 is used to identify a resource from the first of January if the year 1983.
In case 4, many music organizers sort out files by inventing a new category of "artist" for the album-artist and using "A, B, C" for albums where there are multiple arists, and "Various Artists" for "albums" that have varieties of artists for each song.

People often invent hierarchies as a way of managing access control. WebDAV includes operations based on hierarchies.

Sometimes what is desired is a combination of different data models mixed, with the hierarchical model a "default" view, and the query parameters used to search the space and redirect to a canonical hierarchical URI.

If you're defining the space for an API, the JSON-HAL design pattern adds a layer of indirection using link relations rather than URL patterns, especially in applications where the hierarchical pattern is used for high-performance web servers which use the URL syntax for optimized load-balanced servers with multiple patterns.

Yours faithfully,

XXXX XXXX (name redacted)

September 10, 2016

IETF "Security Considerations" and PDF

One of the things I've been doing lately is trying to dampen some of the misconceptions and misdirections concerning the Portable Document Format (PDF). I'm not sure why, except people seem to forget what was good about it and what wasn't (isn't).

Some background about PDF

Everyone has heard of PDF, but I'm not sure there's widespread understanding of its role and history. Wikipedia PDF isn't too bad; page independent data structures but based on Postscript, a way of getting licenses to embed fonts, first released in 1993. Originally a "distill" of a printed page, over the years, features were added: forms, 3D, compression, reflow, accessibility.

PDF is over 20 years old ... "as old as the Web"-- I first heard about it at GopherCon '93. Has it run its course, time for something new? But PDF supplies a unique solution for an application that spans the work between paper documents and electronic, and assurances of fidelity: if I send you a document, and you say you got it and read it (using a conforming reader) then I know what you saw.

MIME types

In email and web, file types are labeled by a two part label, like text/html, image/jpeg, application/pdf. This "Internet Media Type" is (supposed to be) used in content-type headers in email and web as a way for the sender to say how a receiver should interpret the content (except for "sniffing" but that's another blog post).

There's an official list of media types managed by IANA (in the news lately for other reasons, another blog post). IANA, Internet Assigned Numbers[sic] Auhority, is in charge of maintaining the
registries, as directed by the IETF.

IETF has a different decision-making process than other standards groups. However, as usual, the process involves creating and distributing a draft, and asking for comments. Comments need to be responded to, even if you don't make changes because of the comment. Different kinds of documents have different criteria for advancement, and it's sometimes hard to figure out what rules apply.

Getting draft-hardy-pdf-mime to RFC

I got into updating the registration of PDF a while back, while working on "PDF for RFCs", and, after consultation, took the path of revising the RFC which authorized the current registration, RFC 3778, in the form of a document that replaced 3778, including the registration template for application/pdf . That's the document I'm trying to get passed.

There were lots of comments during the review period, and I responded to most of them last week, in a single email.

Which process?

I won't go into the detailed rules, but the path we chose involved getting IESG approval for an Informational specification, one of the paths laid out in RFC 6838, which lays out the rules for the IANA media type registry.

But which rules apply? RFC 6838 Section 3.1 for types "registered by a recognized standards-related organization using the 'Specification Required' IANA registration policy [RFC5226]"? Or do we follow Section 6.1, "in cases where the original definition of the scheme is contained in an IESG-approved document, updates of the specification also requires IESG approval."?

And does the "DISCUSS" laid on the document's approval meet any of the criteria of the rules for a DISCUSS?

But I'd like to accommodate the common request that the document say more about security of PDF software. It's well-known that PDF has been a vector for several infamous exploits... why can't we say more?

"Security Considerations"

IETF has an unusual policy of requiring ALL documents (https://tools.ietf.org/html/rfc3552 Section 5) to consider security and document threats and possible mitigations. ISO has no such rule; security is considered the responsibility of the implementation. W3C nominally does through TAG review, I think, but WHATWG is more haphazard. The question remains: does a conforming implementation require that the implementation expose the user to security risks.

I'm sure we could say more. And if this were a new registration or the PDF spec itself I'd try. But application/pdf has been around over 20 years, the exploits and their prevention publicized.
But there is no single valid account of software vulnerabilities; the paper suggested (in a COMMENT, not a DISCUSS) isn't anything I could cite; I disagree with too many parts.

I’ll go back to the question of the purpose of “Security Considerations” in MIME registrations; for whom should it be written? For a novice, it is not enough. For an expert, you wind up enumerating the exploits that are understood and can be explained. The situation is fluid because the deployment of browser-based PDF interpreters is changing for desktop and mobile, and PDF is just another part of the web.

I agreed with the reasoning behind the requirement, that requiring everyone to write about Security might make them think a little more about security.

But I think there’s another view, that Security is a feature of the implementation. It’s the implementation’s job to mitigate vulnerabilities. So any security problems, blame the implementation, not the protocol. And the implementors need to worry about not writing buggy code, not just about security per se.

And there is no point of saying “write your implementations carefully”, because there are so many ways to write software badly. Talking about the obvious easy-to-describe exploits isn’t really useful, because we know how to avoid those.

Now perhaps this is just "don't set a bad precedent". So maybe the clue is to follow text/html, and suggest that "entire novels" have been written about PDF security, but not here in the Internet Media Type registration.

February 15, 2015

Birthday greetings, Packed committees, community, standards

Today is my birthday. I woke to find many birthday greetings on Facebook, and more roll in throughout the day. It's hard to admit how pleasing it is, embarrassing. I haven't asked for birthday greetings and don't usually give them. Maybe I'll change my mind.

Perhaps I'm late to the party but I'm still trying to understand the 'why' of social networking -- why does Facebook encourage birthday greetings? What human pleasure does getting 11 "happy birthday" notes trigger?
But it fits into the need to have and build community, and the mechanism for community requires periodic acknowledgement. We engage in sharing our humanity (everyone has a birthday) by greeting. Hello, goodbye, I'm here, poke. But not too often, once a year is enough.
I wrote about standards and community yesterday on the IETF list, but people didn't get it.
Explaining that message and its relationship to birthday greetings is hard.
The topic of discussion was "Updating BCP 10 -- NomCom ELEGIBILITY".

IETF : group that does Internet Standards
BCP10: the unique process for how IETF recruits and picks leadership
NOMCOM: the "Nominating Committee" which picks leadership amongst volunteers
elegibility: qualifications for getting on the NOMCOM

I think BCP10 is a remarkable piece of social engineering, at the center of the question of governance of the Internet: how to make hard decisions, who has final authority, who gets to choose them, and how to choose the choosers. Most standards groups get their authority from governments and treaties or are consortia. But IETF developed a complex algorithm for trying to create structure without any other final authority to resolve disputes.

But it looks like this complex algorithm is buggy, and the IETF is trying to debug the process without being too open about the problem. The idea was to let people volunteer, and choose randomly among qualified volunteers. But what qualifications? There's been some concern about the latest round of nomcom volunteers, that's what started this thread.

During the long email thread on the topic, the discussion turned to the tradeoffs between attending a meeting in person vs. using new Internet tools for virtual meetings or more support for remote participation. Various people noted that the advantage of meeting in person is the ability to have conversations in the hallways outside the formal, minuted meetings.

I thought people were too focused on their personal preferences rather than the needs of the community. What are we trying to accomplish, and how do meetings help with that? How would we satisfy the requirements for effective work.

A few more bits: I mention some of the conflicts between IETF and other standards groups over URLs and JSON because W3C, WHATWG, ECMA are different tribes, different communities.

Creating effective standards is a community activity to avoid the Tragedy of the Commons that would result if individuals and organizations all went their own way. The common good is “the Internet works consistently for everyone” which needs to compete against “enough of the Internet works ok for my friends” where everyone has different friends.

For voluntary standards to happen, you need rough consensus — enough people agree to force the remainder to go along.

It’s a community activity, and for that to work there has to be a sense of community. And video links with remote participation aren’t enough to create a sense of community.

There are groups that purport to manage with minimal face-to-face meetings, but I think those are mainly narrow scope and a small number of relevant players, or an already established community, and they regularly rely heavily on 24/7 online chat, social media, open source tools, wikis which are requirements for full participation.

The “hallway conversations” are not a nice-to-have, they’re how the IETF preserves community with open participation.
One negative aspect of IETF “culture” (loosely, the way in which the IETF community interacts) is that it isn’t friendly or easy to match and negotiate with other SDOs, so we see the WHATWG / W3C / IETF unnecessary forking of URL / URI / IRI, encodings, MIME sniffing, and the RFC7159-JSON competing specs based at least partly on cultural misunderstandings.
The main thing nomcom needs to select for is technical leadership (the skill of getting people to follow) in service of the common good). And nomcom members should have enough experience to have witnessed successful leadership. One hopes there might be some chance of that just by attending 3 meetings, although the most effective leadership is often exercised in those private hallway conversations where compromises are made.

September 14, 2014

Living Standards: "Who Needs IANA?"

I'm reading about two tussles, which seem completely disconnected, although they are about the same thing, and I'm puzzled why there isn't a connection.

This is about the IANA protocol parameter registries. Over in ianaplan@ietf.org people are worrying about preserving the IANA function and the relationship between IETF and IANA, because it is working well and shouldn't be disturbed (by misplaced US political maneuvering that the long-planned transition from NTIA is somehow giving something away by the administration.)

Meanwhile, over in www-international@w3.org, there's a discussion of the Encodings document, being copied from WHATWG's document of that name into W3C recommendation. See the thread (started by me), about the "false statement".

Living Standards don't need or want registries for most things the web use registries for now: Encodings, MIME types, URL schemes. A Living Standard has an exhaustive list, and if you want to add a new one or change one, you just change the standard. Who needs IANA with its fussy separate set of rules? Who needs any registry really?

So that's the contradiction: why doesn't the web need registries while other applications do? Or is IANAPLAN deluded?

September 9, 2014

The multipart/form-data mess

OK, this is only a tiny mess, in comparison with the URL mess, and I have more hope for this one.

Way back when (1995), I spec'ed a way of doing "file upload" in RFC1867. I got into this because some Xerox printing product in the 90s wanted it, and enough other folks in the web community seemed to want it too. I was happy to find something that a Xerox product actually wanted from Xerox research.

It seemed natural, if you were sending files, to use MIME's methods for doing so, in the hopes that the design constraints were similar and that implementors would already be familiar with email MIME implementations. The original file upload spec was done in IETF because at the time, all of the web, including HTML, was being standardized in the IETF. RFC 1867 was "experimental," which in IETF used to be one way of floating a proposal for new stuff without having to declare it ready.

After some experimentation we wanted to move the spec toward standardization. Part of the process of making the proposal standard was to modularize the specification, so that it wasn't just about uploading files in web pages. Rather, all the stuff about extending forms and names of form fields and so forth went with HTML. And the container, the holder of "form data"-- independent of what kind of form you had or whether it had any files at all -- went into the definition of multipart/form-data (in RFC2388). Now, I don't know if it was "theoretical purity" or just some sense of building things that are general purpose to allow unintended mash-ups, but RFC2388 was pretty general, and HTML 3.2 and HTML 4.0 were being developed by people who were more interested in spec-ing a markup language than a form processing application, so there was a specification gap between RFC 2388 and HTML 4.0 about when and how and what browsers were supposed to do to process a form and produce multipart/form-data.

February of last year (2013) I got a request to find someone to update RFC 2388. After many months of trying to find another volunteer (most declined because of lack of time to deal with the politics) I went ahead and started work: update the spec, investigate what browsers did, make some known changes. See GitHub repo for multipart/form-data and the latest Internet Draft spec.

Now, I admit I got distracted trying to build a test framework for a "test the web forward" kind of automated test, and spent way too much time building what wound up to be a fairly arcane system. But I've updated the document, and recommended its "working group last call". The only problem is that I just made stuff up based on some unvalidated guesswork reported second hand ... there is no working group of people willing to do work. No browser implementor has reviewed the latest drafts that I can tell.

I'm not sure what it takes to actually get technical reviewers who will actually read the document and compare it to one or more implementations to justify the changes in the draft.

Go to it! Review the spec! Make concrete suggestions for change, comments or even better, send GitHub pull requests!

September 6, 2013

HTTP meeting in Hamburg

I was going to do a trip report about the HTTPbis meeting August 5-7 at the Adobe Hamburg office, but wound up writing up a longer essay about HTTP/2.0 (which I will post soon, promise.) So, to post the photo:

It was great to have so many knowledgeable implementors working on live interoperability: 30 people from around the industry and around the world came, including participants from Adobe, Akamai, Canon, Google, Microsoft, Mozilla, Twitter, and many others representing browsers, servers, proxies and other intermediaries.
It's good the standard development is being driven by implementation and testing. While testing across the Internet is feasible, meeting face-to-face helped with establishing coordination on the standard.
I do have some concerns about things that might go wrong, which I'll also post soon.

Masinter's Musings