Masinter's Musings: 2011

In the late 80s, we saw the fall of AI and Expert Systems as a "hot" technology -- the "AI winter". The methodology, in brief: build a representation system (a way of talking about facts about the world) and an inference engine (a way of making logical inferences bet of a set of facts). Get experts to tell you facts about the world. Grind the inference engine, and get new facts. Voila!

I always felt that the problem with the methodology was the failure of model theory to scale: the more people and time involved in developing the "facts" about the world, the more likely it is that the terminology in the representation system would fuzz -- that different people involved in entering and maintaining the "knowledge base" would disagree about what the terms in the representation system stood for.

The "semantic web" chose to use URIs as the terminology for grounding abstract assertions and creating a model where those assertions were presumed to be about the real world.

This exacerbates the scalability problem. URIs are intrinsically ambiguous and were not designed to be precise denotation terms. The semantic web terminology of "definition" and "assignment" of URIs reflects a point of view I fundamentally disagree with. URIs don't "denote". People may use them to denote, but it is a communication act; the fact that I say by "http://larry.masinter.net" I mean *me* does not imbue that URI with any intrinsic semantics.

I've been trying to get at these issues around ambiguity with the "duri" and "tdb" URI schemes, for example, but I think the fundamental perspective still simmers.

html,html5,w3c,standards,open source

I've been meaning to post some version of this forever, and it's been getting in the way of me blogging more. So ... here goes... incomplete and warty as this post is.

I've come to think that many of these differences might be from a "implementation" vs. "specification" view, but I'll have to say more about that later....

The ongoing battle for future control over HTML is dominated not only by the usual forces ("whose technology wins?") but also some very polarized views of what standards are, what they should be, how standards should work and so forth. The debate over these prinicples has really slowed down the development of web standards. Many of these issues were also presented at the November 2010 W3C Technical Plenary in the "HTML Next" session.

I've written down some of these polarized viewpoints, as an extreme position and a counterposition.

Matching Reality:

Standards should be written to "match reality": the standard should follow what (some, all, most, the important, the open source) systems have implemented (or are willing to implement in the very near future.)
Standards should try to "lead reality": The standard should try to move things in directions that improve modularity, reliability, and other values.

Of course, having standards that do not "match reality" in the long run is not a good situation, but the question is whether backward compatibility with (admittedly buggy) implementations should dominate the discussion of "where standards should go". If new standards always match the "reality" of existing content and systems, then you could never add any features at all. But if you're willing to add new features, why not also try to 'fix' things that are misimplemented or done badly? There does need to be a transition plan (how to make changes in a way that doesn't break existing content or viewers), but that's often feasible.

Precision:

Standards should precisely specify behavior, and give sufficient details for how to implement something "compatible" with the what is currently deployed, sufficiently that no user will complain that some implementation doesn't work "the same". Such behavior MUST be mandated by the standard.
Standards should minimize the compliance requirements to allow widest possible range of implementations; "interoperability" doesn't necessarily mean that even badly written web pages must be supported. Conformance ("MUST") should be used very sparingly.

Personally, I'm more on the "blue" side: the more precisely behavior is specified, the narrower the applicability of the standard. There's a tradeoff, but it seems better to err on the side of under- rather than over-specifying, if a standard is going to have a long-term value. If a subset of implementations want a more precise guideline, doing so could be in a separate implementation guide or profile.

Leading:

Standards should lead the community and add exciting new features. New features should ideally appear first in the standard.
Standards should follow innovative practice only after wide experience with technology. Sample implementations should be widely reviwed and tested; only after wide experience with technology should it be added to the standard.

In general standards should follow innovation. Refinements during the standardization phase might be seen as "leading", in order to satisfy the broader requirements brought to bear as the standard gets reviewed. There's a compromise, but looking for innovation from a committee.... well, we all know about "design by committee".

Extensibility:

Non-standard extensions should be avoided. Ideally, we should eliminate any non-standard extensions; everyone's experience should be the same.
Non-standard extensions are valuable. Innovations have (and will continue to) come from competing (non-standard) extensions, including plugins. Not all plugins are universally deployed; sites can choose to use non-standard extensions if they want.

In the past, plugins and other non-standard extensions have fueled new features; why should this trend stop? There are trade-offs, but moves to eliminate non-standard extensions or make them less viable are conter-productive.

Modularity:

Modularity is disruptive. Independent evolution of components leads to divergence and confusion. Independent committees go their own way. Subsets just mean unwanted choices and chaos.
Modularity is valuable. Specifying technology into smaller separate parts is beneficial: the ability to choose subsets extends the range of applications; modules can evolve independently.

Modularity is important, but it has to be done "right". Architecture recapitulations organizational structure; separate committes with independent specs requires a great deal of good-faith effort to coordinate, and there's not a lot of "good faith" going around.

Timely:

Standards take too long, move faster. Implementing and shipping the latest proposal is a good way to validate proposed standards and get technology in the hands of users. Standards that take years aren't interesting.
Encouraging users to deploy experimental extensions before they are completed will cause fragmentation, because not all experiments succeed.

The community can see innovation pretty quickly, but good standards take time. I'd rather see experimental features as "proposals" rather than passed around as "the standard" misleadingly.

Web Content Authors Ignore Standards:

Web authors don't care about standards. Most individual authors, designers, developers and content providers ignore standards anyway, so any efforts based on assuming authors will change isn't helpful.
Influencing authors is possible. Authors can and will adopt standards if popular browsers tie new features to standard-conforming content.

I'm not convincued that influencing content authors is impossible. Doing so requires some agreement from "leading implementors" to give authors sufficient feedback to make them care, but this isn't impossible. It's happened in other standards when it was important.

Versionless Standards and Always On Committee:

Standards committees should be chartered to work forever, because the technology needs to evolve continuously. A stable "standard" is just a meaningless snapshot. Standard committees should be "always on", to allow for rapid evolution. The notion of "version numbers" for standards is obsolete in a world where there are continual improvements.
Standards should be stable. Continual innovation is good for technology suppliers, but bad for standards; evolution should be handled by allowing individual technology providers to innovate, and then to bring these innovations into standards in specific versions.

We shouldn't guarantee "lifetime employement for standards writers". A stable document should have a long lifetime, not subject to constant revision. If we're not ready to settle on a feature, it should likely move into a separate document and be designed as a (perhaps proprietary) extension. An "always on" committee is more likely to concentrate power in the few who can afford to commit resources, independently of how deeply they are affected by changes.

Open Source:

Standards should always have an open source implementation. Allowing any company or software developer to provide their own private extensions is harmful; a content standard should be managed by the group of major (or major open source) implementors, so that any "standard" extension is available to all.
Open source is useful but unnecessary. Proprietary extensions and capabilities (originally from a single source or a consortium) have benefited the web in the past and will continue to be sources of innovation. While "open source" may be beneficial, not everything will or can be open source.

Working on open source implementations can go hand in hand with working with standards. However, a standard is very different from open source software. In the end, users care about compatibility of a wide variety of implementations. We shouldn't guarantee "lifetime employement for standards writers".

The "Web" is defined by "What Browsers Do":

The web is first and foremost “what browsers do”, and secondly a source of "web applications" technology (browser technology used for installable applications)
Other needs can dominate browser needs Web technologies extend to the widest range of Internet applications, including email, instant messaging, news distribution, syndication and aggregation, help systems, electronic publishing; requirements of these applications should have equal weight, even when those requirements are meaningless for what “browsers” are used for.

Royalty Free:

Avoid all patented technology. Every component of a browser MUST be implementable without any restriction based on patents or copyright (although creation tools, search engines, analysis, translation gateways, traffic analysis may not be)
Patented technology has a place. In some cases, patented technology cannot be avoided, or is so widespread that “royalty free” is just one more requirement among many tradeoffs.

Forking:

Forking a spec allows innovation. Having multiple specifications which offer different definitions same thing (such as HTML) allows leading features to be widely known and implemented, and allows group to work around organizational bottlenecks.
Forking a spec is harmful. Multiple specifications which claim to define the same thing is a power trip, causing confusion.

Accessibility:

Accessibility is just one of many requirements Accessibility is an important requirement for the web platform, but only one of many sets of requirements, to be traded off against the requirements of other user communities when developing standards
Accessibility is not an option. Insuring that those who deploy products implementing W3C standards allow building accessible content is necessary before W3C can endorse or recommend that standard.

Architecture:

Architecture is mainly theoretical; it is not a very useful concern; rather, invoking "architecture" is mainly a way of adding requirements that aren’t very useful.
Architecture and consistency is crucial. Consistency between components of the web architecture and guidelines for consistency and orthogonality are important enough that existing work should slow down to insure architectural consistency.

And a few other topics I ran out of time to elaborate:

Digital Rights Management: DRM is Evil? DRM is an Important feature?

Privacy: Up to browsers? Mandated in specs?

Voice: Integrated? Separate spec?

Applications: Great? Misuse: use Browser?

JavaScript: Essential, stable? Fundamentally broken?

Masinter's Musings

December 14, 2011

HTTP Status Cat: 418 - I'm a teapot

August 15, 2011

Expert System Scalability and the Semantic Web

August 7, 2011

Internet Privacy: TELLING a friend may mean telling THE ENEMY

July 16, 2011

Leadership: getting others to follow

June 26, 2011

Irreconcilable differences

Medley Interlisp Project, by Larry Masinter et al.

Links