Masinter's Musings

Medley Interlisp Project, by Larry Masinter et al.

2023-04-08T18:42:00.000-07:00

I haven't been blogging -- most of my focus has been on Medley Interlisp.

Tell me what you think!

2021-12-05T18:00:00.000-08:00

A year and a half into the project, I'm still having fun at Interlisp.Org.

One of my favorite hacks: On April 1 some spring in the late 70s I changed the site GREET file (which everyone used) to use "Screen Melt" as a screen saver. No one was amused.

Restoring Medley Interlisp: running well on modern systems

2020-08-11T12:13:00.004-07:00

at Interlisp.org

and on GitHub (and see issue list)

It's great to work with old friends, as if 30 years hadn't passed.

We have a lispcore@googlegroups.com and interlisp@googlegroups.com mailing lists and weekly zoom calls.

Using old code feels like driving a vintage car. The quirks are more noticeable. I'm hoping we can smooth some of the rough edges of using it on different platforms and also bringing up some of the applications built in Interlisp.

Shown running on a $60 Raspberry Pi (lower left) about 50 times faster than a 1982 Dorado.

The Office of the Future: the Officeless Office

2020-07-05T18:37:00.004-07:00

The office of future is becoming clear: it's "no office location". The economics of that is clear
(Economist video via YouTube)
Keeping expensive real estate and wasting time commuting to an office in a big building, to do a job on the off chance you might bump into someone is a waste

The Xerox Star system was a cornerstone of the vision of "the office of the future" but the next step after getting rid of paper is getting rid of "the office" completely.

The Epidemiology of Bad Ideas

2020-06-12T11:18:00.000-07:00

Lately there's been a lot of discussion about Section 230 and the responsibility (if there is) of the distributor of user opinions. But this discussion is misdirected.

The problem the Internet brings is that it allows for unfettered spread of bad ideas and lies (attributable to some combination of stupidity and malice).

Think of a bad idea like a virus, fact checking and filtering like testing and quarantine, and Twitter and Facebook postings like mass spreading events. From that perspective, what we need is the equivalent of washing hands and social distancing.

Any site distributing an idea, a post, a share to more than N people should add some social distance -- a time delay (depending on reach), fact checking, annotating the post or adding a click-through.... sufficient filtering/delay to flatten the curve. How much of their business model depends on posts going viral QUICKLY?

Recently, Carol tried to read me an article about the latest outrageous behavior. But I protested: I'd developed antibodies that protect me from the awfulness. By continued exposure, nothing was that bad anymore, we had all seen more. And there are teams of people all working full time developing new outrageous things.

So bad ideas spread like viruses. Some people are able to develop a defense, through outrage. The epidemic is spread by "news" organizations and social media as contagious in thought as a mass event is for the virus.

Facebook and Twitter and Google and other social media sites need to implement thought-social distancing to reduce R(0) of a bad idea.

Facebook's ads promoting CDC would be OK if it weren't for the corruption of CDC management by the wave of bad ideas, spread by immune-compromised individuals (don't recognize ideas as bad) and allowed unchecked by those who exhibit no symptoms (outrage).

Building Going-Remote.Info

2020-04-18T21:45:00.000-07:00

In a sort of frenetic panic, I started the Covid-19 community group at W3C, with the idea of working together with others to help people "Going Remote". It seemed like an ]area that required people to jump in without a lot of guidance.

But the community group didn't quite jell because we didn't really agree on the scope of work.

So after considering lots of possibilities I decided to put up a stake for https://Going-Remote.info as a web site/wiki/discussion forum. This is old fashioned but I think it will appeal to the kind of community I think we can build.

Log into https://Going-Remote.info and check it out.

W3C: CoVid-19 Remote Meet, Work, Class Community Group created

2020-03-12T12:04:00.001-07:00

In the face of the covid-19 pandemic and travel restrictions, lots of meetings, schools, other kinds of events are trying to "go virtual". To help people understand their choices and other concerns, we started a "community group" at W3C.

Inviting experts in the current state of usable technology for remote meetings, classes, work from home, etc. to help put together best practices documents for the transition.

From the announcement:

The CoVid-19 Remote Meet, Work, Class Community Group has been launched:
http://www.w3.org/community/covid-19/

A clearinghouse for experience and guidelines for people who are suddenly called to avoid travel or meetings, work-at-home or do classes online. Focus on current capabilities and future needs.

To join:
http://www.w3.org/community/covid-19/join

If you do not have one already, you will need a W3C account to join:
http://www.w3.org/accounts/request

This is a community initiative. W3C's hosting of this group does not imply endorsement of the activities.

The group must now choose a chair:
http://www.w3.org/community/about/faq/#how-do-we-choose-a-chair

For more information about getting started in the new group, see:
http://www.w3.org/community/about/faq/#how-do-we-get-started-in-a-new-group

and good practice for running a group:
http://www.w3.org/community/about/good-practice-for-running-a-group/

We invite you to share news of this new group in social media and other channels.

Building AI with computation and data distributed using cpus of autonomous vehicles

2020-02-06T22:55:00.000-08:00

I've been trying to calculate how much compute power would be available for machine learning tasks if the entire fleet of ~1M Tesla's were available (at some trickle usage of incremental battery use, or maybe just when driving) and compare that to what Amazon has with Echo's (not anything like as powerful) or just spare load on Amazon/ Apple / Google cloud platforms.

It might be a unique opportunity to develop distributed AI in an interesting computational base.
I'll update as I find out more.

I've been Wikipedia'd!

2019-12-08T17:25:00.000-08:00

At some point I had the silly idea that I should be listed in Wikipedia. Now like a monkey's claw, like Midas' Touch, I discovered it happened, my wish has turned into a curse. My Wikipedia page is full of nonsense. It is hard to find a sentence that doesn't have an error or two or three. And each error correction requires four things, including cite-able third party proof that the change is justified .

Is this typical? To see so many errors in Wikipedia articles?

All of the Interlisp work was at Xerox. Although I was listed as a student at Stanford and didn't get my PhD until 1980, I was working at Xerox full-time after 1976.
I had nothing to do with Interlisp-Jericho.
There wasn't a port of Interlisp to the vax, there was an effort to build one, and I wrote a document trying to scope out how much work that was to be done. That document wasn't to "document the port".
My work at Stanford was on the Dendral project as an employee (my Alternative Service), not as a student. The program was in Lisp.
My work on document management was almost all at Xerox, not for Adobe. I didn't do "pioneering work on the PDF format" (for anyone).
I remained an employee of Xerox PARC, becoming a "Principal Scientist", but never had the title "Chief Scientist" and never reported to "Xerox AI Systems".
I wasn't "instrumental in the development of the PDF MIME type" (I helped publish it at best.)
My work on internet standards through IETF and W3C was over many years, between Xerox, AT&T Labs and Adobe. But it was mainly a volunteer effort on my part.
Internet standards are not published in "peer reviewed journals"; they are reviewed, but for different reasons than peer-reviewed journals.
I never worked on Apache. I never collaborated with Nick Kew or Kim Veltman or anyone else on any book.
The footnote references don't correspond very well to the topics discussed.

Why these odd anonymous comments?

2019-07-01T00:20:00.003-07:00

I've been getting comments on my blog (hosted by blogger) that puzzle me.

You have brought up a very wonderful details, thanks for the post
Appreciate it for helping out, excellent info
This site definitely has all the information I needed about this subject and didn't know who to ask.
Hi Dear, are you actually visiting this web page on a regular basis, if so afterward you will definitely obtain fastidious know-how.
Excellent post. I was checking continuously this blog and I am impressed! Extremely helpful info particularly the last part :) I care for such information much. I was seeking this certain info for a very long time. Thank you and best of luck.
You have made some good points there. I checked on the web for additional information about the issue and found most individuals will go along with your views on this site.
I am in fact pleased to glance at this webpage posts which contains lots of useful data, thanks for providing these kinds of statistics.

It wasn't until I had gotten 2 or 3 that I went from "Publish" to "Delete" to "Mark as spam" (the three options offered by blogger.)

The things that distinguish these comments:

They exist. I rarely get comments. These happen once or twice a month
They always are Anonymous
They always have bad english grammar.
They don't fit any known category of spam; not promoting anything or links anywhere
They are flattering
They contain nothing that would associate them with the content of the blog post they are commenting on.

Here are some theories:

This is a data communication method, using steganography based on grammatical or punctuation or word choices.
There is some grad student running experiments / training an AI etc. based on human spam detection
Someone is running a background check for blogs that auto-publish anonymous comments.

Any better theories? Leave a comment 😀

On the nature of hierarchical URLs

2019-05-09T15:08:00.000-07:00

Topic: URL

Dear Dr Masinter,

I am a French software engineer and have a technical question about RFC 3986, Uniform Resource Identifier (URI): Generic Syntax.

What do you mean exactly by “hierarchical data” in this paragraph?

“The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource within the scope of the URI’s scheme and naming authority (if any).”

The language "usually" in RFC 3986 indicates the text isn't normative, but rather explanatory. So the terms used may not be precise.

For arbitrary URIs, interpretation of "/" and the hierarchy it established depends on the scheme. For http: and https: URIs, the interpretation depends on the server implementation. While many HTTP servers map the path hierarchy to hierarchy of file names in the server's file system, such mappings are not usually 1-1, and might vary depending on server configuration, case sensitivity or interpretation of unnormalized Unicode strings. For web clients, the WHATWG URL spec defines how web clients parse and interpret URLs (which correspond roughly to RFC 3987 IRIs).

For an idea of what was intended, a search for "hierarchical data" yields a Wikipedia entry for "Hierarchical data model" that has relevant examples and how there are many data models in use.

When I look at a directed graph of resources (where resources are the vertices and links between them are the edges), I don’t know how to decide that a particular link between two resources A and B is “hierarchical” or “non hierarchical”. Should it be just antisymmetric (A → B, but not B → A)? Should it be more restrictive, for instance the inclusion relationship (A ⊃ B)? Or another property?

The choice of data model is up to your application, and you can decide for each application which model matches your use.

For instance, which of the following directed graphs of resources should I consider as having “hierarchical” links and which should I consider as having “non hierarchical” links?

1. animalia → mammalia → dog
2. Elisabeth II → Charles → William
3. 1989 → 01 → 31
4. Pink Floyd → The Dark Side of the Moon → Money

If we use antisymmetry as the criterion for being “hierarchical”, the links of all directed graphs will be “hierarchical”.
If we use inclusion as the criterion for being “hierarchical”, only the links of 1 will be “hierarchical”, since in 2 a child has two parents, in 3 a month is shared by every year and in 4 an album can be created by several artists.

This information is needed to know if we should use the path component:

1. /animalia/mammalia/dog
2. /elisabeth-ii/charles/william
3. /1989/01/31
4. /pink-floyd/the-dark-side-of-the-moon/money

or the query component:

1. ?kingdom=animalia&class=mammalia&species=dog
2. ?grandparent=elisabeth-ii&parent=charles&child=william
3. ?year=1989&month=01&day=31
4. ?artist=pink-floyd&album=the-dark-side-of-the-moon&song=money

to identify a resource.

which one of these you use depends on not one case but the entire set of data you're trying to organize.

In case 1, if you're identifying properties of groups of organisms by their taxonomic designation, the namespace is by definition hierarchical, established and managed by complex rules to resolve non-hierarchical conflicts, overlaps and disagreements. But the taxonomy of rankings has 7 or 8 levels (depending on how you count), not 3, and your example says "dog" and it's not clear if you mean Canis lupus familiaris or the entire family of Canidae. The inclusion relation is not hierarchical.

In case 2, you might have a hierarchy if you restricted the relationship to "heir apparent of".
In case 3, many use the hierarchy to organize the data such that 1989/01/13 is used to identify a resource from the first of January if the year 1983.
In case 4, many music organizers sort out files by inventing a new category of "artist" for the album-artist and using "A, B, C" for albums where there are multiple arists, and "Various Artists" for "albums" that have varieties of artists for each song.

People often invent hierarchies as a way of managing access control. WebDAV includes operations based on hierarchies.

Sometimes what is desired is a combination of different data models mixed, with the hierarchical model a "default" view, and the query parameters used to search the space and redirect to a canonical hierarchical URI.

If you're defining the space for an API, the JSON-HAL design pattern adds a layer of indirection using link relations rather than URL patterns, especially in applications where the hierarchical pattern is used for high-performance web servers which use the URL syntax for optimized load-balanced servers with multiple patterns.

Yours faithfully,

XXXX XXXX (name redacted)

Using Secret Sharing for National Archives

2019-05-08T16:58:00.000-07:00

Topic: preservation

Reading about the National Archives budget woes: perhaps as digital material is becoming more prevalent, the volumes of paper documents isn't growing?

What are the unique requirements of the National Archives for online storage of digital material?
Unlike most business archives, the lifetime of archived documents is measured in centuries.
The security of archives from both accidental and intentional loss is for that lifetime; unauthorized revelation most be prevented for at least decades.

For public records, LOCKSS (Lots of Copies Keeps Stuff Safe) might be a solution. Distribute copies to each state or region.

But for confidential records, the more copies, the more likely it is the information will be revealed.

But there is an approach worth investigating, using Secret Sharing where each State could maintain a separate secure facility under its own control.

This would allow some resilience to meddling, take-down, or premature release of information unless a large number of states agree.

The Paperless Office and the Horseless Carriage

2019-04-09T13:04:00.001-07:00

Topic: preservation

You know the story of the horseless carriage with a buggy-whip holder (in case you needed to put a horse in front of it). But what we wound up with is a wide variety of forms: motorcycle, automobile, tank, train, etc. The transition was accompanied by corresponding developments in infrastructure.

When Xerox started its Palo Alto Research Center nearly 50 years ago, part of its mission was the Paperless Office -- a world where work was done without the use of paper.

We're still in the middle of a long transition to paperless processes from billing, statements, advertising, news, receipts, insurance, medical records, government, textbooks, fiction, with documents still playing a major role in data-based activities. In most cases, these processes are turned from using paper documents to electronic ones, with PDF being an important carrier because of its ability to straddle the divide between the paper and electronic world.

In many of these processes, we're only seeing the beginnings of transition to another phase, of data connections, where electronic documents and email are being supplanted by sharing data, and documents only generated on demand by those who are outside of the data-centric roles.

HTML may have been originally designed as a document format for scientific papers at CERN, but its primary thrust in the last decade has been as a way of delivering applications to consumers.

The problem of the digital dark age is not so much technological obsolescence as it is that there are no document by-products of work done; this is not something that can be solved by new kinds of archival documents. In the records management community, documents must have or carry their own context which allows auditing of past behavior by examining the documents the process left behind.

We need some better ways of straddling the document and data world such that data archives are produced in a way that allows it to be audited, redacted, processed, without having all of the original context. Most data channels are, for efficiency reasons, context free. Metadata (embedded or supplemental) is a document-centric way of supplying context, but it isn't enough.

The Internet is a WMD (Weapon of Mass Delusion)

2019-01-10T21:47:00.000-08:00

Some thoughts on the Internet as the root problem, based on this Washington Post Op-Ed.

When I was a kid, I remember reading about Robert Oppenheimer's work on the atomic bomb and his thought that scientists developing technologies that could be weaponized had some responsibility to counter its misuse. It was probably in the phase of reading bios.

If the Internet, the Web, social media, analytics and targeted advertising have been weaponized, turned into weapons of mass delusion, what are we doing to effectively counter this threat?

I see some focus on decentralization, but that doesn't seem to mitigate the problem.

IETF, ISOC, ICANN, W3C, ACM, somewhere else? where?

It seems like this isn't a problem that can be solved by fixing a handful of sites. Or applying sanctions to individuals and companies. how can we counter the susceptibility of mass communication systems to this kind of manipulation and the continuing escalating arms race of hacks and prevention hacks?

Pseudonymous postings seem to be essential, with an increasing sophistication of operational techniques.

Debugging my brain

2018-04-26T17:35:00.000-07:00

Today was my fifth session with Dr. Nengchun Huang trying to adjust the signals going to my brain to improve my Parkinson's symptoms. I've been doing this every two weeks. Each session has lasted an hour to an hour and a half, as the doctor adjusts the signals and watches my reaction: does my thumb wiggle, my eyes droop or blink rapidly, does my foot twist uncontrollably or steady. Periodically he asks me to do various tasks -- tap my thumb and fingers together like playing castinets; touch my nose and then his finger, back and forth; tap my foot rapidly and evenly. Periodically we take off the collar and I get up and go out to walk down the hall and back while he watches the regularity of my pace and whether my hand shakes while I'm walking. It's almost always been possible to stop the tremor for a moment by focusing my attention on the body part (like with yoga or tai chi), but usually only for a few seconds. "Don't suppress the tremor", he reminds me often, but it's often reflex.
The ideal is "no symptoms, steady state", and each session has gotten better. For example, the strong impulse to close my eyes is less. My remote control (here wired to the PC) allows me to adjust the overall voltage slightly, and choose between the latest and one of three previously saved programs.

They call this "programming" but it's more like debugging; not modern debugging but the old-fashioned kind, where you're pawing through core dumps and using binary search.

Here's what I think is going on

The device implanted below my right collarbone communicates wirelessly with the programmer. There are eight contacts one after another on the end of the electrodes skewering my Globus Pallidus Interna on each side, different parts of which are connected to different parts of the brain, controlling movement of my legs and feet and hands and other parts. Each electrode can emit pulses at a given frequency (up to 179 hz) for a periodic burst (I think, the "pulse width") and a given voltage or amperage. Pulses can be monopolar or bipolar (not sure, but I'm guessing either negative only or negative and positive). The bipolar signals have a narrower effect; the monopolar signals are stronger.
There's no exact map of which parts of the GPi connect to which function of the brain. I think the theory is that the pulses disrupt the beta-rhythms which synchronize the brain function. Anyway, the Boston Scientific device I have is new, giving the programmer lots of options not possible before.

My procedure

2018-02-07T08:40:00.001-08:00

"STEREOTACTIC IMAGE GUIDED IMPLANTATION OF BILATERAL GPi DEEP BRAIN STIMULATION ELECTRODES WITH VOLUMETRIC ANALYSIS USING MAZOR ROBOT AND IMPLANTATION OF RIGHT INFRACLAVICULAR PULSE GENERATOR"

was on all the forms I had to sign, describing what was done to me...

"Stereotactic image" (the 3D result of an MRI scan on Tuesday and a CT scan Friday morning)
"... guided" (the team looked at the 3D image to plan the route of the wires into my brain)
"Implantation" (skewering)
"of bilateral" (both sides)
"GPi" (globus pallidus interna -- the part of my brain they were targeting)
"deep brain" (well, it was pretty deep in there, and hard to get to)
"stimulation electrodes" (two thin cables with 8 strands each leading to 8 points within the GPi)
"with volumetric analysis" (checking the path so they don't skewer something important)
"using Mazor robot" (a device they attached to a 'platform' that was screwed into my skull first thing Friday morning after first making me a numb-skull, then drilling pilot holes and screwing in; the Mazor robot then guides the surgery exactly)
"and implantation" (this one goes in a little pocket they cut in my chest)
"of right" (only one)
"infraclavicular" (under my collar bone, the electrode leads pushed under scalp, neck skin, down chest)
"pulse generator" (the thing that will send pulses to the electrodes)

Right now my brain is recovering from the trauma of this disturbance... my Parkinson's symptoms are mainly worse than usual -- both hands shaking, having trouble standing, using a walker to scoot around the house, speech soft and distorted enough that Alexa doesn't understand me, mild headache, among other indignities.
Today I get to shower! Yay!

My appointment for having the whole thing turned on and to start tuning the pulses to my body is February 16th. Happy Birthday!

The process of fine tuning can take 6 months of appointments every two weeks. Each of 8 leads on each side, with their own pulse intensity, frequency and width ... I'm assuming there's some method to it, will let you know.

Going Bionic

2018-01-25T10:10:00.001-08:00

I hadn't talked about it publicly, but it's pretty obvious now if you see me in person: my hands (often) shake, my foot (often) drags. I have Parkinson's disease -- first diagnosed 12 years ago, its a slowly progressive condition. There are whole books about all the horrible potential symptoms, but I've been fortunate (among the millions with it) that my symptoms have been relatively mild. Medications, exercise and determination mainly controlled the worst.

But progressive diseases progress, and interfere with getting on with life.

I'm mainly risk-averse, but (after declining 5 years ago) I'm now scheduled next week for a surgical treatment called "Deep Brain Stimulation": like a pacemaker, but for the brain rather than the heart, and more of a pace-disruptor than -maker.

The procedure is described as "minimally invasive" and "reversible" and it's been done 100,000 times (300 by my surgeon). But still, it requires MRI and CAT scan to place wires to the exact spot without hitting good brain or blood vessels. (I got a Rift VR tour of some patient's anonymized brain as part of the explanation.)

Besides the wires in my brain ("Will I be able to listen to the radio without a radio?"), there will be wires under skin from scalp down to a not-so-tiny controller implanted -- wirelessly charged and programmed, battery rated to last 15 years. The "programming" usually takes months of adjustment.

I remember ~40 years ago admiring someone's programming skills, to the point I told people "he's so good I'd let him program my pacemaker". I'm not going to ask Boston Scientific to review their source code, but I do hope they aspire to better than "five nines".

Wish me luck...

1/31 added: for all the good wishes, expressed and felt: thank you, its meant a lot...

Mea Culpa

2017-10-22T18:13:00.003-07:00

I hate Twitter. It amplifies the emotional content of what should be a rational discussion.

I was dismayed to see a twitterstorm of complaints about the ODI report on PDF and data.

I'm especially embarrassed that I asserted (too convincingly) anything about stars. The 5-star evaluation of file formats was a nice idea, but shouldn't apply to formats at all.

Whether something is "open" depends on the tools you have, as I explore in Open Data and Documents, the scale doesn't match what people actually value.

I love Twitter. I think the twitter responses have been useful in at least reaching the user community. Thanks.

Join in on the discussion.

Open Data and Open Documents: Framework

2017-10-21T13:28:00.000-07:00

I moved this post to Google docs for comment and collaboration. But see email thread too.

IETF "Security Considerations" and PDF

2016-09-10T18:59:00.000-07:00

One of the things I've been doing lately is trying to dampen some of the misconceptions and misdirections concerning the Portable Document Format (PDF). I'm not sure why, except people seem to forget what was good about it and what wasn't (isn't).

Some background about PDF

Everyone has heard of PDF, but I'm not sure there's widespread understanding of its role and history. Wikipedia PDF isn't too bad; page independent data structures but based on Postscript, a way of getting licenses to embed fonts, first released in 1993. Originally a "distill" of a printed page, over the years, features were added: forms, 3D, compression, reflow, accessibility.

PDF is over 20 years old ... "as old as the Web"-- I first heard about it at GopherCon '93. Has it run its course, time for something new? But PDF supplies a unique solution for an application that spans the work between paper documents and electronic, and assurances of fidelity: if I send you a document, and you say you got it and read it (using a conforming reader) then I know what you saw.

MIME types

In email and web, file types are labeled by a two part label, like text/html, image/jpeg, application/pdf. This "Internet Media Type" is (supposed to be) used in content-type headers in email and web as a way for the sender to say how a receiver should interpret the content (except for "sniffing" but that's another blog post).

There's an official list of media types managed by IANA (in the news lately for other reasons, another blog post). IANA, Internet Assigned Numbers[sic] Auhority, is in charge of maintaining the
registries, as directed by the IETF.

IETF has a different decision-making process than other standards groups. However, as usual, the process involves creating and distributing a draft, and asking for comments. Comments need to be responded to, even if you don't make changes because of the comment. Different kinds of documents have different criteria for advancement, and it's sometimes hard to figure out what rules apply.

Getting draft-hardy-pdf-mime to RFC

I got into updating the registration of PDF a while back, while working on "PDF for RFCs", and, after consultation, took the path of revising the RFC which authorized the current registration, RFC 3778, in the form of a document that replaced 3778, including the registration template for application/pdf . That's the document I'm trying to get passed.

There were lots of comments during the review period, and I responded to most of them last week, in a single email.

Which process?

I won't go into the detailed rules, but the path we chose involved getting IESG approval for an Informational specification, one of the paths laid out in RFC 6838, which lays out the rules for the IANA media type registry.

But which rules apply? RFC 6838 Section 3.1 for types "registered by a recognized standards-related organization using the 'Specification Required' IANA registration policy [RFC5226]"? Or do we follow Section 6.1, "in cases where the original definition of the scheme is contained in an IESG-approved document, updates of the specification also requires IESG approval."?

And does the "DISCUSS" laid on the document's approval meet any of the criteria of the rules for a DISCUSS?

But I'd like to accommodate the common request that the document say more about security of PDF software. It's well-known that PDF has been a vector for several infamous exploits... why can't we say more?

"Security Considerations"

IETF has an unusual policy of requiring ALL documents (https://tools.ietf.org/html/rfc3552 Section 5) to consider security and document threats and possible mitigations. ISO has no such rule; security is considered the responsibility of the implementation. W3C nominally does through TAG review, I think, but WHATWG is more haphazard. The question remains: does a conforming implementation require that the implementation expose the user to security risks.

I'm sure we could say more. And if this were a new registration or the PDF spec itself I'd try. But application/pdf has been around over 20 years, the exploits and their prevention publicized.
But there is no single valid account of software vulnerabilities; the paper suggested (in a COMMENT, not a DISCUSS) isn't anything I could cite; I disagree with too many parts.

I’ll go back to the question of the purpose of “Security Considerations” in MIME registrations; for whom should it be written? For a novice, it is not enough. For an expert, you wind up enumerating the exploits that are understood and can be explained. The situation is fluid because the deployment of browser-based PDF interpreters is changing for desktop and mobile, and PDF is just another part of the web.

I agreed with the reasoning behind the requirement, that requiring everyone to write about Security might make them think a little more about security.

But I think there’s another view, that Security is a feature of the implementation. It’s the implementation’s job to mitigate vulnerabilities. So any security problems, blame the implementation, not the protocol. And the implementors need to worry about not writing buggy code, not just about security per se.

And there is no point of saying “write your implementations carefully”, because there are so many ways to write software badly. Talking about the obvious easy-to-describe exploits isn’t really useful, because we know how to avoid those.

Now perhaps this is just "don't set a bad precedent". So maybe the clue is to follow text/html, and suggest that "entire novels" have been written about PDF security, but not here in the Internet Media Type registration.

Birthday greetings, Packed committees, community, standards

2015-02-15T12:40:00.000-08:00

Today is my birthday. I woke to find many birthday greetings on Facebook, and more roll in throughout the day. It's hard to admit how pleasing it is, embarrassing. I haven't asked for birthday greetings and don't usually give them. Maybe I'll change my mind.

Perhaps I'm late to the party but I'm still trying to understand the 'why' of social networking -- why does Facebook encourage birthday greetings? What human pleasure does getting 11 "happy birthday" notes trigger?
But it fits into the need to have and build community, and the mechanism for community requires periodic acknowledgement. We engage in sharing our humanity (everyone has a birthday) by greeting. Hello, goodbye, I'm here, poke. But not too often, once a year is enough.
I wrote about standards and community yesterday on the IETF list, but people didn't get it.
Explaining that message and its relationship to birthday greetings is hard.
The topic of discussion was "Updating BCP 10 -- NomCom ELEGIBILITY".

IETF : group that does Internet Standards
BCP10: the unique process for how IETF recruits and picks leadership
NOMCOM: the "Nominating Committee" which picks leadership amongst volunteers
elegibility: qualifications for getting on the NOMCOM

I think BCP10 is a remarkable piece of social engineering, at the center of the question of governance of the Internet: how to make hard decisions, who has final authority, who gets to choose them, and how to choose the choosers. Most standards groups get their authority from governments and treaties or are consortia. But IETF developed a complex algorithm for trying to create structure without any other final authority to resolve disputes.

But it looks like this complex algorithm is buggy, and the IETF is trying to debug the process without being too open about the problem. The idea was to let people volunteer, and choose randomly among qualified volunteers. But what qualifications? There's been some concern about the latest round of nomcom volunteers, that's what started this thread.

During the long email thread on the topic, the discussion turned to the tradeoffs between attending a meeting in person vs. using new Internet tools for virtual meetings or more support for remote participation. Various people noted that the advantage of meeting in person is the ability to have conversations in the hallways outside the formal, minuted meetings.

I thought people were too focused on their personal preferences rather than the needs of the community. What are we trying to accomplish, and how do meetings help with that? How would we satisfy the requirements for effective work.

A few more bits: I mention some of the conflicts between IETF and other standards groups over URLs and JSON because W3C, WHATWG, ECMA are different tribes, different communities.

Creating effective standards is a community activity to avoid the Tragedy of the Commons that would result if individuals and organizations all went their own way. The common good is “the Internet works consistently for everyone” which needs to compete against “enough of the Internet works ok for my friends” where everyone has different friends.

For voluntary standards to happen, you need rough consensus — enough people agree to force the remainder to go along.

It’s a community activity, and for that to work there has to be a sense of community. And video links with remote participation aren’t enough to create a sense of community.

There are groups that purport to manage with minimal face-to-face meetings, but I think those are mainly narrow scope and a small number of relevant players, or an already established community, and they regularly rely heavily on 24/7 online chat, social media, open source tools, wikis which are requirements for full participation.

The “hallway conversations” are not a nice-to-have, they’re how the IETF preserves community with open participation.
One negative aspect of IETF “culture” (loosely, the way in which the IETF community interacts) is that it isn’t friendly or easy to match and negotiate with other SDOs, so we see the WHATWG / W3C / IETF unnecessary forking of URL / URI / IRI, encodings, MIME sniffing, and the RFC7159-JSON competing specs based at least partly on cultural misunderstandings.
The main thing nomcom needs to select for is technical leadership (the skill of getting people to follow) in service of the common good). And nomcom members should have enough experience to have witnessed successful leadership. One hopes there might be some chance of that just by attending 3 meetings, although the most effective leadership is often exercised in those private hallway conversations where compromises are made.

Ambiguity, Semantic web, speech acts, truth and beauty

2014-11-20T12:15:00.001-08:00

(I think this post is pretty academic for the web dev crowd, oh well)

When talking about URLs and URNs or semantic web or linked data, I keep on returning to a topic. Carl Hewitt gave me a paper about inconsistency which this post reacts to.

The traditional AI model of semantics and meaning don't work well for the web.

Maybe this is old-hat somewhere but if you know any writings on this topic, send me references.

In the traditional model (from Bobrow's essay in Representation and Understanding), the real world has objects and people and places and facts; there is a KRL Knowledge Representation Language in which statements about the world are written, using terms that refer to the objects in the real world. Experts use their expertise to write additional statements about the world, and an "Inference Engine" processes those statements together to derive new statements of facts.

This is like classic deduction "Socrates is a man, all men are mortal, thus Socrates is mortal" or arithmetic (37+53) by adding 7+3, write 0 carry 1 plus 3 plus 5 write 9, giving 90.

And to a first approximation, the semantic web was based on the idea of using URLs as the terms to refer to real world, and relationships, and RDF as an underlying KRL where statements consisted of triples.

Now we get to the great and horrible debate over "what is the range of the http function" which has so many untenable presumptions that it's almost impossible to discuss. That the question makes sense.

That you can talk about two resources being "the same". That URLs are 'unambiguous enough', and the only question is to deal with some niggly ambiguity problems, with a proposal for new HTTP result codes.

So does http://larry.masinter.net refer to me or my web page? To my web page now or for all history, to just the HTML of the home page or does it include the images loaded, or maybe the whole site?

"http://larry.masinter.net" "looks" "good".

So I keep on coming back to the fundamental assumption, the model for the model.

Coupled with my concern that we're struggling with identity (what is a customer, what is a visitor) in every field, and phishing and fraud on another front.

Another influence has been thinking about "speech acts". It's one thing to say "Socrates is a man" and completely different thing to say "Wow!". "Wow!" isn't an assertion (by itself), so what is it? It's a "speech act" and you distinguish between assertions and questions and speech acts.

A different model for models, with some different properties:

Every speech is a speech act.

There are no categories into assertion, question, speech act. Each message passed is just some message intending to cause a reaction, on receipt. And information theory applies: you can't supply more than the bits sent will carry. "http://larry.masinter.net" doesn't intrinsically carry any more than the entropy of the string can hold. You can't tell by any process whether it was intended to refer to me or to my web page.

Truth is too simple, make belief fundamental.

So in this model, individuals do not 'know' assertions, they only 'believe' to a degree. Some things are believed so strongly that they are treated as if they were known. Some things we don't believe at all. A speech act accomplishes its mission if the belief of the recipient changes in the way the sender wanted. Trust is a measure of influence: your speech acts that look like statements influence my beliefs about the world insofar as I trust you. The web page telling me my account balance influences my beliefs about how much I owe.

Changing the model helps think about security

Part of the problem with security and authorization is we don't have a good model for reasoning about it. Usually we divide the world into "Good guys" and "bad guys": Good guys make true statements ("this web page comes from bank trustme") while bad guys lie. (Let's block the bad guys.) By putting trust and ambiguity at the base of the model and not as an after-patch we have a much better way of describing what we're trying to accomplish.

Inference, induction, intuition are just different kinds of processing

In this model, you would like influence of belief to resemble logic in the cases where there is trust and those communicating have some agreement about what the terms used refer to. But inference is subject to its own flaws ("Which Socrates? What do you mean by mortal? Or 'all men'").

Every identifier is intrinsically ambiguous

Among all of the meanings the speaker might have meant, there is no inbound right way to disambiguate. Other context, out of band, might give the receiver of the message with a URL more information about what the sender might have meant. But part of the inference, part of the assessment of trust, would have to take into account belief about the sender's model as to what the sender might have meant. Precision of terms is not absolute.

URNs are not 'permanent' nor 'unambiguous', they're just terms with a registrar

I've written more on this which i'll expand elsewhere. But URNs aren't exempt from ambiguity, they're generally just URLs with different assigned organizations to disambiguate if called on.

Metadata, linked data, are speech acts too.

When you look in or around an object on the net, you can often find additional data, trying to tell you things about the object. This is the metadata. But it isn't "truth", metadata is also a communication act, just one where one of the terms used is the object.

There's more but I think I'll stop here. What do you think?

Living Standards: "Who Needs IANA?"

2014-09-14T14:04:00.001-07:00

I'm reading about two tussles, which seem completely disconnected, although they are about the same thing, and I'm puzzled why there isn't a connection.

This is about the IANA protocol parameter registries. Over in ianaplan@ietf.org people are worrying about preserving the IANA function and the relationship between IETF and IANA, because it is working well and shouldn't be disturbed (by misplaced US political maneuvering that the long-planned transition from NTIA is somehow giving something away by the administration.)

Meanwhile, over in www-international@w3.org, there's a discussion of the Encodings document, being copied from WHATWG's document of that name into W3C recommendation. See the thread (started by me), about the "false statement".

Living Standards don't need or want registries for most things the web use registries for now: Encodings, MIME types, URL schemes. A Living Standard has an exhaustive list, and if you want to add a new one or change one, you just change the standard. Who needs IANA with its fussy separate set of rules? Who needs any registry really?

So that's the contradiction: why doesn't the web need registries while other applications do? Or is IANAPLAN deluded?

The multipart/form-data mess

2014-09-09T14:47:00.002-07:00

OK, this is only a tiny mess, in comparison with the URL mess, and I have more hope for this one.

Way back when (1995), I spec'ed a way of doing "file upload" in RFC1867. I got into this because some Xerox printing product in the 90s wanted it, and enough other folks in the web community seemed to want it too. I was happy to find something that a Xerox product actually wanted from Xerox research.

It seemed natural, if you were sending files, to use MIME's methods for doing so, in the hopes that the design constraints were similar and that implementors would already be familiar with email MIME implementations. The original file upload spec was done in IETF because at the time, all of the web, including HTML, was being standardized in the IETF. RFC 1867 was "experimental," which in IETF used to be one way of floating a proposal for new stuff without having to declare it ready.

After some experimentation we wanted to move the spec toward standardization. Part of the process of making the proposal standard was to modularize the specification, so that it wasn't just about uploading files in web pages. Rather, all the stuff about extending forms and names of form fields and so forth went with HTML. And the container, the holder of "form data"-- independent of what kind of form you had or whether it had any files at all -- went into the definition of multipart/form-data (in RFC2388). Now, I don't know if it was "theoretical purity" or just some sense of building things that are general purpose to allow unintended mash-ups, but RFC2388 was pretty general, and HTML 3.2 and HTML 4.0 were being developed by people who were more interested in spec-ing a markup language than a form processing application, so there was a specification gap between RFC 2388 and HTML 4.0 about when and how and what browsers were supposed to do to process a form and produce multipart/form-data.

February of last year (2013) I got a request to find someone to update RFC 2388. After many months of trying to find another volunteer (most declined because of lack of time to deal with the politics) I went ahead and started work: update the spec, investigate what browsers did, make some known changes. See GitHub repo for multipart/form-data and the latest Internet Draft spec.

Now, I admit I got distracted trying to build a test framework for a "test the web forward" kind of automated test, and spent way too much time building what wound up to be a fairly arcane system. But I've updated the document, and recommended its "working group last call". The only problem is that I just made stuff up based on some unvalidated guesswork reported second hand ... there is no working group of people willing to do work. No browser implementor has reviewed the latest drafts that I can tell.

I'm not sure what it takes to actually get technical reviewers who will actually read the document and compare it to one or more implementations to justify the changes in the draft.

Go to it! Review the spec! Make concrete suggestions for change, comments or even better, send GitHub pull requests!

The URL mess

2014-09-07T19:54:00.000-07:00

(updated 9/8/14)

One of the main inventions of the Web was the URL. And I've gotten stuck trying to help fix up the standards so that they actually work.

The standards around URLs, though, have gotten themselves into an organizational political quandary to the point where it's like many other situations where a polarized power struggle keeps the right thing from happening.

Here's an update to an earlier description of the situation:

URLs were originally defined as ASCII only. Although it was quickly determined that it was desirable to allow non-ASCII characters, shoehorning utf-8 into ASCII-only systems was unacceptable; at the time, Unicode was not so widely deployed, and there were other issues. The tack was taken to leave "URI" alone and define a new protocol element, "IRI"; RFC 3987 published in 2005 (in sync with the RFC 3986 update to the URI definition). (This is a very compressed history of what really happened.)

The IRI-to-URI transformation specified in RFC 3987 had options; it wasn't a deterministic path. The URI-to-IRI transformation was also heuristic, since there was no guarantee that %xx-encoded bytes in the URI were actually meant to be %xx percent-hex-encoded bytes of a utf8 encoding of a Unicode string.

To address issues and to fix URL for HTML5, a new working group was established in IETF in 2009 (The IRI working group). Despite years of development, the group didn't get the attention of those active in WHATWG, W3C or Unicode consortium, and the IRI group was closed in 2014, with the consolation that the documents that were being developed in the IRI working group could be updated as individual submissions or within the "applications area" working group. In particular, one of the IRI working group items was to update the "scheme guidelines and registration process", which is currently under development in IETF's application area.

Independently, the HTML5 specs in WHATWG/W3C defined "Web Address", in an attempt to match what some of the browsers were doing. This definition (mainly a published parsing algorithm) was moved out into a separate WHATWG document called "URL".

The world has also moved on. ICANN has approved non-ascii top level domains, and IDN 2003 and 2008 didn't really address IRI Encoding. Unicode consortium is working on UTS #46.

The big issue is to make the IRI -to-URI transformation non-ambiguous and stable. But I don't know what to do about non-domain-name non-ascii 'authority' fields. There is some evidence that some processors are %xx-hex-encoding the UTF8 of domain names in some circumstances.

There are four umbrella organizations (IETF, W3C, WHATWG, Unicode consortium) and multiple documents, and it's unclear whether there's a trajectory to make them consistent:

IETF

Dave Thaler (mainly) has updated http://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg, which needs comunity review.

The IRI working group closed, but work can continue in the APPS area working group. Documents sitting needing update, abandoned now, are three drafts (iri-3987bis, iri-comparison, iri-bidi-guidelines) intended originally to obsolete RFC 3987.

Other work in IETF that is relevant but I'm not as familiar with is the IDN/IDNA work for internationalizing domain names, since the rules for canonicalization, equivalence, encoding, parsing, and displaying domain names needs to be compatible with the rules for doing those things to URLs that contain domain names.

In addition, there's quite a bit of activity around URNs and library identifiers in the URN working group, work that is ignored by other organizations.

W3C

The W3C has many existing recommendations which reference the IETF URI/IRI specs in various ways (for example, XML has its own restricted/expanded allowed syntax for URL-like-things). The HTML5 spec references something, the TAG seems to be involved, as well as the sysapps working group, I believe. I haven't tracked what's happened in the last few months.

WHATWG

The WHATWG spec is http://url.spec.whatwg.org/ (Anne, Leif). This fits in with the WHATWG principle of focusing on specifying what is important for browsers, so it leaves out many of the topics in the IETF specs. I don't think there is any reference to registration, and (when I checked last) had a fixed set of relative schemes: ftp, file, gopher (a mistake?), http, https, ws, wss, used IDNA 2003 not 2008, and was (perhaps, perhaps not) at odds with IETF specs.

Unicode consortium

Early versions of #46 and I think others recommends translating toAscii and back using punycode ? But it wasn't specific about which schemes.

Conclusion

From a user or developer point of view, it makes no sense for there to be a proliferation of definitions of URL, or a large variety URL syntax categories. Yes, currently there is a proliferation of slightly incompatible implementations. This shouldn't be a competitive feature. Yet the organizations involved have little incentive to incur the overhead of cooperation, especially since there is an ongoing power struggle for legitimacy and control. The same dynamic applies to the Encoding spec, and, to a lesser degree, handling of MIME types (sniffing) and multipart/form-data.