August 15, 2011

Expert System Scalability and the Semantic Web

In the late 80s, we saw the fall of AI and Expert Systems as a "hot" technology -- the "AI winter".  The methodology, in brief: build a representation system (a way of talking about facts about the world) and an inference engine (a way of making logical inferences bet of a set of facts).  Get experts to tell you facts about the world. Grind the inference engine, and get new facts. Voila!

I always felt that the problem with the methodology was the failure of model theory to scale: the more people and time involved in developing the "facts" about the world, the more likely it is that the terminology in the representation system would fuzz -- that different people involved in entering and maintaining the "knowledge base" would disagree about what the terms in the representation system stood for.

The "semantic web" chose to use URIs as the terminology for grounding abstract assertions and creating a model where those assertions were presumed to be about the real world.

This exacerbates the scalability problem. URIs are intrinsically ambiguous and were not designed to be precise denotation terms. The semantic web terminology of "definition" and "assignment" of URIs reflects a point of view I fundamentally disagree with.  URIs don't "denote". People may use them to denote, but it is a communication act; the fact that I say by "http://larry.masinter.net" I mean *me* does not imbue that URI with any intrinsic semantics.

I've been trying to get at these issues around ambiguity with the "duri" and "tdb" URI schemes, for example, but I think the fundamental perspective still simmers.

3 comments:

  1. The claim "URIs are intrinsically ambiguous and were not designed to be precise denotation terms" is vacuous because this can be said of any string or utterance. Ambiguity and precision are a matter of degree, and engineering approaches exist (such as logic, formal curation protocols, etc) to decrease/increase them. These approaches are not sensitive to the syntax of the string (URI vs. non-URI). You're trying to talk about problems of method and scaling, then you attempt to implicate URIs, without giving any evidence or explaining what is special about URIs that would make them different from any other syntax.

    I think you have a particular opinion about appropriate use of URIs that isn't captured in RFC 3986. It would be interesting to hear you articulate your view on URIs - what led you to say the things you did in this post, which otherwise is sort of uninformative.

    By the way the term "definition" is not in general use in semantic web contexts; it's one I introduced into a TAG document, with a particular meaning, for want of a better term. The "semantic web" cannot be held responsible for it. Also I recommend you review what "model theory" is; I don't think it's what you think it is.

    ReplyDelete
  2. A formal knowledge representation system shouldn't have terms as ambiguous as "strings" (without a clearer namespace) and "utterances". Some might have hoped that using URIs would give more precision, and my point is that it doesn't.

    "failure of model theory to scale" wasn't what I meant (a thinko if not a typo). "Failure of the resulting systems system to scale." would be closer to what I meant.

    Chemical control plant expert system:
    "If the vat is over 300 degrees Cecilius, it will likely explode":
    Not too much ambiguity about "over 300 degrees" means, although the sensor might fail, give wrong values, etc.

    "If there are more people than chairs in the room, then someone can't sit down":
    What is a chair? Is a stool a chair? Is a 3-legged stool with one broken leg a chair? If the tree falls in the forest, and the stump looks like a chair, but nobody sits in it, is it a chair?

    I'm confused about your citation of RFC 3986, since I wrote some of the text, and chaired the URI committee. It wasn't in the charter of the group to define something that could be used in knowledge representation systems as terms. It wasn't a requirement.

    ReplyDelete
  3. ARGH... entered a reply, which then got deleted when I had to log in... so this will be briefer than the first time I wrote it.

    Of course use of URIs does not, in itself, sprinkle precision dust on any endeavor. That's lunacy - nobody thinks that. People like URIs not for that reason but because they provide for collision avoidance and for a follow-your-nose method (the spec tree).

    Your two ambiguity examples are the same as far as I can tell; what is an explosion? what's a vat? how likely is 'likely'? Ambiguity is the reciprocal of information. If something is ambiguous it means you don't have information that you need. In that case you ask for more information. This is not magic.

    3986 seems to promote the URI namespace as a general purpose federated namespace, into which you can put anything. The whole point was extensibility, right? If the charter had been limited to a finite set of uses, there would have been no need for a URI scheme registry or review process. So what in 3986 says that the URI namespace *should not* be used for KR systems? Maybe the "identification" paragraph?

    There's plenty wrong with the semantic web and its use of URIs, and risks ahead for those who have bought into the architecture (even for just the linked data part - which ditches the whole KR business). So I think you're right to be critical. Maybe scaling is the issue, but I'm not sure. I have other suspects in mind which I'll write about one of these days.

    ReplyDelete