November 20, 2014

Ambiguity, Semantic web, speech acts, truth and beauty

(I think this post is pretty academic for the web dev crowd, oh well)

When talking about URLs and URNs or semantic web or linked data, I keep on returning to a topic. Carl Hewitt gave me a paper about inconsistency which this post reacts to.

The traditional AI model of semantics and meaning don't work well for the web. 
Maybe this is old-hat somewhere but if you know any writings on this topic, send me references.

In the traditional model (from Bobrow's essay in Representation and Understanding), the real world has objects and people and places and facts; there is a KRL Knowledge Representation Language in which statements about the world are written, using terms that refer to the objects in the real world. Experts use their expertise to write additional statements about the world, and an "Inference Engine" processes those statements together to derive new statements of facts.

This is like classic deduction "Socrates is a man, all men are mortal, thus Socrates is mortal" or arithmetic (37+53) by adding 7+3, write 0 carry 1 plus 3 plus 5 write 9, giving 90.

And to a first approximation, the semantic web was based on the idea of using URLs as the terms to refer to real world, and relationships, and RDF as an underlying KRL where statements consisted of triples.

Now we get to the great and horrible debate over "what is the range of the http function" which has so many untenable presumptions that it's almost impossible to discuss. That the question makes sense.
That you can talk about two resources being "the same". That URLs are 'unambiguous enough', and the only question is to deal with some niggly ambiguity problems, with a proposal for new HTTP result codes.

So does refer to me or my web page? To my web page now or for all history, to just the HTML of the home page or does it include the images loaded, or maybe the whole site?

"" "looks" "good".

So I keep on coming back to the fundamental assumption, the model for the model.

Coupled with my concern that we're struggling with identity (what is a customer, what is a visitor) in every field, and phishing and fraud on another front.

Another influence has been thinking about "speech acts". It's one thing to say "Socrates is a man" and completely different thing to say "Wow!". "Wow!" isn't an assertion (by itself), so what is it? It's a "speech act" and you distinguish between assertions and questions and speech acts.

A different model for models, with some different properties:

Every speech is a speech act.

      There are no categories into assertion, question, speech act. Each message passed is just some message intending to cause a reaction, on receipt. And information theory applies: you can't supply more than the bits sent will carry. "" doesn't intrinsically carry any more than the entropy of the string can hold. You can't tell by any process whether it was intended to refer to me or to my web page.

Truth is too simple, make belief fundamental. 

   So in this model, individuals do not 'know' assertions, they only 'believe'  to a degree. Some things are believed so strongly that they are treated as if they were known. Some things we don't believe at all. A speech act accomplishes its mission if the belief of the  recipient changes in the way the sender wanted.   Trust is a measure of influence: your speech acts that look like statements influence my beliefs about the world insofar as I trust you. The web page telling me my account balance influences my beliefs about how much I owe.

Changing the model helps think about security

Part of the problem with security and authorization is we don't have a good model for  reasoning about it. Usually we divide the world into "Good guys" and "bad guys": Good guys make true statements ("this web page comes from bank trustme")  while bad guys lie. (Let's block the bad guys.)   By putting trust and ambiguity at the base of the model and not as an after-patch we have a much better way of describing what we're trying to accomplish.

Inference, induction, intuition are just different kinds of processing

   In this model, you would like influence of belief to resemble logic in the cases where there is trust and those communicating have some agreement about what the terms used refer to. But inference is subject to its own flaws ("Which Socrates? What do you mean by mortal? Or 'all men'"). 

Every identifier is intrinsically ambiguous

Among all of the meanings the speaker might have meant, there is no inbound right way to disambiguate. Other context, out of band, might give the receiver of the message with a URL more information about what the sender might have meant. But part of the inference, part of the assessment of trust, would have to take into account belief about the sender's model as to what the sender might have meant. Precision of terms is not absolute.

URNs are not 'permanent' nor 'unambiguous', they're just terms with a registrar

I've written more on this which i'll expand elsewhere. But URNs aren't exempt from ambiguity, they're generally just URLs with different assigned organizations to disambiguate if called on.

Metadata, linked data, are speech acts too.

When you look in or around an object on the net, you can often  find additional data, trying to tell you things about the object. This is the metadata. But it isn't "truth", metadata is also a communication act, just one where one of the terms used is the object.

There's more but I think I'll stop here. What do you think?


  1. Trust is indeed a very interesting part of it. You said "Trust is a measure of influence: your speech acts that look like statements influence my beliefs about the world insofar as I trust you."

    I also consider Trust through its secondary effect as an accelerator. Basically, we trust because we need to be faster. It's a shortcut mechanism. We could imagine a (absurd) world where nothing is being trusted. And since your first morning you need to question everything, every single bits of your actions and your thoughts. Basically we would be put on stop. So we decide to trust because we know/understand that it will be faster, often these go through contracts (explicit or implicit). The person A tells me that if I do this I will get that, I decide that I can indeed rely on this information because many people told me that this person A is trusted by themselves (influence by others) or because I have a long record of history that some actions have always the same results (consistency).

    In our models, we have a tendency to map what we see as beneficial properties of the good (what you call truth), but very rarely what we don't like but are good for helping us to infere more things down the road. In Sciences, it would be, you accept the imperfect model of the world, because you know it will help you further down the road. It's not that you trust the model, it's just that you know it will help to set up a world **good enough** for making progress. The same way you will lie to someone to help this person to go further (death of someone, Santa Claus, etc. etc.). It's why ambiguity is so much more powerful than 0/1.

  2. Karl, I have another take on motivation ("why") but i don't think we need to ask "why do we trust".

    Trust isn't binary (or transitive). I'm using 'trust' in terms of a receiver of an assertion trusting the source, and want to use some other word for "belief in memory of previous belief".

    When you wake up in the morning, you today tend to believe things you believed before.

    "rely on this information" -- I want to recast this as "believe the assertion with tenacity".

    "can rely" becomes "if I act as if this assertion is true, that won't cause me too much trouble"

    I want to avoid judgments like "beneficial" and "good" along with motivation.

    Science is study using "scientific method" to try to separate evidence of correlation from evidence of causality. All models are imperfect. Science is prediction.

    History is a story we tell to help ourselves understand and predict. If there were something like "truth", it would take as much entropy to represent as the reality it is trying to describe.

    So any assertion, because of intrinsic ambiguity and limits on bandwidth, anything said is only partly true, partly false.

  3. I was waiting until I had written at least the second installment before engaging you on this subject, but you're forcing my hand... here's what I wrote:

    You may or may not recognize it as being about the same problems, as I'm being quite a bit more oblique than you. But I am struggling with challenges you pose.

    (this is Jonathan in case my name doesn't come up)