
From ned+ima@mrochek.com  Wed Feb  5 09:48:13 2014
Return-Path: <ned+ima@mrochek.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7086C1A0211 for <ima@ietfa.amsl.com>; Wed,  5 Feb 2014 09:48:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.437
X-Spam-Level: 
X-Spam-Status: No, score=-2.437 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.535, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id peP23rADGl39 for <ima@ietfa.amsl.com>; Wed,  5 Feb 2014 09:48:10 -0800 (PST)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.59.230.40]) by ietfa.amsl.com (Postfix) with ESMTP id 8FF991A020F for <ima@ietf.org>; Wed,  5 Feb 2014 09:48:10 -0800 (PST)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01P3ZRRD1WSG002S9R@mauve.mrochek.com> for ima@ietf.org; Wed, 5 Feb 2014 09:43:07 -0800 (PST)
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=iso-8859-1
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01P3ZOHK5C340000CD@mauve.mrochek.com> (original mail from NED@mauve.mrochek.com) for ima@ietf.org; Wed, 5 Feb 2014 09:43:05 -0800 (PST)
From: ned+ima@mrochek.com
Message-id: <01P3ZRRC0OJA0000CD@mauve.mrochek.com>
Date: Wed, 05 Feb 2014 09:39:07 -0800 (PST)
To: ima <ima@ietf.org>
Subject: [EAI] Correct LDAP matching rule for EAI addresses?
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Feb 2014 17:48:13 -0000

So what's the matching rule/syntax to use for EAI addresses stored in LDAP?
AFAICT there isn't an appropriate one: At a minimum you want something that
does ASCII case mapping while allowing the full UTF-8 repertoire. (Better still
would be one that does the right sorts of normalization, but that's probably
too much to ask.)

I note in passing that Sieve's i;ascii-casemap comparator does more or less
the right thing. The thing that's missing in Sieve is a way to decode
encoded domains as part of the address test.

				Ned

From klensin@jck.com  Wed Feb  5 10:21:49 2014
Return-Path: <klensin@jck.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46F331A001A for <ima@ietfa.amsl.com>; Wed,  5 Feb 2014 10:21:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.135
X-Spam-Level: 
X-Spam-Status: No, score=-3.135 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.535] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sUUtkhRjIBo1 for <ima@ietfa.amsl.com>; Wed,  5 Feb 2014 10:21:45 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) by ietfa.amsl.com (Postfix) with ESMTP id 612F61A00EB for <ima@ietf.org>; Wed,  5 Feb 2014 10:21:41 -0800 (PST)
Received: from [198.252.137.115] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.71 (FreeBSD)) (envelope-from <klensin@jck.com>) id 1WB76C-000AHy-U8; Wed, 05 Feb 2014 13:21:24 -0500
Date: Wed, 05 Feb 2014 13:21:19 -0500
From: John C Klensin <klensin@jck.com>
To: ned+ima@mrochek.com, ima <ima@ietf.org>
Message-ID: <0243CD0BDA9444A48730C80D@JcK-HP8200.jck.com>
In-Reply-To: <01P3ZRRC0OJA0000CD@mauve.mrochek.com>
References: <01P3ZRRC0OJA0000CD@mauve.mrochek.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: Re: [EAI] Correct LDAP matching rule for EAI addresses?
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Feb 2014 18:21:50 -0000

--On Wednesday, February 05, 2014 09:39 -0800
ned+ima@mrochek.com wrote:

> So what's the matching rule/syntax to use for EAI addresses
> stored in LDAP? AFAICT there isn't an appropriate one: At a
> minimum you want something that does ASCII case mapping while
> allowing the full UTF-8 repertoire. (Better still would be one
> that does the right sorts of normalization, but that's probably
> too much to ask.)

Ned, the problem is that the EAI WG, after considering a number
of alternatives and the forest of rat holes to which they led,
decided on a very strict reading of the "no one but (or before)
the delivery server can determine whether two addresses match"
rule that is implicit in RFC 5321 and its predecessors.  The net
result is that, for local parts, even
   NFC(local-part-in-address) ? NFC(local-part-in-database)
cannot be guaranteed to represent the same destination address
and things are sort of downhill from there.  Of course, if one
were strictly following 5321, then 
   toLowerCase(local-part-in-address) ?
toLowerCase(local-part-in-database)
can't be guaranteed to represent the same address either.

The domain part is easy by comparison: each non-ASCII label
should be converted to A-label form (and rejected as nonsense if
the conversion fails), lower-cased if needed, and then compared
on a bitstring basis.

We probably agree on what ought to happen, but, if we want
consistency, a number of statements to the effect that "if you
want this to work in LDAP or Sieve, then the delivery server
better be configured to treat these as equivalent" probably need
to be scattered in various places.

> I note in passing that Sieve's i;ascii-casemap comparator does
> more or less the right thing. The thing that's missing in
> Sieve is a way to decode encoded domains as part of the
> address test.

Ack.

best,
   john






From ned+ima@mrochek.com  Wed Feb  5 12:11:09 2014
Return-Path: <ned+ima@mrochek.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5EEFB1A01C0 for <ima@ietfa.amsl.com>; Wed,  5 Feb 2014 12:11:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.437
X-Spam-Level: 
X-Spam-Status: No, score=-2.437 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.535, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Eb0GdkjL9dBX for <ima@ietfa.amsl.com>; Wed,  5 Feb 2014 12:11:01 -0800 (PST)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.59.230.40]) by ietfa.amsl.com (Postfix) with ESMTP id 13CA21A0192 for <ima@ietf.org>; Wed,  5 Feb 2014 12:11:01 -0800 (PST)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01P3ZWRGCBVK0044GW@mauve.mrochek.com> for ima@ietf.org; Wed, 5 Feb 2014 12:05:58 -0800 (PST)
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=iso-8859-1
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01P3ZOHK5C340000CD@mauve.mrochek.com> (original mail from NED@mauve.mrochek.com) for ima@ietf.org; Wed, 5 Feb 2014 12:05:55 -0800 (PST)
From: ned+ima@mrochek.com
Message-id: <01P3ZWREEHUK0000CD@mauve.mrochek.com>
Date: Wed, 05 Feb 2014 11:41:28 -0800 (PST)
In-reply-to: "Your message dated Wed, 05 Feb 2014 13:21:19 -0500" <0243CD0BDA9444A48730C80D@JcK-HP8200.jck.com>
References: <01P3ZRRC0OJA0000CD@mauve.mrochek.com> <0243CD0BDA9444A48730C80D@JcK-HP8200.jck.com>
To: ima <ima@ietf.org>
Subject: Re: [EAI] Correct LDAP matching rule for EAI addresses?
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Feb 2014 20:11:09 -0000

> --On Wednesday, February 05, 2014 09:39 -0800
> ned+ima@mrochek.com wrote:

> > So what's the matching rule/syntax to use for EAI addresses
> > stored in LDAP? AFAICT there isn't an appropriate one: At a
> > minimum you want something that does ASCII case mapping while
> > allowing the full UTF-8 repertoire. (Better still would be one
> > that does the right sorts of normalization, but that's probably
> > too much to ask.)

The good news is that caseIgnoreMatch appears to be sufficient for
this purpose. It doesn't handle spaces correctly, but anyone that puts
sequences of spaces into an address and expects it to work is losing anyway.

> Ned, the problem is that the EAI WG, after considering a number
> of alternatives and the forest of rat holes to which they led,
> decided on a very strict reading of the "no one but (or before)
> the delivery server can determine whether two addresses match"
> rule that is implicit in RFC 5321 and its predecessors. 

Well, that's actually total nonsese, as use-cases abound where addresses have
to be compared during message transport. Various antispam measures are only the
most obvious example, there are many, many others.

This may be another one of those head-in-the-purity-sand situations the IETF
sometimes gets itself into, like the notion that messages never undergo
transformations in transit (despite the fact that we have standards that
specifically sanction such activity).

But even if this was a reasonable rule, it's irrelevant, because it is
*precisely* the delivery server that's mostly commonly looking up addresses in
LDAP. So the rule, even if valid, in no way lets us skirt the problem.

> The net result is that, for local parts, even
>    NFC(local-part-in-address) ? NFC(local-part-in-database)
> cannot be guaranteed to represent the same destination address
> and things are sort of downhill from there.

I don't actually care about that. My view of the local-part-normalization punt
EAI ended up with is that it forces clients and provisioning systems to deal
with the local part normalization issue.

My issue is that for better or worse the expectation is that the ASCII subset
of characters in addresses needs to be compared case-insensitively. So you have
to use a matching rule that handles that while not restricting the syntax to
IA5.

A normalizing matching rule would be better, but getting such a thing designed
and implemented is beyond my purview.

> Of course, if one
> were strictly following 5321, then
>    toLowerCase(local-part-in-address) ?
> toLowerCase(local-part-in-database)
> can't be guaranteed to represent the same address either.

Yep. But the reality on the ground is that nobody wants that. We actually have
support for that in our product, but to the best of my knowledge we've only
ever had one customer who ever used it.

> The domain part is easy by comparison: each non-ASCII label
> should be converted to A-label form (and rejected as nonsense if
> the conversion fails), lower-cased if needed, and then compared
> on a bitstring basis.

Sorry, that's not going work. People are going to want to enter pure utf-8
addresses into the directory. Asking provivisioning systes to enter addresses
with the local part in utf-8 and the domain part in A-label is a pretty
clearly a losing proposition.

The only approach that makes sense is properly normalized utf-8 addresses in
the directory. So you actually want the opposite: Convert all A-labels to utf-8
prior to lookup, and normalize any utf-8 already present in the domain part,
then search. Coding this is tedious but straightforward.

Domains stored in the directory are another matter. An argument can be for
using A-levels there. But since there's no encoding defined for local parts,
the simplest thing is to just use normalized utf-8 again.

				Ned
