[discuss] IDNA and Arabic (and other RTL scripts) (was: IANA missing information)

Andrew Sullivan ajs at anvilwalrusden.com
Tue Mar 18 13:53:19 UTC 2014


On Mon, Mar 17, 2014 at 08:28:48PM +0100, Dominique Lacroix wrote:

> And in Arabic writing, the dot must be on the right side, not on the
> left one.

In short, exactly how to display whole domain names using Arabic
script is surprisingly fraught.  There appears to be more than one
implementation, and there is no well-described standard for this,
unfortunately.

The longer version:

IDNA is defined for labels, not domain names.  This means that there's
no obvious "correct side" for the dot.  I don't have an Arabic-script
input method well-configured on this computer, but my point is that
there's more than view about how to do this.  Suppose we have
Arabic-script labels.  For most users, I think the most natural way to
write a name in analogy with the current use is something like this:

    [TLD].[2LD].[3LD].[4LD]

That's also how most implementations do it.  However, you could just
as easily make the argument that this is the right approach (and I've
seen this once):

    [4LD].[3LD].[2LD].[TLD]

The reason you could do that is that it better represents the
resolution path: from root on down.  (In left-to-right writing, we
might well have instead written names com.example.www.)  An
alternative explanation is that domain names are themselves
left-to-right even though the labels themselves might be right-to-left
    
Moreover, there's actually a dot on _both_ sides, believe it or not.
That is, the real way you write an actually fully-qualified domain
name is like this:

    www.example.com.

Note the final dot.  There are technical reasons for this that are
probably not relevant here, but the important point is that everyone
always leaves off the final dot "to save typing" (as the RFC says).  

So, really, "where the dot goes" is the least of these problems for
Arabic-script names.  And we haven't even touched on what to do when
you have the combination of a RTL label and an LTR label together in a
domain name, except for the bidi algorithm.

IDNs are relatively straightforward in the abstract.  They're actually
really quite hard in the concrete, as the ICANN Variant Issues Project
I believe indicated fairly clearly.

Best regards,

A

-- 
Andrew Sullivan
ajs at anvilwalrusden.com



More information about the discuss mailing list