<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="exp" number="8771" docName="draft-mayrhofer-i-dunno-02" ipr="trust200902" obsoletes="" updates="" submissionType="independent" xml:lang="en" tocInclude="true" symRefs="true" sortRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 2.40.1 -->

  <front>
    <title abbrev="I-DUNNO">The Internationalized Deliberately Unreadable Network NOtation (I-DUNNO)</title>
    <seriesInfo name="RFC" value="8771"/>
    <author initials="A." surname="Mayrhofer" fullname="Alexander Mayrhofer">
      <organization>nic.at GmbH</organization>
      <address>
   <email>alexander.mayrhofer@nic.at</email>
        <uri>https://i-dunno.at/</uri>
      </address>
    </author>
    <author initials="J." surname="Hague" fullname="Jim Hague">
      <organization>Sinodun</organization>
      <address>
        <email>jim@sinodun.com</email>
        <uri>https://www.sinodun.com/</uri>
      </address>
    </author>
    <date day="1" month="April" year="2020"/>

    <abstract>
      <t>
        Domain Names were designed for humans, IP addresses were not. But more
        than 30 years after the introduction of the DNS, a minority of
        mankind persists in invading the realm of machine-to-machine
        communication by reading, writing, misspelling, memorizing, permuting,
        and confusing IP addresses.  This memo describes the Internationalized
        Deliberately Unreadable Network NOtation ("I-DUNNO"), a notation
        designed to replace current textual representations of IP addresses
        with something that is not only more concise but will also
        discourage this small, but obviously important, subset of 
        human activity.
      </t>
    </abstract>
  </front>
  <middle>
    <section numbered="true" toc="default">
      <name>Introduction</name>
      <t>
        In <xref target="RFC0791" format="default" section="2.3"
        sectionFormat="of"/>, the original designers of the Internet Protocol
        carefully defined names and addresses as separate quantities. While
        they did not explicitly reserve names for human consumption and
        addresses for machine use, they did consider the matter indirectly in
        their philosophical communal statement: "A name indicates what we
        seek."  This clearly indicates that names rather than addresses should
        be of concern to humans.
      </t>

      <t>
        The specification of domain names in <xref target="RFC1034"
        format="default"/>, and indeed the continuing enormous effort put into
        the Domain Name System, reinforces the view that humans should use
        names and leave worrying about addresses to the machines. RFC 1034
        mentions "users" several times, and even includes the word "humans",
        even though it is positioned slightly unfortunately, though perfectly
        understandably, in a context of "annoying" and "can wreak havoc" (see
        <xref target="RFC1034" section="5.2.3"
        sectionFormat="of"/>). Nevertheless, this is another clear indication
        that domain names are made for human use, while IP addresses are for
        machine use.
      </t>
      <t>
        Given this, and a long error-strewn history of human attempts to
        utilize addresses directly, it is obviously desirable that humans
        should not meddle with IP addresses. For that reason, it appears quite
        logical that a human-readable (textual) representation of IP addresses
        was just very vaguely specified in <xref target="RFC1123"
        format="default" section="2.1" sectionFormat="of"/>.  Subsequently, a
        directed effort to further discourage human use by making IP addresses
        more confusing was introduced in <xref target="RFC1883"
        format="default"/> (which was obsoleted by <xref target="RFC8200"/>), and additional options for human puzzlement were
        offered in <xref target="RFC4291" format="default" section="2.2"
        sectionFormat="of"/>.  These noble early attempts to hamper efforts by
        humans to read, understand, or even spell IP addressing schemes were
        unfortunately severely compromised in <xref target="RFC5952"
        format="default"/>.
      </t>

      <t>
        In order to prevent further damage from human meddling with IP
        addresses, there is a clear urgent need for an address notation that
        replaces these "Legacy Notations", and efficiently discourages humans
        from reading, modifying, or otherwise manipulating IP addresses.
        Research in this area long ago recognized the potential in
        ab^H^Hperusing the intricacies, inaccuracies, and chaotic disorder of
        what humans are pleased to call a "Cultural Technique" (also known as
        "Script"), and with a certain inexorable inevitability has focused of
        late on the admirable confusion (and thus discouragement) potential of
        <xref target="UNICODE" format="default"/> as an address notation.  In
        <xref target="confreq" format="default"/>, we introduce a framework of
        Confusion Levels as an aid to the evaluation of the effectiveness of
        any Unicode-based scheme in producing notation in a form designed to
        be resistant to ready comprehension or, heaven forfend, mutation of
        the address, and so effecting the desired confusion and
        discouragement.
      </t>
      <t>
        The authors welcome <xref target="RFC8369" format="default"/> as a
        major step in the right direction.  However, we have some reservations
        about the scheme proposed therein:
      </t>
      <ul spacing="normal">
        <li>Our analysis of the proposed scheme indicates that, while
        impressively concise, it fails to attain more than at best a
        Minimum Confusion Level in our classification.</li>
        <li>Humans, especially younger ones, are becoming skilled at handling
        emoji. Over time, this will negatively impact the discouragement
        factor.</li>
        <li>The proposed scheme is specific to IPv6; if a solution to this
        problem is to be in any way timely, it must, as a matter of the
        highest priority, address IPv4. After all, even taking the regrettable
        effects of RFC 5952 into account, IPv6 does at least remain inherently
        significantly more confusing and discouraging than IPv4.</li>
      </ul>
      <t>
        This document therefore specifies an alternative Unicode-based
        notation, the Internationalized Deliberately Unreadable Network
        NOtation (I-DUNNO).  This notation addresses each of the concerns
        outlined above:
      </t>
      <ul spacing="normal">
        <li>I-DUNNO can generate Minimum, Satisfactory, or Delightful levels of confusion.</li>
        <li>As well as emoji, it takes advantage of other areas of Unicode confusion.</li>
        <li>It can be used with IPv4 and IPv6 addresses.</li>
      </ul>
      <t>
        We concede that I-DUNNO notation is markedly less concise than that of
        RFC 8369.  However, by permitting multiple code points in the
        representation of a single address, I-DUNNO opens up the full spectrum
        of Unicode-adjacent code point interaction. This is a significant
        factor in allowing I-DUNNO to achieve higher levels of
        confusion. I-DUNNO also requires no change to the current size of
        Unicode code points, and so its chances of adoption and implementation
        are (slightly) higher.
      </t>
      <t>Note that the use of I-DUNNO in the reverse DNS system is currently
      out of scope. The occasional human-induced absence of the magical
      one-character sequence <u format="num">.</u> is believed to cause sufficient disorder
      there.</t>
      <t>Media Access Control (MAC) addresses are totally out of the question.</t>
    </section>
    <section anchor="terminology" numbered="true" toc="default">
      <name>Terminology</name>

        <t>
    The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
    "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>",
    "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
    "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
    "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be
    interpreted as described in BCP&nbsp;14 <xref target="RFC2119"/> <xref
    target="RFC8174"/> when, and only when, they appear in all capitals, as
    shown here.
        </t>

      <t>Additional terminology from <xref target="RFC6919" format="default"/> MIGHT apply.</t>
    </section>
    <section numbered="true" toc="default">
      <name>The Notation</name>
      <t>
             I-DUNNO leverages <xref target="RFC3629"
             format="default">UTF-8</xref> to obfuscate IP addresses for
             humans. UTF-8 uses sequences between 1 and 4 octets to represent
             code points as follows:
      </t>

<table anchor="UTF8"> 
  <thead>
    <tr>
      <th>Char. number range</th>
      <th>UTF-8 octet sequence</th>
    </tr>
    <tr>
      <th>(hexadecimal)</th> 
      <th>(binary)</th>
    </tr>
  </thead>
  <tbody>        
    <tr>
      <td>0000 0000 - 0000 007F</td>
      <td>0xxxxxxx</td>
    </tr>
    <tr>
      <td>0000 0080 - 0000 07FF</td>
      <td>110xxxxx 10xxxxxx</td>
    </tr>
    <tr>
      <td>0000 0800 - 0000 FFFF</td>
      <td>1110xxxx 10xxxxxx 10xxxxxx</td>
    </tr>
    <tr>
      <td>0001 0000 - 0010 FFFF</td>
      <td>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</td>
    </tr>
  </tbody>
</table>

      <t>I-DUNNO uses that structure to convey addressing information as follows:</t>
      <section numbered="true" toc="default">
        <name>Forming I-DUNNO</name>
        <t>
               In order to form an I-DUNNO based on the Legacy Notation of an
               IP address, the following steps are performed:
        </t>
        <ol spacing="normal" type="1">
          <li>
                   The octets of the IP address are written as a bitstring in network byte order.
                 </li>
          <li>
            <t>
                   Working from left to right, the bitstring (32 bits for
                   IPv4; 128 bits for IPv6) is used to generate a list of
                   valid UTF-8 octet sequences.  To allocate a single UTF-8
                   sequence:
            </t>
            <ol spacing="normal" type="a">
              <li>
                       Choose whether to generate a UTF-8
                       sequence of 1, 2, 3, or 4 octets. 

		       The choice OUGHT TO be guided by the
                       requirement to generate a satisfactory <xref
                       target="minconf" format="default">Minimum Confusion
                       Level</xref> (not to be confused with the minimum <xref
                       target="satconf" format="default">Satisfactory
                       Confusion Level</xref>). Refer to the character number
                       range in <xref target="UTF8" format="default"/> in
                       order to identify which octet sequence lengths are
                       valid for a given bitstring. For example, a 2-octet
                       UTF-8 sequence requires the next 11 bits to have a
                       value in the range 0080-07ff.
                     </li>
              <li>
                       Allocate bits from the bitstring to fill the vacant
                       positions 'x' in the UTF-8 sequence (see <xref
                       target="UTF8" format="default"/>) from left to right.
                     </li>
              <li>
                       UTF-8 sequences of 1, 2, 3, and 4 octets require 7, 11,
                       16, and 21 bits, respectively, from the
                       bitstring. Since the number of combinations of UTF-8
                       sequences accommodating exactly 32 or 128 bits is
                       limited, in sequences where the number of bits required
                       does not exactly match the number of available bits,
                       the final UTF-8 sequence <bcp14>MUST</bcp14> be padded with additional
                       bits once the available address bits are exhausted. The
                       sequence may therefore require up to 20 bits of
                       padding. The content of the padding <bcp14>SHOULD</bcp14> be chosen to
                       maximize the resulting Confusion Level.
                     </li>
            </ol>
          </li>
          <li>
                   Once the bits in the bitstring are exhausted, the
                   conversion is complete. The I-DUNNO representation of the
                   address consists of the Unicode code points described by
                   the list of generated UTF-8 sequences, and it <bcp14>MAY</bcp14> now be
                   presented to unsuspecting humans.
                 </li>
        </ol>
      </section>
      <section anchor="deform" numbered="true" toc="default">
        <name>Deforming I-DUNNO</name>
        <t>
               This section is intentionally omitted. The machines will know
               how to do it, and by definition humans <bcp14>SHOULD NOT</bcp14> attempt the
               process.
        </t>
      </section>
    </section>
    <section anchor="confreq" numbered="true" toc="default">
      <name>I-DUNNO Confusion Level Requirements</name>

      <t>A sequence of characters is considered I-DUNNO only when there's
      enough potential to confuse humans.</t>

      <t>Unallocated code points <bcp14>MUST</bcp14> be avoided. While they might appear to
      have great confusion power at the moment, there's a minor chance that a
      future allocation to a useful, legible character will reduce this
      capacity significantly. Worse, in the (unlikely, but not impossible --
      see <xref target="RFC5894" sectionFormat="of" section="3.1.3"/>) event
      of a code point losing its DISALLOWED property per IDNA2008
      <xref target="RFC5894" format="default"/>, existing
      I-DUNNOs could be rendered less than minimally confusing, with
      disastrous consequences.</t>
      <t>The following Confusion Levels are defined:</t>
      <section anchor="minconf" numbered="true" toc="default">
        <name>Minimum Confusion Level</name>
        <t>As a minimum, a valid I-DUNNO <bcp14>MUST</bcp14>:
        </t>
        <ul spacing="normal">
          <li>Contain at least one UTF-8 octet sequence with a length greater
          than one octet.</li>
          <li>Contain at least one character that is DISALLOWED in
          IDNA2008. No code point left behind! Note that this allows machines
          to distinguish I-DUNNO from Internationalized Domain Name
          labels.</li>
        </ul>
        <t>
             I-DUNNOs on this level will at least puzzle most human users with
             knowledge of the Legacy Notation.
        </t>
      </section>
      <section anchor="satconf" numbered="true" toc="default">
        <name>Satisfactory Confusion Level</name>
        <t>An I-DUNNO with Satisfactory Confusion Level <bcp14>MUST</bcp14> adhere to the
        Minimum Confusion Level, and additionally contain two of the
        following:
        </t>
        <ul spacing="normal">
          <li>At least one non-printable character.</li>
          <li>Characters from at least two different Scripts.</li>
          <li>A character from the "Symbol" category.</li>
        </ul>
        <t>
             The Satisfactory Confusion Level will make many human-machine
             interfaces beep, blink, silently fail, or any combination
             thereof. This is considered sufficient to discourage most humans
             from deforming I-DUNNO.</t>
      </section>
      <section anchor="delconf" numbered="true" toc="default">
        <name>Delightful Confusion Level</name>
        <t>An I-DUNNO with Delightful Confusion Level <bcp14>MUST</bcp14> adhere to the
        Satisfactory Confusion Level, and additionally contain at least two of
        the following:
        </t>
        <ul spacing="normal">
          <li>Characters from scripts with different directionalities.</li>
          <li>Character classified as "Confusables".</li>
          <li>One or more emoji.</li>
        </ul>
        <t>
    An I-DUNNO conforming to this level will cause almost all humans to
    U+1F926, with the exception of those subscribed to the idna-update mailing
    list.</t>
        <t>(We have also considered a further, higher Confusion Level, tentatively entitled "BReak EXaminatIon or Twiddling" or "BREXIT" Level Confusion,
        but currently we have no idea how to go about actually implementing it.)</t>
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>Example</name>
      <t>An I-DUNNO based on the Legacy Notation IPv4 address "198.51.100.164"
      is formed and validated as follows: First, the Legacy Notation is
      written as a string of 32 bits in network byte order:</t>
      <artwork align="center" name="" type="" alt=""><![CDATA[
11000110001100110110010010100100
  ]]></artwork>
      <t>Since I-DUNNO requires at least one UTF-8 octet sequence with a length greater than one octet, we allocate bits in the following form:</t>
      <artwork align="center" name="" type="" alt=""><![CDATA[
  seq1  |   seq2  |   seq3  |   seq4 
--------+---------+---------+------------
1100011 | 0001100 | 1101100 | 10010100100
  ]]></artwork>
      <t>This translates into the following code points:</t>

<table anchor="example">  

  <thead>
    <tr>
      <th>Bit Seq.</th>   
      <th>Character Number (Character Name)</th>
    </tr>
  </thead>
  <tbody>          
    <tr>
      <td>1100011</td>
      <td><u format="num-name">c</u></td>
    </tr>
    <tr>
      <td>0001100</td>
      <td>U+000C (FORM FEED (FF))</td>
    </tr>
    <tr>
      <td>1101100</td>
      <td><u format="num-name">l</u></td>
    </tr>
    <tr>
      <td>10010100100</td>
      <td><u format="num-name">Ҥ</u></td>
    </tr>
  </tbody>
</table>

<!--
      <artwork align="center" name="" type="" alt=""><![CDATA[
Bit seq.    | Char. number | Character Name
____________+______________+______________________
1100011     | U+0063       | LATIN SMALL LETTER C
0001100     | U+000C       | FORM FEED (FF)
1101100     | U+006C       | LATIN SMALL LETTER L
10010100100 | U+04A4       | CYRILLIC CAPITAL LIGATURE EN GHE
  ]]></artwork>
-->
      <t>
  The resulting string <bcp14>MUST</bcp14> be evaluated against the Confusion Level
  Requirements before I-DUNNO can be declared. Given the example above:
      </t>
      <ul spacing="normal">
        <li>There is at least one UTF-8 octet sequence with a length greater
	than 1 (<u format="num">Ҥ</u>) <!--U+04A4-->.</li>
        <li>There are two IDNA2008 DISALLOWED characters: <u
	format="num-name"></u> U+000C (for good reason!) and <u format="num">Ҥ</u>.</li>
        <li>There is one non-printable character (U+000C).</li>
        <li>There are characters from two different Scripts (Latin and Cyrillic).</li>
      </ul>
      <t>
  Therefore, the example above constitutes valid I-DUNNO with a Satisfactory
  Confusion Level. U+000C in particular has great potential in environments
  where I-DUNNOs would be sent to printers.
</t>
    </section>
    <section numbered="true" toc="default">
      <name>IANA Considerations</name>

      <t>If this work is standardized, IANA is kindly requested to revoke all
      IPv4 and IPv6 address range allocations that do not allow for at least
      one I-DUNNO of Delightful Confusion Level. IPv4 prefixes are more likely
      to be affected, hence this can easily be marketed as an effort to foster
      IPv6 deployment.</t>

      <t>Furthermore, IANA is urged to expand the <xref target="RFC5513"
      format="default">Internet TLA Registry</xref> to accommodate
      Seven-Letter Acronyms (SLA) for obvious reasons, and register
      'I-DUNNO'. For that purpose, <u format="num-lit-name">-</u>
      <bcp14>SHALL</bcp14> be declared a Letter.</t>

    </section>
    <section numbered="true" toc="default">
      <name>Security Considerations</name>

      <t>I-DUNNO is not a security algorithm. Quite the contrary -- many humans
      are known to develop a strong feeling of insecurity when confronted with
      I-DUNNO.</t>
      <t>In the tradition of many other RFCs, the evaluation of other security
      aspects of I-DUNNO is left as an exercise for the reader.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3629.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5894.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6919.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
      </references>
      <references>
        <name>Informative References</name>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.0791.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.1034.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.1123.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.1883.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4291.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5513.xml"/>
        <xi:include
	    href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5952.xml"/>
        <xi:include
	    href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8200.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8369.xml"/>


        <reference anchor="UNICODE" target="http://www.unicode.org/versions/latest/">
          <front>
            <title>The Unicode Standard (Current Version)</title>
            <author>
              <organization>The Unicode Consortium</organization>
            </author>
            <date year="2019"/>
          </front>
	</reference>
      </references>
    </references>
  </back>
</rfc>
