<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>

<rfc number="9285" docName="draft-faltstrom-base45-12" xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" submissionType="IETF" category="info" obsoletes="" updates="" consensus="true" xml:lang="en" tocInclude="true" symRefs="true" sortRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.12.10 -->
  <front>
    <title abbrev="Base45">
      The Base45 Data Encoding
    </title>
    <seriesInfo name="RFC" value="9285" /> 
    <author fullname="Patrik Fältström" initials="P." surname="Fältström">
      <organization>Netnod</organization>
      <address>
        <email>paf@netnod.se</email>
      </address>
    </author>
    <author fullname="Fredrik Ljunggren" initials="F." surname="Ljunggren">
      <organization>Kirei</organization>
      <address>
        <email>fredrik@kirei.se</email>
      </address>
    </author>
    <author fullname="Dirk-Willem van Gulik" initials="D.W." surname="van Gulik">
      <organization>Webweaving</organization>
      <address>
        <email>dirkx@webweaving.org</email>
      </address>
    </author>
    <date month="August" year="2022"/>
    <keyword>BASE45</keyword>
    <abstract>
      <t>
	This document describes the Base45 encoding scheme, which is
	built upon the Base64, Base32, and Base16 encoding schemes.
      </t>
    </abstract>
  </front>
  <middle>
    <section anchor="intro" numbered="true" toc="default">
      <name>Introduction</name>
      <t>
	A QR code is used to encode text as a graphical
	image. Depending on the characters used in the text, various
	encoding options for a QR code exist, e.g., Numeric,
	Alphanumeric, and Byte mode.  Even in Byte mode, a typical
	QR code reader tries to interpret a byte sequence as text encoded in UTF-8
	or ISO/IEC 8859-1. Thus, QR codes cannot be used
	to encode arbitrary binary data directly. Such data has to be
	converted into an appropriate text before that text could be
	encoded as a QR code.  Compared to already established Base64,
	Base32, and Base16 encoding schemes that are described in
	<xref target="RFC4648" format="default"></xref>, the Base45 scheme
	described in this document offers a more compact QR code
	encoding.
      </t>
      <t>
	One important difference from those others and Base45 is the
	key table and that the padding with '=' is not required.
      </t>
    </section>
    <section numbered="true" toc="default">
      <name>Conventions Used in This Document</name>
        <t>
    The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
    NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
    "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
    described in BCP&nbsp;14 <xref target="RFC2119"/> <xref target="RFC8174"/> 
    when, and only when, they appear in all capitals, as shown here.
        </t>
    </section>
    <section numbered="true" toc="default">
      <name>Interpretation of Encoded Data</name>
      <t>
	Encoded data is to be interpreted as described in <xref target="RFC4648" format="default"></xref> with the exception that a
	different alphabet is selected.
      </t>
    </section>
    <section numbered="true" toc="default">
      <name>The Base45 Encoding</name>
      <t>
	QR codes have a limited ability to store binary data. In
	practice, binary data have to be encoded in characters
	according to one of the modes already defined in the standard
	for QR codes. The easiest mode to use in called Alphanumeric
	mode (see Section 7.3.4 and Table 2 of <xref target="ISO18004" format="default"></xref>.  Unfortunately
	Alphanumeric mode uses 45 different characters which implies
	neither Base32 nor Base64 are very effective encodings.
      </t>
      <t>
	A 45-character subset of US-ASCII is used; the 45 characters
	usable in a QR code in Alphanumeric mode (see Section 7.3.4
	and Table 2 of <xref target="ISO18004" format="default"></xref>). Base45 encodes 2 bytes in 3 characters,
	compared to Base64, which encodes 3 bytes in 4 characters.
      </t>
      <t>
	For encoding, two bytes [a, b] <bcp14>MUST</bcp14> be
	interpreted as a number n in base 256, i.e. as an unsigned
	integer over 16 bits so that the number n = (a * 256) + b.
      </t>
      <t>
	This number n is converted to base 45 [c, d, e] so that n = c
	+ (d * 45) + (e * 45 * 45). Note the order of c, d and e which
	are chosen so that the left-most [c] is the least significant.
      </t>
      <t>
	The values c, d, and e are then looked up in <xref
	target="table1"/> to produce a three character string. The
	process is reversed when decoding.
      </t>
      <t>
	For encoding a single byte [a], it <bcp14>MUST</bcp14> be
	interpreted as a base 256 number, i.e. as an unsigned integer
	over 8 bits. That integer <bcp14>MUST</bcp14> be converted to
	base 45 [c d] so that a = c + (45 * d). The values c and d are
	then looked up in <xref target="table1"/> to produce a
	two-character string.
      </t>
      <t>
	A byte string [a b c d ... x y z] with arbitrary content and
	arbitrary length <bcp14>MUST</bcp14> be encoded as follows:
	From left to right pairs of bytes <bcp14>MUST</bcp14> be
	encoded as described above. If the number of bytes is even,
	then the encoded form is a string with a length that is evenly
	divisible by 3. If the number of bytes is odd, then the last
	(rightmost) byte <bcp14>MUST</bcp14> be encoded on two
	characters as described above.
      </t>
      <t>
	For decoding a Base45 encoded string the inverse operations
	are performed.
      </t>
      <section numbered="true" toc="default">
        <name>When to Use and Not Use Base45</name>
        <t>
	  If binary data is to be stored in a QR code, the suggested
	  mechanism is to use the Alphanumeric mode that uses 11 bits
	  for 2 characters as defined in Section 7.3.4 of <xref target="ISO18004" format="default"></xref>. The Extended Channel Interpretation (ECI) mode
	  indicator for this encoding is 0010.
        </t>
        <t>
	  On the other hand if the data is to be sent via some other
	  transport, a transport encoding suitable for that transport
	  should be used instead of Base45. For example, it is not
	  recommended to first encode data in Base45 and then encode
	  the resulting string in Base64 if the data is to be sent via
	  email. Instead, the Base45 encoding should be removed, and
	  the data itself should be encoded in Base64.
        </t>
      </section>
      <section numbered="true" toc="default">
        <name>The Alphabet Used in Base45</name>
        <t>
	  The Alphanumeric mode is defined to use 45 characters as specified
	  in this alphabet.
        </t>
       
<table anchor="table1"> 
  <name>The Base45 Alphabet</name>
  <thead>
    <tr>
      <th align="right">Value</th><th>Encoding</th>    
      <th align="right">Value</th><th>Encoding</th>
      <th align="right">Value</th><th>Encoding</th>
      <th align="right">Value</th><th>Encoding</th>
    </tr>
  </thead>
  <tbody>         
    <tr>
      <td align="right">00</td><td>0</td>
      <td align="right">12</td><td>C</td>
      <td align="right">24</td><td>O</td>
      <td align="right">36</td><td>Space</td>
    </tr>
    <tr>
      <td align="right">01</td><td>1</td>
      <td align="right">13</td><td>D</td>
      <td align="right">25</td><td>P</td>
      <td align="right">37</td><td>$</td>
    </tr>
    <tr>
      <td align="right">02</td><td>2</td>
      <td align="right">14</td><td>E</td>
      <td align="right">26</td><td>Q</td>
      <td align="right">38</td><td>%</td>
    </tr>
    <tr>
      <td align="right">03</td><td>3</td>
      <td align="right">15</td><td>F</td>
      <td align="right">27</td><td>R</td>
      <td align="right">39</td><td>*</td>
     </tr>
    <tr>
      <td align="right">04</td><td>4</td>
      <td align="right">16</td><td>G</td>
      <td align="right">28</td><td>S</td>
      <td align="right">40</td><td>+</td>
    </tr>
<tr>
      <td align="right">05</td><td>5</td>
      <td align="right">17</td><td>H</td>
      <td align="right">29</td><td>T</td>
      <td align="right">41</td><td>-</td>
    </tr>
<tr>
      <td align="right">06</td><td>6</td>
      <td align="right">18</td><td>I</td>
      <td align="right">30</td><td>U</td>
      <td align="right">42</td><td>.</td>
    </tr>
<tr>
      <td align="right">07</td><td>7</td>
      <td align="right">19</td><td>J</td>
      <td align="right">31</td><td>V</td>
      <td align="right">43</td><td>/</td>
    </tr>
<tr>
      <td align="right">08</td><td>8</td>
      <td align="right">20</td><td>K</td>
      <td align="right">32</td><td>W</td>
      <td align="right">44</td><td>:</td>
    </tr>
    <tr>
      <td align="right">09</td><td>9</td>
      <td align="right">21</td><td>L</td>
      <td align="right">33</td><td>X</td>
      <td></td><td></td>
    </tr>
      <tr>
      <td align="right">10</td><td>A</td>
      <td align="right">22</td><td>M</td>
      <td align="right">34</td><td>Y</td>
      <td></td><td></td>
    </tr>
<tr>
      <td align="right">11</td><td>B</td>
      <td align="right">23</td><td>N</td>
      <td align="right">35</td><td>Z</td>
      <td></td><td></td>
    </tr>
  </tbody>
</table>
      </section>
      <section numbered="true" toc="default">
        <name>Encoding Examples</name>
        <t>
	  It should be noted that although the examples are all text,
	  Base45 is an encoding for binary data where each octet can
	  have any value 0-255.
        </t>

<t>Encoding example 1:</t>
<t indent="3">
  The string "AB" is the byte sequence [[65 66]].
  
  If we look at all 16 bits, we get 65 * 256 + 66 = 16706.
  
  16706 equals 11 + (11 * 45) + (8 * 45 * 45), so the sequence in base 45 is [11 11 8].
  
  Referring to <xref target="table1"/>,  we get the encoded string "BB8".
</t>

	<table>
	  <name>Example 1 in Detail</name>
	  <tbody>
	    <tr>
	      <td>AB</td>
	      <td>Initial string</td>
	    </tr>
	    <tr>
	      <td>[[65 66]]</td>
	      <td>Decimal value</td>
	    </tr>
	    <tr>
	      <td>[16706]</td>
	      <td>Value in base 16</td>
	    </tr>
	    <tr>
	      <td>[11 11 8]</td>
	      <td>Value in base 45</td>
	    </tr>
	    <tr>
	      <td>BB8</td>
	      <td>Encoded string</td>
	    </tr>
	  </tbody>
	</table>

<t>Encoding example 2:</t>
<t indent="3">
  The string "Hello!!" as ASCII is the byte sequence [[72 101] [108 108] [111 33] [33]].
  
  If we look at this 16 bits at a time, we get [18533 27756 28449 33]. Note the 33 for the last byte.
  
     When looking at the values in base 45, we get [[38 6 9] [36 31 13] [9 2 14] [33 0]], where the last byte is represented by two values.
  
     The resulting string "%69 VD92EX0" is created by looking up these values in <xref target="table1"/>.  It should be noted it includes a space.
</t>

	<table>
	  <name>Example 2 in Detail</name>
	  <tbody>
	    <tr>
	      <td>Hello!!</td>
	      <td>Initial string</td>
	    </tr>
	    <tr>
	      <td>[[72 101] [108 108] [111 33] [33]]</td>
	      <td>Decimal value</td>
	    </tr>
	    <tr>
	      <td>[18533 27756 28449 33]</td>
	      <td>Value in base 16</td>
	    </tr>
	    <tr>
	      <td>[[38 6 9] [36 31 13] [9 2 14] [33 0]]</td>
	      <td>Value in base 45</td>
	    </tr>
	    <tr>
	      <td>%69 VD92EX0</td>
	      <td>Encoded string</td>
	    </tr>
	  </tbody>
	</table>

<t>Encoding example 3:</t>
<t indent="3">
  The string "base-45" as ASCII is the byte sequence [[98 97] [115 101]
   [45 52] [53]].
  
  If we look at this two bytes at a time, we get [25185 29541 11572
   53].  Note the 53 for the last byte.
  
  When looking at the values in base 45, we get [[30 19 12] [21 26 14]
   [7 32 5] [8 1]] where the last byte is represented by two values.
  
  Referring to <xref target="table1"/>, we get the encoded string
   "UJCLQE7W581".
</t>  
	<table>
	  <name>Example 3 in Detail</name>
	  <tbody>
	    <tr>
	      <td>base-45</td>
	      <td>Initial string</td>
	    </tr>
	    <tr>
	      <td>[[98 97] [115 101] [45 52] [53]]</td>
	      <td>Decimal value</td>
	    </tr>
	    <tr>
	      <td>[25185 29541 11572 53]</td>
	      <td>Value in base 16</td>
	    </tr>
	    <tr>
	      <td>[[30 19 12] [21 26 14] [7 32 5] [8 1]]</td>
	      <td>Value in base 45</td>
	    </tr>
	    <tr>
	      <td>UJCLQE7W581</td>
	      <td>Encoded string</td>
	    </tr>
	  </tbody>
	</table>
      </section>
      <section numbered="true" toc="default">
	<name>Decoding Example</name>
        <t>Decoding example 1:</t>
<t indent="3">
   The string "QED8WEX0" represents, when looked up in Table 1, the
   values [26 14 13 8 32 14 33 0].

     We arrange the numbers in chunks of three, except for the last one
   which can be two numbers, and get [[26 14 13] [8 32 14] [33 0]].
  
  In base 45, we get [26981 29798 33] where the bytes are [[105 101]
   [116 102] [33]].
  
     If we look at the ASCII values, we get the string "ietf!".
</t>

	<table>
	  <name>Example 4 in Detail</name>
	  <tbody>
	    <tr>
	      <td>QED8WEX0</td>
	      <td>Initial string</td>
	    </tr>
	    <tr>
	      <td>[26 14 13 8 32 14 33 0]</td>
	      <td>Looked up values</td>
	    </tr>
	    <tr>
	      <td>[[26 14 13] [8 32 14] [33 0]]</td>
	      <td>Groups of three</td>
	    </tr>
	    <tr>
	      <td>[26981 29798 33]</td>
	      <td>Interpreted as base 45</td>
	    </tr>
	    <tr>
	      <td>[[105 101] [116 102] [33]]</td>
	      <td>Values in base 8</td>
	    </tr>
	    <tr>
	      <td>ietf!</td>
	      <td>Decoded string</td>
	    </tr>
	  </tbody>
	</table>
	
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>
	This document has no IANA actions.
      </t>
    </section>
    <section numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>
	When implementing encoding and decoding it is important to be
	very careful so that buffer overflow or similar issues do
	not occur.  This of course includes the calculations in base
	45 and lookup in the table of characters (<xref
	target="table1"/>). A decoder must also be robust regarding
	input, including proper handling of any octet value 0-255,
	including the NUL character (ASCII 0).
      </t>
      <t>
	It should be noted that Base64 and some other encodings pad
	the string so that the encoding starts with an aligned number
	of characters while Base45 specifically avoids padding. Because of
	this, special care has to be taken when an odd number of octets
	is to be encoded. Similarly, care must be taken if the number
	of characters to decode are not evenly divisible by 3.
      </t>
      <t>
	Base encodings use a specific, reduced alphabet to encode
	binary data. Non-alphabet characters could exist within
	base-encoded data, caused by data corruption or by
	design. Non-alphabet characters may be exploited as a "covert
	channel", where non-protocol data can be sent for nefarious
	purposes. Non-alphabet characters might also be sent in order
	to exploit implementation errors leading to, for example, buffer
	overflow attacks.
      </t>
      <t>
	Implementations <bcp14>MUST</bcp14> reject any input that is
	not a valid encoding. For example, it <bcp14>MUST</bcp14>
	reject the input (encoded data) if it contains characters
	outside the base alphabet (in <xref target="table1"/>) when
	interpreting base-encoded data.
      </t>
      <t>
	Even though a Base45-encoded string contains only characters
	from the alphabet in <xref target="table1"/>, cases like the
	following have to be considered: The string "FGW" represents
	65535 (FFFF in base 16), which is a valid encoding of 16 bits.
	A slightly different encoded string of the same length, "GGW",
	would represent 65536 (10000 in base 16), which is represented
	by more than 16 bits.  Implementations <bcp14>MUST</bcp14>
	also reject the encoded data if it contains a triplet of
	characters that, when decoded, results in an unsigned integer
	that is greater than 65535 (FFFF in base 16).
      </t>
      <t>
	It should be noted that the resulting string after encoding to
	Base45 might include non-URL-safe characters so if the URL
	including the Base45 encoded data has to be URL-safe, one
	has to use percent-encoding.
      </t>
    </section>
  </middle>
  <back>
    <references>
      <name>Normative References</name>
      <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4648.xml"/>
      <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
      <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>

      <reference anchor="ISO18004" target="https://www.iso.org/standard/62021.html">
        <front>
          <title>
	    Information technology - Automatic identification and data capture
	    techniques - QR Code bar code symbology specification
          </title>
          <author>
            <organization>ISO/IEC</organization>
          </author>
          <date month="February" year="2015"/>
        </front>
        <seriesInfo name="ISO/IEC" value="18004:2015"/>
      </reference>
    </references>
    <section numbered="false" toc="default">
      <name>Acknowledgements</name>
      <t>
	The authors thank <contact fullname="Mark Adler"/>, <contact fullname="Anders Ahl"/>, <contact fullname="Alan Barrett"/>, <contact fullname="Sam Spens Clason"/>, <contact fullname="Alfred Fiedler"/>, <contact fullname="Tomas Harreveld"/>, <contact fullname="Erik Hellman"/>, <contact fullname="Joakim Jardenberg"/>, <contact fullname="Michael Joost"/>, <contact fullname="Erik Kline"/>, <contact fullname="Christian Landgren"/>, <contact fullname="Anders Lowinger"/>, <contact fullname="Mans Nilsson"/>, <contact fullname="Jakob Schlyter"/>, <contact fullname="Peter Teufl"/>, and <contact fullname="Gaby Whitehead"/> for the feedback. Also, everyone who has been working with Base64 over a long period of years and has proven the implementations are stable.
      </t>
    </section>
  </back>
</rfc>
