<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-ietf-cbor-cddl-control-07" number="9165" obsoletes="" updates="" submissionType="IETF" category="std" consensus="true" xml:lang="en" tocInclude="true" sortRefs="true" symRefs="true" version="3">
  
<!-- xml2rfc v2v3 conversion 3.9.1 -->

  <front>
    <title abbrev="CDDL Control Operators">Additional Control Operators for the Concise Data Definition Language (CDDL)</title>
    <seriesInfo name="RFC" value="9165"/>
    <author initials="C." surname="Bormann" fullname="Carsten Bormann">
      <organization>Universität Bremen TZI</organization>
      <address>
        <postal>
          <street>Postfach 330440</street>
          <city>Bremen</city>
          <code>D-28359</code>
          <country>Germany</country>
        </postal>
        <phone>+49-421-218-63921</phone>
        <email>cabo@tzi.org</email>
      </address>
    </author>
    <date year="2021" month="December"/>

    <keyword>binary format</keyword>
    <keyword>data interchange format</keyword>
    <keyword>description language</keyword>
    <keyword>schema language</keyword>
    <keyword>tree grammar</keyword>
    <keyword>ABNF</keyword>
    <keyword>Augmented BNF</keyword>
    <keyword>feature indication</keyword>

    <abstract>
      <t>The Concise Data Definition Language (CDDL), standardized in RFC 8610,
provides "control operators" as its main language extension point.</t>
      <t>The present document defines a number of control operators that were
not yet ready at the time RFC 8610 was completed:
<tt>.plus</tt>, <tt>.cat</tt>, and <tt>.det</tt> for the construction of constants;
<tt>.abnf</tt>/<tt>.abnfb</tt> for including ABNF (RFC 5234 and RFC 7405) in CDDL specifications; and
<tt>.feature</tt> for indicating the use of a non-basic feature in an instance.</t>
    </abstract>
  </front>
  <middle>
    <section anchor="intro" numbered="true" toc="default">
      <name>Introduction</name>
      <t>The Concise Data Definition Language (CDDL), standardized in <xref target="RFC8610" format="default"/>,
provides "control operators" as its main language extension point
(<xref section="3.8" sectionFormat="of" target="RFC8610" format="default"/>).</t>
      <t>The present document defines a number of control operators that were
not yet ready at the time <xref target="RFC8610"/> was completed:</t>
      <table anchor="tbl-new" align="center">
        <name>New Control Operators in this Document</name>
        <thead>
          <tr>
            <th align="left">Name</th>
            <th align="left">Purpose</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">.plus</td>
            <td align="left">Numeric addition</td>
          </tr>
          <tr>
            <td align="left">.cat</td>
            <td align="left">String concatenation</td>
          </tr>
          <tr>
            <td align="left">.det</td>
            <td align="left">String concatenation, pre-dedenting</td>
          </tr>
          <tr>
            <td align="left">.abnf</td>
            <td align="left">ABNF in CDDL (text strings)</td>
          </tr>
          <tr>
            <td align="left">.abnfb</td>
            <td align="left">ABNF in CDDL (byte strings)</td>
          </tr>
          <tr>
            <td align="left">.feature</td>
            <td align="left">Indicates name of feature used (extension point)</td>
          </tr>
        </tbody>
      </table>
      <section anchor="terminology" numbered="true" toc="default">
        <name>Terminology</name>
	        <t>
    The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
    NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
    "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
    described in BCP&nbsp;14 <xref target="RFC2119"/> <xref target="RFC8174"/> 
    when, and only when, they appear in all capitals, as shown here.
        </t>

        <t>This specification uses terminology from <xref target="RFC8610" format="default"/>.
In particular, with respect to control operators, "target" refers to
the left-hand side operand and "controller" to the right-hand side operand.
"Tool" refers to tools along the lines of that described in <xref section="F" sectionFormat="of" target="RFC8610" format="default"/>.
Note also that the data model underlying CDDL provides for text
strings as well as byte strings as two separate types, which are
then collectively referred to as "strings".</t>
        <t>The term "ABNF" in this specification stands for the combination of
<xref target="RFC5234" format="default"/> and <xref target="RFC7405" format="default"/>; i.e., the ABNF control operators defined by
this document allow use of the case-sensitive extensions defined in
<xref target="RFC7405" format="default"/>.</t>
      </section>
    </section>
    <section anchor="computed-literals" numbered="true" toc="default">
      <name>Computed Literals</name>
      <t>CDDL as defined in <xref target="RFC8610" format="default"/> does not have any mechanisms to compute
literals.  To cover a large part of the use cases, this specification adds three control
operators: <tt>.plus</tt> for numeric addition, <tt>.cat</tt> for string
concatenation, and <tt>.det</tt> for string concatenation with dedenting of
both sides (target and controller).</t>
      <t>For these operators, as with all control operators, targets and
controllers are types.  The resulting type is therefore formally a
function of the elements of the cross-product of the two types.
Not all tools may be able to work with non-unique targets or
controllers.</t>
      <section anchor="numeric-addition" numbered="true" toc="default">
        <name>Numeric Addition</name>
        <t>In many cases, numbers are needed relative to a
base number in a specification.  The <tt>.plus</tt> control identifies a number that is
constructed by adding the numeric values of the target and the
controller.</t>
        <t>The target and controller both <bcp14>MUST</bcp14> be numeric.
If the target is a floating point number and the controller an integer
number, or vice versa, the sum is converted into the type of the
target; converting from a floating point number to an integer selects
its floor (the largest integer less than or equal to the floating
point number, i.e., rounding towards negative infinity).</t>
        <figure anchor="exa-plus">
          <name>An Example of Addition to a Base Value</name>
          <sourcecode type="cddl"><![CDATA[
interval<BASE> = (
  BASE => int             ; lower bound
  (BASE .plus 1) => int   ; upper bound
  ? (BASE .plus 2) => int ; tolerance
)

X = 0
Y = 3
rect = {
  interval<X>
  interval<Y>
}
]]></sourcecode>
        </figure>
        <t>The example in <xref target="exa-plus" format="default"/> contains the generic definition of a CDDL
group <tt>interval</tt> that gives a lower and upper bound and, optionally,
a tolerance.
The parameter BASE allows the non-conflicting use of a multiple of these
interval groups in one map by assigning different labels to the
entries of the interval.
The rule <tt>rect</tt> combines two of these interval groups into a map, one group for
the X dimension (using 0, 1, and 2 as labels) and one for the Y dimension
(using 3, 4, and 5 as labels).</t>
      </section>
      <section anchor="string-concatenation" numbered="true" toc="default">
        <name>String Concatenation</name>
        <t>It is often useful to be able to compose string literals out of
component literals defined in different places in the specification.</t>
        <t>The <tt>.cat</tt> control identifies a string that is built from a
concatenation of the target and the controller.
The target and controller both <bcp14>MUST</bcp14> be strings.
The result of the operation has the same type as the target.                                    
The concatenation is performed on the bytes in both strings.
If the target is a text string, the result of that concatenation <bcp14>MUST</bcp14>
be valid UTF-8.</t>
        <figure anchor="exa-cat">
          <name>An Example of Concatenation of Text and Byte Strings</name>
          <sourcecode type="cddl"><![CDATA[
c = "foo" .cat '
  bar
  baz
'
; on a system where the newline is \n, is the same string as:
b = "foo\n  bar\n  baz\n"
]]></sourcecode>
        </figure>
        <t>The example in <xref target="exa-cat" format="default"/>
builds a text string named <tt>c</tt> from concatenating the target text string <tt>"foo"</tt>
and the controller byte string entered in a text form byte string literal.
(This particular idiom is useful when the text string contains
newlines, which, as shown in the example for <tt>b</tt>, may be harder to
read when entered in the format that the pure CDDL text string
notation inherits from JSON.)</t>
      </section>
      <section anchor="string-concatenation-with-dedenting" numbered="true" toc="default">
        <name>String Concatenation with Dedenting</name>
        <t>Multi-line string literals for various applications, including
embedded ABNF (<xref target="embedded-abnf" format="default"/>), need to be set flush left, at least
partially.
Often, having some indentation in the source code for the literal can
promote readability, as in <xref target="exa-det" format="default"/>.</t>
        <figure anchor="exa-det">
          <name>An Example of Dedenting Concatenation</name>
          <sourcecode type="cddl"><![CDATA[
oid = bytes .abnfb ("oid" .det cbor-tags-oid)
roid = bytes .abnfb ("roid" .det cbor-tags-oid)

cbor-tags-oid = '
  oid = 1*arc
  roid = *arc
  arc = [nlsb] %x00-7f
  nlsb = %x81-ff *%x80-ff
'
]]></sourcecode>
        </figure>
        <t>The control operator <tt>.det</tt> works like <tt>.cat</tt>, except that both
arguments (target and controller) are independently <em>dedented</em> before
the concatenation takes place.</t>
        <t>For the first rule in <xref target="exa-det" format="default"/>, the result is
equivalent to <xref target="exa-det-result" format="default"/>.</t>
        <figure anchor="exa-det-result">
          <name>Dedenting Example: Result of First .det</name>
          <sourcecode type="cddl"><![CDATA[
oid = bytes .abnfb 'oid
oid = 1*arc
roid = *arc
arc = [nlsb] %x00-7f
nlsb = %x81-ff *%x80-ff
'
]]></sourcecode>
        </figure>
        <t>For the purposes of this specification, we define "dedenting" as:</t>
        <ol spacing="normal" type="1"><li>determining the smallest amount of leftmost blank space (number of
leading space characters) present in all the non-blank lines, and</li>
          <li>removing exactly that number of leading space characters from each
line.  For blank (blank space only or empty) lines, there may be
fewer (or no) leading space characters than this amount, in which
case all leading space is removed.</li>
        </ol>
        <t>(The name <tt>.det</tt> is a shortcut for "dedenting cat".
The maybe more obvious name <tt>.dedcat</tt> has not been chosen
as it is longer and may invoke unpleasant images.)</t>
        <t>Occasionally, dedenting of only a single item is needed.
This can be achieved by using this operator with an empty string,
e.g., <tt>"" .det rhs</tt> or <tt>lhs .det ""</tt>, which can in turn be combined
with a <tt>.cat</tt>: in the construct <tt>lhs .cat ("" .det rhs)</tt>, only <tt>rhs</tt>
is dedented.</t>
      </section>
    </section>
    <section anchor="embedded-abnf" numbered="true" toc="default">
      <name>Embedded ABNF</name>
      <t>Many IETF protocols define allowable values for their text strings in
ABNF <xref target="RFC5234" format="default"/> <xref target="RFC7405" format="default"/>.
It is often desirable to define a text string type in CDDL by
employing existing ABNF embedded into the CDDL specification.
Without specific ABNF support in CDDL, that ABNF would usually need to
be translated into a regular expression (if that is even possible).</t>
      <t>ABNF is added to CDDL in the same way that regular
expressions were added: by defining a <tt>.abnf</tt> control operator.
The target is usually <tt>text</tt> or some restriction on it, and the controller
is the text of an ABNF specification.</t>
      <t>There are several small issues; the solutions are given here:</t>
      <ul spacing="normal">
        <li>ABNF can be used to define byte sequences as well as UTF-8 text
strings interpreted as Unicode scalar sequences.  This means this
specification defines two control operators: <tt>.abnfb</tt> for ABNF
denoting byte sequences and <tt>.abnf</tt> for denoting sequences of
Unicode scalar values (code points) represented as UTF-8 text strings.

Both control operators can be applied to targets of either string
type; the ABNF is applied to the sequence of bytes in the string
and interprets it as a sequence of bytes (<tt>.abnfb</tt>) or as a sequence
of code points represented as an UTF-8 text string (<tt>.abnf</tt>).
	The controller string <bcp14>MUST</bcp14> be a string. When a byte string, it <bcp14>MUST</bcp14> be valid UTF-8 and is interpreted as the text string that has the same sequence of bytes.</li>
        <li>ABNF defines a list of rules, not a single expression (called
"elements" in <xref target="RFC5234" format="default"/>).  This is resolved by requiring the
controller string to be one valid "element", followed by zero or
more valid "rules" separated from the element by a newline; thus, the
controller string can be built by preceding a piece
of valid ABNF by an "element" that selects from that ABNF and a newline.</li>
        <li>For the same reason, ABNF requires newlines; specifying newlines in
CDDL text strings is tedious (and leads to essentially unreadable
ABNF).  The workaround employs the <tt>.cat</tt> operator introduced in
<xref target="string-concatenation" format="default"/> and the syntax for text in byte strings.
As is customary for ABNF, the syntax of ABNF itself (<em>not</em> the syntax
expressed in ABNF!) is relaxed to allow a single line feed as a
newline:</li>
      </ul>
      <sourcecode type="abnf"><![CDATA[
   CRLF = %x0A / %x0D.0A
]]></sourcecode>
      <ul spacing="normal">
        <li>One set of rules provided in an ABNF specification is often used in
multiple positions, particularly staples such as DIGIT and ALPHA.
(Note that all rules referenced need to be defined in each ABNF
operator controller string --
there is no implicit import of core ABNF rules from <xref target="RFC5234" format="default"/> or other rules.)
The composition this calls for can be provided by the <tt>.cat</tt>
operator and/or by <tt>.det</tt> if there is indentation to be disposed of.</li>
      </ul>
      <t>These points are combined into an example in <xref target="exa-abnf" format="default"/>, which uses
ABNF from <xref target="RFC3339" format="default"/> to specify one of each of the Concise Binary Object Representation (CBOR) tags defined in
      <xref target="RFC8943" format="default"/> and <xref target="RFC8949" format="default"/>.</t>
      <figure anchor="exa-abnf">
        <name>An Example of Employing ABNF from RFC 3339 for Defining CBOR Tags</name>
<sourcecode type="cddl"><![CDATA[
; for RFC 8943
Tag1004 = #6.1004(text .abnf full-date)
; for RFC 8949
Tag0 = #6.0(text .abnf date-time)

full-date = "full-date" .cat rfc3339
date-time = "date-time" .cat rfc3339

; Note the trick of idiomatically starting with a newline, separating
;   off the element in the concatenations above from the rule-list
rfc3339 = '
   date-fullyear   = 4DIGIT
   date-month      = 2DIGIT  ; 01-12
   date-mday       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
                             ; month/year
   time-hour       = 2DIGIT  ; 00-23
   time-minute     = 2DIGIT  ; 00-59
   time-second     = 2DIGIT  ; 00-58, 00-59, 00-60 based on leap sec
                             ; rules
   time-secfrac    = "." 1*DIGIT
   time-numoffset  = ("+" / "-") time-hour ":" time-minute
   time-offset     = "Z" / time-numoffset

   partial-time    = time-hour ":" time-minute ":" time-second
                     [time-secfrac]
   full-date       = date-fullyear "-" date-month "-" date-mday
   full-time       = partial-time time-offset

   date-time       = full-date "T" full-time
' .det rfc5234-core

rfc5234-core = '
   DIGIT          =  %x30-39 ; 0-9
   ; abbreviated here
'
]]></sourcecode>
      </figure>
    </section>
    <section anchor="features" numbered="true" toc="default">
      <name>Features</name>
      <t>Commonly, the kind of validation enabled by languages such as
CDDL provides a Boolean result: valid or invalid.</t>
      <t>In rapidly evolving environments, this is too simplistic.  The data
models described by a CDDL specification may continually be enhanced
by additional features, and it would be useful even for a
specification that does not yet describe a specific future feature to
identify the extension point the feature can use and accept such
extensions while marking them as extensions.</t>
      <t>The <tt>.feature</tt> control annotates the target as making use of the
feature named by the controller.  The latter will usually be a string.
A tool that validates an instance against that specification may mark
the instance as using a feature that is annotated by the
specification.</t>
      <t>More specifically, the tool's diagnostic output might contain
the controller (right-hand side) as a feature name and the target
(left-hand side) as a feature detail.  However, in some cases, the target has
too much detail, and the specification might want to hint to the tool
that more limited detail is appropriate.  In this case, the controller
should be an array, with the first element being the feature name
(that would otherwise be the entire controller) and the second
element being the detail (usually another string), as illustrated in
<xref target="exa-feat-array" format="default"/>.</t>
      <figure anchor="exa-feat-array">
        <name>Providing Explicit Detail with .feature</name>
        <sourcecode type="cddl"><![CDATA[
foo = {
  kind: bar / baz .feature (["foo-extensions", "bazify"])
}
bar = ...
baz = ... ; complex stuff that doesn't all need to be in the detail
]]></sourcecode>
      </figure>
      <t><xref target="exa-feat-map" format="default"/> shows what could be the definition of a person, with
potential extensions beyond <tt>name</tt> and <tt>organization</tt> being marked
<tt>further-person-extension</tt>.
Extensions that are known at the time this definition is written can be
collected into <tt>$$person-extensions</tt>.  However, future extensions
would be deemed invalid unless the wildcard at the end of the map is
added.
These extensions could then be specifically examined by a user or a
tool that makes use of the validation result; the label (map key)
actually used makes a fine feature detail for the tool's diagnostic
output.</t>
      <t>Leaving out the entire extension point would mean that instances that
make use of an extension would be marked as wholesale invalid, making
the entire validation approach much less useful.
Leaving the extension point in but not marking its use as special
would render mistakes (such as using the label "<tt>organisation</tt>" instead of
"<tt>organization</tt>") invisible.</t>
      <figure anchor="exa-feat-map">
        <name>Map Extensibility with .feature</name>
        <sourcecode type="cddl"><![CDATA[
person = {
  ? name: text
  ? organization: text
  $$person-extensions
  * (text .feature "further-person-extension") => any
}

$$person-extensions //= (? bloodgroup: text)
]]></sourcecode>
      </figure>
      <t><xref target="exa-feat-type" format="default"/> shows another example where <tt>.feature</tt> provides for
type extensibility.</t>
      <figure anchor="exa-feat-type">
        <name>Type Extensibility with .feature</name>
        <sourcecode type="cddl"><![CDATA[
allowed-types = number / text / bool / null
              / [* number] / [* text] / [* bool]
              / (any .feature "allowed-type-extension")
]]></sourcecode>
      </figure>
      <t>A CDDL tool may simply report the set of features being used; the
control then only provides information to the process requesting the
validation.
One could also imagine a tool that takes arguments, allowing the tool to accept
certain features and reject others (enable/disable).  The latter approach
could, for instance, be used for a JSON/CBOR switch, as illustrated in
<xref target="exa-feat-variants" format="default"/>, using Sensor Measurement Lists (SenML) <xref target="RFC8428" format="default"/> as the example data model
used with both JSON and CBOR.</t>
      <figure anchor="exa-feat-variants">
        <name>Describing Variants with .feature</name>
        <sourcecode type="cddl"><![CDATA[
SenML-Record = {
; ...
  ? v => number
; ...
}
v = JC<"v", 2>
JC<J,C> = J .feature "json" / C .feature "cbor"
]]></sourcecode>
      </figure>
      <t>It remains to be seen if the enable/disable approach can lead to new
idioms of using CDDL.  The language currently has no way to enforce
mutually exclusive use of features, as would be needed in this example.</t>
    </section>
    <section anchor="iana-considerations" numbered="true" toc="default">
      <name>IANA Considerations</name>
<t>IANA has registered the contents of
<xref target="tbl-iana-reqs" format="default"/> into the "CDDL Control Operators" registry of <xref target="IANA.cddl" format="default"/>:</t>
      <table anchor="tbl-iana-reqs" align="center">
        <name>New Control Operators</name>
        <thead>
          <tr>
            <th align="left">Name</th>
            <th align="left">Reference</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">.plus</td>
            <td align="left">RFC 9165</td>
          </tr>
          <tr>
            <td align="left">.cat</td>
            <td align="left">RFC 9165</td>
          </tr>
          <tr>
            <td align="left">.det</td>
            <td align="left">RFC 9165</td>
          </tr>
          <tr>
            <td align="left">.abnf</td>
            <td align="left">RFC 9165</td>
          </tr>
          <tr>
            <td align="left">.abnfb</td>
            <td align="left">RFC 9165</td>
          </tr>
          <tr>
            <td align="left">.feature</td>
            <td align="left">RFC 9165</td>
          </tr>
        </tbody>
      </table>
    </section>
    <section anchor="security-considerations" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>The security considerations of <xref target="RFC8610" format="default"/> apply.</t>
      <t>While both <xref target="RFC5234" format="default"/> and <xref target="RFC7405" format="default"/> state that security is truly
believed to be irrelevant to the respective document, the use of
formal description techniques cannot only simplify but sometimes also
complicate a specification.
This can lead to security problems in implementations and in the
specification itself.
As with CDDL itself, ABNF should be judiciously applied, and overly
complex (or "cute") constructions should be avoided.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>

<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8610.xml"/>

        <reference anchor="IANA.cddl" target="https://www.iana.org/assignments/cddl">
          <front>
            <title>Concise Data Definition Language (CDDL)</title>
            <author>
              <organization>IANA</organization>
            </author>
            <date/>
          </front>
        </reference>

<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5234.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7405.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>

      </references>
      <references>
        <name>Informative References</name>

<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8428.xml"/>
	
<!--
        <reference anchor="CDDL-RS" target="https://github.com/anweiss/cddl">
          <front>
            <title>cddl-rs</title>
            <author initials="A." surname="Weiss" fullname="Andrew Weiss">
              <organization/>
            </author>
            <date/>
          </front>
        </reference>
-->
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3339.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8943.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8949.xml"/>

      </references>
    </references>
    <section numbered="false" anchor="acknowledgements" toc="default">
      <name>Acknowledgements</name>
      <t><contact fullname="Jim Schaad"/> suggested several improvements.
The <tt>.feature</tt> feature was developed out of a discussion with <contact fullname="Henk Birkholz"/>.
<contact fullname="Paul Kyzivat"/> helped isolate the need for <tt>.det</tt>.</t>
      <t>.det is an abbreviation for "dedenting cat", but Det is also the name
of a German TV cartoon character created in the 1960s.</t>
</section>
  </back>
</rfc>
