<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info"
     docName="draft-google-self-published-geofeeds-09" ipr="trust200902"
     obsoletes="" updates="" submissionType="independent" xml:lang="en"
     tocInclude="true" tocDepth="4" sortRefs="true" symRefs="true"
     version="3" number="8805"> 
  <!-- xml2rfc v2v3 conversion 2.41.0 -->
  <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>

  <front>
    <title abbrev="Self-Published IP Geofeeds">A Format for Self-Published IP
    Geolocation Feeds</title>
    <seriesInfo name="RFC" value="8805"/>
    <author fullname="Erik Kline" initials="E." surname="Kline">
      <organization>Loon LLC</organization>
      <address>
        <postal>
          <street>1600 Amphitheatre Parkway</street>
          <city>Mountain View</city>
          <region>CA</region>
          <code>94043</code>
          <country>United States of America</country>
        </postal>
        <email>ek@loon.com</email>
      </address>
    </author>
    <author fullname="Krzysztof Duleba" initials="K."
	    surname="Duleba">
   <organization>Google</organization>
      <address>
        <postal>
          <street>1600 Amphitheatre Parkway</street>
          <city>Mountain View</city>
	  <region>CA</region>
          <code>94043</code>
          <country>United States of America</country>
        </postal>
        <email>kduleba@google.com</email>
      </address>
    </author>
    <author fullname="Zoltan Szamonek" initials="Z." surname="Szamonek">
      <organization>Google Switzerland GmbH</organization>
      <address>
        <postal>
          <street>Brandschenkestrasse 110</street>
          <code>8002</code>
          <city>Zürich</city>
          <country>Switzerland</country>
        </postal>
        <email>zszami@google.com</email>
      </address>
    </author>
    <author fullname="Stefan Moser" initials="S." surname="Moser">
      <organization>Google Switzerland GmbH</organization>
      <address>
        <postal>
          <street>Brandschenkestrasse 110</street>
          <code>8002</code>
          <city>Zürich</city>
          <country>Switzerland</country>
        </postal>
        <email>smoser@google.com</email>
      </address>
    </author>
    <author fullname="Warren Kumari" initials="W." surname="Kumari">
      <organization>Google</organization>
      <address>
        <postal>
          <street>1600 Amphitheatre Parkway</street>
          <city>Mountain View</city>
	  <region>CA</region>
          <code>94043</code>
          <country>United States of America</country>
        </postal>
        <email>warren@kumari.net</email>
      </address>
    </author>
    <date month="August" year="2020"/>

    <abstract>
      <t>This document records a format whereby a network operator can publish
      a mapping of IP address prefixes to simplified geolocation information,
      colloquially termed a "geolocation feed".  Interested parties can poll
      and parse these feeds to update or merge with other geolocation data
      sources and procedures.  This format intentionally only allows specifying
      coarse-level location.</t> 
      <t>Some technical organizations operating networks that move from one
      conference location to the next have already experimentally published
      small geolocation feeds.</t>
      <t>This document describes a currently deployed format. At
      least one consumer (Google) has incorporated these feeds into a
      geolocation data pipeline, and a significant number of ISPs are
      using it to inform them where their prefixes should be geolocated.</t>
    </abstract>
  </front>
  <middle>
    <section numbered="true" toc="default">
      <name>Introduction</name>
      <section numbered="true" toc="default">
        <name>Motivation</name>
        <t>Providers of services over the Internet have grown to depend on
        best-effort geolocation information to improve the user experience.
        Locality information can aid in directing traffic to the nearest
        serving location, inferring likely native language, and providing
        additional context for services involving search queries.</t>
        <t>When an ISP, for example, changes the location where an IP prefix
        is deployed, services that make use of geolocation information may
        begin to suffer degraded performance. This can lead to customer
        complaints, possibly to the ISP directly. Dissemination of correct
        geolocation data is complicated by the lack of any centralized means
        to coordinate and communicate geolocation information to all
        interested consumers of the data.</t>
        <t>This document records a format whereby a network operator (an ISP,
        an enterprise, or any organization that deems the geolocation of its
        IP prefixes to be of concern) can publish a mapping of IP address
        prefixes to simplified geolocation information, colloquially termed a
        "geolocation feed". Interested parties can poll and parse these feeds
        to update or merge with other geolocation data sources and
        procedures.</t>
        <t>This document describes a currently deployed format. At
        least one consumer (Google) has incorporated these feeds into a
        geolocation data pipeline, and a significant number of ISPs are
        using it to inform them where their prefixes should be geolocated.</t>
      </section>
      <section numbered="true" toc="default">
        <name>Requirements Notation</name>
        <t>
    The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
    "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL 
    NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
    "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", 
    "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are
    to be interpreted as 
    described in BCP&nbsp;14 <xref target="RFC2119"/> <xref target="RFC8174"/> 
    when, and only when, they appear in all capitals, as shown here.
        </t>
        <t>As this is an informational document about a data format and set of
        operational practices presently in use, requirements notation captures
        the design goals of the authors and implementors.</t>
      </section>
      <section numbered="true" toc="default">
        <name>Assumptions about Publication</name>
        <t>This document describes both a format and a mechanism for
        publishing data, with the assumption that the network operator to whom
        operational responsibility has been delegated for any published data
        wishes it to be public. Any privacy risk is bounded by the format, and
        feed publishers <bcp14>MAY</bcp14> omit prefixes or any location field associated with
        a given prefix to further protect privacy (see <xref target="spec" format="default"/>
        for details about which fields exactly may be omitted). Feed publishers
        assume the responsibility of determining which data should be made
        public.</t>
        <t>This document does not incorporate a mechanism to communicate
        acceptable use policies for self-published data. Publication itself is
        inferred as a desire by the publisher for the data to be usefully
        consumed, similar to the publication of information like host names,
        cryptographic keys, and Sender Policy Framework (SPF) records <xref
	target="RFC7208" format="default"/> in the DNS.</t>
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>Self-Published IP Geolocation Feeds</name>
      <t>The format described here was developed to address the need of
      network operators to rapidly and usefully share geolocation information
      changes. Originally, there arose a specific case where regional
      operators found it desirable to publish location changes rather than
      wait for geolocation algorithms to "learn" about them. Later, technical
      conferences that frequently use the same network prefixes advertised
      from different conference locations experimented by publishing
      geolocation feeds updated in advance of network location changes in
      order to better serve conference attendees.</t>
      <t>At its simplest, the mechanism consists of a network operator
      publishing a file (the "geolocation feed") that contains several text
      entries, one per line. Each entry is keyed by a unique (within the feed)
      IP prefix (or single IP address) followed by a sequence of network
      locality attributes to be ascribed to the given prefix.</t>
      <section anchor="spec" numbered="true" toc="default">
        <name>Specification</name>
        <t>For operational simplicity, every feed should contain data about
        all IP addresses the provider wants to publish. Alternatives, like
        publishing only entries for IP addresses whose geolocation data has
        changed or differ from current observed geolocation behavior "at
        large", are likely to be too operationally complex.</t>
        <t>Feeds <bcp14>MUST</bcp14> use UTF-8 <xref target="RFC3629"
	format="default"/> character encoding. 
        Lines are delimited by a line break (CRLF) (as specified in
        <xref target="RFC4180" format="default"/>), and blank lines are ignored.
        Text from a '#' character to the end of the current line is treated
        as a comment only and is similarly ignored (note that this does not
        strictly follow <xref target="RFC4180" format="default"/>, which has no
	support for comments).</t>
        <t>Feed lines that are not comments <bcp14>MUST</bcp14> be formatted
	as comma-separated values (CSV), as described in <xref
	target="RFC4180" format="default"/>. Each feed entry is a text line of
	the form:</t>  
<artwork name="" type="" align="left" alt=""><![CDATA[
ip_prefix,alpha2code,region,city,postal_code
]]></artwork>
        <t>The IP prefix field is <bcp14>REQUIRED</bcp14>, all others are
	<bcp14>OPTIONAL</bcp14> (can be empty), though the requisite minimum
	number of commas <bcp14>SHOULD</bcp14> be present.</t> 
        <section numbered="true" toc="default">
          <name>Geolocation Feed Individual Entry Fields</name>
          <section numbered="true" toc="default">
            <name>IP Prefix</name>
            <t><bcp14>REQUIRED</bcp14>: Each IP prefix field
	    <bcp14>MUST</bcp14> be either a single IP address or an IP prefix
	    in Classless Inter-Domain Routing (CIDR) notation in conformance
	    with <xref target="RFC4632" sectionFormat="of" section="3.1"/> for
	    IPv4 or <xref target="RFC4291" sectionFormat="of" section="2.3"/>
	    for IPv6.</t> 
            <t>Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and
            "2001:db8::1" and "2001:db8::/32" for IPv6.</t>
          </section>
          <section numbered="true" toc="default">
            <name>Alpha2code (Previously: 'country')</name>
            <t><bcp14>OPTIONAL</bcp14>: The alpha2code field, if non-empty,
	    <bcp14>MUST</bcp14> be a 2-letter 
            ISO country code conforming to ISO 3166-1 alpha 2 <xref
	    target="ISO.3166.1alpha2" format="default"/>. Parsers
	    <bcp14>SHOULD</bcp14> treat this field 
            case-insensitively.</t>
            <t>Earlier versions of this document called this field "country",
	    and it may still be referred to as such in existing
	    tools/interfaces.</t>  
            <t>Parsers <bcp14>MAY</bcp14> additionally support other 2-letter
	    codes outside the ISO 3166-1 alpha 2 codes, such as the 2-letter
	    codes from the "Exceptionally reserved codes" <xref
	    target="ISO-GLOSSARY" format="default"/> set.</t>  
            <t>Examples include "US" for the United States, "JP" for Japan,
	    and "PL" for Poland.</t>

	    
          </section>
          <section numbered="true" toc="default">
            <name>Region</name>
            <t><bcp14>OPTIONAL</bcp14>: The region field, if non-empty,
	    <bcp14>MUST</bcp14> be an ISO region code conforming to ISO 3166-2
	    <xref target="ISO.3166.2" format="default"/>. Parsers
	    <bcp14>SHOULD</bcp14> treat this field case-insensitively.</t> 
            <t>Examples include "ID-RI" for the Riau province of Indonesia and
            "NG-RI" for the Rivers province in Nigeria.</t>
          </section>
          <section numbered="true" toc="default">
            <name>City</name>
            <t><bcp14>OPTIONAL</bcp14>: The city field, if non-empty,
	    <bcp14>SHOULD</bcp14> be free UTF-8 
            text, excluding the comma (',') character.</t>
            <t>Examples include "Dublin", "New York", and "Sao Paulo"
            (specifically "S" followed by 0xc3, 0xa3, and "o Paulo").</t>
          </section>
          <section anchor="postal" numbered="true" toc="default">
            <name>Postal Code</name>
            <t><bcp14>OPTIONAL</bcp14>, DEPRECATED: The postal code field, if
	    non-empty, <bcp14>SHOULD</bcp14> be free UTF-8 text, excluding the
	    comma (',') character. The use of this field is deprecated;
	    consumers of feeds should be able to parse feeds containing these
	    fields, but new feeds <bcp14>SHOULD NOT</bcp14> include this
	    field due to the granularity of this information. See <xref
	    target="privacy" format="default"/> for additional discussion.</t> 
            <t>Examples include "106-6126" (in Minato ward, Tokyo, Japan).</t>
          </section>
        </section>
        <section numbered="true" toc="default">
          <name>Prefixes with No Geolocation Information</name>
          <t>Feed publishers may indicate that some IP prefixes should not
          have any associated geolocation information. It may be that some
          prefixes under their administrative control are reserved, not yet
          allocated or deployed, or in the process of being redeployed
          elsewhere and existing geolocation information can, from the
          perspective of the publisher, safely be discarded.</t>
          <t>This special case can be indicated by explicitly leaving blank
          all fields that specify any degree of geolocation information. For
          example: </t>
<artwork name="" type="" align="left" alt=""><![CDATA[
192.0.2.0/24,,,,
2001:db8:1::/48,,,,
2001:db8:2::/48,,,,
]]></artwork>
          <t>Historically, the user-assigned alpha2code identifier of "ZZ" has
	  been used for this same purpose. This is not necessarily preferred,
	  and no specific interpretation of any of the other user-assigned
	  alpha2code codes is currently defined.</t> 
        </section>
        <section numbered="true" toc="default">
          <name>Additional Parsing Requirements</name>
          <t>Feed entries that do not have an IP address or prefix field or have an
	  IP address or prefix field that fails to parse correctly
	  <bcp14>MUST</bcp14> be discarded.</t> 
          <t>While publishers <bcp14>SHOULD</bcp14> follow <xref
	  target="RFC5952" format="default"/> for IPv6 prefix fields,
	  consumers <bcp14>MUST</bcp14> nevertheless accept all valid string
	  representations.</t> 
          <t>Duplicate IP address or prefix entries <bcp14>MUST</bcp14> be
	  considered an error, and consumer implementations
	  <bcp14>SHOULD</bcp14> log the repeated entries for further
	  administrative review. Publishers <bcp14>SHOULD</bcp14> take
	  measures to ensure there is one and only one entry per IP address
	  and prefix.</t> 
          <t>Multiple entries that constitute nested prefixes are
	  permitted. Consumers <bcp14>SHOULD</bcp14> consider the entry with
	  the longest matching prefix (i.e., the "most specific") to be the
	  best matching entry for a given IP address.</t> 
          <t>Feed entries with non-empty optional fields that fail to parse,
          either in part or in full, <bcp14>SHOULD</bcp14> be discarded. It is
	  <bcp14>RECOMMENDED</bcp14> that they also be logged for further
	  administrative review.</t> 
          <t>For compatibility with future additional fields, a parser
	  <bcp14>MUST</bcp14> ignore any fields beyond those it expects. The
	  data from fields that are expected and that parse successfully
	  <bcp14>MUST</bcp14> still be considered valid. Per <xref
	  target="future_work" format="default"/>, no extensions to this format
	  are in use nor are any anticipated.</t> 
        </section>
      </section>
      <section numbered="true" toc="default">
        <name>Examples</name>
        <t>Example entries using different IP address formats and describing
	locations at alpha2code ("country code"), region, and city granularity
	level, respectively: </t> 
<artwork name="" type="" align="left" alt=""><![CDATA[
192.0.2.0/25,US,US-AL,,
192.0.2.5,US,US-AL,Alabaster,
192.0.2.128/25,PL,PL-MZ,,
2001:db8::/32,PL,,,
2001:db8:cafe::/48,PL,PL-MZ,,
]]></artwork>
        <t>The IETF network publishes geolocation information for the meeting
        prefixes, and generally just comment out the last meeting information
        and append the new meeting information. The <xref target="GEO_IETF"
	format="default"/>, at the time of this writing, contains:


	</t>
<artwork name="" type="" align="left" alt=""><![CDATA[
# IETF106 (Singapore) - November 2019 - Singapore, SG
130.129.0.0/16,SG,SG-01,Singapore,
2001:df8::/32,SG,SG-01,Singapore,
31.133.128.0/18,SG,SG-01,Singapore,
31.130.224.0/20,SG,SG-01,Singapore,
2001:67c:1230::/46,SG,SG-01,Singapore,
2001:67c:370::/48,SG,SG-01,Singapore,
]]></artwork>
        <t>Experimentally, RIPE has published geolocation information for
        their conference network prefixes, which change location in accordance
        with each new event. <xref target="GEO_RIPE_NCC" format="default"/>, at the time of
        writing, contains:</t>
<artwork name="" type="" align="left" alt=""><![CDATA[
193.0.24.0/21,NL,NL-ZH,Rotterdam,
2001:67c:64::/48,NL,NL-ZH,Rotterdam,
]]></artwork>
        <t>Similarly, ICANN has published geolocation information for their
        portable conference network prefixes. <xref target="GEO_ICANN" format="default"/>, at
        the time of writing, contains: </t>
<artwork name="" type="" align="left" alt=""><![CDATA[
199.91.192.0/21,MA,MA-07,Marrakech
2620:f:8000::/48,MA,MA-07,Marrakech
]]></artwork>
        <t>A longer example is the <xref target="GEO_Google" format="default"/> Google Corp
        Geofeed, which lists the geolocation information for Google corporate
        offices.</t>
        <t>At the time of writing, Google processes approximately 400 feeds
        comprising more than 750,000 IPv4 and IPv6 prefixes.</t>
      </section>
    </section>
    <section anchor="consumers" numbered="true" toc="default">
      <name>Consuming Self-Published IP Geolocation Feeds</name>
      <t>Consumers <bcp14>MAY</bcp14> treat published feed data as a hint only
      and <bcp14>MAY</bcp14> choose 
      to prefer other sources of geolocation information for any given IP
      prefix. Regardless of a consumer's stance with respect to a given
      published feed, there are some points of note for sensibly and
      effectively consuming published feeds.</t>
      <section anchor="integrity" numbered="true" toc="default">
        <name>Feed Integrity</name>
        <t>The integrity of published information <bcp14>SHOULD</bcp14> be
	protected by securing the means of publication, for example, by using
	HTTP over TLS <xref target="RFC2818" format="default"/>. Whenever
	possible, consumers <bcp14>SHOULD</bcp14> prefer retrieving
	geolocation feeds in a manner that guarantees integrity of the
	feed.</t> 
      </section>
      <section anchor="authority" numbered="true" toc="default">
        <name>Verification of Authority</name>
        <t>Consumers of self-published IP geolocation feeds <bcp14>SHOULD</bcp14> perform
        some form of verification that the publisher is in fact authoritative
        for the addresses in the feed. The actual means of verification is
        likely dependent upon the way in which the feed is discovered. Ad hoc
        shared URIs, for example, will likely require an ad hoc verification
        process. Future automated means of feed discovery <bcp14>SHOULD</bcp14> have an
        accompanying automated means of verification.</t>
        <t>A consumer should only trust geolocation information for IP addresses
        or prefixes for which the publisher has been verified as
        administratively authoritative. All other geolocation feed entries
        should be ignored and logged for further administrative review.</t>
      </section>
      <section anchor="accuracy" numbered="true" toc="default">
        <name>Verification of Accuracy</name>
        <t>Errors and inaccuracies may occur at many levels, and publication
        and consumption of geolocation data are no exceptions. To the extent
        practical, consumers <bcp14>SHOULD</bcp14> take steps to verify the accuracy of
        published locality. Verification methodology, resolution of
        discrepancies, and preference for alternative sources of data are left
        to the discretion of the feed consumer.</t>
        <t>Consumers <bcp14>SHOULD</bcp14> decide on discrepancy thresholds
	and <bcp14>SHOULD</bcp14> flag, for administrative review, feed entries
	that exceed set thresholds.</t> 
      </section>
      <section numbered="true" toc="default">
        <name>Refreshing Feed Information</name>
        <t>As a publisher can change geolocation data at any time and without
        notification, consumers <bcp14>SHOULD</bcp14> implement mechanisms to periodically
        refresh local copies of feed data. In the absence of any other refresh
        timing information, it is recommended that consumers <bcp14>SHOULD</bcp14> refresh
        feeds no less often than weekly and no more often than is likely
        to cause issues to the publisher.</t>
        <t>For feeds available via HTTPS (or HTTP), the publisher <bcp14>MAY</bcp14>
        communicate refresh timing information by means of the standard HTTP
        expiration model (<xref target="RFC7234"/>). Specifically, publishers can 
        include either an Expires header (<xref target="RFC7234"
	sectionFormat="of" section="5.3"/>) or a Cache-Control header (<xref
	target="RFC7234" sectionFormat="of" section="5.2"/>) specifying the
	max-age. Where practical, consumers <bcp14>SHOULD</bcp14> refresh feed
	information before the expiry time is reached.</t> 
      </section>
    </section>
    <section anchor="privacy" numbered="true" toc="default">
      <name>Privacy Considerations</name>
      <t>Publishers of geolocation feeds are advised to have fully considered
      any and all privacy implications of the disclosure of such information
      for the users of the described networks prior to publication. A thorough
      comprehension of the security considerations (<xref target="RFC6772"
      sectionFormat="of" section="13"/>) of a chosen geolocation policy is
      highly recommended, including an understanding of some of the
      limitations of information obscurity (<xref target="RFC6772"
      sectionFormat="of" section="13.5"/>) (see also <xref target="RFC6772"
      format="default"/>).</t> 

      <t>As noted in <xref target="spec" format="default"/>, each location
      field in an entry is 
      optional, in order to support expressing only the level of specificity
      that the publisher has deemed acceptable. There is no requirement that
      the level of specificity be consistent across all entries within a feed.
      In particular, the Postal Code field (<xref target="postal" format="default"/>) can
      provide very specific geolocation, sometimes within a building. Such
      specific Postal Code values <bcp14>MUST NOT</bcp14> be published in geofeeds without
      the express consent of the parties being located.</t>
      <t>Operators who publish geolocation information are strongly encouraged
      to inform affected users/customers of this fact and of the potential
      privacy-related consequences and trade-offs.</t>
    </section>
    <section numbered="true" toc="default">
      <name>Relation to Other Work</name>
      <t>While not originally done in conjunction with the GEOPRIV
      Working Group <xref
      target="GEOPRIV" format="default"/>, Richard Barnes
      observed that this work 
      is nevertheless consistent with that which the group has defined, both
      for address format and for privacy. The data elements in geolocation
      feeds are equivalent to the following XML structure (<xref
      target="RFC5139" format="default"/> <xref
      target="W3C.REC-xml-20081126" format="default"/>):


      </t> 
<sourcecode type="xml"><![CDATA[
<civicAddress>
  <country>country</country>
  <A1>region</A1>
  <A2>city</A2>
  <PC>postal_code</PC>
</civicAddress>
]]></sourcecode>
      <t>Providing geolocation information to this granularity is equivalent
      to the following privacy policy (the definition of the 'building'
      <xref target="RFC6772" sectionFormat="of" section="6.5.1"/> level of
      disclosure):</t>  
<sourcecode type="xml"><![CDATA[
<ruleset>
  <rule>
    <conditions/>
    <actions/>
    <transformations>
      <provide-location profile="civic-transformation">
        <provide-civic>building</provide-civic>
      </provide-location>
    </transformations>
  </rule>
</ruleset>
]]></sourcecode>
    </section>
    <section anchor="Security" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>As there is no true security in the obscurity of the location of any
      given IP address, self-publication of this data fundamentally opens no
      new attack vectors. For publishers, self-published data may increase
      the ease with which such location data might be exploited (it can, for
      example, make easy the discovery of prefixes populated with customers
      as distinct from prefixes not generally in use).</t>
      <t>For consumers, feed retrieval processes may receive input from
      potentially hostile sources (e.g., in the event of hijacked traffic). As
      such, proper input validation and defense measures <bcp14>MUST</bcp14> be taken (see
      the discussion in <xref target="integrity" format="default"/>).</t>
      <t>Similarly, consumers who do not perform sufficient verification of
      published data bear the same risks as from other forms of geolocation
      configuration errors (see the discussion in Sections <xref
      target="authority" format="counter"/> 
      and <xref target="accuracy" format="counter"/>).</t>
      <t>Validation of a feed's contents includes verifying that the publisher
      is authoritative for the IP prefixes included in the feed. Failure to
      verify IP prefix authority would, for example, allow ISP Bob to make
      geolocation statements about IP space held by ISP Alice. At this time,
      only out-of-band verification methods are implemented (i.e., an ISP's
      feed may be verified against publicly available IP allocation data).
      </t>
    </section>
    <section anchor="future_work" numbered="true" toc="default">
      <name>Planned Future Work</name>
      <t>In order to more flexibly support future extensions, use of a more
      expressive feed format has been suggested. Use of JavaScript Object
      Notation (JSON) <xref target="RFC8259" format="default"/>,
      specifically, has been 
      discussed. However, at the time of writing, no such specification nor
      implementation exists. Nevertheless, work on extensions is deferred
      until a more suitable format has been selected.</t>
      <t>The authors are planning on writing a document describing such a
      new format. This document describes a currently deployed and used
      format. Given the extremely limited extensibility of the present format
      no extensions to it are anticipated. Extensibility requirements are
      instead expected to be integral to the development of a new format.</t>
    </section>
    <section numbered="true" toc="default">
      <name>Finding Self-Published IP Geolocation Feeds</name>
      <t>The issue of finding, and later verifying, geolocation feeds is not
      formally specified in this document. At this time, only ad hoc feed
      discovery and verification has a modicum of established practice (see
      below); discussion of other mechanisms has been removed for clarity.</t>
      <section numbered="true" toc="default">
        <name>Ad Hoc 'Well-Known' URIs</name>
        <t>To date, geolocation feeds have been shared informally in the form
        of HTTPS URIs exchanged in email threads. Three example URIs (<xref
	target="GEO_IETF" format="default"/>, <xref target="GEO_RIPE_NCC"
	format="default"/>, and <xref target="GEO_ICANN" format="default"/>)
	describe networks that change locations periodically, the operators
	and operational practices of which are well known within their
	respective technical communities.</t> 
        <t>The contents of the feeds are verified by a similarly ad hoc
        process, including: </t>
        <ul spacing="normal">
          <li>personal knowledge of the parties involved in the exchange
	  and</li> 
          <li>comparison of feed-advertised prefixes with the BGP-advertised
	  prefixes of Autonomous System Numbers known to be operated by the
	  publishers.</li> 
        </ul>
        <t>Ad hoc mechanisms, while useful for early experimentation by
        producers and consumers, are unlikely to be adequate for long-term,
        widespread use by multiple parties. Future versions of any such
        self-published geolocation feed mechanism <bcp14>SHOULD</bcp14> address scalability
        concerns by defining a means for automated discovery and verification
        of operational authority of advertised prefixes.</t>
      </section>
      <section numbered="true" toc="default">
        <name>Other Mechanisms</name>
        <t>Previous versions of this document referenced use of the
        WHOIS service <xref target="RFC3912" format="default"/> operated by
	Regional Internet Registries (RIRs), as well as
        possible DNS-based schemes to discover and validate geofeeds.
        To the authors' knowledge, support for such mechanisms has never been
        implemented, and this speculative text has been removed to avoid
        ambiguity.</t>
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="ISO.3166.1alpha2" target="http://www.iso.org/iso/home/standards/country_codes/iso-3166-1_decoding_table.htm">
          <front>
            <title>ISO 3166-1 decoding table</title>
            <author>
              <organization>ISO</organization>
            </author>
          </front>
        </reference>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include
	    href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7234.xml"/>

        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3629.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4180.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4291.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4632.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5952.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <reference anchor="ISO.3166.2" target="http://www.iso.org/iso/home/standards/country_codes.htm#2012_iso3166-2">
          <front>
            <title>ISO 3166-2:2007</title>
            <author>
              <organization>ISO</organization>
            </author>
          </front>
        </reference>

<reference anchor="W3C.REC-xml-20081126" target="http://www.w3.org/TR/2008/REC-xml-20081126" quoteTitle="true" derivedAnchor="W3C.REC-xml-20081126">
          <front>
            <title>Extensible Markup Language (XML) 1.0 (Fifth Edition)</title>
            <author initials="T." surname="Bray" fullname="Tim Bray">
              <organization showOnFrontPage="true"/>
            </author>
            <author initials="J." surname="Paoli" fullname="Jean Paoli">
              <organization showOnFrontPage="true"/>
            </author>
            <author initials="M." surname="Sperberg-McQueen" fullname="Michael Sperberg-McQueen">
              <organization showOnFrontPage="true"/>
            </author>
            <author initials="E." surname="Maler" fullname="Eve Maler">
              <organization showOnFrontPage="true"/>
            </author>
            <author initials="F." surname="Yergeau" fullname="François Yergeau">
              <organization showOnFrontPage="true"/>
            </author>
            <date month="November" year="2008"/>
          </front>
          <seriesInfo name="World Wide Web Consortium Recommendation" value="REC-xml-20081126"/>
          <format type="HTML" target="http://www.w3.org/TR/2008/REC-xml-20081126"/>
        </reference>
	
      </references>
      <references>
        <name>Informative References</name>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2818.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3912.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7208.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8259.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5139.xml"/>
        <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6772.xml"/>

        <reference anchor="GEO_IETF" target="https://noc.ietf.org/geo/google.csv">
          <front>
            <title>IETF Meeting Network Geolocation Data</title>
            <author  initials="W." surname="Kumari" fullname="Warren Kumari">
              <organization/>
            </author>
          </front>
        </reference>
        <reference anchor="GEO_RIPE_NCC" target="https://meetings.ripe.net/geo/google.csv">
          <front>
            <title>RIPE NCC Meeting Geolocation Data</title>
            <author fullname="Menno Schepers" initials="M." surname="Schepers">
              <organization abbrev="RIPE NCC">Réseaux IP Européens
            Network Coordination Centre</organization>
            </author>
          </front>
        </reference>
        <reference anchor="GEO_ICANN" target="https://meeting-services.icann.org/geo/google.csv">
          <front>
            <title>ICANN Meeting Geolocation Data</title>
            <author>
              <organization>ICANN</organization>
            </author>
          </front>
        </reference>
        <reference anchor="GEO_Google" target="https://www.gstatic.com/geofeed/corp_external">
          <front>
            <title>Google Corp Geofeed</title>
            <author>
              <organization>Google, LLC</organization>
            </author>
          </front>
        </reference>
        <reference anchor="GEOPRIV" target="http://datatracker.ietf.org/wg/geopriv/">
          <front>
            <title>Geographic Location/Privacy (geopriv)</title>
            <author>
              <organization>IETF</organization>
            </author>
            <date/>
          </front>
        </reference>
        <reference anchor="IPADDR_PY" target="http://code.google.com/p/ipaddr-py/">
          <front>
            <title>Google's Python IP address manipulation library</title>
            <author fullname="Mike Shields" initials="M." surname="Shields">
              <organization abbrev="Google">Google Inc.</organization>
            </author>
            <author fullname="Peter Moody" initials="P." surname="Moody">
              <organization abbrev="Google">Google Inc.</organization>
            </author>
            <date/>
          </front>
        </reference>
	<reference anchor="ISO-GLOSSARY"
		   target="https://www.iso.org/glossary-for-iso-3166.html">
	  <front>
	    <title>Glossary for ISO 3166</title>
	    <author>
	      <organization>ISO</organization>
	    </author>
	    <date/>
	  </front>
	</reference>


      </references>
    </references>
    <section numbered="true" toc="default">
      <name>Sample Python Validation Code</name>
      <t>Included here is a simple format validator in Python for
      self-published ipgeo feeds. This tool reads CSV data in the
      self-published ipgeo feed format from the standard input and performs
      basic validation.  It is intended for use by feed publishers before
      launching a feed. Note that this validator does not verify the
      uniqueness of every IP prefix entry within the feed as a whole but only
      verifies the syntax of each single line from within the feed. A complete
      validator <bcp14>MUST</bcp14> also ensure IP prefix uniqueness.</t>
      <t>The main source file "ipgeo_feed_validator.py" follows. It requires
      use of the open source ipaddr Python library for IP address and CIDR
      parsing and validation <xref target="IPADDR_PY" format="default"/>.</t>
<sourcecode name="" type="python" markers="true"><![CDATA[
#!/usr/bin/python
#
# Copyright (c) 2012 IETF Trust and the persons identified as
# authors of the code.  All rights reserved.  Redistribution and use
# in source and binary forms, with or without modification, is
# permitted pursuant to, and subject to the license terms contained
# in, the Simplified BSD License set forth in Section 4.c of the
# IETF Trust's Legal Provisions Relating to IETF
# Documents (http://trustee.ietf.org/license-info).

"""Simple format validator for self-published ipgeo feeds.

This tool reads CSV data in the self-published ipgeo feed format
from the standard input and performs basic validation.  It is
intended for use by feed publishers before launching a feed.
"""

import csv
import ipaddr
import re
import sys


class IPGeoFeedValidator(object):
  def __init__(self):
    self.prefixes = {}
    self.line_number = 0
    self.output_log = {}
    self.SetOutputStream(sys.stderr)

  def Validate(self, feed):
    """Check validity of an IPGeo feed.

    Args:
      feed: iterable with feed lines
    """

    for line in feed:
      self._ValidateLine(line)

  def SetOutputStream(self, logfile):
    """Controls where the output messages go do (STDERR by default).

    Use None to disable logging.

    Args:
      logfile: a file object (e.g., sys.stdout) or None.
    """
    self.output_stream = logfile

  def CountErrors(self, severity):
    """How many ERRORs or WARNINGs were generated."""
    return len(self.output_log.get(severity, []))

  ############################################################
  def _ValidateLine(self, line):
    line = line.rstrip('\r\n')
    self.line_number += 1
    self.line = line.split('#')[0]
    self.is_correct_line = True

    if self._ShouldIgnoreLine(line):
      return

    fields = [field for field in csv.reader([line])][0]

    self._ValidateFields(fields)
    self._FlushOutputStream()

  def _ShouldIgnoreLine(self, line):
    line = line.strip()
    if line.startswith('#'):
      return True
    return len(line) == 0

  ############################################################
  def _ValidateFields(self, fields):
    assert(len(fields) > 0)

    is_correct = self._IsIPAddressOrPrefixCorrect(fields[0])

    if len(fields) > 1:
      if not self._IsAlpha2CodeCorrect(fields[1]):
        is_correct = False

    if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]):
      is_correct = False

    if len(fields) != 5:
      self._ReportWarning('5 fields were expected (got %d).'
                          % len(fields))

  ############################################################
  def _IsIPAddressOrPrefixCorrect(self, field):
    if '/' in field:
      return self._IsCIDRCorrect(field)
    return self._IsIPAddressCorrect(field)

  def _IsCIDRCorrect(self, cidr):
    try:
      ipprefix = ipaddr.IPNetwork(cidr)
      if ipprefix.network._ip != ipprefix._ip:
        self._ReportError('Incorrect IP Network.')
        return False
      if ipprefix.is_private:
        self._ReportError('IP Address must not be private.')
        return False
    except:
      self._ReportError('Incorrect IP Network.')
      return False
    return True

  def _IsIPAddressCorrect(self, ipaddress):
    try:
      ip = ipaddr.IPAddress(ipaddress)
    except:
      self._ReportError('Incorrect IP Address.')
      return False
    if ip.is_private:
      self._ReportError('IP Address must not be private.')
      return False
    return True

  ############################################################
  def _IsAlpha2CodeCorrect(self, alpha2code):
    if len(alpha2code) == 0:
      return True
    if len(alpha2code) != 2 or not alpha2code.isalpha():
      self._ReportError(
          'Alpha 2 code must be in the ISO 3166-1 alpha 2 format.')
      return False
    return True

  def _IsRegionCodeCorrect(self, region_code):
    if len(region_code) == 0:
      return True
    if '-' not in region_code:
      self._ReportError('Region code must be in ISO 3166-2 format.')
      return False

    parts = region_code.split('-')
    if not self._IsAlpha2CodeCorrect(parts[0]):
      return False
    return True

  ############################################################
  def _ReportError(self, message):
    self._ReportWithSeverity('ERROR', message)

  def _ReportWarning(self, message):
    self._ReportWithSeverity('WARNING', message)

  def _ReportWithSeverity(self, severity, message):
    self.is_correct_line = False
    output_line = '%s: %s\n' % (severity, message)

    if severity not in self.output_log:
      self.output_log[severity] = []
    self.output_log[severity].append(output_line)

    if self.output_stream is not None:
      self.output_stream.write(output_line)

  def _FlushOutputStream(self):
    if self.is_correct_line: return
    if self.output_stream is None: return

    self.output_stream.write('line %d: %s\n\n'
                             % (self.line_number, self.line))


############################################################
def main():
   feed_validator = IPGeoFeedValidator()
   feed_validator.Validate(sys.stdin)

   if feed_validator.CountErrors('ERROR'):
     sys.exit(1)

if __name__ == '__main__':
  main()
]]></sourcecode>
      <t>A unit test file, "ipgeo_feed_validator_test.py" is provided as well.
      It provides basic test coverage of the code above, though does not test
      correct handling of non-ASCII UTF-8 strings.</t>
<sourcecode name="" type="python" markers="true"><![CDATA[
#!/usr/bin/python
#
# Copyright (c) 2012 IETF Trust and the persons identified as
# authors of the code.  All rights reserved.  Redistribution and use
# in source and binary forms, with or without modification, is
# permitted pursuant to, and subject to the license terms contained
# in, the Simplified BSD License set forth in Section 4.c of the
# IETF Trust's Legal Provisions Relating to IETF
# Documents (http://trustee.ietf.org/license-info).

import sys
from ipgeo_feed_validator import IPGeoFeedValidator

class IPGeoFeedValidatorTest(object):
  def __init__(self):
    self.validator = IPGeoFeedValidator()
    self.validator.SetOutputStream(None)
    self.successes = 0
    self.failures = 0

  def Run(self):
    self.TestFeedLine('# asdf', 0, 0)
    self.TestFeedLine('   ', 0, 0)
    self.TestFeedLine('', 0, 0)

    self.TestFeedLine('asdf', 1, 1)
    self.TestFeedLine('asdf,US,,,', 1, 0)
    self.TestFeedLine('aaaa::,US,,,', 0, 0)
    self.TestFeedLine('zzzz::,US', 1, 1)
    self.TestFeedLine(',US,,,', 1, 0)
    self.TestFeedLine('55.66.77', 1, 1)
    self.TestFeedLine('55.66.77.888', 1, 1)
    self.TestFeedLine('55.66.77.asdf', 1, 1)

    self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0)
    self.TestFeedLine('2001:db8:cafe::/48', 0, 1)

    self.TestFeedLine('55.66.77.88,PL', 0, 1)
    self.TestFeedLine('55.66.77.88,PL,,,', 0, 0)
    self.TestFeedLine('55.66.77.88,,,,', 0, 0)
    self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0)
    self.TestFeedLine('55.66.77.88,US,,,', 0, 0)
    self.TestFeedLine('55.66.77.88,USA,,,', 1, 0)
    self.TestFeedLine('55.66.77.88,99,,,', 1, 0)

    self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0)
    self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0)
    self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0)

    self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0)
    self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043',
                      0, 0)
    self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,'
                      '1600 Ampthitheatre Parkway', 0, 1)

    self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0)
    self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0)
    self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0)
    self.TestFeedLine('55.66.77/24,US,,,', 1, 0)
    self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0)

    self.TestFeedLine('172.15.30.1,US,,,', 0, 0)
    self.TestFeedLine('172.28.30.1,US,,,', 1, 0)
    self.TestFeedLine('192.167.100.1,US,,,', 0, 0)
    self.TestFeedLine('192.168.100.1,US,,,', 1, 0)
    self.TestFeedLine('10.0.5.9,US,,,', 1, 0)
    self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0)
    self.TestFeedLine('fc00::/48,PL,,,', 1, 0)
    self.TestFeedLine('fe00::/48,PL,,,', 0, 0)

    print ('%d tests passed, %d failed'
      % (self.successes, self.failures))

  def IsOutputLogCorrectAtSeverity(self, severity,
    expected_msg_count):
    msg_count = self.validator.CountErrors(severity)

    if msg_count != expected_msg_count:
      print ('TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n'
         % (self.validator.line, expected_msg_count, severity,
           msg_count,
          str(self.validator.output_log[severity])))
      return False
    return True

  def IsOutputLogCorrect(self, new_errors, new_warnings):
    retval = True

    if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors):
      retval = False
    if not self.IsOutputLogCorrectAtSeverity('WARNING',
                                             new_warnings):
      retval = False

    return retval

  def TestFeedLine(self, line, warning_count, error_count):
    self.validator.output_log['WARNING'] = []
    self.validator.output_log['ERROR'] = []
    self.validator._ValidateLine(line)

    if not self.IsOutputLogCorrect(warning_count, error_count):
      self.failures += 1
      return False

    self.successes += 1
    return True


if __name__ == '__main__':
  IPGeoFeedValidatorTest().Run()
]]></sourcecode>

    </section>
    <section numbered="false" toc="default">
      <name>Acknowledgements</name>
      <t>The authors would like to express their gratitude to reviewers and
      early implementors, including but not limited to <contact
      fullname="Mikael Abrahamsson"/>, <contact fullname="Andrew Alston"/>,
      <contact fullname="Ray Bellis"/>, <contact fullname="John Bond"/>,
      <contact fullname="Alissa Cooper"/>, <contact fullname="Andras Erdei"/>,
      <contact fullname="Stephen Farrell"/>, <contact fullname="Marco
      Hogewoning"/>, <contact fullname="Mike Joseph"/>, <contact
      fullname="Maciej Kuzniar"/>, <contact fullname="George Michaelson"/>,
      <contact fullname="Menno Schepers"/>, <contact fullname="Justyna
      Sidorska"/>, <contact fullname="Pim van Pelt"/>, and <contact
      fullname="Bjoern A. Zeeb"/>.</t> 
      <t>In particular, <contact fullname="Richard L. Barnes"/> and <contact
      fullname="Andy Newton"/> contributed substantial review,
      text, and advice.</t> 
    </section>
  </back>
</rfc>
