
From nobody Thu Jul  2 17:23:23 2020
Return-Path: <agenda@ietf.org>
X-Original-To: wpack@ietf.org
Delivered-To: wpack@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 68DBA3A0997; Thu,  2 Jul 2020 17:20:38 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: "\"IETF Secretariat\"" <agenda@ietf.org>
To: <wpack-chairs@ietf.org>, <sean+ietf@sn3rd.com>
Cc: superuser@gmail.com, wpack@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 7.7.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <159373563841.30173.4177033308348030769@ietfa.amsl.com>
Date: Thu, 02 Jul 2020 17:20:38 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/AzCosJgV48jGHGvRbOg-6n1Mzys>
Subject: [Wpack] wpack - Requested session has been scheduled for IETF 108
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jul 2020 00:20:51 -0000

Dear Sean Turner,

The session(s) that you have requested have been scheduled.
Below is the scheduled session information followed by
the original request. 


    wpack Session 1 (0:50 requested)
    Friday, 31 July 2020, Session II 1300-1350
    Room Name: Room 3 size: 3
    ---------------------------------------------


iCalendar: https://datatracker.ietf.org/meeting/108/sessions/wpack.ics

Request Information:


---------------------------------------------------------
Working Group Name: Web Packaging
Area Name: Applications and Real-Time Area
Session Requester: Sean Turner


Number of Sessions: 1
Length of Session(s):  50 Minutes
Number of Attendees: None
Conflicts to Avoid: 
 Chair Conflict: add dnsop dprive tls mls quic
 Technology Overlap: httpbis dispatch secdispatch saag acme






People who must be present:
  Sean Turner
  Murray Kucherawy
  David C Lawrence

Resources Requested:

Special Requests:
  
---------------------------------------------------------



From nobody Wed Jul  8 15:47:28 2020
Return-Path: <jyasskin@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 26D013A08AC for <wpack@ietfa.amsl.com>; Wed,  8 Jul 2020 15:47:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.25
X-Spam-Level: 
X-Spam-Status: No, score=-9.25 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wQ58JwGQhzDt for <wpack@ietfa.amsl.com>; Wed,  8 Jul 2020 15:47:24 -0700 (PDT)
Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49D493A08AA for <wpack@ietf.org>; Wed,  8 Jul 2020 15:47:24 -0700 (PDT)
Received: by mail-qt1-x842.google.com with SMTP id b25so281273qto.2 for <wpack@ietf.org>; Wed, 08 Jul 2020 15:47:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gS5EQF6V8iM/Htyz506beZYPRD4kw0gVfR73XJWDCkk=; b=ZD5S4MgBc8l3eD2dODIJyd24OomIkTkiwVPu4xh/fi54XbsoDYUztjC/5PybuCtLzy vfdj1kVhKiOcEiFAGGRgqdFoUd0FLp8mtiJ5AUywrrW0GxJIsNuLcpbqNsmjwCdeEeZ9 2coiztnjdCcIj+n5jsaaERC4DbUJtByGzS3ZA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gS5EQF6V8iM/Htyz506beZYPRD4kw0gVfR73XJWDCkk=; b=QTRDfE5FRJPo+sKOgPhPVEoL7Dle/hR5SVKsM9R4NQWet/5fAN3LJ43XF2eRHTnXwy Fyn47CUtdgrg4uSfaa2NO4mKvGV7v+kP4K+Ym7lmcbX5Nm7gQ7HgvY7gcUOZUnSwx3Fb l/Vi25H0tt+Bt3WVr/hHqyIqSypCacr7h+FKtl7F+mXKEF7yOPQFgkHlpu+HJTh3X7AH WNVEXTBYQAjs+daO6u+wrkNEOr79euc9WPhnC3PKV3gl+9a2PktZjOm9I4Hl3jPZgOJI 0A04kRmjE6xnISbEheZUFZtb5tUPbbjIffttwekOWGzWlKE6AxXbru2FNR/fJAXj5oIy tZ8A==
X-Gm-Message-State: AOAM531zOalBdXHwuEg02pFlUO/MVDsCI+9YwDWJrQ94VSxFjJXCVrq/ /GqW9Js9xSgz8+f4rnk5m4jKzEOK2qmUQxTqvJzzGQ==
X-Google-Smtp-Source: ABdhPJzirn76Bj1UZQg1flSAyaXfU+2SvCT5aP6qLT6ILtUGOtIiJ0e/HehnbwvPm4yPX//LZuxYbLKZv6drOSnAMO0=
X-Received: by 2002:ac8:371c:: with SMTP id o28mr10008803qtb.153.1594248442575;  Wed, 08 Jul 2020 15:47:22 -0700 (PDT)
MIME-Version: 1.0
References: <CANh-dXndPaue3zAADhpc+wyNb8dxs=nVKOAp1n=6SMCKoUe=eQ@mail.gmail.com> <97bcac95-c220-41ae-b957-d93fc57f4a74@www.fastmail.com> <CANh-dXkXnvi+1YK-+CjPaiiN9VhAecLEEjpever7D-gVB-sN0A@mail.gmail.com> <c52ed6da-a7fa-4802-8923-d9782f498daf@www.fastmail.com> <CANh-dXmoWarbW=9wy1=t6rcFh2T8ph7LhhBg5aZ_6auLSYp7-w@mail.gmail.com>
In-Reply-To: <CANh-dXmoWarbW=9wy1=t6rcFh2T8ph7LhhBg5aZ_6auLSYp7-w@mail.gmail.com>
From: Jeffrey Yasskin <jyasskin@chromium.org>
Date: Wed, 8 Jul 2020 15:47:11 -0700
Message-ID: <CANh-dXnBSpOdCEt7s2tb8f441QraXHGpA1QLb-yRzurSj27GEg@mail.gmail.com>
To: Jeffrey Yasskin <jyasskin@chromium.org>
Cc: Martin Thomson <mt@lowentropy.net>, mknodel@cdt.org, WPACK List <wpack@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000001af3ba05a9f5e4e1"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/iEUT-102ThCcoUXnf3OezMIMRig>
Subject: Re: [Wpack] package: URL scheme
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jul 2020 22:47:27 -0000

--0000000000001af3ba05a9f5e4e1
Content-Type: text/plain; charset="UTF-8"

I've put an overall discussion of the new scheme and the plausible
alternatives at
https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md.
The URL encoding variants section
<https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md#url-encoding-variants>
discusses
several of the alternatives that have come up in this thread, along with
their downsides. We can definitely switch to one of them (or another new
option) if folks think the tradeoffs are worth it.

Jeffrey

On Mon, Jun 15, 2020 at 11:16 AM Jeffrey Yasskin <jyasskin@chromium.org>
wrote:

> Replies inline.
>
> On Sun, Jun 14, 2020 at 6:38 PM Martin Thomson <mt@lowentropy.net> wrote:
>
>> On Sat, Jun 13, 2020, at 09:07, Jeffrey Yasskin wrote:
>> > 1. https://foo.example/page.html
>> > 2. https://bar.example/page.html
>> > 3. A resource in the bundle at https://foo.example/bundle.wbn named
>> > https://bar.example/page.html.
>> > 4. A resource in the bundle at https://foo.example/bundle.wbn named
>> > https://quux.example/page.html.
>> > 5. A resource in the bundle at https://foo.example/otherbundle.wbn
>> > named https://bar.example/page.html.
>> >
>> > If we say (1) and (3) get the same origin, it means the result can't
>> > help the Internet Archive serve their pages more safely, and we force
>> > (3), (4), and (5) to have the same origins.
>>
>> Yes, I think that having 1 and 3 being distinct is an important
>> property.  I tend to stop at this point and not try to add too much more,
>> but there you go.
>>
>
> I think distinguishing (1) and (3) requires a new scheme.
>
> > If we say (2) and (3) get the same origin, we break the entire web
>> > origin security model. :-D
>>
>> (1) and (2) more so.
>>
>> Part of what we're trying to do is find a way to bring (2) and (3) closer
>> (and (5) too). That doesn't necessarily need to mean that we treat them
>> identically.  Though we might find a path to doing so.
>>
>
> I think the signing or transfer effort described
> in draft-thomson-wpack-content-origin
> and draft-yasskin-http-origin-signed-responses tries to get (2) and (3)
> closer together, but here I'm focusing on use cases that don't need the
> browser to recognize (2) and (3) as related at all.
>
> > If we say (3) and (4) get the same origin, it means that El Paquete
>> > Semanal couldn't safely put multiple websites into the same bundle
>> > without risking them stepping on each other's storage. It could put
>> > them in separate files, but then they'd have trouble linking to each
>> > other.
>>
>> What cannot be solved with one level of abstraction might be solved with
>> two.
>>
>> If the distinctive characteristic of a bundle is that it creates a new
>> origin, then a bundle that contains other bundles can be used to isolate
>> content from different origins.  The El Paquete Semanal as a whole could be
>> assembled from multiple bundles from different "real" origins.
>>
>> As you say, that complicates the process of inter-origin references.  Say
>> the El Paquete Semanal bundle was served by non-determined means, then you
>> can't rely on having a shared identifier for the outer bundle, unless you
>> have some means of minting one.  (This is a case where you can't refer to
>> the outer bundle from the inner one using content identification, for
>> reasons that I hope are obvious.)  So you mint one (a signing key might
>> suffice) and then you can refer from one inner bundle to another inner
>> bundle via that identifier.
>>
>> The reference chain might become convoluted, but I don't see a need to
>> constrain URL design to fit that use case.  As long as references are
>> possible, then we can work on making it more usable.  If you have
>> siteA.example and siteB.example served from bundle.example from
>> distributor.example, then my hope would be that your inter-site references
>> are of the form:
>>
>> <something>://siteX.example/<a hint that mentions bundle.example and
>> maybe distributor.example, but the latter might be best left implied>
>>
>> I say that because while siteA and siteB might expect to be served from
>> the same meta-bundle, there shouldn't need to be a strong binding to that
>> situation.  If one is and the other not, then I would hope that the
>> identifiers would support that without requiring strong changes.
>>
>
> There's a downside that this still requires rewriting all the cross-origin
> links inside the documents, but as El Paquete Semanal is already rewriting
> links to put everything on a filesystem, maybe that's not a big problem.
>
> I'm curious what you think the hint could look like. You've sketched it in
> the path area, but these URLs also have paths, so we'd need something to
> separate the two parts of the path in the same way my first attempt
> separated two parts of the authority.
>
> > If we say (3) and (5) get the same origin, it means that if an archive
>> > stores multiple versions of the same website, but those versions use
>> > storage differently, users couldn't easily try more than one version.
>>
>> I think that this is a key question.  My assertion is that maintaining a
>> clean abstraction for bundles is important and that would dictate having
>> (3) and (5) distinct.  At least initially.  Mechanisms that allow transfer
>> of state or content between origins might allow an upgrade to occur from
>> (3) to (5).
>>
>> > Abandoning some of these use cases definitely makes the URL design
>> > easier, if there's consensus to go that direction.
>> >
>> > However, if we want to keep the use cases, and we want to put the
>> > bundle's server in the authority position of the URL, we get something
>> > like Larry's suggestion:
>> > pkg+
>> https://foo.example/bundle.wbn?query#https://bar.example/page.html?q=query%23fragment.
>> To give (3)-(5) distinct origins, the origin algorithm <
>> https://url.spec.whatwg.org/#origin> for pkg+https needs to take the
>> fragment into account, returning something like ("pkg+https",
>> "foo.example/bundle.wbn?query#https://bar.example", null, null). This
>> design also makes it possible to resolve relative URLs relative to a
>> pkg+https:// base URL, and it gives what's probably the wrong answer,
>> moving relative to the bundle instead of the active subresource. That's
>> probably ok: links inside a bundle need to explicitly search the bundle so
>> that absolute references search the bundle first.
>>
>> Rather than try to work the identity of the serving entity into the
>> origin, it might be better to look to the limits of the origin concept.
>> We're seeing browsers move more toward placing origins within a greater
>> context.  Double-keying storage by top-level browsing context shows us how
>> maybe the origin isn't able to capture the entire context.  The same might
>> apply here.  Maybe it is important that this bundle originally came from
>> foo.example, but it might not be important to the concept of the origin
>> model.
>>
>
> This is an interesting point: we could use the new storage keys
> <https://storage.spec.whatwg.org/#storage-keys> being created by
> https://github.com/privacycg/storage-partitioning to declare that an
> environment settings object with a bundle attached uses a storage key
> derived from both the bundle's URL and the subresource's URL, instead of
> trying to encapsulate both into a single origin.
>
> Relative references within a bundle should be simple and possible, but I
>> fear that expecting fragments to support that is likely to run afoul of all
>> sorts of existing expectations.  Using fragments for bundles is likely good
>> if you don't want to mint a new URI scheme, but if you are in the business
>> of defining a new scheme, then go for it.
>>
>> If you are defining a new media type, then the fragment can do whatever
>> you like.  With no new URI scheme needed:
>>
>> https://foo.example/bundle.wbn?query#<use whatever you feel like, but
>> don't expect internal references to work>
>>
>> Similarly, once you are defining a new URI scheme, then you have a lot of
>> options available.  The URI standard is perhaps unnecessarily narrow in its
>> definition of authority, but there's a lot of room for creative
>> interpretation: a registered name doesn't need to be domain name, and there
>> are no real mechanism that ensure that it is "registered" (whatever that
>> means).
>>
>> pkg://<the authority for the above bundle, which might not be
>> bar.example>/page.html?q=query%23fragment
>>
>
> This is how I got
> to package:https:,,distributor.example,package.wbn;q=query$https:,,publisher.example/page.html?q=query
> (or package://...). It claims that the "authority" for a subresource inside
> a bundle consists of both the bundle's location and the subresource's
> origin. arcp:// did the same, but without the subresource's origin, which
> would have the effect of unifying (3) and (4).
>
> The definition of origin necessarily becomes more flexible.  Rather than
>> insist on the reduction to a simple tuple for an origin, we should regard
>> these having a same-origin comparison operation, which sometimes allow two
>> origins to be regarded the same, even with vastly different inputs.  The
>> reduction to a tuple is useful, but potentially quite confining.
>>
>> If pkg://<bar> and https://<foo> happen to be same origin because we
>> decide that is useful (and safe), then we should be free to define that as
>> we choose, within the constraints of the origin tuple or not.
>>
>> Despite it being quite clearly invalid for domain name and port, this is
>> completely valid URI if we so choose: <pkg://$:1000000/>. That doesn't mean
>> that it couldn't be same-origin with <https://example.com/>, but it
>> might be a little tricky to reduce to an origin tuple.
>>
>
> I don't mind diverging from the origin-tuple concept, but so far I haven't
> found a need to.
>
> Thanks,
> Jeffrey
>

--0000000000001af3ba05a9f5e4e1
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I&#39;ve put an overall discussion of the new scheme and t=
he plausible alternatives at=C2=A0<a href=3D"https://github.com/WICG/webpac=
kage/blob/master/explainers/bundle-urls-and-origins.md">https://github.com/=
WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md</a>. The =
<a href=3D"https://github.com/WICG/webpackage/blob/master/explainers/bundle=
-urls-and-origins.md#url-encoding-variants">URL encoding variants section</=
a>=C2=A0discusses several of the alternatives that have come up in this thr=
ead, along with their downsides. We can definitely switch to one of them (o=
r another new option) if folks think the tradeoffs are worth it.<div><br></=
div><div>Jeffrey</div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr"=
 class=3D"gmail_attr">On Mon, Jun 15, 2020 at 11:16 AM Jeffrey Yasskin &lt;=
<a href=3D"mailto:jyasskin@chromium.org">jyasskin@chromium.org</a>&gt; wrot=
e:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"l=
tr"><div>Replies inline.=C2=A0</div><br><div class=3D"gmail_quote"><div dir=
=3D"ltr" class=3D"gmail_attr">On Sun, Jun 14, 2020 at 6:38 PM Martin Thomso=
n &lt;<a href=3D"mailto:mt@lowentropy.net" target=3D"_blank">mt@lowentropy.=
net</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1e=
x">On Sat, Jun 13, 2020, at 09:07, Jeffrey Yasskin wrote:<br>
&gt; 1. <a href=3D"https://foo.example/page.html" rel=3D"noreferrer" target=
=3D"_blank">https://foo.example/page.html</a><br>
&gt; 2. <a href=3D"https://bar.example/page.html" rel=3D"noreferrer" target=
=3D"_blank">https://bar.example/page.html</a><br>
&gt; 3. A resource in the bundle at <a href=3D"https://foo.example/bundle.w=
bn" rel=3D"noreferrer" target=3D"_blank">https://foo.example/bundle.wbn</a>=
 named <br>
&gt; <a href=3D"https://bar.example/page.html" rel=3D"noreferrer" target=3D=
"_blank">https://bar.example/page.html</a>.<br>
&gt; 4. A resource in the bundle at <a href=3D"https://foo.example/bundle.w=
bn" rel=3D"noreferrer" target=3D"_blank">https://foo.example/bundle.wbn</a>=
 named <br>
&gt; <a href=3D"https://quux.example/page.html" rel=3D"noreferrer" target=
=3D"_blank">https://quux.example/page.html</a>.<br>
&gt; 5. A resource in the bundle at <a href=3D"https://foo.example/otherbun=
dle.wbn" rel=3D"noreferrer" target=3D"_blank">https://foo.example/otherbund=
le.wbn</a> <br>
&gt; named <a href=3D"https://bar.example/page.html" rel=3D"noreferrer" tar=
get=3D"_blank">https://bar.example/page.html</a>.<br>
&gt; <br>
&gt; If we say (1) and (3) get the same origin, it means the result can&#39=
;t <br>
&gt; help the Internet Archive serve their pages more safely, and we force =
<br>
&gt; (3), (4), and (5) to have the same origins.<br>
<br>
Yes, I think that having 1 and 3 being distinct is an important property.=
=C2=A0 I tend to stop at this point and not try to add too much more, but t=
here you go.<br></blockquote><div><br></div><div>I think distinguishing (1)=
 and (3) requires a new scheme.</div><div><br></div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,=
204,204);padding-left:1ex">
&gt; If we say (2) and (3) get the same origin, we break the entire web <br=
>
&gt; origin security model. :-D<br>
<br>
(1) and (2) more so.<br>
<br>
Part of what we&#39;re trying to do is find a way to bring (2) and (3) clos=
er (and (5) too). That doesn&#39;t necessarily need to mean that we treat t=
hem identically.=C2=A0 Though we might find a path to doing so.<br></blockq=
uote><div><br></div><div>I think the signing or transfer effort described i=
n=C2=A0draft-thomson-wpack-content-origin and=C2=A0draft-yasskin-http-origi=
n-signed-responses tries to get (2) and (3) closer together, but here I&#39=
;m focusing on use cases that don&#39;t need the browser to recognize (2) a=
nd (3) as related at all.</div><div><br></div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,20=
4);padding-left:1ex">
&gt; If we say (3) and (4) get the same origin, it means that El Paquete <b=
r>
&gt; Semanal couldn&#39;t safely put multiple websites into the same bundle=
 <br>
&gt; without risking them stepping on each other&#39;s storage. It could pu=
t <br>
&gt; them in separate files, but then they&#39;d have trouble linking to ea=
ch <br>
&gt; other.<br>
<br>
What cannot be solved with one level of abstraction might be solved with tw=
o.<br>
<br>
If the distinctive characteristic of a bundle is that it creates a new orig=
in, then a bundle that contains other bundles can be used to isolate conten=
t from different origins.=C2=A0 The El Paquete Semanal as a whole could be =
assembled from multiple bundles from different &quot;real&quot; origins.<br=
>
<br>
As you say, that complicates the process of inter-origin references.=C2=A0 =
Say the El Paquete Semanal bundle was served by non-determined means, then =
you can&#39;t rely on having a shared identifier for the outer bundle, unle=
ss you have some means of minting one.=C2=A0 (This is a case where you can&=
#39;t refer to the outer bundle from the inner one using content identifica=
tion, for reasons that I hope are obvious.)=C2=A0 So you mint one (a signin=
g key might suffice) and then you can refer from one inner bundle to anothe=
r inner bundle via that identifier.<br>
<br>
The reference chain might become convoluted, but I don&#39;t see a need to =
constrain URL design to fit that use case.=C2=A0 As long as references are =
possible, then we can work on making it more usable.=C2=A0 If you have site=
A.example and siteB.example served from bundle.example from distributor.exa=
mple, then my hope would be that your inter-site references are of the form=
:<br>
<br>
&lt;something&gt;://siteX.example/&lt;a hint that mentions bundle.example a=
nd maybe distributor.example, but the latter might be best left implied&gt;=
<br>
<br>
I say that because while siteA and siteB might expect to be served from the=
 same meta-bundle, there shouldn&#39;t need to be a strong binding to that =
situation.=C2=A0 If one is and the other not, then I would hope that the id=
entifiers would support that without requiring strong changes.<br></blockqu=
ote><div><br></div><div>There&#39;s a downside that this still requires rew=
riting all the cross-origin links inside the documents, but as El Paquete S=
emanal is already rewriting links to put everything on a filesystem, maybe =
that&#39;s not a big problem.</div><div><br></div><div>I&#39;m curious what=
 you think the hint could look like. You&#39;ve sketched it in the path are=
a, but these URLs also have paths, so we&#39;d need something to separate t=
he two parts of the path in the same way my first attempt separated two par=
ts of the authority.=C2=A0</div><div><br></div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex">
&gt; If we say (3) and (5) get the same origin, it means that if an archive=
 <br>
&gt; stores multiple versions of the same website, but those versions use <=
br>
&gt; storage differently, users couldn&#39;t easily try more than one versi=
on.<br>
<br>
I think that this is a key question.=C2=A0 My assertion is that maintaining=
 a clean abstraction for bundles is important and that would dictate having=
 (3) and (5) distinct.=C2=A0 At least initially.=C2=A0 Mechanisms that allo=
w transfer of state or content between origins might allow an upgrade to oc=
cur from (3) to (5).<br>
<br>
&gt; Abandoning some of these use cases definitely makes the URL design <br=
>
&gt; easier, if there&#39;s consensus to go that direction.<br>
&gt; <br>
&gt; However, if we want to keep the use cases, and we want to put the <br>
&gt; bundle&#39;s server in the authority position of the URL, we get somet=
hing <br>
&gt; like Larry&#39;s suggestion: <br>
&gt; pkg+<a href=3D"https://foo.example/bundle.wbn?query#https://bar.exampl=
e/page.html?q=3Dquery%23fragment" rel=3D"noreferrer" target=3D"_blank">http=
s://foo.example/bundle.wbn?query#https://bar.example/page.html?q=3Dquery%23=
fragment</a>. To give (3)-(5) distinct origins, the origin algorithm &lt;<a=
 href=3D"https://url.spec.whatwg.org/#origin" rel=3D"noreferrer" target=3D"=
_blank">https://url.spec.whatwg.org/#origin</a>&gt; for pkg+https needs to =
take the fragment into account, returning something like (&quot;pkg+https&q=
uot;, &quot;foo.example/bundle.wbn?query#<a href=3D"https://bar.example" re=
l=3D"noreferrer" target=3D"_blank">https://bar.example</a>&quot;, null, nul=
l). This design also makes it possible to resolve relative URLs relative to=
 a pkg+https:// base URL, and it gives what&#39;s probably the wrong answer=
, moving relative to the bundle instead of the active subresource. That&#39=
;s probably ok: links inside a bundle need to explicitly search the bundle =
so that absolute references search the bundle first.<br>
<br>
Rather than try to work the identity of the serving entity into the origin,=
 it might be better to look to the limits of the origin concept.=C2=A0 We&#=
39;re seeing browsers move more toward placing origins within a greater con=
text.=C2=A0 Double-keying storage by top-level browsing context shows us ho=
w maybe the origin isn&#39;t able to capture the entire context.=C2=A0 The =
same might apply here.=C2=A0 Maybe it is important that this bundle origina=
lly came from foo.example, but it might not be important to the concept of =
the origin model.<br></blockquote><div><br></div><div>This is an interestin=
g point: we could use the new <a href=3D"https://storage.spec.whatwg.org/#s=
torage-keys" target=3D"_blank">storage keys</a>=C2=A0being created by=C2=A0=
<a href=3D"https://github.com/privacycg/storage-partitioning" target=3D"_bl=
ank">https://github.com/privacycg/storage-partitioning</a>=C2=A0to declare =
that an environment settings object with a bundle attached uses a storage k=
ey derived from both the bundle&#39;s URL and the subresource&#39;s=C2=A0UR=
L, instead of trying to encapsulate both into a single origin.</div><div><b=
r></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Relative references within a bundle should be simple and possible, but I fe=
ar that expecting fragments to support that is likely to run afoul of all s=
orts of existing expectations.=C2=A0 Using fragments for bundles is likely =
good if you don&#39;t want to mint a new URI scheme, but if you are in the =
business of defining a new scheme, then go for it.<br>
<br>
If you are defining a new media type, then the fragment can do whatever you=
 like.=C2=A0 With no new URI scheme needed:<br>
<br>
<a href=3D"https://foo.example/bundle.wbn?query#" rel=3D"noreferrer" target=
=3D"_blank">https://foo.example/bundle.wbn?query#</a>&lt;use whatever you f=
eel like, but don&#39;t expect internal references to work&gt;<br>
<br>
Similarly, once you are defining a new URI scheme, then you have a lot of o=
ptions available.=C2=A0 The URI standard is perhaps unnecessarily narrow in=
 its definition of authority, but there&#39;s a lot of room for creative in=
terpretation: a registered name doesn&#39;t need to be domain name, and the=
re are no real mechanism that ensure that it is &quot;registered&quot; (wha=
tever that means).<br>
<br>
pkg://&lt;the authority for the above bundle, which might not be bar.exampl=
e&gt;/page.html?q=3Dquery%23fragment<br></blockquote><div><br></div><div>Th=
is is how I got to=C2=A0package:https:,,distributor.example,package.wbn;q=
=3Dquery$https:,,publisher.example/page.html?q=3Dquery (or package://...). =
It claims that the &quot;authority&quot; for a subresource inside a bundle =
consists of both the bundle&#39;s location and the subresource&#39;s origin=
. arcp:// did the same, but without the subresource&#39;s origin, which wou=
ld have the effect of unifying (3) and (4).</div><div><br></div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so=
lid rgb(204,204,204);padding-left:1ex">
The definition of origin necessarily becomes more flexible.=C2=A0 Rather th=
an insist on the reduction to a simple tuple for an origin, we should regar=
d these having a same-origin comparison operation, which sometimes allow tw=
o origins to be regarded the same, even with vastly different inputs.=C2=A0=
 The reduction to a tuple is useful, but potentially quite confining.<br>
<br>
If pkg://&lt;bar&gt; and https://&lt;foo&gt; happen to be same origin becau=
se we decide that is useful (and safe), then we should be free to define th=
at as we choose, within the constraints of the origin tuple or not.<br>
<br>
Despite it being quite clearly invalid for domain name and port, this is co=
mpletely valid URI if we so choose: &lt;pkg://$:1000000/&gt;. That doesn&#3=
9;t mean that it couldn&#39;t be same-origin with &lt;<a href=3D"https://ex=
ample.com/" rel=3D"noreferrer" target=3D"_blank">https://example.com/</a>&g=
t;, but it might be a little tricky to reduce to an origin tuple.<br></bloc=
kquote><div><br></div><div>I don&#39;t mind diverging from the origin-tuple=
 concept, but so far I haven&#39;t found a need to.</div><div><br></div><di=
v>Thanks,</div><div>Jeffrey=C2=A0</div></div></div>
</blockquote></div>

--0000000000001af3ba05a9f5e4e1--


From nobody Wed Jul  8 17:55:36 2020
Return-Path: <masinter@gmail.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 711F13A0AD1 for <wpack@ietfa.amsl.com>; Wed,  8 Jul 2020 17:55:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.498
X-Spam-Level: 
X-Spam-Status: No, score=-1.498 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LlhaEDN1cmPe for <wpack@ietfa.amsl.com>; Wed,  8 Jul 2020 17:55:26 -0700 (PDT)
Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 257353A0B5E for <wpack@ietf.org>; Wed,  8 Jul 2020 17:55:24 -0700 (PDT)
Received: by mail-pf1-x42e.google.com with SMTP id m9so257549pfh.0 for <wpack@ietf.org>; Wed, 08 Jul 2020 17:55:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;  h=sender:from:to:subject:date:message-id:mime-version :content-language:thread-index; bh=uylhsBXQ4JD674w56T/j+qU3C7syI4k0n2ei+1dS2ig=; b=L/IkewwRyqazB8ZbZtJjAumQt2LdU7EV1fsckmpwPfJP5chbCAZ1QnkJc6KFOdHPAH WYjAcSltqpc67lDe24xj6Y2hmo4Zbny2p86rZlPYQEcO8fxLhi/SZ94/khlNCSCggawD sgZ7yYCT/rW9B0VEOj4dLXH0IK3+K4HfJ5HMWKop2bItsuQyKckUD2i5TsI6fdcWOHMv yw9knRPRtmhAIsxuUMzd54kle6Rb9rBA3EsOlfqYHMAi6v4uvCbvegmoeo1asRuzLHT2 oZYS5hn+qvg8ZAy+96uwzaxgH39eYMjp6TMr2Y7mws6UqGGyGCqSmd+1fOKWFkmlZSSE aW9w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :mime-version:content-language:thread-index; bh=uylhsBXQ4JD674w56T/j+qU3C7syI4k0n2ei+1dS2ig=; b=s/s96QqP2m/q0Hgv3mlYQvfSuvom84j6iMEf5JBl4aI1LsWNWp8Fs6bUZrDDxg2bI7 GH/38E++bGX4lZBl2U5WT/DaExy9cTXOXWU3sLMArMkJfoVHXlaOp98T5nHsLzBKwD5C prtsFKrGp+E97BAS2t7i+O1ouu+IV+PRg/Li2bfQtMZqsHx7Hke+Z5ilsiPItdkVc/sJ bwJabBLMm+UgZSPOLvQoYu3mVHpKv41wWoh1ukWzMtkynkx418ao5iLlii9qHkbnMjGf CR5gDutH4q8IWOxLAo6H6zca0coawHmwWsbY0H8/1cIVTNuubaDsbZjt4riKKOE+jovs wpCg==
X-Gm-Message-State: AOAM533yfPOmAXtf8cADJDdCDCy3CoQHbkPZAsduZpOaM+lHf+V9yAhx F4hWjH9ynWL4gMIMk8Vlv8IFHcpY
X-Google-Smtp-Source: ABdhPJy+9EK42edvHxDtzQU8xZ/TkUGxEzhptj1GMVVz1cENwdVUo4g+uC1v76geENq1XHwXsUWMCg==
X-Received: by 2002:aa7:9303:: with SMTP id 3mr46532866pfj.108.1594256123095;  Wed, 08 Jul 2020 17:55:23 -0700 (PDT)
Received: from TVPC (c-67-169-101-78.hsd1.ca.comcast.net. [67.169.101.78]) by smtp.gmail.com with ESMTPSA id j16sm858792pgb.33.2020.07.08.17.55.21 for <wpack@ietf.org> (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Jul 2020 17:55:21 -0700 (PDT)
Sender: Larry Masinter <masinter@gmail.com>
From: Larry Masinter <LMM@acm.org>
X-Google-Original-From: "Larry Masinter" <lmm@acm.org>
To: "'WPACK List'" <wpack@ietf.org>
Date: Wed, 8 Jul 2020 17:55:21 -0700
Message-ID: <027801d6558b$9f9deab0$ded9c010$@acm.org>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0279_01D65550.F33F87E0"
X-Mailer: Microsoft Outlook 16.0
Content-Language: en-us
Thread-Index: AdZVhKUZCoH/GxJwSlW0r6t9t4EayQ==
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/DxUnQxsNrmaTkrrbRjV1-1Es1lE>
Subject: [Wpack] Fragment-based URL scheme
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jul 2020 00:55:34 -0000

This is a multipart message in MIME format.

------=_NextPart_000_0279_01D65550.F33F87E0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

I don't understand the cases where this doesn't meet the goal.

 

"the fragment might sometimes get dropped."  when? By what process?

"some code might only be looking at the host" This is some existing code?
Something web packaging would work with?

  

And the advantage seem to be enough to take a harder look

 

*	being able to use other "kinds of things with subresources, like
zip, tar, 7z, etc. files" 
gives you instant deployment  " (including PDF)
*	works with nested containers to any depth
*	You don't need to deploy a new URL scheme seem compelling

--

https://LarryMasinter.net <https://larrymasinter.net/>
https://going-remote.info <https://going-remote.info/> 

 


------=_NextPart_000_0279_01D65550.F33F87E0
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta =
http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii"><meta name=3DGenerator content=3D"Microsoft Word 15 =
(filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:Wingdings;
	panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:#0563C1;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:#954F72;
	text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
	{mso-style-priority:34;
	margin-top:0in;
	margin-right:0in;
	margin-bottom:0in;
	margin-left:.5in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri",sans-serif;
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri",sans-serif;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
/* List Definitions */
@list l0
	{mso-list-id:1699307352;
	mso-list-type:hybrid;
	mso-list-template-ids:-516371100 127834206 67698691 67698693 67698689 =
67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
	{mso-level-start-at:0;
	mso-level-number-format:bullet;
	mso-level-text:\F0B7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Symbol;
	mso-fareast-font-family:Calibri;
	mso-bidi-font-family:"Times New Roman";}
@list l0:level2
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:"Courier New";}
@list l0:level3
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Wingdings;}
@list l0:level4
	{mso-level-number-format:bullet;
	mso-level-text:\F0B7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Symbol;}
@list l0:level5
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:"Courier New";}
@list l0:level6
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Wingdings;}
@list l0:level7
	{mso-level-number-format:bullet;
	mso-level-text:\F0B7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Symbol;}
@list l0:level8
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:"Courier New";}
@list l0:level9
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Wingdings;}
ol
	{margin-bottom:0in;}
ul
	{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-US =
link=3D"#0563C1" vlink=3D"#954F72"><div class=3DWordSection1><p =
class=3DMsoNormal> I don&#8217;t understand the cases where this =
doesn&#8217;t meet the goal.<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>&#8220;the =
fragment might sometimes get dropped&#8230;&#8221; &nbsp;when? By what =
process?<o:p></o:p></p><p class=3DMsoNormal>&#8220;some code might only =
be looking at the host&#8221; This is some existing code? Something web =
packaging would work with?<o:p></o:p></p><p class=3DMsoNormal>&nbsp; =
<o:p></o:p></p><p class=3DMsoNormal>And the advantage seem to be enough =
to take a harder look<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><ul style=3D'margin-top:0in' =
type=3Ddisc><li class=3DMsoListParagraph =
style=3D'margin-left:0in;mso-list:l0 level1 lfo1'>being able to use =
other &#8220;kinds of things with subresources, like zip, tar, 7z, etc. =
files&#8221; <br>gives you instant deployment &nbsp;&#8220; (including =
PDF)<o:p></o:p></li><li class=3DMsoListParagraph =
style=3D'margin-left:0in;mso-list:l0 level1 lfo1'> works with nested =
containers to any depth<o:p></o:p></li><li class=3DMsoListParagraph =
style=3D'margin-left:0in;mso-list:l0 level1 lfo1'>You don&#8217;t need =
to deploy a new URL scheme seem compelling<o:p></o:p></li></ul><p =
class=3DMsoNormal><span =
style=3D'font-size:10.0pt'>--<o:p></o:p></span></p><p =
class=3DMsoNormal><span style=3D'font-size:10.0pt'><a =
href=3D"https://larrymasinter.net/">https://LarryMasinter.net</a> <a =
href=3D"https://going-remote.info/">https://going-remote.info</a><o:p></o=
:p></span></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p></div></body></html>
------=_NextPart_000_0279_01D65550.F33F87E0--


From nobody Thu Jul  9 13:32:49 2020
Return-Path: <jyasskin@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E1353A0831 for <wpack@ietfa.amsl.com>; Thu,  9 Jul 2020 13:32:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.25
X-Spam-Level: 
X-Spam-Status: No, score=-9.25 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PjkBubTx9jPd for <wpack@ietfa.amsl.com>; Thu,  9 Jul 2020 13:32:45 -0700 (PDT)
Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 61F2F3A082F for <wpack@ietf.org>; Thu,  9 Jul 2020 13:32:45 -0700 (PDT)
Received: by mail-qv1-xf35.google.com with SMTP id dm12so1572753qvb.9 for <wpack@ietf.org>; Thu, 09 Jul 2020 13:32:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PqJR2y//+8XMwL3Tepzi2mL7AxqGDbRAoNU40JL1diE=; b=aBMQA80bmCBY1uHvHBUfEpYQ8LhANran0CQ4/2U40gxjrdo+XYW9aBAy1fGME54fcX nwxNJFhqGBaOzpSAw50SxFIEabGQjLaSUWrXFVcXV4VzBnb8mKguNXV33JM0NE/bxu5P Yr6JWRDKopUE93Go7NL1Q8NRFj227zAAYcIhI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PqJR2y//+8XMwL3Tepzi2mL7AxqGDbRAoNU40JL1diE=; b=BJTdS7kZdjB48bCHiaPXGZ82vnSRb5N0c5ECxz6sPEJg3lkAiZngCsRYQIBQL/tJFE uXhR6hfAVJWQLaiMAMA8s8vvSYXmL3WwdjWzvIzylg/Ax+t1CL8GNpRC3rNSSeIBCmpz TriIMRvsPoh3kIfduuyDJmusYS3uqVkYESuhnvWKRMb6bvrmm0thhhdkAghWjMhBdby+ tz6D3rd1zXZoJY+jIpa986DQoMa0hUbL2URh7h5/00AJXXaBnYbHcPkJ1AX6mCpI9Wo8 /xBL+pZH8EWlglQBtyacmRJnsmZsUXYcG9/F+tfTp3sAfLcMHCxC/euvtauxDsxM09Xd N2+Q==
X-Gm-Message-State: AOAM532VlgeFV9lzz5UKYFpZUkaoZ2Ppa5nZTB64tW+w2CWQou+qkKGQ tHqveb242NXup3KBpCq2vG3cpO3uPi9nS5LwOELv4Kjas0BuSg==
X-Google-Smtp-Source: ABdhPJztBhXuzDAsVpA+aQVJOsPZ8gBw/DNSPKQa3MySKQSk+2tDabrff0RMDdod6mTH+tXfVzONCefqriQdbNPGL8w=
X-Received: by 2002:ad4:4f09:: with SMTP id fb9mr56304500qvb.20.1594326764090;  Thu, 09 Jul 2020 13:32:44 -0700 (PDT)
MIME-Version: 1.0
References: <027801d6558b$9f9deab0$ded9c010$@acm.org>
In-Reply-To: <027801d6558b$9f9deab0$ded9c010$@acm.org>
From: Jeffrey Yasskin <jyasskin@chromium.org>
Date: Thu, 9 Jul 2020 13:32:32 -0700
Message-ID: <CANh-dXnP9g_y6kvgkDz+58X9JHtc_KS2+6e2HA0UMV3xueh2qg@mail.gmail.com>
To: Larry Masinter <LMM@acm.org>
Cc: WPACK List <wpack@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000006e724f05aa08204d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/dfpPaDbxAiZndtbAPbxX_Vns75s>
Subject: Re: [Wpack] Fragment-based URL scheme
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jul 2020 20:32:47 -0000

--0000000000006e724f05aa08204d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Thanks for the counter-proposal. I'm going to argue here in order to make
sure that we've fully understood the pros and cons of the fragment-based
scheme, not because I'm firmly opposed to it.

https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-o=
rigins.md#fragment-based-url-scheme
describes downsides of:

1. =E2=80=9Cthe fragment might sometimes get dropped=E2=80=A6=E2=80=9D: I b=
elieve this would
primarily be by code inside browsers that hasn't ever needed to pay
attention to the fragment when making security decisions. I'm not certain
that this code exists.
2. =E2=80=9Csome code might only be looking at the host=E2=80=9D: Here I'm =
more worried
about server-side or Javascript code that receives this new kind of URL in
an Origin header or the 'event.sender' field of a postMessage(). If they
parse the URL but don't check the scheme, their bug might become more
exploitable with a fragment-based scheme.

I think the three advantages you mention aren't actually differences
between a fragment-based and an encoding-based scheme. Specifically:

A. It's straightforward to define that `package:bundle$content` looks up
the `content` part in a way that depends on the mime type of the `bundle`,
and I think we should do that no matter what URL format we pick. For PDF,
we could define that as a mapping to the existing fragment format.
B. Because the `bundle` and `content` parts of a `package:bundle$content`
URL are themselves encoded URLs, they can also be `package:` URLs, which
allows nested containers of any depth. The fragment format might win on
readability here, like it does for non-nested containers.
C. As
https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-o=
rigins.md#fragment-based-url-scheme
mentions, it looks difficult to say that `
https://distributor.example/package.wbn#url=3Dcontent` uses different stora=
ge
than `https://distributor.example/index.html`. I think we'd need to
define `pkg+https://distributor.example/package.wbn#url=3Dcontent` as you
suggested
<https://mailarchive.ietf.org/arch/msg/wpack/XkQ8OSlGn18xsVNtSXvoD7s0FIU/>
last month, at which point we need to deploy a new URL scheme anyway.

Jeffrey

On Wed, Jul 8, 2020 at 5:55 PM Larry Masinter <LMM@acm.org> wrote:

> I don=E2=80=99t understand the cases where this doesn=E2=80=99t meet the =
goal.
>
>
>
> =E2=80=9Cthe fragment might sometimes get dropped=E2=80=A6=E2=80=9D  when=
? By what process?
>
> =E2=80=9Csome code might only be looking at the host=E2=80=9D This is som=
e existing code?
> Something web packaging would work with?
>
>
>
> And the advantage seem to be enough to take a harder look
>
>
>
>    - being able to use other =E2=80=9Ckinds of things with subresources, =
like
>    zip, tar, 7z, etc. files=E2=80=9D
>    gives you instant deployment  =E2=80=9C (including PDF)
>    - works with nested containers to any depth
>    - You don=E2=80=99t need to deploy a new URL scheme seem compelling
>
> --
>
> https://LarryMasinter.net <https://larrymasinter.net/>
> https://going-remote.info
>
>
> _______________________________________________
> Wpack mailing list
> Wpack@ietf.org
> https://www.ietf.org/mailman/listinfo/wpack
>

--0000000000006e724f05aa08204d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Thanks for the counter-proposal. I&#39;m going to arg=
ue here in order to make sure that we&#39;ve fully understood the pros and =
cons of the fragment-based scheme, not because I&#39;m firmly opposed to it=
.</div><div dir=3D"ltr"><br></div><div dir=3D"ltr"><a href=3D"https://githu=
b.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md#fra=
gment-based-url-scheme">https://github.com/WICG/webpackage/blob/master/expl=
ainers/bundle-urls-and-origins.md#fragment-based-url-scheme</a> describes d=
ownsides of:<br></div><div dir=3D"ltr"><br></div><div dir=3D"ltr">1. =E2=80=
=9Cthe fragment might sometimes get dropped=E2=80=A6=E2=80=9D: I believe th=
is would primarily be by code inside browsers that hasn&#39;t ever needed t=
o pay attention to the fragment when making security decisions. I&#39;m not=
 certain that this code exists.<br></div><div dir=3D"ltr">2.=C2=A0=E2=80=9C=
some code might only be looking at the host=E2=80=9D: Here I&#39;m more wor=
ried about server-side or Javascript code that receives this new kind of UR=
L in an Origin header or the &#39;event.sender&#39; field of a postMessage(=
). If they parse the URL but don&#39;t check the scheme, their bug might be=
come more exploitable with a fragment-based scheme.</div><div dir=3D"ltr"><=
br></div><div dir=3D"ltr">I think the three advantages you mention aren&#39=
;t actually differences between a fragment-based and an encoding-based sche=
me. Specifically:</div><div dir=3D"ltr"><br></div><div>A. It&#39;s straight=
forward to define that `package:bundle$content` looks up the `content` part=
 in a way that depends on the mime type of the `bundle`, and I think we sho=
uld do that no matter what URL format we pick. For PDF, we could define tha=
t as a mapping to the existing fragment format.=C2=A0</div><div>B. Because =
the `bundle` and `content` parts of a `package:bundle$content` URL are them=
selves encoded URLs, they can also be `package:` URLs, which allows nested =
containers of any depth. The fragment format might win on readability here,=
 like it does for non-nested containers.</div><div>C. As=C2=A0<a href=3D"ht=
tps://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-ori=
gins.md#fragment-based-url-scheme">https://github.com/WICG/webpackage/blob/=
master/explainers/bundle-urls-and-origins.md#fragment-based-url-scheme</a> =
mentions, it looks difficult to say that `<a href=3D"https://distributor.ex=
ample/package.wbn#url=3Dcontent`">https://distributor.example/package.wbn#u=
rl=3Dcontent`</a> uses different storage than=C2=A0`<a href=3D"https://dist=
ributor.example/index.html`">https://distributor.example/index.html`</a>. I=
 think we&#39;d need to define=C2=A0`pkg+<a href=3D"https://distributor.exa=
mple/package.wbn#url=3Dcontent`">https://distributor.example/package.wbn#ur=
l=3Dcontent`</a> as you <a href=3D"https://mailarchive.ietf.org/arch/msg/wp=
ack/XkQ8OSlGn18xsVNtSXvoD7s0FIU/">suggested</a> last month, at which point =
we need to deploy a new URL scheme anyway.</div><div><br></div><div>Jeffrey=
</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">On Wed, Jul 8, 2020 at 5:=
55 PM Larry Masinter &lt;<a href=3D"mailto:LMM@acm.org">LMM@acm.org</a>&gt;=
 wrote:<br></div><div class=3D"gmail_quote"><blockquote class=3D"gmail_quot=
e" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204)=
;padding-left:1ex"><div lang=3D"EN-US"><div><p class=3D"MsoNormal"> I don=
=E2=80=99t understand the cases where this doesn=E2=80=99t meet the goal.<u=
></u><u></u></p><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=3D"=
MsoNormal">=E2=80=9Cthe fragment might sometimes get dropped=E2=80=A6=E2=80=
=9D =C2=A0when? By what process?<u></u><u></u></p><p class=3D"MsoNormal">=
=E2=80=9Csome code might only be looking at the host=E2=80=9D This is some =
existing code? Something web packaging would work with?<u></u><u></u></p><p=
 class=3D"MsoNormal">=C2=A0 <u></u><u></u></p><p class=3D"MsoNormal">And th=
e advantage seem to be enough to take a harder look<u></u><u></u></p><p cla=
ss=3D"MsoNormal"><u></u>=C2=A0<u></u></p><ul style=3D"margin-top:0in" type=
=3D"disc"><li style=3D"margin-left:0in">being able to use other =E2=80=9Cki=
nds of things with subresources, like zip, tar, 7z, etc. files=E2=80=9D <br=
>gives you instant deployment =C2=A0=E2=80=9C (including PDF)<u></u><u></u>=
</li><li style=3D"margin-left:0in"> works with nested containers to any dep=
th<u></u><u></u></li><li style=3D"margin-left:0in">You don=E2=80=99t need t=
o deploy a new URL scheme seem compelling<u></u><u></u></li></ul><p class=
=3D"MsoNormal"><span style=3D"font-size:10pt">--<u></u><u></u></span></p><p=
 class=3D"MsoNormal"><span style=3D"font-size:10pt"><a href=3D"https://larr=
ymasinter.net/" target=3D"_blank">https://LarryMasinter.net</a> <a href=3D"=
https://going-remote.info/" target=3D"_blank">https://going-remote.info</a>=
<u></u><u></u></span></p><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></d=
iv></div>_______________________________________________<br>
Wpack mailing list<br>
<a href=3D"mailto:Wpack@ietf.org" target=3D"_blank">Wpack@ietf.org</a><br>
<a href=3D"https://www.ietf.org/mailman/listinfo/wpack" rel=3D"noreferrer" =
target=3D"_blank">https://www.ietf.org/mailman/listinfo/wpack</a><br>
</blockquote></div></div>

--0000000000006e724f05aa08204d--


From nobody Fri Jul 10 15:30:25 2020
Return-Path: <ikreymer@gmail.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D87E33A098C for <wpack@ietfa.amsl.com>; Fri, 10 Jul 2020 15:30:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level: 
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gJm_CBPmx586 for <wpack@ietfa.amsl.com>; Fri, 10 Jul 2020 15:30:21 -0700 (PDT)
Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D73A13A0962 for <wpack@ietf.org>; Fri, 10 Jul 2020 15:30:20 -0700 (PDT)
Received: by mail-ej1-x62d.google.com with SMTP id rk21so7673859ejb.2 for <wpack@ietf.org>; Fri, 10 Jul 2020 15:30:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;  h=mime-version:from:date:message-id:subject:to; bh=jGWPuT2WEaxJ6T9Z4dJA9nbizWSNyT70N/H1JVn8yAw=; b=q0+VxcWRV3/bWa3OCV4/b93xmqWQ4m0WA7O4S9De37hECwZIqIeJTuLOzMiDmj6rFg vP3cQ9CXx4970FiHkrkePUgytI6v3TCE3hafZeISIO6NNULu3W9Uicgj3xpqlqAeJ24m BgFIGeeQCBhm13asJQZ3gNwCl+lguXXhsPLdC+D9XLg+qoZayMJoN6AlkODmVFP4/z42 /qKq1rVAGBPLOLDJAw21afYNzf8UwFUyBQ7WNbzeH37ofa9g2IMcWljl5hOGXrljnOO2 wopjnuaqOYDjwFhcdwREqd+u5w3LDjicYInGcUDw+eDTjNpiIla7/Bzk24pSeMoIrc7f +KMw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=jGWPuT2WEaxJ6T9Z4dJA9nbizWSNyT70N/H1JVn8yAw=; b=BkufG37vpdnHJr4MeNf9g57XzPeTL4Eo42KaRaZJKwxF89OfBqD1lCJqnT6ioFdJ6v b29g2tKCnd3l/sPgx55m4FMmPb4lGsnOweBFC99M9jIOLqw8qKValikG1joeK48cS53o szWBgFpCHhs9yP8GF/mWmT5b+Fig0fVqAlz2p1d0g91qUH+2Yae+R6PyB9Zr6XMrnVGz hi4VM8E2LdG9Q4OAJHtf5UVVNV6vySwqstbe/1gLWcRdB+xmKM06Ic9QHvq43rq1PDng vHO0Ejs7Il56FRN5xZnh1gpIXa/6HSm8U1vf0hjOE0u5LHkCqRxDyl9fs1XtI9Uu9rPW rO+w==
X-Gm-Message-State: AOAM532XLvST4u03AjMNcWmizhZZNZgAos/TwXnFh4b5mIjqq0jHtNsL G+V0g99+9QGWCNeZzddfgPwVwI3CpzM3DOg9jkReMOct
X-Google-Smtp-Source: ABdhPJyMQDM9t5e7MDmfC9IXk9OR5Fg0Y7Jgzhh0oOl1NPLWDiXKJxcBw4Mnnedv9cwLvhw2tMLVSOgoiQzlH/AqJcA=
X-Received: by 2002:a17:906:3acd:: with SMTP id z13mr45999010ejd.69.1594420218921;  Fri, 10 Jul 2020 15:30:18 -0700 (PDT)
MIME-Version: 1.0
From: Ilya Kreymer <ikreymer@gmail.com>
Date: Fri, 10 Jul 2020 15:30:08 -0700
Message-ID: <CANAUx6juQjKmJZpj+_gzmz6i+SRK3wYDW0g0zmCr7DY2kXKqyA@mail.gmail.com>
To: wpack@ietf.org
Content-Type: multipart/alternative; boundary="000000000000c5866d05aa1de2cc"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/0uRzQkJeOXDv8ieaLJOZAlg3dTI>
Subject: [Wpack] Web Archives, Replaying Web Pages and WPACK
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Jul 2020 22:30:24 -0000

--000000000000c5866d05aa1de2cc
Content-Type: text/plain; charset="UTF-8"

Hi,

I wanted to reach out again to the webpack group, as I believe I am working
on solving some of the same problems as wpack, but from the perspective of
web archiving. It would be great if there was some way to collaborate with
this group, though I am struggling to understand how that could be done.

The overarching goal I believe seems to be the same: to replay HTTP network
traffic in a way that recreates an authentic representation of a website,
and to have a way to verify that the traffic was not forged. It seems a
'web archive' and a 'bundled http exchange' are fundamentally describing
the same type of object, with perhaps different storage requirements and
use cases.

I wanted to share a system, called https://replayweb.page/ which can replay
HTTP network traffic stored in a variety of formats directly in the
browser, using existing web standards, particularly Service Workers,
Fetch and IndexedDB (for caching).

Here are a few examples, which replay bundled HTTP traffic, and can even be
embedded in other pages as iframes:
https://webrecorder.net/embed-demo-1.html - replaying smaller
archives/bundles
https://webrecorder.net/embed-demo-2.html - replaying from a 17GB
archive/bundle
https://webrecorder.net/embed-demo-3.html - replaying more complex web
sites, including one with 3d viewer

These examples are all isolated and rendered independent of each other:
through Javascript rewriting and injection, the original Origin of the page
is emulated so that the site behaves as it is running on its initial
origin. This allows for replaying of complex, interactive web pages, though
is not perfect.

As I come from the web archiving community, I've focused mostly on WARC
format, as that is an existing ISO standard and widely in use, and the
system also supports replaying from HAR and the web bundles created via the
WBN tool.

However, the WARC format alone is a bit limiting, and there seems to
be a misunderstanding
about WARC
<https://github.com/WICG/webpackage/blob/fc9b3e75309546c805b5cdb1db74b2d58a8e0b28/explainers/navigation-to-unsigned-bundles.md#warc>:
It provides random access to HTTP traffic, but does not contain a built in
index necessary for random access (it is assumed the index is maintained
separately). To work around this, I've created a new 'bundling format', a
'bespoke zip format', which can contain WARCs (and other types of data,
even .wbn bundles), along with other metadata, and a compressed index.
This ZIP-based format is explained here:
https://github.com/webrecorder/web-archive-collection-format

Since the ZIP format allows for random access (see: ZipInfo
<https://github.com/Rob--W/zipinfo.js/>), it is possible to load all
bundled data on-demand via range requests. This allows the format to scale
to tens and probably hundreds of GBs.

The system also supports referencing URLs via query params in the fragment,
for example:
https://replayweb.page/?source=/examples/netpreserve-twitter.warc#view=replay&url=https%3A%2F%2Ftwitter.com%2Fnetpreserve&ts=20190603053135
loads
the WARC file, then loads the specified URL from the archive/bundle.

I wanted to share all of this to see if there's perhaps some way to align
with the work you're doing here, though I must admit it is not easy to
understand if that is possible or of interest to this group. Again, from my
perspective, it seems like you're working on a very similar problem,
attempting to standardize this at the browser level, but perhaps for
different use cases.

One area I'm especially interested in is verification for Saving a Bundle
in the Browser
<https://github.com/WICG/webpackage/blob/2a78f2930a228ee6872630ecb023fa71151cc164/draft-yasskin-wpack-use-cases.md#save-and-share-a-web-page-snapshot>
.
Unfortunately, it seems that this use case is currently out of scope. I am
especially interested in building tools to solve this problem, so that (to
use this example) Casey can save the page in their browser, share it with
Dakota, and that *Dakota can verify that this is what Casey saw in their
browser*, and it was not forged. I think being able to site a web bundle
from a client's perspective would be extremely useful for archival,
fact-checking, sharing, etc.. use cases and could make the web more
trustworthy.

Please let me know if there is any interest in collaborating, or if these
existing tools could somehow help this spec move forward.

(If anyone is interested, the replayweb.page tool can be found on github
at: https://github.com/webrecorder/replayweb.page (UI frontend) and
https://github.com/webrecorder/wabac.js (service worker backend)

Thank you,
Ilya
webrecorder.net

--000000000000c5866d05aa1de2cc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div>I wanted to reach out again to the =
webpack group, as I believe I am working on solving some of the same proble=
ms as wpack, but from the perspective of web archiving. It would be great i=
f there was some way to collaborate with this group, though I am struggling=
 to understand how that could be done.</div><div><br></div><div>The overarc=
hing goal I believe seems to be the same: to replay HTTP network traffic in=
 a way that recreates an authentic representation of a website, and to have=
 a way to verify that the traffic was not forged. It seems a &#39;web archi=
ve&#39; and a &#39;bundled http exchange&#39; are fundamentally describing =
the same type of object, with perhaps different storage requirements and us=
e cases.</div><div><br></div><div>I wanted to share a system, called <a hre=
f=3D"https://replayweb.page/" target=3D"_blank">https://replayweb.page/</a>=
=C2=A0which can replay HTTP network traffic stored in a variety of formats =
directly in the browser, using existing web standards, particularly=C2=A0Se=
rvice Workers, Fetch=C2=A0and IndexedDB (for caching).</div><div><br></div>=
<div>Here are a few examples, which replay bundled HTTP traffic, and can ev=
en be embedded in other pages as iframes:</div><div><a href=3D"https://webr=
ecorder.net/embed-demo-1.html" target=3D"_blank">https://webrecorder.net/em=
bed-demo-1.html</a>=C2=A0- replaying smaller archives/bundles<br></div><div=
><a href=3D"https://webrecorder.net/embed-demo-2.html" target=3D"_blank">ht=
tps://webrecorder.net/embed-demo-2.html</a>=C2=A0- replaying from a 17GB ar=
chive/bundle<br></div><div><a href=3D"https://webrecorder.net/embed-demo-3.=
html" target=3D"_blank">https://webrecorder.net/embed-demo-3.html</a>=C2=A0=
- replaying more complex web sites, including one with 3d viewer<br></div><=
div><br></div><div>These examples are all isolated and rendered independent=
 of each other: through Javascript rewriting and injection, the original Or=
igin of the page is emulated so that the site behaves as it is running on i=
ts initial origin. This allows for replaying of complex, interactive web pa=
ges, though is not perfect.</div><div><br></div><div>As I come from the web=
 archiving community, I&#39;ve focused mostly on WARC format, as that is an=
 existing ISO standard and widely in use, and the system also supports repl=
aying from HAR and the web bundles created via the WBN tool.</div><div><br>=
</div><div>However, the WARC format alone is a bit limiting, and there seem=
s to be a <a href=3D"https://github.com/WICG/webpackage/blob/fc9b3e75309546=
c805b5cdb1db74b2d58a8e0b28/explainers/navigation-to-unsigned-bundles.md#war=
c" target=3D"_blank">misunderstanding about WARC</a>: It provides random ac=
cess to HTTP traffic, but does not contain a built in index necessary for r=
andom access (it is assumed the index is maintained separately). To work ar=
ound this, I&#39;ve created a new &#39;bundling format&#39;, a &#39;bespoke=
 zip format&#39;, which can contain WARCs (and other types of data, even .w=
bn bundles), along with other metadata, and a compressed index.</div><div>T=
his ZIP-based format is explained here:=C2=A0<a href=3D"https://github.com/=
webrecorder/web-archive-collection-format" target=3D"_blank">https://github=
.com/webrecorder/web-archive-collection-format</a></div><div><br></div><div=
>Since the ZIP format allows for random access (see: <a href=3D"https://git=
hub.com/Rob--W/zipinfo.js/" target=3D"_blank">ZipInfo</a>), it is possible =
to load all bundled data on-demand via range requests. This allows the form=
at to scale to tens and probably hundreds of GBs.</div><div><br></div><div>=
The system also supports referencing URLs via query params in the fragment,=
 for example:=C2=A0<a href=3D"https://replayweb.page/?source=3D/examples/ne=
tpreserve-twitter.warc#view=3Dreplay&amp;url=3Dhttps%3A%2F%2Ftwitter.com%2F=
netpreserve&amp;ts=3D20190603053135" target=3D"_blank">https://replayweb.pa=
ge/?source=3D/examples/netpreserve-twitter.warc#view=3Dreplay&amp;url=3Dhtt=
ps%3A%2F%2Ftwitter.com%2Fnetpreserve&amp;ts=3D20190603053135</a>=C2=A0loads=
 the WARC file, then loads the specified URL from the archive/bundle.</div>=
<div><br></div><div>I wanted to share all of this to see if there&#39;s per=
haps some way to align with the work you&#39;re doing here, though I must a=
dmit it is not easy to understand if that is possible or of interest to thi=
s group. Again, from my perspective, it seems like you&#39;re working on a =
very similar problem, attempting to standardize this at the browser level, =
but perhaps for different use cases.</div><div><br></div><div>One area I&#3=
9;m especially interested in is verification for=C2=A0<a href=3D"https://gi=
thub.com/WICG/webpackage/blob/2a78f2930a228ee6872630ecb023fa71151cc164/draf=
t-yasskin-wpack-use-cases.md#save-and-share-a-web-page-snapshot" target=3D"=
_blank">Saving a Bundle in the Browser</a>.</div><div>Unfortunately, it see=
ms that this use case is currently out of scope. I am especially interested=
 in building tools to solve this problem, so that (to use this example) Cas=
ey can save the page in their browser, share it with Dakota, and that *Dako=
ta can verify that this is what Casey saw in their browser*, and it was not=
 forged. I think being able to site a web bundle from a client&#39;s perspe=
ctive would be extremely useful for archival, fact-checking, sharing, etc..=
 use cases and could make the web more trustworthy.</div><div><br></div><di=
v>Please let me know if there is any interest in collaborating, or if these=
 existing tools could somehow help this spec move forward.</div><div><br></=
div><div>(If anyone is interested, the <a href=3D"http://replayweb.page" ta=
rget=3D"_blank">replayweb.page</a> tool can be found on github at:=C2=A0<a =
href=3D"https://github.com/webrecorder/replayweb.page" target=3D"_blank">ht=
tps://github.com/webrecorder/replayweb.page</a>=C2=A0(UI frontend) and=C2=
=A0<a href=3D"https://github.com/webrecorder/wabac.js" target=3D"_blank">ht=
tps://github.com/webrecorder/wabac.js</a>=C2=A0(service worker backend)</di=
v><div><br></div><div>Thank you,</div><div>Ilya</div><div><a href=3D"http:/=
/webrecorder.net" target=3D"_blank">webrecorder.net</a></div><div><br></div=
><div><br></div><div><br></div><div><br></div><div><br></div></div>

--000000000000c5866d05aa1de2cc--


From nobody Sun Jul 12 15:48:53 2020
Return-Path: <masinter@gmail.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A24D03A0955 for <wpack@ietfa.amsl.com>; Sun, 12 Jul 2020 15:48:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.096
X-Spam-Level: 
X-Spam-Status: No, score=-0.096 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uPDIyHgn8sAC for <wpack@ietfa.amsl.com>; Sun, 12 Jul 2020 15:48:50 -0700 (PDT)
Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8FFD43A0954 for <wpack@ietf.org>; Sun, 12 Jul 2020 15:48:50 -0700 (PDT)
Received: by mail-pf1-x430.google.com with SMTP id t11so5112935pfq.11 for <wpack@ietf.org>; Sun, 12 Jul 2020 15:48:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;  h=sender:from:to:references:in-reply-to:subject:date:message-id :mime-version:thread-index:content-language; bh=axyZNXkqEjKRR5TkZrZ4EeM2RiZdZN51OVLEsrXlpSA=; b=XiwQzx1df/usKw77qozsbJvXBWb2sKzJcb9xfJ888/9R95ceIlaWgospKT2r/sZ9Pn vtqx2THrb6OmzpE1Qte2hq1xwx240BNVW1t/l0psJ9JjUQRbu8DTTt0O+RprRm4IFl+S rk5WVsDI69nGhigv3BFSnKoR/OTLZ5cQigGU17j4IKU6GqAC3PwXAThrEFdiXY+PR7BR kz1yEcKdI+FbfBIrA39kyOCmsSLaVLreS7vtUrvCjQsij1QA1dbu74Z7Ib2EKyRXRxxl AldU0Wqn86w/YqmmN7fK7s7hRlV4lN2XDX0ZGgb/Jj3MGcE9pYfsTmELyXFlNrQNgfU/ WmIA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:references:in-reply-to:subject :date:message-id:mime-version:thread-index:content-language; bh=axyZNXkqEjKRR5TkZrZ4EeM2RiZdZN51OVLEsrXlpSA=; b=MoEuGFUzlhjQZI2LbO3uPZbp+kzeKfDPtYLvIo+PzZ1u3LLGICSmesZqKSRtUeN3Fu 3eKQhXlQ3aaVwhDTBi3K3LGwFAD6MCRCx+ZuWtP9mWHHo763or4Qi93Kp+fs+Rd2Bx1+ ZCepqElUH0tICDor+6D/CT0oB1p2Hv1eZaLHnLg4xwO5EMwUyK/Ul4cR5T9eRmAGFZ/O GRlwlRSmwMhBfGCMB5UJboOHPthb0LwOy5egwEcMJy2NEqqweA4yE3VpfgbbXsgrFNAc nGQafp2bxJD5vjESrMBhDKwxcjrRSxtUVO12tzlKoTNtvPX91iFGYgAYVo6L30JzcMxB pGkg==
X-Gm-Message-State: AOAM533YJD5OjKcfmfO+k4BOl7C3qFE6+SU7ppNTTNDh0jPNa2cgQMxv KsC2D1Y71LHZMGTivEjIbBQ=
X-Google-Smtp-Source: ABdhPJxNMKKyHngU9F9BhD4ioYsje3SLZSWLAf7PD4GNHTQO9VqhTgYyiFFn1THEhjsiWJAeDaHCWA==
X-Received: by 2002:a63:a361:: with SMTP id v33mr63519902pgn.101.1594594129846;  Sun, 12 Jul 2020 15:48:49 -0700 (PDT)
Received: from TVPC (c-67-169-101-78.hsd1.ca.comcast.net. [67.169.101.78]) by smtp.gmail.com with ESMTPSA id 199sm11850147pgc.79.2020.07.12.15.48.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 12 Jul 2020 15:48:48 -0700 (PDT)
Sender: Larry Masinter <masinter@gmail.com>
From: Larry Masinter <LMM@acm.org>
X-Google-Original-From: "Larry Masinter" <lmm@acm.org>
To: "'Ilya Kreymer'" <ikreymer@gmail.com>, <wpack@ietf.org>
References: <CANAUx6juQjKmJZpj+_gzmz6i+SRK3wYDW0g0zmCr7DY2kXKqyA@mail.gmail.com>
In-Reply-To: <CANAUx6juQjKmJZpj+_gzmz6i+SRK3wYDW0g0zmCr7DY2kXKqyA@mail.gmail.com>
Date: Sun, 12 Jul 2020 15:48:47 -0700
Message-ID: <023301d6589e$9b4ead30$d1ec0790$@acm.org>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0234_01D65863.EEF02350"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQH5nmeyykLDa275OehSS5Sk/OneUKi91zIQ
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/jbTrReTQnC1hQJJNMmegYGvsXvM>
Subject: Re: [Wpack] Web Archives, Replaying Web Pages and WPACK
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Jul 2020 22:48:53 -0000

This is a multipart message in MIME format.

------=_NextPart_000_0234_01D65863.EEF02350
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Most of the replay of HTTP is irrelevant to the archive use case because =
there is no point in trying to reach out to original servers long after =
HTTP/n is obsolete. Mostly it=E2=80=99s a privacy threat to record =
irrelevant transaction metadta.=20

=20

Instead you need to define a layer (like PDF/A did for paged documents, =
which preserves the meaning of the original experience without =
necessarily being able to enter in new data and have it recompute. For =
example, there is no good way to archive an empty chat room and preserve =
the experience of saying something new.

=20

=20

The archive use case needs a different security model from the online =
same origin policy.

The model used in PDF is pretty simple:

=20

Intra-package links are trusted. Links from inside the package to out =
require user verification (once for that package).


------=_NextPart_000_0234_01D65863.EEF02350
Content-Type: text/html;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta =
http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8"><meta =
name=3DGenerator content=3D"Microsoft Word 15 (filtered =
medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
	{mso-style-name:msonormal;
	mso-margin-top-alt:auto;
	margin-right:0in;
	mso-margin-bottom-alt:auto;
	margin-left:0in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri",sans-serif;
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;
	font-family:"Calibri",sans-serif;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-US link=3Dblue =
vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal =
style=3D'margin-left:5.25pt'><span style=3D'font-size:10.0pt'>Most of =
the replay of HTTP is irrelevant to the archive use case because there =
is no point in trying to reach out to original servers long after HTTP/n =
is obsolete. Mostly it=E2=80=99s a privacy threat to record irrelevant =
transaction metadta. <o:p></o:p></span></p><p class=3DMsoNormal =
style=3D'margin-left:5.25pt'><span =
style=3D'font-size:10.0pt'><o:p>&nbsp;</o:p></span></p><p =
class=3DMsoNormal style=3D'margin-left:5.25pt'><span =
style=3D'font-size:10.0pt'>Instead you need to define a layer (like =
PDF/A did for paged documents, which preserves the meaning of the =
original experience without necessarily being able to enter in new data =
and have it recompute. For example, there is no good way to archive an =
empty chat room and preserve the experience of saying something =
new.<o:p></o:p></span></p><p class=3DMsoNormal =
style=3D'margin-left:5.25pt'><span =
style=3D'font-size:10.0pt'><o:p>&nbsp;</o:p></span></p><p =
class=3DMsoNormal><span =
style=3D'font-size:10.0pt'><o:p>&nbsp;</o:p></span></p><div><p =
class=3DMsoNormal>The archive use case needs a different security model =
from the online same origin policy.<o:p></o:p></p><p =
class=3DMsoNormal>The model used in PDF is pretty =
simple:<o:p></o:p></p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p =
class=3DMsoNormal>Intra-package links are trusted. Links from inside the =
package to out require user verification (once for that =
package).<o:p></o:p></p><p class=3DMsoNormal> =
<o:p></o:p></p></div></div></body></html>
------=_NextPart_000_0234_01D65863.EEF02350--


From nobody Sun Jul 12 20:06:27 2020
Return-Path: <ehs@pobox.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 192B23A0880 for <wpack@ietfa.amsl.com>; Sun, 12 Jul 2020 20:06:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.38
X-Spam-Level: **
X-Spam-Status: No, score=2.38 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=0.1, MISSING_MIMEOLE=1.843, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=pobox.com; domainkeys=pass (1024-bit key) header.from=ehs@pobox.com header.d=pobox.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aFmt7UQiO1-Y for <wpack@ietfa.amsl.com>; Sun, 12 Jul 2020 20:06:25 -0700 (PDT)
Received: from pb-smtp21.pobox.com (pb-smtp21.pobox.com [173.228.157.53]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3F0203A087D for <wpack@ietf.org>; Sun, 12 Jul 2020 20:06:25 -0700 (PDT)
Received: from pb-smtp21.pobox.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id A45B3E6BAD; Sun, 12 Jul 2020 23:06:24 -0400 (EDT) (envelope-from ehs@pobox.com)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date:subject :message-id:in-reply-to:from:to:cc:mime-version:content-type :content-transfer-encoding; s=sasl; bh=Ko3umyGs0DbcqDrnJ9HZWQ0jf wc=; b=IDQwI+R1nHr0g786EykDWwEwKIt++PDyLnrNYqHDz2Lp17tIxPWNz2nmC tfp5qjQZYabAWsTx7no0SQF2m4LT5knxd+5aPPuv//KsDcl5j+pbwIEl+g8Db6f1 vsal3KXCw/eadZA6Aqpm/s2Pz2bzre2bT8wK7J8lkg+gjDy4Ss=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:subject :message-id:in-reply-to:from:to:cc:mime-version:content-type :content-transfer-encoding; q=dns; s=sasl; b=jfHvyujAMMer3MLFIPW 3BWYQWxqVilWEfppbLm0xgnUEA+xbzcNZwOL6d4wLk+kqsazQh0mSvk0kfjiLI9P gVTsZAHQqUzKHjwHaf3zuyNyeJVbUBnlW8TE1XXbdFZ0b9lKUCzz9jEoPgYHqxq+ J5aIspKOz3W8ltJqjDloe6Gk=
Received: from pb-smtp21.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 8FCB9E6BAC; Sun, 12 Jul 2020 23:06:24 -0400 (EDT) (envelope-from ehs@pobox.com)
Received: from [192.168.1.228] (unknown [173.79.71.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pb-smtp21.pobox.com (Postfix) with ESMTPSA id 8F5FCE6BAB; Sun, 12 Jul 2020 23:06:21 -0400 (EDT) (envelope-from ehs@pobox.com)
Date: Sun, 12 Jul 2020 23:06:16 -0400
Message-ID: <a67020c2-6b6e-4d00-9a4f-dc4f83cf3f89@email.android.com>
X-Android-Message-ID: <a67020c2-6b6e-4d00-9a4f-dc4f83cf3f89@email.android.com>
In-Reply-To: <023301d6589e$9b4ead30$d1ec0790$@acm.org>
From: ehs@pobox.com
To: Larry Masinter <LMM@acm.org>
Cc: 'Ilya Kreymer' <ikreymer@gmail.com>, wpack@ietf.org
Importance: Normal
X-Priority: 3
X-MSMail-Priority: Normal
MIME-Version: 1.0
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64
X-Pobox-Relay-ID: D464EDCC-C4B5-11EA-A60B-843F439F7C89-07615111!pb-smtp21.pobox.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/JIHH-AoUvtmYlwYtLr4ZwLH-Fd4>
Subject: Re: [Wpack] Web Archives, Replaying Web Pages and WPACK
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jul 2020 03:06:26 -0000

PGRpdiBkaXI9J2F1dG8nPjxkaXY+PGRpdiBjbGFzcz0iZ21haWxfZXh0cmEiPjxkaXYgY2xhc3M9
ImdtYWlsX3F1b3RlIj5PbiBKdWwgMTIsIDIwMjAgNjo0OCBQTSwgTGFycnkgTWFzaW50ZXIgJmx0
O0xNTUBhY20ub3JnJmd0OyB3cm90ZTo8YnIgdHlwZT0iYXR0cmlidXRpb24iPjxibG9ja3F1b3Rl
IGNsYXNzPSJxdW90ZSIgc3R5bGU9Im1hcmdpbjowIDAgMCAuOGV4O2JvcmRlci1sZWZ0OjFweCAj
Y2NjIHNvbGlkO3BhZGRpbmctbGVmdDoxZXgiPjxkaXY+PGRpdj48cCBzdHlsZT0ibWFyZ2luLWxl
ZnQ6NS4yNXB0Ij48c3BhbiBzdHlsZT0iZm9udC1zaXplOjEwcHQiPk1vc3Qgb2YgdGhlIHJlcGxh
eSBvZiBIVFRQIGlzIGlycmVsZXZhbnQgdG8gdGhlIGFyY2hpdmUgdXNlIGNhc2UgYmVjYXVzZSB0
aGVyZSBpcyBubyBwb2ludCBpbiB0cnlpbmcgdG8gcmVhY2ggb3V0IHRvIG9yaWdpbmFsIHNlcnZl
cnMgbG9uZyBhZnRlciBIVFRQL24gaXMgb2Jzb2xldGUuIE1vc3RseSBpdOKAmXMgYSBwcml2YWN5
IHRocmVhdCB0byByZWNvcmQgaXJyZWxldmFudCB0cmFuc2FjdGlvbiBtZXRhZHRhLjwvc3Bhbj48
L3A+PC9kaXY+PC9kaXY+PC9ibG9ja3F1b3RlPjwvZGl2PjwvZGl2PjwvZGl2PjxkaXYgZGlyPSJh
dXRvIj5JcyB0aGVyZSBhIHN1Y2NpbmN0IGRlc2NyaXB0aW9uIG9mIHRoZSAiYXJjaGl2ZSB1c2Ug
Y2FzZSI/IFByb3ZlbmFuY2UgaXMgdXN1YWxseSBwcmV0dHkgaW1wb3J0YW50IGZvciBhcmNoaXZl
cywgYW5kIHRoZSBIVFRQIHRyYW5zYWN0aW9uIGlzIHF1aXRlIGltcG9ydGFudCBpbmZvcm1hdGlv
biBpbiB0aGF0IHJlZ2FyZC48YnI+PC9kaXY+PGRpdiBkaXI9ImF1dG8iPjxkaXYgZGlyPSJhdXRv
Ij48YnI+PC9kaXY+PGRpdiBkaXI9ImF1dG8iPi8vRWQ8L2Rpdj48L2Rpdj48ZGl2IGRpcj0iYXV0
byI+PGRpdiBjbGFzcz0iZ21haWxfZXh0cmEiPjxkaXYgY2xhc3M9ImdtYWlsX3F1b3RlIj48Ymxv
Y2txdW90ZSBjbGFzcz0icXVvdGUiIHN0eWxlPSJtYXJnaW46MCAwIDAgLjhleDtib3JkZXItbGVm
dDoxcHggI2NjYyBzb2xpZDtwYWRkaW5nLWxlZnQ6MWV4Ij48ZGl2PjxkaXY+PGRpdj48cD4gPC9w
PjwvZGl2PjwvZGl2PjwvZGl2PjwvYmxvY2txdW90ZT48L2Rpdj48YnI+PC9kaXY+PC9kaXY+PC9k
aXY+


From nobody Sun Jul 12 21:24:56 2020
Return-Path: <ikreymer@gmail.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D1D3A3A0CC9 for <wpack@ietfa.amsl.com>; Sun, 12 Jul 2020 21:24:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.197
X-Spam-Level: 
X-Spam-Status: No, score=-0.197 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ROFJ849pOlIT for <wpack@ietfa.amsl.com>; Sun, 12 Jul 2020 21:24:53 -0700 (PDT)
Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F0BDB3A0CC8 for <wpack@ietf.org>; Sun, 12 Jul 2020 21:24:52 -0700 (PDT)
Received: by mail-ed1-x533.google.com with SMTP id a1so5866901edt.10 for <wpack@ietf.org>; Sun, 12 Jul 2020 21:24:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;  h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=O2gRGeJLae2jMLREqrEFt3WckFCD5oNumxE+2K+ie6Q=; b=ZBR9JhynMCnr3gbywWf9WUr14Bt9X1nygGiJfQoAxqJk5sJd7hGI1VHbjZW9Ju4wyJ LDkphJgW2AB0L1GsEeHr0eGgzQ3aQwMAnb0oD8xD9sIqptSb0xshOTALyo7j9CqgrTKu CRKSz92zRYdus1CVtMmOejLmYbifos1C6X1r+ro3Sz68kFNDNqPhVfZhZIsVOWvBBah8 zuoXe1GnTO2WSjqHNVdgNCHYcz/rQZ+pZOAzyLCB7x8nH0JS1+CKKwSNGH4cP/OOujzL 2M2gZt95970MFCfgfdOULOfNhRLzf9JBfNDALLIEs6ND1hZmonwtU8n9qQ92Lo8bmZfc L+Xw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=O2gRGeJLae2jMLREqrEFt3WckFCD5oNumxE+2K+ie6Q=; b=q98sGnfGUTsNd9X9RWtpyq9f2a48rUlUwir5XUCJJS04lsqzdjM+zraj3meb/LynHP obn/lXyXAn6TZU5lktzLEjr+YbifFd/oT1QPDe97R3HLNMq3bylOPGfW+HcvKFHXfFnH rcCA39ohLZrMNqhPYcismkD9r/aD8Afm0CxpYqtQRIcocl1tNz/FWm0knuD+QxfxAFSq jxBd3D/BGJTnwoqRH8DTek3n3FevH/OJlKR/Zq2QxZPHjYQhA/fhqSZlWqtnMhN8pAH6 PB85pE+ah8wpk061Jkf/L57kausUWmWJmruDLuDqDjS5BrDm4yUWnOUqbvzWD5equLCV adoA==
X-Gm-Message-State: AOAM531o2g1OlIXCNka1iTL2X0BEpE4EMIpCg5i+ShRj8t2yqMmaRQSn 6wJaN5sxOCokwwHKhpEr6YAk7Ic8bvsGe5M/78Y=
X-Google-Smtp-Source: ABdhPJwpbSzcrEeStgyE+XnRuc2FCndhGQRKJUfVuOdPjB7VIGvd7lTiP5svggacee+w+tS9UMJxJgG6B+7LLC+rq4A=
X-Received: by 2002:a50:a881:: with SMTP id k1mr87075646edc.12.1594614291350;  Sun, 12 Jul 2020 21:24:51 -0700 (PDT)
MIME-Version: 1.0
References: <CANAUx6juQjKmJZpj+_gzmz6i+SRK3wYDW0g0zmCr7DY2kXKqyA@mail.gmail.com> <023301d6589e$9b4ead30$d1ec0790$@acm.org>
In-Reply-To: <023301d6589e$9b4ead30$d1ec0790$@acm.org>
From: Ilya Kreymer <ikreymer@gmail.com>
Date: Sun, 12 Jul 2020 21:24:40 -0700
Message-ID: <CANAUx6hfm61DBHRu4takYRNXd6iM=_uJzv+Dbgmgb9PUGW1yLA@mail.gmail.com>
To: Larry Masinter <LMM@acm.org>
Cc: wpack@ietf.org
Content-Type: multipart/alternative; boundary="00000000000063c67b05aa4b126b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/rce8VYfk7ZEO25fReenk66WX5j4>
Subject: Re: [Wpack] Web Archives, Replaying Web Pages and WPACK
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jul 2020 04:24:55 -0000

--00000000000063c67b05aa4b126b
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, Jul 12, 2020 at 3:48 PM Larry Masinter <LMM@acm.org> wrote:

> Most of the replay of HTTP is irrelevant to the archive use case because
> there is no point in trying to reach out to original servers long after
> HTTP/n is obsolete. Mostly it=E2=80=99s a privacy threat to record irrele=
vant
> transaction metadta.
>

Web archive replay is the replay of HTTP exchanges to recreate the original
page as accurately as possible, but within an isolated context. When
visiting, for example, Internet Archive's Wayback Machine, what it does is
replay (unsigned) HTTP exchanges, with necessary modifications to make the
pages load from a different origin.
I have implemented a similar system entirely in the browser, using service
workers to match a request to an archived response. The exact HTTP protocol
itself is generally not relevant, and the abstraction of matching an HTTP
request to a stored HTTP response is sufficient,
Making the response seem like it was loaded from the original origin, not
the origin of the archive is the harder part, however.

The original servers are never being contacted, as the goal is to load the
archive in an isolated context, much like what is proposed with web bundles
for offline use.



>
>
> Instead you need to define a layer (like PDF/A did for paged documents,
> which preserves the meaning of the original experience without necessaril=
y
> being able to enter in new data and have it recompute. For example, there
> is no good way to archive an empty chat room and preserve the experience =
of
> saying something new.
>
>
Yes, a way to define boundaries would be nice to have for archives,
currently there is no such spec -- if you hit an archive 'boundary', you'd
end up with the 404 when trying to replay. This is not a requirement for
reasonable replay, though.


>
>
>
>
> The archive use case needs a different security model from the online sam=
e
> origin policy.
>
> The model used in PDF is pretty simple:
>
>
>
> Intra-package links are trusted. Links from inside the package to out
> require user verification (once for that package).
>

Yes, I agree, the current security model is unfortunately insufficient,
that's why I was hoping that this spec could help the web archiving use
case,
which today remains an actively used example of replaying HTTP exchanges.

Archives are generally isolated bundles, and should not link outside the
archive. (This is easier to enforce with a CSP policy currently)

In my systems, I have implemented sandboxing and isolation via Javascript,
and it has taken a lot of effort to do so, and probably is not entirely
foolproof:

Generally, this is done via URL prefixes

For example, an archive of 'https://example.com/' at:

https://my-archive.example.com/bundle-A/20200701//https://example.com/

should not have access to:

https://my-archive.example.com/bundle-B/20200701/https://example.com/

But both pages should believe they are loaded from the origin of '
https://example.com/' in order to operate properly. This is currently done
by 'emulating' the origin
via Javascript injection, since there is no other way, but would be great
if instead the browser could support this for a trusted archive directly.

The intent of the above URLs is to say:
'load URL https://example.com/ archived on 2020-06-01 from a
https://my-archive.example.com/bundle-B.bundle' or
'load URL https://example.com/ archived on 2020-07-01 from a
https://my-archive.example.com/bundle-A.bundle'

and the web archiving community settled on this de-facto URL scheme for
expressing such requests.

I think the main difference between the archival use case and some of the
other use cases being considered seems to be the signing/verification and
duration of the signatures. Since the HTTP exchange is a two way exchange,
what if the client browser had equal ability to create a signed exchange?

If the client, rather than only the server, could sign an HTTP exchange and
the signature could be verifiable for longer than 7 days, it could be a
significant help to the archival use case, which includes users making
their own snapshots. Archives like Internet Archive, smaller less known
archives and even individual users could produce verifiable HTTP exchanges
that could be loaded offline, could be more trustworthy than screenshots,
etc..  The other mechanics involved in loading and replaying an HTTP
exchange bundle, such as url fragments, request-->response matching are
also no different for the archival use case than any of the other ones.

Ilya

--00000000000063c67b05aa4b126b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Sun, Jul 12, 2020 at 3:48 PM Larry=
 Masinter &lt;<a href=3D"mailto:LMM@acm.org" target=3D"_blank">LMM@acm.org<=
/a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0=
px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><=
div lang=3D"EN-US"><div><p class=3D"MsoNormal" style=3D"margin-left:5.25pt"=
><span style=3D"font-size:10pt">Most of the replay of HTTP is irrelevant to=
 the archive use case because there is no point in trying to reach out to o=
riginal servers long after HTTP/n is obsolete. Mostly it=E2=80=99s a privac=
y threat to record irrelevant transaction metadta.</span></p></div></div></=
blockquote><div><br></div><div>Web archive replay is the replay of HTTP exc=
hanges to recreate the original page as accurately as possible, but within =
an isolated=C2=A0context. When visiting, for example, Internet Archive&#39;=
s Wayback Machine, what it does is replay (unsigned) HTTP exchanges, with n=
ecessary modifications to make the pages load from a different origin.</div=
><div>I have implemented a similar system entirely in the browser, using se=
rvice workers to match a request to an archived response. The exact HTTP pr=
otocol itself is generally not relevant, and the abstraction of matching an=
 HTTP request to a stored HTTP response is sufficient,=C2=A0</div><div>Maki=
ng the response seem like it was loaded from the original origin, not the o=
rigin of the archive is the harder part, however.</div><div><br></div><div>=
The original servers are never being contacted, as the goal is to load the =
archive in an isolated context, much like what is proposed with web=C2=A0bu=
ndles</div><div>for offline use.</div><div><br></div><div>=C2=A0</div><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:=
1px solid rgb(204,204,204);padding-left:1ex"><div lang=3D"EN-US"><div><p cl=
ass=3D"MsoNormal" style=3D"margin-left:5.25pt"><span style=3D"font-size:10p=
t"> <u></u><u></u></span></p><p class=3D"MsoNormal" style=3D"margin-left:5.=
25pt"><span style=3D"font-size:10pt"><u></u>=C2=A0<u></u></span></p><p clas=
s=3D"MsoNormal" style=3D"margin-left:5.25pt"><span style=3D"font-size:10pt"=
>Instead you need to define a layer (like PDF/A did for paged documents, wh=
ich preserves the meaning of the original experience without necessarily be=
ing able to enter in new data and have it recompute. For example, there is =
no good way to archive an empty chat room and preserve the experience of sa=
ying something new.<u></u><u></u></span></p><p class=3D"MsoNormal" style=3D=
"margin-left:5.25pt"><span style=3D"font-size:10pt"><u></u></span></p></div=
></div></blockquote><div><br></div><div>Yes, a way to define boundaries wou=
ld be nice to have for archives, currently there is no such spec -- if you =
hit an archive &#39;boundary&#39;, you&#39;d end up with the 404 when tryin=
g to replay. This is not a requirement for reasonable replay, though.</div>=
<div>=C2=A0<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div=
 lang=3D"EN-US"><div><p class=3D"MsoNormal" style=3D"margin-left:5.25pt"><s=
pan style=3D"font-size:10pt">=C2=A0<u></u></span></p><p class=3D"MsoNormal"=
><span style=3D"font-size:10pt"><u></u>=C2=A0<u></u></span></p><div><p clas=
s=3D"MsoNormal">The archive use case needs a different security model from =
the online same origin policy.<u></u><u></u></p><p class=3D"MsoNormal">The =
model used in PDF is pretty simple:<u></u><u></u></p><p class=3D"MsoNormal"=
><u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">Intra-package links are tru=
sted. Links from inside the package to out require user verification (once =
for that package).</p></div></div></div></blockquote><div><br></div><div>Ye=
s, I agree, the current security model is unfortunately insufficient, that&=
#39;s why I was hoping that this spec could help the web archiving use case=
,</div><div>which today remains an actively used example of replaying HTTP =
exchanges.</div><div><br></div><div>Archives are generally isolated bundles=
, and should not link outside the archive. (This is easier to enforce with =
a CSP policy currently)</div><div><br></div><div>In my systems, I have impl=
emented sandboxing and isolation via Javascript, and it has taken a lot of =
effort to do so, and probably is not entirely foolproof:</div><div><br></di=
v><div>Generally, this is done via URL prefixes</div><div><br></div><div>Fo=
r example, an archive of &#39;<a href=3D"https://example.com/" target=3D"_b=
lank">https://example.com/</a>&#39; at:</div><div><br></div><div><a href=3D=
"https://my-archive.example.com/bundle-A/20200701//https://example.com/">ht=
tps://my-archive.example.com/bundle-A/20200701//https://example.com/</a></d=
iv><div><br></div><div>should not have access to:</div><div><br></div><div>=
<a href=3D"https://my-archive.example.com/bundle-B/20200701/https://example=
.com/">https://my-archive.example.com/bundle-B/20200701/https://example.com=
/</a><br></div><div><br></div><div>But both pages should believe they are l=
oaded from the origin of &#39;<a href=3D"https://example.com/">https://exam=
ple.com/</a>&#39; in order to operate properly. This is currently done by &=
#39;emulating&#39; the origin</div><div>via Javascript injection, since the=
re is no other way, but would be great if instead the browser could support=
 this for a trusted archive directly.</div><div><br></div><div>The intent o=
f the above URLs is to say:</div><div>&#39;load URL <a href=3D"https://exam=
ple.com/">https://example.com/</a> archived on 2020-06-01 from a <a href=3D=
"https://my-archive.example.com/bundle-B.bundle">https://my-archive.example=
.com/bundle-B.bundle</a>&#39; or=C2=A0</div><div>&#39;load URL <a href=3D"h=
ttps://example.com/">https://example.com/</a> archived on 2020-07-01 from a=
 <a href=3D"https://my-archive.example.com/bundle-A.bundle">https://my-arch=
ive.example.com/bundle-A.bundle</a>&#39;</div><div><br></div><div>and the w=
eb archiving community settled on this de-facto URL scheme for expressing s=
uch requests.</div><div><br></div><div>I think the main difference between =
the archival use case and some of the other use cases being considered seem=
s to be the=C2=A0signing/verification and duration of the signatures. Since=
 the HTTP exchange is a two way exchange, what if the client browser had eq=
ual ability to create a signed exchange?</div><div><br></div><div>If the cl=
ient, rather than only the server, could sign an HTTP exchange and the sign=
ature could be verifiable for longer than 7 days, it could be a significant=
=C2=A0help to the archival use case, which includes users making their own =
snapshots. Archives like Internet Archive, smaller less known archives and =
even individual users could produce verifiable HTTP exchanges that could be=
 loaded offline, could be more trustworthy than screenshots, etc..=C2=A0 Th=
e other mechanics involved in loading and replaying an HTTP exchange bundle=
, such as url fragments, request--&gt;response matching are also no differe=
nt for the archival use case than any of the other ones.</div><div><br></di=
v><div>Ilya</div><div><br></div><div><br></div><div><br></div><div><br></di=
v><div><br></div><div><br></div><div><br></div><div><br></div><div><br></di=
v><div>=C2=A0</div></div></div>

--00000000000063c67b05aa4b126b--


From nobody Wed Jul 22 15:45:29 2020
Return-Path: <jyasskin@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C186C3A08A3 for <wpack@ietfa.amsl.com>; Wed, 22 Jul 2020 15:45:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.498
X-Spam-Level: 
X-Spam-Status: No, score=-9.498 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y4kLPbBu0cbn for <wpack@ietfa.amsl.com>; Wed, 22 Jul 2020 15:45:23 -0700 (PDT)
Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9B1A53A08A5 for <wpack@ietf.org>; Wed, 22 Jul 2020 15:45:23 -0700 (PDT)
Received: by mail-qk1-x732.google.com with SMTP id j187so3628011qke.11 for <wpack@ietf.org>; Wed, 22 Jul 2020 15:45:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZPFmb19EYPTPrMnm01L9zAoAh0fHQDp6hAO9l9VBXOw=; b=NJ/sVySFQMvXYe4VVj0JKuhKdjz3l4Ugm7tjK691uGLEhhaMTr+9L2moDTFkloV+1f 0IclHiTudm2hfY4foM1x89fFIgQ2aH2qsufhNtnBPW+Urbea86LItzvYcSEAyTw2a0UN 0tQOWxrCQumtUTvuQw7QBEgsqaW3/ckLapd2U=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZPFmb19EYPTPrMnm01L9zAoAh0fHQDp6hAO9l9VBXOw=; b=pNvIeYOqC0mm0vpLALz+ZkBeQI0XeHuuasQQ8fpfBhzY4rmUpMv8GOjujUjyc825HP qYcIaDy562NIUBfu+8R6B6oW+4yXuPlESy3VDoSu8cqz5er8e4UOcWRK1Ko0lgrrc2E/ dZH1rPmGEvSy4f2/JdlVc23qwxt8xJUSAlqvtXPH7VSh4hohWG5St9HqH8jRaJzKv8X2 BFPye1BCPA8X7jd8EukPdQAmQDZRvZrkZrfWMgfdrwKGWX3mwjETixd7Tt2eMiwsgpMj Gw9NVot4e87yyFK6dw1/bsDZeIYRGP+dbVV4VJizEnPEM/5dj9n3GFx4I7vzKVjc7lIL 1i0g==
X-Gm-Message-State: AOAM531tmMgFJAE7HBq18/pKSBBOhfHT8VQlM3tM1hDlSlwLVSwONyJa ckGcBTkI4lDre02XzAXoCUsLoBQT50aboGqmDrYnyuUhpykpmw==
X-Google-Smtp-Source: ABdhPJwpzkkIO/zz+dfSFii3On/cccsfP4pMDpKgwf9RHA41/Q9x2cbbKSbOb6hOmeo3nNaSm3YCQsG8KezQVPM6kZo=
X-Received: by 2002:a37:8946:: with SMTP id l67mr2284489qkd.457.1595457922162;  Wed, 22 Jul 2020 15:45:22 -0700 (PDT)
MIME-Version: 1.0
References: <CANAUx6juQjKmJZpj+_gzmz6i+SRK3wYDW0g0zmCr7DY2kXKqyA@mail.gmail.com>
In-Reply-To: <CANAUx6juQjKmJZpj+_gzmz6i+SRK3wYDW0g0zmCr7DY2kXKqyA@mail.gmail.com>
From: Jeffrey Yasskin <jyasskin@chromium.org>
Date: Wed, 22 Jul 2020 15:45:10 -0700
Message-ID: <CANh-dXk61a3WE_L72=QqEz0pztc5mopD_DTY4hB2HeiC23S_8Q@mail.gmail.com>
To: Ilya Kreymer <ikreymer@gmail.com>
Cc: WPACK List <wpack@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000b4db7905ab0f7e51"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/LrJ-Kj_9Ot2Cdwvlwdu8d29341k>
Subject: Re: [Wpack] Web Archives, Replaying Web Pages and WPACK
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Jul 2020 22:45:27 -0000

--000000000000b4db7905ab0f7e51
Content-Type: text/plain; charset="UTF-8"

Thanks for reaching out, and sorry for the slowness of my reply. In
general, I'd like this group's work to be useful to the archiving
community. Have you seen
https://www.iab.org/wp-content/IAB-uploads/2019/06/sawood-alam-2.pdf with
another archiving group's take on how we could cooperate?

I have a couple comments on your message, but please let me know if I've
failed to reply to something you wanted an answer to.

1. Your and other replay systems show that a change to browsers to natively
support a bundling format isn't strictly necessary for archivists to be
able to show the archived content, but I suspect that having some native
support will make several aspects of your job easier. Sawood's paper calls
out "live-leakage
<https://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html>,
temporal
violations <https://arxiv.org/abs/1402.0928>, origin violations
<https://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html>,
cookie violations
<https://ws-dl.blogspot.com/2019/03/2019-03-18-cookie-violations-cause.html>,
broken links, and security risks
<https://dl.acm.org/doi/10.1145/3133956.3134042>". Browser support for
bundles won't fix all of that, but it will fix some of the security issues
and origin violations ... if we can find a good conclusion to the
discussion around
https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md
.

2. I'd like to fix the misunderstanding you've found in
https://github.com/WICG/webpackage/blob/@%7B2020-07-22%7D/explainers/navigation-to-unsigned-bundles.md#warc.
I didn't follow what that misunderstanding was, though. Your message seems
to agree that WARC itself is not random-access, even though a random-access
structure can be built on top of it.

3. There are interesting technical constraints around giving a client the
tools to prove that it didn't forge an archive it presents to a peer. As
far as I've been able to design, you can't get real proof, just a
collection of assertions by variously-trusted entities that the content is
accurate. If the original site cooperates, it can sign a bundle of its
content to vouch for it, and then the client could pass the bundle around
to various archiving entities for them to add additional signatures
representing their claims that the content is authentic. The client can add
a signature too, but it's unlikely their signing key will be known well
enough to be trusted.

Jeffrey

On Fri, Jul 10, 2020 at 3:30 PM Ilya Kreymer <ikreymer@gmail.com> wrote:

> Hi,
>
> I wanted to reach out again to the webpack group, as I believe I am
> working on solving some of the same problems as wpack, but from the
> perspective of web archiving. It would be great if there was some way to
> collaborate with this group, though I am struggling to understand how that
> could be done.
>
> The overarching goal I believe seems to be the same: to replay HTTP
> network traffic in a way that recreates an authentic representation of a
> website, and to have a way to verify that the traffic was not forged. It
> seems a 'web archive' and a 'bundled http exchange' are fundamentally
> describing the same type of object, with perhaps different storage
> requirements and use cases.
>
> I wanted to share a system, called https://replayweb.page/ which can
> replay HTTP network traffic stored in a variety of formats directly in the
> browser, using existing web standards, particularly Service Workers,
> Fetch and IndexedDB (for caching).
>
> Here are a few examples, which replay bundled HTTP traffic, and can even
> be embedded in other pages as iframes:
> https://webrecorder.net/embed-demo-1.html - replaying smaller
> archives/bundles
> https://webrecorder.net/embed-demo-2.html - replaying from a 17GB
> archive/bundle
> https://webrecorder.net/embed-demo-3.html - replaying more complex web
> sites, including one with 3d viewer
>
> These examples are all isolated and rendered independent of each other:
> through Javascript rewriting and injection, the original Origin of the page
> is emulated so that the site behaves as it is running on its initial
> origin. This allows for replaying of complex, interactive web pages, though
> is not perfect.
>
> As I come from the web archiving community, I've focused mostly on WARC
> format, as that is an existing ISO standard and widely in use, and the
> system also supports replaying from HAR and the web bundles created via the
> WBN tool.
>
> However, the WARC format alone is a bit limiting, and there seems to be a misunderstanding
> about WARC
> <https://github.com/WICG/webpackage/blob/fc9b3e75309546c805b5cdb1db74b2d58a8e0b28/explainers/navigation-to-unsigned-bundles.md#warc>:
> It provides random access to HTTP traffic, but does not contain a built in
> index necessary for random access (it is assumed the index is maintained
> separately). To work around this, I've created a new 'bundling format', a
> 'bespoke zip format', which can contain WARCs (and other types of data,
> even .wbn bundles), along with other metadata, and a compressed index.
> This ZIP-based format is explained here:
> https://github..com/webrecorder/web-archive-collection-format
> <https://github.com/webrecorder/web-archive-collection-format>
>
> Since the ZIP format allows for random access (see: ZipInfo
> <https://github.com/Rob--W/zipinfo.js/>), it is possible to load all
> bundled data on-demand via range requests. This allows the format to scale
> to tens and probably hundreds of GBs.
>
> The system also supports referencing URLs via query params in the
> fragment, for example:
> https://replayweb.page/?source=/examples/netpreserve-twitter.warc#view=replay&url=https%3A%2F%2Ftwitter.com%2Fnetpreserve&ts=20190603053135 loads
> the WARC file, then loads the specified URL from the archive/bundle.
>
> I wanted to share all of this to see if there's perhaps some way to align
> with the work you're doing here, though I must admit it is not easy to
> understand if that is possible or of interest to this group. Again, from my
> perspective, it seems like you're working on a very similar problem,
> attempting to standardize this at the browser level, but perhaps for
> different use cases.
>
> One area I'm especially interested in is verification for Saving a Bundle
> in the Browser
> <https://github.com/WICG/webpackage/blob/2a78f2930a228ee6872630ecb023fa71151cc164/draft-yasskin-wpack-use-cases.md#save-and-share-a-web-page-snapshot>
> .
> Unfortunately, it seems that this use case is currently out of scope. I am
> especially interested in building tools to solve this problem, so that (to
> use this example) Casey can save the page in their browser, share it with
> Dakota, and that *Dakota can verify that this is what Casey saw in their
> browser*, and it was not forged. I think being able to site a web bundle
> from a client's perspective would be extremely useful for archival,
> fact-checking, sharing, etc.. use cases and could make the web more
> trustworthy.
>
> Please let me know if there is any interest in collaborating, or if these
> existing tools could somehow help this spec move forward.
>
> (If anyone is interested, the replayweb.page tool can be found on github
> at: https://github.com/webrecorder/replayweb.page (UI frontend) and
> https://github.com/webrecorder/wabac.js (service worker backend)
>
> Thank you,
> Ilya
> webrecorder.net
>
>
>
>
>
> _______________________________________________
> Wpack mailing list
> Wpack@ietf.org
> https://www.ietf.org/mailman/listinfo/wpack
>

--000000000000b4db7905ab0f7e51
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Thanks for reaching out, and sorry for the slowness o=
f my reply. In general, I&#39;d like this group&#39;s work to be useful to =
the archiving community. Have you seen=C2=A0<a href=3D"https://www.iab.org/=
wp-content/IAB-uploads/2019/06/sawood-alam-2.pdf">https://www.iab.org/wp-co=
ntent/IAB-uploads/2019/06/sawood-alam-2.pdf</a> with another archiving grou=
p&#39;s take on how we could cooperate?</div><div><br></div><div>I have a c=
ouple comments on your message, but please let me know if I&#39;ve failed t=
o reply to something you wanted an answer to.</div><div><br></div><div>1. Y=
our and other=C2=A0replay systems show that a change to browsers to nativel=
y support a bundling format isn&#39;t strictly necessary for archivists to =
be able to show the archived content, but I suspect that having some native=
 support will make several aspects of your job easier.=C2=A0Sawood&#39;s pa=
per calls out &quot;<a href=3D"https://ws-dl.blogspot.com/2012/10/2012-10-1=
0-zombies-in-archives.html">live-leakage</a>, <a href=3D"https://arxiv.org/=
abs/1402.0928">temporal violations</a>, <a href=3D"https://ws-dl.blogspot.c=
om/2017/01/2017-01-20-cnncom-has-been-unarchivable.html">origin violations<=
/a>, <a href=3D"https://ws-dl.blogspot.com/2019/03/2019-03-18-cookie-violat=
ions-cause.html">cookie violations</a>, broken links, and <a href=3D"https:=
//dl.acm.org/doi/10.1145/3133956.3134042">security risks</a>&quot;. Browser=
 support for bundles won&#39;t fix all of that, but it will fix some of the=
 security issues and origin violations ... if we can find a good conclusion=
 to the discussion around=C2=A0<a href=3D"https://github.com/WICG/webpackag=
e/blob/master/explainers/bundle-urls-and-origins.md">https://github.com/WIC=
G/webpackage/blob/master/explainers/bundle-urls-and-origins.md</a>.</div><d=
iv><br></div><div>2. I&#39;d like to fix the misunderstanding you&#39;ve fo=
und in=C2=A0<a href=3D"https://github.com/WICG/webpackage/blob/@%7B2020-07-=
22%7D/explainers/navigation-to-unsigned-bundles.md#warc">https://github.com=
/WICG/webpackage/blob/@%7B2020-07-22%7D/explainers/navigation-to-unsigned-b=
undles.md#warc</a>. I didn&#39;t follow what that misunderstanding was, tho=
ugh. Your message seems to agree that WARC itself is not random-access, eve=
n though a random-access structure can be built on top of it.</div><div><br=
></div><div>3. There are interesting technical constraints around giving a =
client the tools to prove that it didn&#39;t forge an archive it presents t=
o a peer. As far as I&#39;ve been able to design, you can&#39;t get real pr=
oof, just a collection of assertions by variously-trusted entities that the=
 content is accurate. If the original site cooperates, it can sign a bundle=
 of its content to vouch for it, and then the client could pass the bundle =
around to various archiving entities for them to add additional signatures =
representing their claims that the content is authentic. The client can add=
 a signature too, but it&#39;s unlikely their signing key will be known wel=
l enough to be trusted.</div><div><br></div><div>Jeffrey</div><br><div clas=
s=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Jul 10, 202=
0 at 3:30 PM Ilya Kreymer &lt;<a href=3D"mailto:ikreymer@gmail.com">ikreyme=
r@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>I wanted to reach out ag=
ain to the webpack group, as I believe I am working on solving some of the =
same problems as wpack, but from the perspective of web archiving. It would=
 be great if there was some way to collaborate with this group, though I am=
 struggling to understand how that could be done.</div><div><br></div><div>=
The overarching goal I believe seems to be the same: to replay HTTP network=
 traffic in a way that recreates an authentic representation of a website, =
and to have a way to verify that the traffic was not forged. It seems a &#3=
9;web archive&#39; and a &#39;bundled http exchange&#39; are fundamentally =
describing the same type of object, with perhaps different storage requirem=
ents and use cases.</div><div><br></div><div>I wanted to share a system, ca=
lled <a href=3D"https://replayweb.page/" target=3D"_blank">https://replaywe=
b.page/</a>=C2=A0which can replay HTTP network traffic stored in a variety =
of formats directly in the browser, using existing web standards, particula=
rly=C2=A0Service Workers, Fetch=C2=A0and IndexedDB (for caching).</div><div=
><br></div><div>Here are a few examples, which replay bundled HTTP traffic,=
 and can even be embedded in other pages as iframes:</div><div><a href=3D"h=
ttps://webrecorder.net/embed-demo-1.html" target=3D"_blank">https://webreco=
rder.net/embed-demo-1.html</a>=C2=A0- replaying smaller archives/bundles<br=
></div><div><a href=3D"https://webrecorder.net/embed-demo-2.html" target=3D=
"_blank">https://webrecorder.net/embed-demo-2.html</a>=C2=A0- replaying fro=
m a 17GB archive/bundle<br></div><div><a href=3D"https://webrecorder.net/em=
bed-demo-3.html" target=3D"_blank">https://webrecorder.net/embed-demo-3.htm=
l</a>=C2=A0- replaying more complex web sites, including one with 3d viewer=
<br></div><div><br></div><div>These examples are all isolated and rendered =
independent of each other: through Javascript rewriting and injection, the =
original Origin of the page is emulated so that the site behaves as it is r=
unning on its initial origin. This allows for replaying of complex, interac=
tive web pages, though is not perfect.</div><div><br></div><div>As I come f=
rom the web archiving community, I&#39;ve focused mostly on WARC format, as=
 that is an existing ISO standard and widely in use, and the system also su=
pports replaying from HAR and the web bundles created via the WBN tool.</di=
v><div><br></div><div>However, the WARC format alone is a bit limiting, and=
 there seems to be a <a href=3D"https://github.com/WICG/webpackage/blob/fc9=
b3e75309546c805b5cdb1db74b2d58a8e0b28/explainers/navigation-to-unsigned-bun=
dles.md#warc" target=3D"_blank">misunderstanding about WARC</a>: It provide=
s random access to HTTP traffic, but does not contain a built in index nece=
ssary for random access (it is assumed the index is maintained separately).=
 To work around this, I&#39;ve created a new &#39;bundling format&#39;, a &=
#39;bespoke zip format&#39;, which can contain WARCs (and other types of da=
ta, even .wbn bundles), along with other metadata, and a compressed index.<=
/div><div>This ZIP-based format is explained here:=C2=A0<a href=3D"https://=
github.com/webrecorder/web-archive-collection-format" target=3D"_blank">htt=
ps://github..com/webrecorder/web-archive-collection-format</a></div><div><b=
r></div><div>Since the ZIP format allows for random access (see: <a href=3D=
"https://github.com/Rob--W/zipinfo.js/" target=3D"_blank">ZipInfo</a>), it =
is possible to load all bundled data on-demand via range requests. This all=
ows the format to scale to tens and probably hundreds of GBs.</div><div><br=
></div><div>The system also supports referencing URLs via query params in t=
he fragment, for example:=C2=A0<a href=3D"https://replayweb.page/?source=3D=
/examples/netpreserve-twitter.warc#view=3Dreplay&amp;url=3Dhttps%3A%2F%2Ftw=
itter.com%2Fnetpreserve&amp;ts=3D20190603053135" target=3D"_blank">https://=
replayweb.page/?source=3D/examples/netpreserve-twitter.warc#view=3Dreplay&a=
mp;url=3Dhttps%3A%2F%2Ftwitter.com%2Fnetpreserve&amp;ts=3D20190603053135</a=
>=C2=A0loads the WARC file, then loads the specified URL from the archive/b=
undle.</div><div><br></div><div>I wanted to share all of this to see if the=
re&#39;s perhaps some way to align with the work you&#39;re doing here, tho=
ugh I must admit it is not easy to understand if that is possible or of int=
erest to this group. Again, from my perspective, it seems like you&#39;re w=
orking on a very similar problem, attempting to standardize this at the bro=
wser level, but perhaps for different use cases.</div><div><br></div><div>O=
ne area I&#39;m especially interested in is verification for=C2=A0<a href=
=3D"https://github.com/WICG/webpackage/blob/2a78f2930a228ee6872630ecb023fa7=
1151cc164/draft-yasskin-wpack-use-cases.md#save-and-share-a-web-page-snapsh=
ot" target=3D"_blank">Saving a Bundle in the Browser</a>.</div><div>Unfortu=
nately, it seems that this use case is currently out of scope. I am especia=
lly interested in building tools to solve this problem, so that (to use thi=
s example) Casey can save the page in their browser, share it with Dakota, =
and that *Dakota can verify that this is what Casey saw in their browser*, =
and it was not forged. I think being able to site a web bundle from a clien=
t&#39;s perspective would be extremely useful for archival, fact-checking, =
sharing, etc.. use cases and could make the web more trustworthy.</div><div=
><br></div><div>Please let me know if there is any interest in collaboratin=
g, or if these existing tools could somehow help this spec move forward.</d=
iv><div><br></div><div>(If anyone is interested, the <a href=3D"http://repl=
ayweb.page" target=3D"_blank">replayweb.page</a> tool can be found on githu=
b at:=C2=A0<a href=3D"https://github.com/webrecorder/replayweb.page" target=
=3D"_blank">https://github.com/webrecorder/replayweb.page</a>=C2=A0(UI fron=
tend) and=C2=A0<a href=3D"https://github.com/webrecorder/wabac.js" target=
=3D"_blank">https://github.com/webrecorder/wabac.js</a>=C2=A0(service worke=
r backend)</div><div><br></div><div>Thank you,</div><div>Ilya</div><div><a =
href=3D"http://webrecorder.net" target=3D"_blank">webrecorder.net</a></div>=
<div><br></div><div><br></div><div><br></div><div><br></div><div><br></div>=
</div>
_______________________________________________<br>
Wpack mailing list<br>
<a href=3D"mailto:Wpack@ietf.org" target=3D"_blank">Wpack@ietf.org</a><br>
<a href=3D"https://www.ietf.org/mailman/listinfo/wpack" rel=3D"noreferrer" =
target=3D"_blank">https://www.ietf.org/mailman/listinfo/wpack</a><br>
</blockquote></div></div>

--000000000000b4db7905ab0f7e51--


From nobody Thu Jul 23 18:45:27 2020
Return-Path: <ikreymer@gmail.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E6DE13A0808 for <wpack@ietfa.amsl.com>; Thu, 23 Jul 2020 18:45:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level: 
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HxjC7Egs6Ik6 for <wpack@ietfa.amsl.com>; Thu, 23 Jul 2020 18:45:22 -0700 (PDT)
Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C1AEC3A0807 for <wpack@ietf.org>; Thu, 23 Jul 2020 18:45:21 -0700 (PDT)
Received: by mail-ed1-x531.google.com with SMTP id a8so5903991edy.1 for <wpack@ietf.org>; Thu, 23 Jul 2020 18:45:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;  h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=dG/QA4CRcgkHV3w3ltm9G5b2Ptd4mwrcyax7FFaoetM=; b=RQAdbOvbAQbAPHB0aw3g/foRjS4ECCyZDRNM+Nm4eizSjl4Iwead/L6SYGQm2VaOIN ZhRYeVz2FYFiamNXCbqlo/7Zop9J0PSBFB3TEvX4Biw+ZjVpHjxIJIEP6NptToIuusLv vkWpQ5NLl7F8NCZ3M9yKkE272KyrRJzrrz+XSiqfrVGG7ncQuN53+Ujw+xK16XAhuVm4 64eQH+JIIB8tjDDCrm42Woiy2OUx8NLc0FjyZJ1LMNYxHlb8DqHTVIUi16EBPAGvRGZY ZuABmI4zKMJ77Dqf9KBg2zO0O/UJ6+tA0qgEK2+EeFKOkhFpIl1XpcN1z2riFOVeZ4HP zi7g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=dG/QA4CRcgkHV3w3ltm9G5b2Ptd4mwrcyax7FFaoetM=; b=lGr5yK+QuuoJPHgwkBTq0BmP+6dLu+rKO78jIap9HsKUkDrkBMaTFeVfRoFn09Ks6T /LkOS8QaG7q0jsdRgoWFQQAhm8eZoIN2dYZKXOYSZennGBtxl6Dd19c3zy9y/0KfRfmQ nJsMHeWTv8cr+NH02CaUcxbzwadUQGOhMYP/daPPlUAivsOQoKGujPtyyKaZV05M33vV G7wBhLzMUArs01jH8QvSd7o0T3j9iJlBQZRMq/X8crmQbbjZcaG/sSZHugfh7MypbuOP UNqw/Lw/pUsOVRAyOAeZIltFjcECjL4gRQ2kYnmhYp63S1AZ4pik0x+2JjQZOZSzE7Vm 5Oqg==
X-Gm-Message-State: AOAM530eVEBqcZpfecNJ6Cz4utZDucy9HiQNBQF8iekc11HPJbVKxZEO QM6EzyNYRL0LiO7rTy3k27WVgMKG5WjpOji/7l4=
X-Google-Smtp-Source: ABdhPJyZS3PTUZmnz552dAIRgs1WMDy2gWqbKghvrUh7GMjc8FKh43jExb4sOltvdhjpW/krmMc+8WwTxZ9TwpEgS/k=
X-Received: by 2002:a05:6402:ca3:: with SMTP id cn3mr7079669edb.64.1595555119981;  Thu, 23 Jul 2020 18:45:19 -0700 (PDT)
MIME-Version: 1.0
References: <CANAUx6juQjKmJZpj+_gzmz6i+SRK3wYDW0g0zmCr7DY2kXKqyA@mail.gmail.com> <CANh-dXk61a3WE_L72=QqEz0pztc5mopD_DTY4hB2HeiC23S_8Q@mail.gmail.com>
In-Reply-To: <CANh-dXk61a3WE_L72=QqEz0pztc5mopD_DTY4hB2HeiC23S_8Q@mail.gmail.com>
From: Ilya Kreymer <ikreymer@gmail.com>
Date: Thu, 23 Jul 2020 18:45:08 -0700
Message-ID: <CANAUx6jmE5txboA7SeTo60s0fcidkC0XJkY0GFM=VntiMd1Fjw@mail.gmail.com>
To: Jeffrey Yasskin <jyasskin@chromium.org>
Cc: WPACK List <wpack@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000025680405ab262016"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/S5TD5o0Xa-F7R1lTkVp3mYxNCSc>
Subject: Re: [Wpack] Web Archives, Replaying Web Pages and WPACK
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Jul 2020 01:45:26 -0000

--00000000000025680405ab262016
Content-Type: text/plain; charset="UTF-8"

Hi Jeffrey,

On Wed, Jul 22, 2020 at 3:45 PM Jeffrey Yasskin <jyasskin@chromium.org>
wrote:

> Thanks for reaching out, and sorry for the slowness of my reply. In
> general, I'd like this group's work to be useful to the archiving
> community. Have you seen
> https://www.iab.org/wp-content/IAB-uploads/2019/06/sawood-alam-2.pdf with
> another archiving group's take on how we could cooperate?
>

No worries, thanks so much for writing back! Yes, I know Sawood and the
other authors of the paper quite well, and we communicate regularly.
I strongly agree with some of the general proposals that they suggest,
especially regarding the potential for improving trust on the web and for
archives with signed exchanges, (more in response to point 3 below), while
strongly disagree with others, such as the focus on HTTP content
negotiation as a useful mechanism for specifying or accessing archives
(which I don't think would be relevant to bundles and not a good idea for
many reasons including those outlined in
https://wiki.whatwg.org/wiki/Why_not_conneg)

I am not a researcher, but a developer/implementer, so wanted to get a more
concrete understanding/path to what cooperation may look like, and I think
this conversation is a great start! More comments below.


>
> I have a couple comments on your message, but please let me know if I've
> failed to reply to something you wanted an answer to.
>
> 1. Your and other replay systems show that a change to browsers to
> natively support a bundling format isn't strictly necessary for archivists
> to be able to show the archived content, but I suspect that having some
> native support will make several aspects of your job easier. Sawood's paper
> calls out "live-leakage
> <https://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html>, temporal
> violations <https://arxiv.org/abs/1402.0928>, origin violations
> <https://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html>,
> cookie violations
> <https://ws-dl.blogspot.com/2019/03/2019-03-18-cookie-violations-cause.html>,
> broken links, and security risks
> <https://dl.acm.org/doi/10.1145/3133956.3134042>". Browser support for
> bundles won't fix all of that, but it will fix some of the security issues
> and origin violations ... if we can find a good conclusion to the
> discussion around
> https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md
> .
>

Indeed, I am very happy to hear that this would be important for you as
well! I definitely welcome native support for web bundle replay if it can
make my job easier,
but wasn't sure how/if that was going to happen. Yes, I wouldn't expect
this proposal to fix all of the issues raised in Sawood's research: some
have already been addressed, others apply specifically to large scale
archives. I am also working with archives at a much smaller scale, for
example an archive of a single site at a single point in time,
where things like temporal violations are less of a concern.

Thanks for pointing to the Origin document, I had not seen it before.

I would say that being able to emulate the original origin/location of a
bundle/archive in an isolated environment would probably be the biggest
improvement to make my job easier -- I am maintaining thousands of lines of
JS code which essentially tries to emulate the original location of the
page when it's loaded from an archive.

If the browser could do so natively, that would be an immense improvement!
My immediate feedback on the proposal is that bundles should be replayed
with all contents served from their original, exact origins but in an
isolated session, similar to Chrome's guest session mode.
Anything less than the original origin, and I'm back to injecting thousands
of lines of JS code to emulate the original origin. If a page was archived
from 'https://example.com' but is then served from 'package:example.com' or
some variation of that, it will still break any site that checks for
'window.location'.
(The complex JS injection that I have precisely deals with emulating the
window.location to be what the site expects).

Probably a guest session window (or tab?) that is isolated to the contents
of the bundle, and can not access any data outside of it, would be the
ideal solution,
but of course I don't know if that is possible.


>
> 2. I'd like to fix the misunderstanding you've found in
> https://github.com/WICG/webpackage/blob/@%7B2020-07-22%7D/explainers/navigation-to-unsigned-bundles.md#warc.
> I didn't follow what that misunderstanding was, though. Your message seems
> to agree that WARC itself is not random-access, even though a random-access
> structure can be built on top of it.
>

I should have been more clear also, as this is a bit confusing. The way
WARCs are structured allows for seeking to byte N to read url U, without
having to read the entire WARC first. However, *where* to seek in the WARC
is not specified in the WARC itself, and does require maintaining an
auxiliary index. All uses of WARCs generally maintain such an auxiliary
index to provide fast random access to the WARC data. This is perhaps both
a strength and a weakness of the WARC format. So yes, perhaps it's
partially random access, or random access with extra data, based on how you
define that :) In a browser, the auxiliary index could be specified with an
IndexedDB store, and the WARC data loaded over HTTP range requests, which
seems to work pretty well.

This sort of leads to another question/suggestion that could immensely make
my work easier. There are PBs of WARCs out there and it would be great if
existing works could be used with this system, without having to be
converted or re-serialized. What if the web bundle spec could de-couple the
replay from the storage, via a customizable API?

The idea I had in mind is something similar to the CacheStorage interface,
where the user could implement a match() function, like this:

interface BaseWebBundle
{
    Promise<Response> match(Request, options);
}

Then, one could implement a 'class WARCBundle extends BaseWebBundle' which
handles lookup from WARC (or any other format) and performs any custom
matching rules. The current implementation would of course also be provided
by default. This could be used to provide a more flexible matching system,
and perhaps simplify some of what needs to happen in:
https://wicg.github.io/webpackage/loading.html#request-matching. Likely,
this would have to be limited to unsigned exchanges only, though.
To get accurate replay, matching of request to response is often requires
custom 'fuzzy' matching, eg.
a request for https://example.com/?_=1235 should match the response for
https://example.com/?_=1234. I think this is outside the scope of what
should be standardized,
but having a flexible way to add custom matching, and support loading for
WARC (or HAR, or other formats) would greatly help my work as well.



> 3. There are interesting technical constraints around giving a client the
> tools to prove that it didn't forge an archive it presents to a peer. As
> far as I've been able to design, you can't get real proof, just a
> collection of assertions by variously-trusted entities that the content is
> accurate. If the original site cooperates, it can sign a bundle of its
> content to vouch for it, and then the client could pass the bundle around
> to various archiving entities for them to add additional signatures
> representing their claims that the content is authentic. The client can add
> a signature too, but it's unlikely their signing key will be known well
> enough to be trusted.
>
>
Yes, I agree that there are additional constraints, though it seems like a
'problem worth solving' to give more power to individual users vs larger
content aggregators, as
Sawood et al suggest. Under the current scheme, only the content
provider/server operator can choose whether their data is verifiable, even
though two parties are involved
in any HTTP exchange. I think peers could exchange public keys for
verification as they do in other systems. Archives large and small could
share their public keys, for instance, and offer verifiable archives. I
agree there's more design that is needed on this, though, and I don't have
any exact answers myself.

A major motivation for this is a project I've been working to support
archiving/bundle creation directly in the browser via an extension. (The
extnension relies on the chrome.debugger apis and so currently limited to
Chrome)

Here is a video of a demo of how it works, archiving several pages as you
browse:
https://drive.google.com/file/d/1aH5X1GN8X84jQXSnS4NJB78BqN3ZjrEA/view

In a perfect world, the browser could provide this functionality natively,
as a working replacement for 'Save Page As...' which is of course broken
for most dynamic pages.
For now, the extension supports exporting the content as WARC, but of
course it could support exporting as a web bundle, and better still if the
exported bundle could be signed by the browser. If this kind of saving of
bundles could be done from the browser, that would really make my job, and
probably many others in the archiving community, easier :)

Let me know if there are any questions and happy to be more involved in
this process!

Thanks again,
Ilya



> Jeffrey
>
> On Fri, Jul 10, 2020 at 3:30 PM Ilya Kreymer <ikreymer@gmail.com> wrote:
>
>> Hi,
>>
>> I wanted to reach out again to the webpack group, as I believe I am
>> working on solving some of the same problems as wpack, but from the
>> perspective of web archiving. It would be great if there was some way to
>> collaborate with this group, though I am struggling to understand how that
>> could be done.
>>
>> The overarching goal I believe seems to be the same: to replay HTTP
>> network traffic in a way that recreates an authentic representation of a
>> website, and to have a way to verify that the traffic was not forged. It
>> seems a 'web archive' and a 'bundled http exchange' are fundamentally
>> describing the same type of object, with perhaps different storage
>> requirements and use cases.
>>
>> I wanted to share a system, called https://replayweb.page/ which can
>> replay HTTP network traffic stored in a variety of formats directly in the
>> browser, using existing web standards, particularly Service Workers,
>> Fetch and IndexedDB (for caching).
>>
>> Here are a few examples, which replay bundled HTTP traffic, and can even
>> be embedded in other pages as iframes:
>> https://webrecorder.net/embed-demo-1.html - replaying smaller
>> archives/bundles
>> https://webrecorder.net/embed-demo-2.html - replaying from a 17GB
>> archive/bundle
>> https://webrecorder.net/embed-demo-3.html - replaying more complex web
>> sites, including one with 3d viewer
>>
>> These examples are all isolated and rendered independent of each other:
>> through Javascript rewriting and injection, the original Origin of the page
>> is emulated so that the site behaves as it is running on its initial
>> origin. This allows for replaying of complex, interactive web pages, though
>> is not perfect.
>>
>> As I come from the web archiving community, I've focused mostly on WARC
>> format, as that is an existing ISO standard and widely in use, and the
>> system also supports replaying from HAR and the web bundles created via the
>> WBN tool.
>>
>> However, the WARC format alone is a bit limiting, and there seems to be a misunderstanding
>> about WARC
>> <https://github.com/WICG/webpackage/blob/fc9b3e75309546c805b5cdb1db74b2d58a8e0b28/explainers/navigation-to-unsigned-bundles.md#warc>:
>> It provides random access to HTTP traffic, but does not contain a built in
>> index necessary for random access (it is assumed the index is maintained
>> separately). To work around this, I've created a new 'bundling format', a
>> 'bespoke zip format', which can contain WARCs (and other types of data,
>> even .wbn bundles), along with other metadata, and a compressed index.
>> This ZIP-based format is explained here:
>> https://github..com/webrecorder/web-archive-collection-format
>> <https://github.com/webrecorder/web-archive-collection-format>
>>
>> Since the ZIP format allows for random access (see: ZipInfo
>> <https://github.com/Rob--W/zipinfo.js/>), it is possible to load all
>> bundled data on-demand via range requests. This allows the format to scale
>> to tens and probably hundreds of GBs.
>>
>> The system also supports referencing URLs via query params in the
>> fragment, for example:
>> https://replayweb.page/?source=/examples/netpreserve-twitter.warc#view=replay&url=https%3A%2F%2Ftwitter.com%2Fnetpreserve&ts=20190603053135 loads
>> the WARC file, then loads the specified URL from the archive/bundle.
>>
>> I wanted to share all of this to see if there's perhaps some way to align
>> with the work you're doing here, though I must admit it is not easy to
>> understand if that is possible or of interest to this group. Again, from my
>> perspective, it seems like you're working on a very similar problem,
>> attempting to standardize this at the browser level, but perhaps for
>> different use cases.
>>
>> One area I'm especially interested in is verification for Saving a
>> Bundle in the Browser
>> <https://github.com/WICG/webpackage/blob/2a78f2930a228ee6872630ecb023fa71151cc164/draft-yasskin-wpack-use-cases.md#save-and-share-a-web-page-snapshot>
>> .
>> Unfortunately, it seems that this use case is currently out of scope. I
>> am especially interested in building tools to solve this problem, so that
>> (to use this example) Casey can save the page in their browser, share it
>> with Dakota, and that *Dakota can verify that this is what Casey saw in
>> their browser*, and it was not forged. I think being able to site a web
>> bundle from a client's perspective would be extremely useful for archival,
>> fact-checking, sharing, etc.. use cases and could make the web more
>> trustworthy.
>>
>> Please let me know if there is any interest in collaborating, or if these
>> existing tools could somehow help this spec move forward.
>>
>> (If anyone is interested, the replayweb.page tool can be found on github
>> at: https://github.com/webrecorder/replayweb.page (UI frontend) and
>> https://github.com/webrecorder/wabac.js (service worker backend)
>>
>> Thank you,
>> Ilya
>> webrecorder.net
>>
>>
>>
>>
>>
>> _______________________________________________
>> Wpack mailing list
>> Wpack@ietf.org
>> https://www.ietf.org/mailman/listinfo/wpack
>>
>

--00000000000025680405ab262016
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">Hi Jeffrey,<div><br></div></div><div clas=
s=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Jul 22, 202=
0 at 3:45 PM Jeffrey Yasskin &lt;<a href=3D"mailto:jyasskin@chromium.org" t=
arget=3D"_blank">jyasskin@chromium.org</a>&gt; wrote:<br></div><blockquote =
class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px sol=
id rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div>Thanks for reac=
hing out, and sorry for the slowness of my reply. In general, I&#39;d like =
this group&#39;s work to be useful to the archiving community. Have you see=
n=C2=A0<a href=3D"https://www.iab.org/wp-content/IAB-uploads/2019/06/sawood=
-alam-2.pdf" target=3D"_blank">https://www.iab.org/wp-content/IAB-uploads/2=
019/06/sawood-alam-2.pdf</a> with another archiving group&#39;s take on how=
 we could cooperate?</div></div></blockquote><div><br></div><div>No worries=
, thanks so much for writing back! Yes, I know Sawood and the other authors=
 of the paper quite well, and we communicate regularly.</div><div>I strongl=
y agree with some of the general proposals that they suggest, especially re=
garding the potential for improving trust on the web and for archives with =
signed exchanges, (more in response to point 3 below), while strongly disag=
ree with others, such as the focus on HTTP content negotiation as a useful =
mechanism for specifying or accessing archives (which I don&#39;t think wou=
ld be relevant to bundles=C2=A0and not a good idea for many reasons includi=
ng those outlined in=C2=A0<a href=3D"https://wiki.whatwg.org/wiki/Why_not_c=
onneg" target=3D"_blank">https://wiki.whatwg.org/wiki/Why_not_conneg</a>)</=
div><div><br></div><div>I am not a researcher, but a developer/implementer,=
 so wanted to get a more concrete understanding/path to what cooperation ma=
y look like, and I think this conversation is a great start! More comments =
below.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1=
ex"><div dir=3D"ltr"><div><br></div><div>I have a couple comments on your m=
essage, but please let me know if I&#39;ve failed to reply to something you=
 wanted an answer to.</div><div><br></div><div>1. Your and other=C2=A0repla=
y systems show that a change to browsers to natively support a bundling for=
mat isn&#39;t strictly necessary for archivists to be able to show the arch=
ived content, but I suspect that having some native support will make sever=
al aspects of your job easier.=C2=A0Sawood&#39;s paper calls out &quot;<a h=
ref=3D"https://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.ht=
ml" target=3D"_blank">live-leakage</a>, <a href=3D"https://arxiv.org/abs/14=
02.0928" target=3D"_blank">temporal violations</a>, <a href=3D"https://ws-d=
l.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html" target=
=3D"_blank">origin violations</a>, <a href=3D"https://ws-dl.blogspot.com/20=
19/03/2019-03-18-cookie-violations-cause.html" target=3D"_blank">cookie vio=
lations</a>, broken links, and <a href=3D"https://dl.acm.org/doi/10.1145/31=
33956.3134042" target=3D"_blank">security risks</a>&quot;. Browser support =
for bundles won&#39;t fix all of that, but it will fix some of the security=
 issues and origin violations ... if we can find a good conclusion to the d=
iscussion around=C2=A0<a href=3D"https://github.com/WICG/webpackage/blob/ma=
ster/explainers/bundle-urls-and-origins.md" target=3D"_blank">https://githu=
b.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md</a>=
.</div></div></blockquote><div><br></div><div>Indeed, I am very happy to he=
ar that this would be important for you as well! I definitely welcome nativ=
e support for web bundle replay if it can make my job easier,</div><div>but=
 wasn&#39;t sure how/if that was going to happen. Yes, I wouldn&#39;t expec=
t this proposal to fix all of the issues raised in Sawood&#39;s=C2=A0resear=
ch: some have already been addressed, others apply specifically to large sc=
ale archives. I am also working with archives at a much smaller scale, for =
example an archive of a single site at a single point in time,</div><div>wh=
ere things like temporal violations are less of a concern.</div><div><br></=
div><div>Thanks for pointing to the Origin document, I had not seen it befo=
re.</div><div><br></div><div>I would say that being able to emulate the ori=
ginal origin/location of a bundle/archive in an isolated environment would =
probably be the biggest improvement to make my job easier -- I am maintaini=
ng thousands of lines of JS code which essentially tries to emulate the ori=
ginal location of the page when it&#39;s loaded from an archive.</div><div>=
<br></div><div>If the browser could do so natively, that would be an immens=
e improvement! My immediate feedback on the proposal is that bundles should=
 be replayed with all contents served from their original, exact origins bu=
t in an isolated=C2=A0session,=C2=A0similar to Chrome&#39;s guest session m=
ode.</div><div>Anything less than the original origin, and I&#39;m back to =
injecting thousands of lines of JS code to emulate the original origin. If =
a page was archived from &#39;<a href=3D"https://example.com" target=3D"_bl=
ank">https://example.com</a>&#39; but is then served from &#39;package:<a h=
ref=3D"http://example.com" target=3D"_blank">example.com</a>&#39; or some v=
ariation of that, it will still break any site that checks for &#39;window.=
location&#39;.</div><div>(The complex JS injection that I have precisely de=
als with emulating the window.location to be what the site expects).</div><=
div><br></div><div>Probably a guest session window (or tab?) that is isolat=
ed to the contents of the bundle, and can not access any data outside of it=
, would be the ideal solution,</div><div>but of course I don&#39;t know if =
that is possible.=C2=A0</div><div>=C2=A0</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,20=
4);padding-left:1ex"><div dir=3D"ltr"><div><br></div><div>2. I&#39;d like t=
o fix the misunderstanding you&#39;ve found in=C2=A0<a href=3D"https://gith=
ub.com/WICG/webpackage/blob/@%7B2020-07-22%7D/explainers/navigation-to-unsi=
gned-bundles.md#warc" target=3D"_blank">https://github.com/WICG/webpackage/=
blob/@%7B2020-07-22%7D/explainers/navigation-to-unsigned-bundles.md#warc</a=
>. I didn&#39;t follow what that misunderstanding was, though. Your message=
 seems to agree that WARC itself is not random-access, even though a random=
-access structure can be built on top of it.</div></div></blockquote><div><=
br></div><div>I should have been more clear also, as this is a bit confusin=
g. The way WARCs are structured allows for seeking to byte N to read url U,=
 without having to read the entire WARC first. However, *where* to seek in =
the WARC is not specified in the WARC itself, and does require maintaining =
an auxiliary index. All uses of WARCs generally maintain such an auxiliary =
index to provide fast random access to the WARC data. This is perhaps both =
a strength and a weakness of the WARC format. So yes, perhaps it&#39;s part=
ially random access, or random access with extra data, based on how you def=
ine that :) In a browser, the auxiliary index could be specified with an In=
dexedDB store, and the WARC data loaded over HTTP range requests, which see=
ms to work pretty well.</div><div><br></div><div>This sort of leads to anot=
her question/suggestion that could immensely make my work easier. There are=
 PBs of WARCs out there and it would be great if existing works could be us=
ed with this system, without having to be converted or re-serialized. What =
if the web bundle spec could de-couple the replay from the storage, via a c=
ustomizable API?</div><div><br></div><div>The idea I had in mind is somethi=
ng similar to the CacheStorage interface, where the user could implement a =
match() function, like this:</div><div><br></div><div>interface BaseWebBund=
le</div><div>{</div><div>=C2=A0 =C2=A0 Promise&lt;Response&gt; match(Reques=
t, options);=C2=A0</div><div>}</div><div><br></div><div>Then, one could imp=
lement a &#39;class WARCBundle extends BaseWebBundle&#39; which handles loo=
kup from WARC (or any other format) and performs any custom matching rules.=
 The current implementation would of course also be provided by default. Th=
is could be used to provide a more flexible matching system, and perhaps si=
mplify some of what needs to happen in:=C2=A0<a href=3D"https://wicg.github=
.io/webpackage/loading.html#request-matching" target=3D"_blank">https://wic=
g.github.io/webpackage/loading.html#request-matching</a>. Likely, this woul=
d have to be limited to unsigned exchanges only, though.</div><div>To get a=
ccurate replay, matching of request to response is often requires custom &#=
39;fuzzy&#39; matching, eg.<br></div><div>a request for <a href=3D"https://=
example.com/?_=3D1235" target=3D"_blank">https://example.com/?_=3D1235</a> =
should match the response for <a href=3D"https://example.com/?_=3D1234" tar=
get=3D"_blank">https://example.com/?_=3D1234</a>. I think this is outside t=
he scope of what should be standardized,</div><div>but having a flexible wa=
y to add custom matching, and support loading for WARC (or HAR, or other fo=
rmats) would greatly help my work as well.=C2=A0</div><div><br></div><div><=
br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e=
x;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"=
><div><br></div><div>3. There are interesting technical constraints around =
giving a client the tools to prove that it didn&#39;t forge an archive it p=
resents to a peer. As far as I&#39;ve been able to design, you can&#39;t ge=
t real proof, just a collection of assertions by variously-trusted entities=
 that the content is accurate. If the original site cooperates, it can sign=
 a bundle of its content to vouch for it, and then the client could pass th=
e bundle around to various archiving entities for them to add additional si=
gnatures representing their claims that the content is authentic. The clien=
t can add a signature too, but it&#39;s unlikely their signing key will be =
known well enough to be trusted.</div><div><br></div></div></blockquote><di=
v><br></div><div>Yes, I agree that there are additional constraints, though=
 it seems like a &#39;problem worth solving&#39; to give more power to indi=
vidual users vs larger content aggregators, as</div><div>Sawood et al sugge=
st. Under the current scheme, only the content provider/server operator can=
 choose whether their data is verifiable, even though two parties are invol=
ved</div><div>in any HTTP exchange. I think peers could exchange public key=
s for verification as they do in other systems. Archives large and small co=
uld share their public keys, for instance, and offer verifiable archives. I=
 agree there&#39;s more design that is needed on this, though, and I don&#3=
9;t have any exact answers myself.</div><div><br></div><div>A major motivat=
ion for this is a project I&#39;ve been working to support archiving/bundle=
 creation directly in the browser via an extension. (The extnension relies =
on the chrome.debugger apis and so currently limited to Chrome)</div><div><=
br></div><div>Here is a video of a demo of how it works, archiving several =
pages as you browse:</div><div><a href=3D"https://drive.google.com/file/d/1=
aH5X1GN8X84jQXSnS4NJB78BqN3ZjrEA/view">https://drive.google.com/file/d/1aH5=
X1GN8X84jQXSnS4NJB78BqN3ZjrEA/view</a><br></div><div><br></div><div>In a pe=
rfect world, the browser could provide this functionality natively, as a wo=
rking replacement for &#39;Save Page As...&#39; which is of course broken f=
or most dynamic pages.</div><div>For now, the extension=C2=A0supports expor=
ting the content as WARC, but of course it could support exporting as a web=
 bundle, and better still if the exported bundle could be signed by the bro=
wser. If this kind of saving of bundles could be done from the browser, tha=
t would really make my job, and probably many others in the archiving commu=
nity, easier :)</div><div><br></div><div>Let me know if there are any quest=
ions and happy to be more involved in this process!</div><div><br></div><di=
v>Thanks again,</div><div>Ilya</div><div><br></div><div>=C2=A0</div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1p=
x solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div></div><div=
>Jeffrey</div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmai=
l_attr">On Fri, Jul 10, 2020 at 3:30 PM Ilya Kreymer &lt;<a href=3D"mailto:=
ikreymer@gmail.com" target=3D"_blank">ikreymer@gmail.com</a>&gt; wrote:<br>=
</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;b=
order-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Hi=
,<div><br></div><div>I wanted to reach out again to the webpack group, as I=
 believe I am working on solving some of the same problems as wpack, but fr=
om the perspective of web archiving. It would be great if there was some wa=
y to collaborate with this group, though I am struggling to understand how =
that could be done.</div><div><br></div><div>The overarching goal I believe=
 seems to be the same: to replay HTTP network traffic in a way that recreat=
es an authentic representation of a website, and to have a way to verify th=
at the traffic was not forged. It seems a &#39;web archive&#39; and a &#39;=
bundled http exchange&#39; are fundamentally describing the same type of ob=
ject, with perhaps different storage requirements and use cases.</div><div>=
<br></div><div>I wanted to share a system, called <a href=3D"https://replay=
web.page/" target=3D"_blank">https://replayweb.page/</a>=C2=A0which can rep=
lay HTTP network traffic stored in a variety of formats directly in the bro=
wser, using existing web standards, particularly=C2=A0Service Workers, Fetc=
h=C2=A0and IndexedDB (for caching).</div><div><br></div><div>Here are a few=
 examples, which replay bundled HTTP traffic, and can even be embedded in o=
ther pages as iframes:</div><div><a href=3D"https://webrecorder.net/embed-d=
emo-1.html" target=3D"_blank">https://webrecorder.net/embed-demo-1.html</a>=
=C2=A0- replaying smaller archives/bundles<br></div><div><a href=3D"https:/=
/webrecorder.net/embed-demo-2.html" target=3D"_blank">https://webrecorder.n=
et/embed-demo-2.html</a>=C2=A0- replaying from a 17GB archive/bundle<br></d=
iv><div><a href=3D"https://webrecorder.net/embed-demo-3.html" target=3D"_bl=
ank">https://webrecorder.net/embed-demo-3.html</a>=C2=A0- replaying more co=
mplex web sites, including one with 3d viewer<br></div><div><br></div><div>=
These examples are all isolated and rendered independent of each other: thr=
ough Javascript rewriting and injection, the original Origin of the page is=
 emulated so that the site behaves as it is running on its initial origin. =
This allows for replaying of complex, interactive web pages, though is not =
perfect.</div><div><br></div><div>As I come from the web archiving communit=
y, I&#39;ve focused mostly on WARC format, as that is an existing ISO stand=
ard and widely in use, and the system also supports replaying from HAR and =
the web bundles created via the WBN tool.</div><div><br></div><div>However,=
 the WARC format alone is a bit limiting, and there seems to be a <a href=
=3D"https://github.com/WICG/webpackage/blob/fc9b3e75309546c805b5cdb1db74b2d=
58a8e0b28/explainers/navigation-to-unsigned-bundles.md#warc" target=3D"_bla=
nk">misunderstanding about WARC</a>: It provides random access to HTTP traf=
fic, but does not contain a built in index necessary for random access (it =
is assumed the index is maintained separately). To work around this, I&#39;=
ve created a new &#39;bundling format&#39;, a &#39;bespoke zip format&#39;,=
 which can contain WARCs (and other types of data, even .wbn bundles), alon=
g with other metadata, and a compressed index.</div><div>This ZIP-based for=
mat is explained here:=C2=A0<a href=3D"https://github.com/webrecorder/web-a=
rchive-collection-format" target=3D"_blank">https://github..com/webrecorder=
/web-archive-collection-format</a></div><div><br></div><div>Since the ZIP f=
ormat allows for random access (see: <a href=3D"https://github.com/Rob--W/z=
ipinfo.js/" target=3D"_blank">ZipInfo</a>), it is possible to load all bund=
led data on-demand via range requests. This allows the format to scale to t=
ens and probably hundreds of GBs.</div><div><br></div><div>The system also =
supports referencing URLs via query params in the fragment, for example:=C2=
=A0<a href=3D"https://replayweb.page/?source=3D/examples/netpreserve-twitte=
r.warc#view=3Dreplay&amp;url=3Dhttps%3A%2F%2Ftwitter.com%2Fnetpreserve&amp;=
ts=3D20190603053135" target=3D"_blank">https://replayweb.page/?source=3D/ex=
amples/netpreserve-twitter.warc#view=3Dreplay&amp;url=3Dhttps%3A%2F%2Ftwitt=
er.com%2Fnetpreserve&amp;ts=3D20190603053135</a>=C2=A0loads the WARC file, =
then loads the specified URL from the archive/bundle.</div><div><br></div><=
div>I wanted to share all of this to see if there&#39;s perhaps some way to=
 align with the work you&#39;re doing here, though I must admit it is not e=
asy to understand if that is possible or of interest to this group. Again, =
from my perspective, it seems like you&#39;re working on a very similar pro=
blem, attempting to standardize this at the browser level, but perhaps for =
different use cases.</div><div><br></div><div>One area I&#39;m especially i=
nterested in is verification for=C2=A0<a href=3D"https://github.com/WICG/we=
bpackage/blob/2a78f2930a228ee6872630ecb023fa71151cc164/draft-yasskin-wpack-=
use-cases.md#save-and-share-a-web-page-snapshot" target=3D"_blank">Saving a=
 Bundle in the Browser</a>.</div><div>Unfortunately, it seems that this use=
 case is currently out of scope. I am especially interested in building too=
ls to solve this problem, so that (to use this example) Casey can save the =
page in their browser, share it with Dakota, and that *Dakota can verify th=
at this is what Casey saw in their browser*, and it was not forged. I think=
 being able to site a web bundle from a client&#39;s perspective would be e=
xtremely useful for archival, fact-checking, sharing, etc.. use cases and c=
ould make the web more trustworthy.</div><div><br></div><div>Please let me =
know if there is any interest in collaborating, or if these existing tools =
could somehow help this spec move forward.</div><div><br></div><div>(If any=
one is interested, the <a href=3D"http://replayweb.page" target=3D"_blank">=
replayweb.page</a> tool can be found on github at:=C2=A0<a href=3D"https://=
github.com/webrecorder/replayweb.page" target=3D"_blank">https://github.com=
/webrecorder/replayweb.page</a>=C2=A0(UI frontend) and=C2=A0<a href=3D"http=
s://github.com/webrecorder/wabac.js" target=3D"_blank">https://github.com/w=
ebrecorder/wabac.js</a>=C2=A0(service worker backend)</div><div><br></div><=
div>Thank you,</div><div>Ilya</div><div><a href=3D"http://webrecorder.net" =
target=3D"_blank">webrecorder.net</a></div><div><br></div><div><br></div><d=
iv><br></div><div><br></div><div><br></div></div>
_______________________________________________<br>
Wpack mailing list<br>
<a href=3D"mailto:Wpack@ietf.org" target=3D"_blank">Wpack@ietf.org</a><br>
<a href=3D"https://www.ietf.org/mailman/listinfo/wpack" rel=3D"noreferrer" =
target=3D"_blank">https://www.ietf.org/mailman/listinfo/wpack</a><br>
</blockquote></div></div>
</blockquote></div>
</div>

--00000000000025680405ab262016--


From nobody Sun Jul 26 20:30:01 2020
Return-Path: <sean@sn3rd.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 13EBE3A165D for <wpack@ietfa.amsl.com>; Sun, 26 Jul 2020 20:29:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level: 
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=sn3rd.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jkCfyl-QWVxx for <wpack@ietfa.amsl.com>; Sun, 26 Jul 2020 20:29:57 -0700 (PDT)
Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8A4BE3A165C for <wpack@ietf.org>; Sun, 26 Jul 2020 20:29:57 -0700 (PDT)
Received: by mail-qt1-x836.google.com with SMTP id o22so11001758qtt.13 for <wpack@ietf.org>; Sun, 26 Jul 2020 20:29:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sn3rd.com; s=google; h=from:content-transfer-encoding:mime-version:subject:message-id:date :to; bh=xwFCEBX1JHPlPgm5P1NYCKiokNpaQ+Gvxska3aCudlE=; b=jFS25UKRQBZxFNoxN//iZ9aQde4qmo+8mRLdeFQD/Fo+9DCpe4CyUAi0vgRk+AyaOs pbQYoYgWZknvwrp1vJrR8gT0TP/FF9VxJTurhW4Lojq2yEV2UcffslpJmsugFIEC8fYG t0l6avnWySNy+9TsVpJcUfya8q93gJUW1CRA4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:to; bh=xwFCEBX1JHPlPgm5P1NYCKiokNpaQ+Gvxska3aCudlE=; b=JdLVNhjc1/xKH3POV8WN4NEMn4ady8+Jx43u+3x3S9jRb0oE8oyu4eLia+i4xRCdsb ak9LNRCkxxtUwUFpGTAYK72Lgi0IbJxLffC63ZTppkCz3+/v3xt58rP9MpRVNdqpdVmX C6/Jrc4LhxugIYHyvt5M8sMSRTEwSeMQnvEiaV2mSGLMyu5v/91Y3OHidmYdNOHycFj4 jMj5gMXBvNXbJ26/19WZ3Xcvot8RJOjS2DEzpTCD3ZAzKMgbmaqFyeL3h9HqKyOC8ahm GT+gFoV/QgThNKxzu+4g/MV3O2kvgbo/xI4hdYHGw2CQgWkF06KVev22DEv1V7Uq6Iw+ 5rmw==
X-Gm-Message-State: AOAM5333iirhCcbN+xLVpF7xv8d/uJ4x4DUZoGoxfkUc8fCWQTEahk7b x/d2kEMSpxPrgLVf1r8RbdJH9SCsvmA=
X-Google-Smtp-Source: ABdhPJzeVPnIV4MHZSBDHXZrXStsIIEXqI4cQQ0FZrleleExq79kE5hY+BG2LbMokg2oCOR4cV2tAQ==
X-Received: by 2002:ac8:d86:: with SMTP id s6mr20522119qti.343.1595820596154;  Sun, 26 Jul 2020 20:29:56 -0700 (PDT)
Received: from [192.168.1.152] (pool-108-31-39-252.washdc.fios.verizon.net. [108.31.39.252]) by smtp.gmail.com with ESMTPSA id y24sm512684qtv.71.2020.07.26.20.29.55 for <wpack@ietf.org> (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Jul 2020 20:29:55 -0700 (PDT)
From: Sean Turner <sean@sn3rd.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Message-Id: <79974E9C-0B33-49B9-A648-17CBCC2D9417@sn3rd.com>
Date: Sun, 26 Jul 2020 23:29:54 -0400
To: WPACK List <wpack@ietf.org>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/qFqLkq4tIxgtLRICgGhI88DE2HU>
Subject: [Wpack] WPACK@IETF108: Meeting Logistics
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jul 2020 03:29:59 -0000

Hi! We=E2=80=99re scheduled for Friday 2020-07-31 @ 13:00 UTC. =
Additional logistics:

Agenda:
https://www.ietf.org/proceedings/108/agenda/agenda-108-wpack-00

Meetecho (a/v and chat):
https://meetings.conf.meetecho.com/ietf108/?group=3Dwpack&short=3D&item=3D=
1

Audio (only):
http://mp3.conf.meetecho.com/ietf/ietf1083.m3u

Etherpad (virtual blue sheets and notes):
https://codimd.ietf.org/notes-ietf-108-wpack#

spt=


From nobody Thu Jul 30 18:39:31 2020
Return-Path: <sean@sn3rd.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E2673A09A9 for <wpack@ietfa.amsl.com>; Thu, 30 Jul 2020 18:39:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level: 
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=sn3rd.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id loh_qboBwObr for <wpack@ietfa.amsl.com>; Thu, 30 Jul 2020 18:39:28 -0700 (PDT)
Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1F11C3A09A8 for <wpack@ietf.org>; Thu, 30 Jul 2020 18:39:20 -0700 (PDT)
Received: by mail-qk1-x729.google.com with SMTP id l23so27549154qkk.0 for <wpack@ietf.org>; Thu, 30 Jul 2020 18:39:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sn3rd.com; s=google; h=from:content-transfer-encoding:mime-version:subject:message-id:date :to; bh=KVsYEp5rHjVZ9MelJSRg9oI/K/yE8jVLtPck4++uTZ0=; b=lIQQnf9Q2d4Clor26WegtA9G7vCngDCFGp4FB58Z7vPlA1a7mpDcHzEotwYcgS9rmt kBrFpzqjmKcTkwnkkbpLXZaAsu1F2dyDF3QWbKVqCZfHcw/wYMmrA64rj4Xt9mqWdanF Goxz55BkyxuBLz5CcoUlN+OOMQfd/6XSAyqKM=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:to; bh=KVsYEp5rHjVZ9MelJSRg9oI/K/yE8jVLtPck4++uTZ0=; b=H1wwOEjQ81RI6sIirlSyMvi7zReIrVlOQi9YjQjdpQ10aKuSZGe1HFddMqaLFwQXkm DqouG0OkF4Q/mL3q89QNsANRvMZf8vODn9JjfN2+cSd72xZWwvhIT5s8X72tKpCTkinp 8DiaVDGc5B6O6OvnBml8YMwI6fCQOwU0a6RqqPqJCs8WfMxADYDiDg1F96FUBwTMPn7S GAz1X/2PsGJn7pMa0H7/cPNLjajtR/qcv2OulgLj/IHujTDzEfyUwQNdW+sPLfrQtwCP FRYf8BW50bNcWHp/7LDBhfuecMnb9Nbe8zweiKrG4W1Aro/BK1/B4cdK4eDAHhzgqcC0 Gw+Q==
X-Gm-Message-State: AOAM532Ype2gJM6vlaIgy/pyW466ZXR3e2P9F9LSDV+S7jyneNMshac4 TaNVexRaQqh9+QqpA6WNFsm7XeYDy0Y=
X-Google-Smtp-Source: ABdhPJxGzgb5u5rXhrXD122Q96434zQXORtm/usGOWtVOUei1RJWQom+Behx2LyxykC9F+TAjwK0FA==
X-Received: by 2002:a37:4048:: with SMTP id n69mr1808204qka.421.1596159559763;  Thu, 30 Jul 2020 18:39:19 -0700 (PDT)
Received: from [192.168.1.152] (pool-108-31-39-252.washdc.fios.verizon.net. [108.31.39.252]) by smtp.gmail.com with ESMTPSA id x24sm7217602qtj.8.2020.07.30.18.39.18 for <wpack@ietf.org> (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Jul 2020 18:39:18 -0700 (PDT)
From: Sean Turner <sean@sn3rd.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Message-Id: <118C40AE-254F-49A6-BC60-935D0053CB83@sn3rd.com>
Date: Thu, 30 Jul 2020 21:39:18 -0400
To: WPACK List <wpack@ietf.org>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/nr31vo0qm1BaJalTMX8p_dpufZo>
Subject: [Wpack] WPACK@IETF108: Revissed Agenda and Slides Now Posted
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jul 2020 01:39:30 -0000

Apologies for being tardy, but I have uploaded a revised agenda and slides:

https://datatracker.ietf.org/meeting/108/session/wpack

spt


From nobody Fri Jul 31 07:10:24 2020
Return-Path: <twifkak@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 466133A0DC3 for <wpack@ietfa.amsl.com>; Fri, 31 Jul 2020 07:10:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.6
X-Spam-Level: 
X-Spam-Status: No, score=-17.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P0jero_6Rr37 for <wpack@ietfa.amsl.com>; Fri, 31 Jul 2020 07:10:22 -0700 (PDT)
Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E53963A0C16 for <wpack@ietf.org>; Fri, 31 Jul 2020 07:10:21 -0700 (PDT)
Received: by mail-wr1-x432.google.com with SMTP id a15so28114397wrh.10 for <wpack@ietf.org>; Fri, 31 Jul 2020 07:10:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=OSyyLJCninmAyz6zb6VCmqGTYye/Mo/k0rAywOUkFe4=; b=mE+LtmJU6fYDhomurjzzVJ5Ts5Hea/2ttm0wn4F5SYV01k+GbJlYSvZShMQsZsBT7s h8ar2ofRBf14bqzT/MkXw2S7SURX8pF9WIC8aMnq90uvJMIOY3JVIbHE7dGn8uN4IwtR QWX+FmD5um75m2DZk4BPVBc+N0p+diJU6l8YnG/3CKJt8g3eZG76iwMCiYYSD47wIV+X rT/9g3ZZtmNRsRe/cwnAwXQn2xLy+OCHoW8iP20wDYCREScoNPS8JLJfUOXWyWIF9Y+q 0Thx94eWFEUz0KYDhnJ95LSp+DASmoBOE5fieAJWuZkjiIiTG0p2Oh0lY4zH8MFVNeOR Rjzw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=OSyyLJCninmAyz6zb6VCmqGTYye/Mo/k0rAywOUkFe4=; b=Vr6qpNKxUEtFE7sVQGGcOhPL15cUnR4BwXY+3FCVJtlBil75tmhuWuJDnbdtOagDTC c/lmXgzfChwhqx41TKcxIWG/YzDxSPJka3w02ZH+9wnpmUc+lVt91RB8YByEjRAeNXGg w6htIA7y1HQepcSS2V1L6Iftqp1cFkkohG7vvdkrSYxtaa2Ziw/FH0FFlAPKcy4bOTIy x1fuo9xi6ghAVDuFh3bZr9YxapDEBQJgKGOwlWjS9lanGJXV+JAd6bnIxlvDt2MDKLUA bwa7xNAcjBX/UJBIhsQJuTZiRZ2UE2X/yy1WW3HoWfHiKXX3siL7GQ//V77rZrlo0lAZ 7klA==
X-Gm-Message-State: AOAM532l+YglBUZkTkgRu3w2U/dlFjnCDs/YRbGdiZ66hK0kOWea6G05 mGtM3jbujX6CMJB4OP3tHXE0kKwEyCGB5aDPxzQcgTEZIKg=
X-Google-Smtp-Source: ABdhPJzAo08QXHaizVFg4u5qSyLYVJfkBcWN7LlBd1bz3N0hZXC6ZCPhK8T957UG1Cjz2CfvsBYJG+FmQeWeIW+q300=
X-Received: by 2002:adf:e647:: with SMTP id b7mr3980831wrn.220.1596204619918;  Fri, 31 Jul 2020 07:10:19 -0700 (PDT)
MIME-Version: 1.0
From: Devin Mullins <twifkak@google.com>
Date: Fri, 31 Jul 2020 07:09:51 -0700
Message-ID: <CANjwSi=T4YeCZ=tmbs_dwCe_L8gPsh9yRxKxFLMwTn6BJ9WCrQ@mail.gmail.com>
To: wpack@ietf.org
Content-Type: multipart/alternative; boundary="0000000000005c496305abbd59d5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/nQ0ppPsE6FZ3vc3Zq83nGrL8t5E>
Subject: [Wpack] file: package URLs under rename
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jul 2020 14:10:23 -0000

--0000000000005c496305abbd59d5
Content-Type: text/plain; charset="UTF-8"

To expand on Benjamin's question, there is potentially surprising behavior
here. If I download a bundle of a PWA (e.g. a game or a text editor) and
generate some storage, I may be surprised to find that storage disappear
when I rename the bundle on my hard drive.

I'm not sure if this is a use-case that is in scope or a priority for
anybody in this group? But it seems like a usage that would naturally arise
from the browser capability.

I'm not proposing any particular solution. If folks don't feel too strongly
about this, maybe it's fine if we just warn against this usage.

--0000000000005c496305abbd59d5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>To expand on Benjamin&#39;s question, there is potent=
ially surprising behavior here. If I download a bundle of a PWA (e.g. a gam=
e or a text editor) and generate some storage, I may be surprised to find t=
hat storage disappear when I rename the bundle on my hard drive.</div><div>=
<br></div><div>I&#39;m not sure if this is a use-case that is in scope or a=
 priority for anybody in this group? But it seems like a usage that would n=
aturally arise from the browser capability.<br></div><div><br></div><div>I&=
#39;m not proposing any particular solution. If folks don&#39;t feel too st=
rongly about this, maybe it&#39;s fine if we just warn against this usage.<=
/div></div>

--0000000000005c496305abbd59d5--


From nobody Fri Jul 31 07:31:02 2020
Return-Path: <jyasskin@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 01AAF3A0B37 for <wpack@ietfa.amsl.com>; Fri, 31 Jul 2020 07:31:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.499
X-Spam-Level: 
X-Spam-Status: No, score=-9.499 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6Js5St8Z7zgu for <wpack@ietfa.amsl.com>; Fri, 31 Jul 2020 07:30:59 -0700 (PDT)
Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com [IPv6:2607:f8b0:4864:20::f36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AC5F23A0AD5 for <wpack@ietf.org>; Fri, 31 Jul 2020 07:30:59 -0700 (PDT)
Received: by mail-qv1-xf36.google.com with SMTP id y11so11206978qvl.4 for <wpack@ietf.org>; Fri, 31 Jul 2020 07:30:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=Ai6oU9xrbQ/9i8YWA0g9R3Istcvm7FBzOi8QGaely1g=; b=ltHZRIT8nwMlXnUs3VjPWrgBI1un1GgSCYjkiPgh/n1AEh8DMnw4Ef58NDJvrjjI8h vlpr1BEFRV1WMv5ZGoPmh7lZRhybO3BO1v18o7D/3e2aSw6AkglgiN+WhltyG7J0Ztle HviGyWFV60S+0h7N7Cske9etycYymaTtvvLpg=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Ai6oU9xrbQ/9i8YWA0g9R3Istcvm7FBzOi8QGaely1g=; b=sUm/JFuUAhpqzTEQOVAn1bSoBOxUGNeHs50IkwwklUwz4nJhSODd2gA1B86bJ/GzHd cnUC3NI9VlgV51kQzNUdgrwkBlxOQQg8NmI1Zq6ZJ8a12vWk4ALWGktXuBov8rQST4zf AABcqX9L2KRtlYbi2tcofTul5guogQ3Lp2EObUviXWMFulneZpRBkwkG0Cffr7pg1I70 h0riYzbDAIFV61lWINCAb00vvM42in9i/pemR+O9kZfJDufLdwyA4IotupRh3kmpzlHy 0g+4UE1Osxrxr1kubZu4NsXCUqzmmKsQz35DSxdi5Mb1yq6jLLmFRKyaoedeDNzD1TBN ekYg==
X-Gm-Message-State: AOAM5320P5zWicvWFj0IaZuJqMJjlTa9Q2+zOxDmna8MFX1XDSTjb3Tx +2L0p67D4A07ssZTRMKSV6J8QbpSVwR3DNlYHk0+mQ==
X-Google-Smtp-Source: ABdhPJxRl7bmklG8tXSRa3COZEjZbuaSQQqjWwI7Fnx9wdoWBnzSiJPC0EkFZGTMpod7BK8wE5AeeHfXqhW53zj7w+4=
X-Received: by 2002:ad4:4992:: with SMTP id t18mr4081418qvx.193.1596205858363;  Fri, 31 Jul 2020 07:30:58 -0700 (PDT)
MIME-Version: 1.0
References: <CANjwSi=T4YeCZ=tmbs_dwCe_L8gPsh9yRxKxFLMwTn6BJ9WCrQ@mail.gmail.com>
In-Reply-To: <CANjwSi=T4YeCZ=tmbs_dwCe_L8gPsh9yRxKxFLMwTn6BJ9WCrQ@mail.gmail.com>
From: Jeffrey Yasskin <jyasskin@chromium.org>
Date: Fri, 31 Jul 2020 07:30:46 -0700
Message-ID: <CANh-dX=3U81HTQBuvEcKg-moE8FO6Cc2hyhzdb5Fu89aTv5gZQ@mail.gmail.com>
To: Devin Mullins <twifkak=40google.com@dmarc.ietf.org>
Cc: WPACK List <wpack@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/q9YjuIdaIUwczbFPj4xZ3KdMlOY>
Subject: Re: [Wpack] file: package URLs under rename
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Jul 2020 14:31:01 -0000

https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-o=
rigins.md#downloading-bundles
discusses this question when the browser downloads the bundle, but I
hadn't considered it when the user renames the file outside the
browser. If the format included help for the browser to incrementally
hash the content, that could help to at least tell the user what's
happened...

On Fri, Jul 31, 2020 at 7:10 AM Devin Mullins
<twifkak=3D40google.com@dmarc.ietf.org> wrote:
>
> To expand on Benjamin's question, there is potentially surprising behavio=
r here. If I download a bundle of a PWA (e.g. a game or a text editor) and =
generate some storage, I may be surprised to find that storage disappear wh=
en I rename the bundle on my hard drive.
>
> I'm not sure if this is a use-case that is in scope or a priority for any=
body in this group? But it seems like a usage that would naturally arise fr=
om the browser capability.
>
> I'm not proposing any particular solution. If folks don't feel too strong=
ly about this, maybe it's fine if we just warn against this usage.
> _______________________________________________
> Wpack mailing list
> Wpack@ietf.org
> https://www.ietf.org/mailman/listinfo/wpack

