
From nobody Sat Nov  3 23:07:46 2018
Return-Path: <rcross@amsl.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 34F84130DD7 for <tools-discuss@ietfa.amsl.com>; Sat,  3 Nov 2018 23:07:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kaHsOjRwIPZX for <tools-discuss@ietfa.amsl.com>; Sat,  3 Nov 2018 23:07:42 -0700 (PDT)
Received: from mail.amsl.com (c8a.amsl.com [4.31.198.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5664D1271FF for <tools-discuss@ietf.org>; Sat,  3 Nov 2018 23:07:42 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTP id C981D1C0439; Sat,  3 Nov 2018 23:07:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from c8a.amsl.com ([127.0.0.1]) by localhost (c8a.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hivrie2QbHuW; Sat,  3 Nov 2018 23:07:30 -0700 (PDT)
Received: from [IPv6:2001:67c:370:128:b974:561d:5d2c:1f6f] (unknown [IPv6:2001:67c:370:128:b974:561d:5d2c:1f6f]) by c8a.amsl.com (Postfix) with ESMTPSA id DA5561C0092; Sat,  3 Nov 2018 23:07:29 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Ryan Cross <rcross@amsl.com>
In-Reply-To: <B138260D-0523-49C8-B92E-CD929EBE40A8@tzi.org>
Date: Sun, 4 Nov 2018 13:07:37 +0700
Cc: "tools-discuss@ietf.org Discussion" <tools-discuss@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <64D7AF0F-45B4-448B-8E46-7FC63C249BEA@amsl.com>
References: <B138260D-0523-49C8-B92E-CD929EBE40A8@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/Bcfa-9HnJ2bYbk2iqR2iDqwiAVQ>
Subject: Re: [Tools-discuss]  =?utf-8?q?Can=27t_search_for_=F0=9F=94=94_in_mai?= =?utf-8?q?l_archive?=
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 04 Nov 2018 06:07:44 -0000

Hi Carsten,

This appears to be a bug.  I have raised a ticket and will investigate =
further.

Ryan

> On Oct 24, 2018, at 11:52 PM, Carsten Bormann <cabo@tzi.org> wrote:
>=20
> I just noticed that the mailarchive.ietf.org doesn=E2=80=99t allow me =
to search for =F0=9F=94=94
> It seems to act as if I hadn=E2=80=99t put in anything into the search =
box.
>=20
> Now, =F0=9F=94=94 is an =E2=80=9Castral=E2=80=9D character (came to =
unicode after 1990 or so), but that should not be a reason I can=E2=80=99t=
 search for it.
>=20
> (It also seems the search function doesn=E2=80=99t really return most =
of what should be hits, but I=E2=80=99m still investigating that.)
>=20
> Gr=C3=BC=C3=9Fe, Carsten
>=20
> ___________________________________________________________
> Tools-discuss mailing list
> Tools-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/tools-discuss
>=20
> Please report datatracker.ietf.org and mailarchive.ietf.org
> bugs at http://tools.ietf.org/tools/ietfdb
> or send email to datatracker-project@ietf.org
>=20
> Please report tools.ietf.org bugs at
> http://tools.ietf.org/tools/issues
> or send email to webmaster@tools.ietf.org


From nobody Sun Nov  4 06:43:23 2018
Return-Path: <mankamis@cisco.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2FF18120072 for <tools-discuss@ietfa.amsl.com>; Sun,  4 Nov 2018 06:43:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.97
X-Spam-Level: 
X-Spam-Status: No, score=-14.97 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.47, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5RCIuSOKw2-A for <tools-discuss@ietfa.amsl.com>; Sun,  4 Nov 2018 06:43:19 -0800 (PST)
Received: from rcdn-iport-5.cisco.com (rcdn-iport-5.cisco.com [173.37.86.76]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3FA191274D0 for <tools-discuss@ietf.org>; Sun,  4 Nov 2018 06:43:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=5680; q=dns/txt; s=iport; t=1541342599; x=1542552199; h=from:to:subject:date:message-id:mime-version; bh=KDpAprzUy3aS4/j0Cc/64FMsNi70yKdQ46IY+S2BUSk=; b=i2LMRwWNj2yc+IJPShZh72EQhZSsLSR1Un1Sv36YrX3Bl3RdX0hzo4G7 FatYSodUYjN8YHd2hqUoL9of2TyXjJeNfYxFPACWKumALRx6pWXbBLVzt dLzhYVYWha3r4K7jmdqUB3SG5lAOuoZ56k/t8UW/9z3RFpQEv5LaCrO1x Q=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0DFAACIBN9b/5RdJa1jHAEBAQQBAQc?= =?us-ascii?q?EAQGBVAQBAQsBgQ13Zn8yg2yULZNmh04LAQGFBYMnIjcKDQEDAQECAQECbR0?= =?us-ascii?q?LhWRoAQw+AgQwJwSDNAGBHWSmdIEuhTyEWIgSg2QXgUE/gTgfgh8BiC4xgiY?= =?us-ascii?q?CiTOFKxaQPgkCgV6PMRiBVYUAiguXHwIRFIEmMyKBVXAVOyoBgkKQWI18gR8?= =?us-ascii?q?BAQ?=
X-IronPort-AV: E=Sophos;i="5.54,464,1534809600";  d="scan'208,217";a="257624770"
Received: from rcdn-core-12.cisco.com ([173.37.93.148]) by rcdn-iport-5.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Nov 2018 14:43:18 +0000
Received: from XCH-ALN-008.cisco.com (xch-aln-008.cisco.com [173.36.7.18]) by rcdn-core-12.cisco.com (8.15.2/8.15.2) with ESMTPS id wA4EhIbm011061 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL) for <tools-discuss@ietf.org>; Sun, 4 Nov 2018 14:43:18 GMT
Received: from xch-rcd-008.cisco.com (173.37.102.18) by XCH-ALN-008.cisco.com (173.36.7.18) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Sun, 4 Nov 2018 08:43:17 -0600
Received: from xch-rcd-008.cisco.com ([173.37.102.18]) by XCH-RCD-008.cisco.com ([173.37.102.18]) with mapi id 15.00.1395.000; Sun, 4 Nov 2018 08:43:17 -0600
From: "Mankamana Mishra (mankamis)" <mankamis@cisco.com>
To: "tools-discuss@ietf.org" <tools-discuss@ietf.org>
Thread-Topic: Suggestion for note taking pad 
Thread-Index: AQHUdEy5UZpuSyVP/UyboeGaqI1xMQ==
Date: Sun, 4 Nov 2018 14:43:17 +0000
Message-ID: <09B271E2-9DA6-438A-B9CC-A807A59D9D1C@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.24.106.111]
Content-Type: multipart/alternative; boundary="_000_09B271E29DA6438AB9CCA807A59D9D1Cciscocom_"
MIME-Version: 1.0
X-Outbound-SMTP-Client: 173.36.7.18, xch-aln-008.cisco.com
X-Outbound-Node: rcdn-core-12.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/M2xzvSUpwCYAUnFw4IZYBgvkDYI>
Subject: [Tools-discuss] Suggestion for note taking pad
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 04 Nov 2018 14:43:21 -0000

--_000_09B271E29DA6438AB9CCA807A59D9D1Cciscocom_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

SGkgVGVhbSwNCkl0IG1pZ2h0IGJlIGNyYXp5IGlkZWEsIGlmIGl0IGlzIHBsZWFzZSBpZ25vcmUu
IEkgd2FzIGp1c3QgdHJ5aW5nIHRvIHRoaW5rIGxvdWQgaWYgdGhlcmUgaXMgcG9zc2liaWxpdHkg
dG8gYWRkIGNhcGFiaWxpdHkgd2hlcmUgd2UgY2FuIGltcG9ydCBuYW1lIG9mIHBlcnNvbiBmcm9t
IGF0dGVuZGVlIGxpc3QgdG8gbm90ZSB0YWtpbmcgcGFnZS4gQmFzaWNhbGx5IHdoZW4gc29tZSBv
bmUgaXMgdGFraW5nIG5vdGUgLCBhbmQgbmVlZCB0byB3cml0ZSBuYW1lIG9mIHBlcnNvbiBpbiBu
b3RlIHdobyBpcyAgYXNraW5nIHF1ZXN0aW9ucyBvciBjb21tZW50aW5nIG9uIGRyYWZ0IHNvbWUg
dGltZSBpdCBiZWNvbWVzIGhhcmQgdG8gZ2V0IHNwZWxsaW5nIG9mIG5hbWVzIHJpZ2h0IGF3YXku
IE9uZSBvZiB0aGUgcmVhc29uIGlzICwgSUVURiBoYXZlIHBhcnRpY2lwYW50IGZyb20gYWNyb3Nz
IHdvcmxkIGFuZCBuYW1lcyBhcmUgc29tZSB0aW1lIG5vdCBmYW1pbGllci4NCg0KU28gSSB3YXMg
dGhpbmtpbmcgd2hhdCBpZiAsIHdlIGhhdmUgY2FwYWJpbGl0eSB3aGVyZSBsZXRzIHNheSB3aGls
ZSB0YWtpbmcgbm90ZSBJIGRvIEBtYW5rYSAgYW5kIGF1dG9tYXRpY2FsbHkgZ2l2ZXMgbWUgbmFt
ZSBhcyBtYW5rYW1hbmEgbWlzaHJhIC4gQW5kIHRoaXMgbmFtZSB3b3VsZCBjb21lIGZyb20gYXR0
ZW5kZWUgbGlzdC4NCg0KRG8geW91IHRoaW5rLCB3ZSBoYXZlIGFueSBwb3NzaWJpbGl0eSBvZiBk
b2luZyBzdWNoIHRoaW5nID8NCg0KVGhhbmtzDQpNYW5rYW1hbmENCg0K

--_000_09B271E29DA6438AB9CCA807A59D9D1Cciscocom_
Content-Type: text/html; charset="utf-8"
Content-ID: <69C7422AE52B28439D833AD05876B160@emea.cisco.com>
Content-Transfer-Encoding: base64

PGh0bWwgeG1sbnM6bz0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTpvZmZpY2U6b2ZmaWNlIiB4
bWxuczp3PSJ1cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTp3b3JkIiB4bWxuczptPSJo
dHRwOi8vc2NoZW1hcy5taWNyb3NvZnQuY29tL29mZmljZS8yMDA0LzEyL29tbWwiIHhtbG5zPSJo
dHRwOi8vd3d3LnczLm9yZy9UUi9SRUMtaHRtbDQwIj4NCjxoZWFkPg0KPG1ldGEgaHR0cC1lcXVp
dj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0L2h0bWw7IGNoYXJzZXQ9dXRmLTgiPg0KPG1l
dGEgbmFtZT0iR2VuZXJhdG9yIiBjb250ZW50PSJNaWNyb3NvZnQgV29yZCAxNSAoZmlsdGVyZWQg
bWVkaXVtKSI+DQo8c3R5bGU+PCEtLQ0KLyogRm9udCBEZWZpbml0aW9ucyAqLw0KQGZvbnQtZmFj
ZQ0KCXtmb250LWZhbWlseToiQ2FtYnJpYSBNYXRoIjsNCglwYW5vc2UtMToyIDQgNSAzIDUgNCA2
IDMgMiA0O30NCkBmb250LWZhY2UNCgl7Zm9udC1mYW1pbHk6RGVuZ1hpYW47DQoJcGFub3NlLTE6
MiAxIDYgMCAzIDEgMSAxIDEgMTt9DQpAZm9udC1mYWNlDQoJe2ZvbnQtZmFtaWx5OkNhbGlicmk7
DQoJcGFub3NlLTE6MiAxNSA1IDIgMiAyIDQgMyAyIDQ7fQ0KQGZvbnQtZmFjZQ0KCXtmb250LWZh
bWlseToiXEBEZW5nWGlhbiI7DQoJcGFub3NlLTE6MiAxIDYgMCAzIDEgMSAxIDEgMTt9DQovKiBT
dHlsZSBEZWZpbml0aW9ucyAqLw0KcC5Nc29Ob3JtYWwsIGxpLk1zb05vcm1hbCwgZGl2Lk1zb05v
cm1hbA0KCXttYXJnaW46MGluOw0KCW1hcmdpbi1ib3R0b206LjAwMDFwdDsNCglmb250LXNpemU6
MTIuMHB0Ow0KCWZvbnQtZmFtaWx5OiJDYWxpYnJpIixzYW5zLXNlcmlmO30NCmE6bGluaywgc3Bh
bi5Nc29IeXBlcmxpbmsNCgl7bXNvLXN0eWxlLXByaW9yaXR5Ojk5Ow0KCWNvbG9yOiMwNTYzQzE7
DQoJdGV4dC1kZWNvcmF0aW9uOnVuZGVybGluZTt9DQphOnZpc2l0ZWQsIHNwYW4uTXNvSHlwZXJs
aW5rRm9sbG93ZWQNCgl7bXNvLXN0eWxlLXByaW9yaXR5Ojk5Ow0KCWNvbG9yOiM5NTRGNzI7DQoJ
dGV4dC1kZWNvcmF0aW9uOnVuZGVybGluZTt9DQpzcGFuLkVtYWlsU3R5bGUxNw0KCXttc28tc3R5
bGUtdHlwZTpwZXJzb25hbC1jb21wb3NlOw0KCWZvbnQtZmFtaWx5OiJDYWxpYnJpIixzYW5zLXNl
cmlmOw0KCWNvbG9yOndpbmRvd3RleHQ7fQ0KLk1zb0NocERlZmF1bHQNCgl7bXNvLXN0eWxlLXR5
cGU6ZXhwb3J0LW9ubHk7DQoJZm9udC1mYW1pbHk6IkNhbGlicmkiLHNhbnMtc2VyaWY7fQ0KQHBh
Z2UgV29yZFNlY3Rpb24xDQoJe3NpemU6OC41aW4gMTEuMGluOw0KCW1hcmdpbjoxLjBpbiAxLjBp
biAxLjBpbiAxLjBpbjt9DQpkaXYuV29yZFNlY3Rpb24xDQoJe3BhZ2U6V29yZFNlY3Rpb24xO30N
Ci0tPjwvc3R5bGU+DQo8L2hlYWQ+DQo8Ym9keSBsYW5nPSJFTi1VUyIgbGluaz0iIzA1NjNDMSIg
dmxpbms9IiM5NTRGNzIiPg0KPGRpdiBjbGFzcz0iV29yZFNlY3Rpb24xIj4NCjxwIGNsYXNzPSJN
c29Ob3JtYWwiPjxzcGFuIHN0eWxlPSJmb250LXNpemU6MTEuMHB0Ij5IaSBUZWFtLCA8bzpwPjwv
bzpwPjwvc3Bhbj48L3A+DQo8cCBjbGFzcz0iTXNvTm9ybWFsIj48c3BhbiBzdHlsZT0iZm9udC1z
aXplOjExLjBwdCI+SXQgbWlnaHQgYmUgY3JhenkgaWRlYSwgaWYgaXQgaXMgcGxlYXNlIGlnbm9y
ZS4gSSB3YXMganVzdCB0cnlpbmcgdG8gdGhpbmsgbG91ZCBpZiB0aGVyZSBpcyBwb3NzaWJpbGl0
eSB0byBhZGQgY2FwYWJpbGl0eSB3aGVyZSB3ZSBjYW4gaW1wb3J0IG5hbWUgb2YgcGVyc29uIGZy
b20gYXR0ZW5kZWUgbGlzdCB0byBub3RlIHRha2luZyBwYWdlLiBCYXNpY2FsbHkNCiB3aGVuIHNv
bWUgb25lIGlzIHRha2luZyBub3RlICwgYW5kIG5lZWQgdG8gd3JpdGUgbmFtZSBvZiBwZXJzb24g
aW4gbm90ZSB3aG8gaXMgJm5ic3A7YXNraW5nIHF1ZXN0aW9ucyBvciBjb21tZW50aW5nIG9uIGRy
YWZ0IHNvbWUgdGltZSBpdCBiZWNvbWVzIGhhcmQgdG8gZ2V0IHNwZWxsaW5nIG9mIG5hbWVzIHJp
Z2h0IGF3YXkuIE9uZSBvZiB0aGUgcmVhc29uIGlzICwgSUVURiBoYXZlIHBhcnRpY2lwYW50IGZy
b20gYWNyb3NzIHdvcmxkIGFuZCBuYW1lcw0KIGFyZSBzb21lIHRpbWUgbm90IGZhbWlsaWVyLiA8
bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQo8cCBjbGFzcz0iTXNvTm9ybWFsIj48c3BhbiBzdHlsZT0i
Zm9udC1zaXplOjExLjBwdCI+PG86cD4mbmJzcDs8L286cD48L3NwYW4+PC9wPg0KPHAgY2xhc3M9
Ik1zb05vcm1hbCI+PHNwYW4gc3R5bGU9ImZvbnQtc2l6ZToxMS4wcHQiPlNvIEkgd2FzIHRoaW5r
aW5nIHdoYXQgaWYgLCB3ZSBoYXZlIGNhcGFiaWxpdHkgd2hlcmUgbGV0cyBzYXkgd2hpbGUgdGFr
aW5nIG5vdGUgSSBkbyBAbWFua2EmbmJzcDsgYW5kIGF1dG9tYXRpY2FsbHkgZ2l2ZXMgbWUgbmFt
ZSBhcyBtYW5rYW1hbmEgbWlzaHJhIC4gQW5kIHRoaXMgbmFtZSB3b3VsZCBjb21lIGZyb20gYXR0
ZW5kZWUgbGlzdC4NCjxvOnA+PC9vOnA+PC9zcGFuPjwvcD4NCjxwIGNsYXNzPSJNc29Ob3JtYWwi
PjxzcGFuIHN0eWxlPSJmb250LXNpemU6MTEuMHB0Ij48bzpwPiZuYnNwOzwvbzpwPjwvc3Bhbj48
L3A+DQo8cCBjbGFzcz0iTXNvTm9ybWFsIj48c3BhbiBzdHlsZT0iZm9udC1zaXplOjExLjBwdCI+
RG8geW91IHRoaW5rLCB3ZSBoYXZlIGFueSBwb3NzaWJpbGl0eSBvZiBkb2luZyBzdWNoIHRoaW5n
ID8NCjxvOnA+PC9vOnA+PC9zcGFuPjwvcD4NCjxwIGNsYXNzPSJNc29Ob3JtYWwiPjxzcGFuIHN0
eWxlPSJmb250LXNpemU6MTEuMHB0Ij48bzpwPiZuYnNwOzwvbzpwPjwvc3Bhbj48L3A+DQo8cCBj
bGFzcz0iTXNvTm9ybWFsIj48c3BhbiBzdHlsZT0iZm9udC1mYW1pbHk6JnF1b3Q7VGltZXMgTmV3
IFJvbWFuJnF1b3Q7LHNlcmlmIj5UaGFua3MgPG86cD4NCjwvbzpwPjwvc3Bhbj48L3A+DQo8cCBj
bGFzcz0iTXNvTm9ybWFsIj48c3BhbiBzdHlsZT0iZm9udC1mYW1pbHk6JnF1b3Q7VGltZXMgTmV3
IFJvbWFuJnF1b3Q7LHNlcmlmIj5NYW5rYW1hbmEgPG86cD4NCjwvbzpwPjwvc3Bhbj48L3A+DQo8
cCBjbGFzcz0iTXNvTm9ybWFsIj48bzpwPiZuYnNwOzwvbzpwPjwvcD4NCjwvZGl2Pg0KPC9ib2R5
Pg0KPC9odG1sPg0K

--_000_09B271E29DA6438AB9CCA807A59D9D1Cciscocom_--


From nobody Fri Nov 16 12:38:46 2018
Return-Path: <kaduk@mit.edu>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 996F7130DC0 for <tools-discuss@ietfa.amsl.com>; Fri, 16 Nov 2018 12:38:44 -0800 (PST)
X-Quarantine-ID: <2UKWumBZ2KTR>
X-Virus-Scanned: amavisd-new at amsl.com
X-Amavis-Alert: BAD HEADER SECTION, Non-encoded 8-bit data (char 9C hex): Received: ...s kaduk@ATHENA.MIT.EDU)\n\t\234by outgoing.mit[...]
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level: 
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2UKWumBZ2KTR for <tools-discuss@ietfa.amsl.com>; Fri, 16 Nov 2018 12:38:43 -0800 (PST)
Received: from dmz-mailsec-scanner-8.mit.edu (dmz-mailsec-scanner-8.mit.edu [18.7.68.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 908EC128CF2 for <tools-discuss@ietf.org>; Fri, 16 Nov 2018 12:38:42 -0800 (PST)
X-AuditID: 12074425-67dff70000004ad8-3b-5bef2acf9a83
Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP id 5C.80.19160.0DA2FEB5; Fri, 16 Nov 2018 15:38:40 -0500 (EST)
Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11]) by mailhub-auth-3.mit.edu (8.14.7/8.9.2) with ESMTP id wAGKccqG006433 for <tools-discuss@ietf.org>; Fri, 16 Nov 2018 15:38:39 -0500
Received: from kduck.kaduk.org (24-107-191-124.dhcp.stls.mo.charter.com [24.107.191.124]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) œby outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id wAGKcZY6008672 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for <tools-discuss@ietf.org>; Fri, 16 Nov 2018 15:38:37 -0500
Date: Fri, 16 Nov 2018 14:38:35 -0600
From: Benjamin Kaduk <kaduk@mit.edu>
To: tools-discuss@ietf.org
Message-ID: <20181116203549.GC11132@kduck.kaduk.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNIsWRmVeSWpSXmKPExsUixCmqrXtB6320wYJpShbbj8xldGD0WLLk J1MAYxSXTUpqTmZZapG+XQJXxtK7LUwF/9kqptyLaGA8ztrFyMkhIWAisfzvT0YQW0hgDZPE p3dMXYxcQPY5RolL714xQzi/mCTuHTnEDFLFIqAq8efOTTYQm01ARaKh+zJYXERASmJa4yoW EFtYwEXixPtOsDgv0IYbiz8yQtiCEidnPgGrYRbQkrjx7yXQNg4gW1pi+T8OkLCogLLE3r5D 7BMYeWch6ZiFpGMWQscCRuZVjLIpuVW6uYmZOcWpybrFyYl5ealFuhZ6uZkleqkppZsYQWHE 7qK6g3HOX69DjAIcjEo8vAoP3kULsSaWFVfmHmKU5GBSEuXllnofLcSXlJ9SmZFYnBFfVJqT WnyIUYKDWUmE98cLoHLelMTKqtSifJiUNAeLkjjvH5HH0UIC6YklqdmpqQWpRTBZGQ4OJQne bE2goYJFqempFWmZOSUIaSYOTpDhPEDDN4LU8BYXJOYWZ6ZD5E8xKkqJ8x4ASQiAJDJK8+B6 QXEukb2/5hWjONArwrz7QKp4gCkCrvsV0GAmoMEnpr4GGVySiJCSamCUSrtka7FEKGrejy2v azPDNCev8fCJPSHAPmWFW7mQb5uqqtVmyz38C7bN9iz+51i2rmdiT3PPOqXqredjG87EcE2b Xuh3L7FkVtqtWRdcTmZb57omvFI6eW/djSWnU1SZrl1gk0tOzz/SzdVlqrvW7JxO79Z3RqoZ PybPkKyuTp/ycH1q6A8lluKMREMt5qLiRADwkU/kzgIAAA==
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/EF5z55NuS4-MNr9ErYPhDv03Ml4>
Subject: [Tools-discuss] changed alignment of main content in HTML versions of docs?
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Nov 2018 20:38:45 -0000

Hi all,

I think that "recently" (in the past few weeks, maybe?  I was not good at
taking notes), the HTML rendering of documents on tools.ietf.org has
changed from having the main <pre> content being basically left-aligned to
being more center-aligned.  On the face of it this would be good, as
centering is kinder to the reader and meets modern expectations for page
behvaior.  However, when I go to copy/paste a chunk of text, there is now a
huge left margin that I can't start my click+drag on in order to select
text.  Instead, I have to find something within the <pre>, and if I want
uniform indentation, I have to hunt for the actual boundary of the <pre>
where my cursor changes from a pointer to a text-selection 'I'.

Does this change ring a bell for anyone, and is there any hope for a tweak
that would make text-selection easier with the new layout?

Thanks,

Ben


From nobody Fri Nov 16 13:13:54 2018
Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 55043129AB8 for <tools-discuss@ietfa.amsl.com>; Fri, 16 Nov 2018 13:13:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level: 
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AzDjecYBjYNJ for <tools-discuss@ietfa.amsl.com>; Fri, 16 Nov 2018 13:13:50 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BA3FE127332 for <tools-discuss@ietf.org>; Fri, 16 Nov 2018 13:13:50 -0800 (PST)
Received: from h-37-140.a357.priv.bahnhof.se ([94.254.37.140]:51026 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1gNlQz-0005Tu-WF; Fri, 16 Nov 2018 13:13:50 -0800
To: Benjamin Kaduk <kaduk@mit.edu>, tools-discuss@ietf.org
References: <20181116203549.GC11132@kduck.kaduk.org>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <3421ff27-e322-10de-793f-490b0f38330d@levkowetz.com>
Date: Fri, 16 Nov 2018 22:13:42 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <20181116203549.GC11132@kduck.kaduk.org>
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ibGma5pjXifDsbk0fJm5LrKjw6DoK2jcM"
X-SA-Exim-Connect-IP: 94.254.37.140
X-SA-Exim-Rcpt-To: tools-discuss@ietf.org, kaduk@mit.edu
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/TF2twCi1tUEjSm1qntf2_STErYw>
Subject: Re: [Tools-discuss] changed alignment of main content in HTML versions of docs?
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Nov 2018 21:13:52 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--ibGma5pjXifDsbk0fJm5LrKjw6DoK2jcM
Content-Type: multipart/mixed; boundary="jqCV6QU6mdN46fjAo4LqVQQTp07lv0CcG";
 protected-headers="v1"
From: Henrik Levkowetz <henrik@levkowetz.com>
To: Benjamin Kaduk <kaduk@mit.edu>, tools-discuss@ietf.org
Message-ID: <3421ff27-e322-10de-793f-490b0f38330d@levkowetz.com>
Subject: Re: [Tools-discuss] changed alignment of main content in HTML
 versions of docs?
References: <20181116203549.GC11132@kduck.kaduk.org>
In-Reply-To: <20181116203549.GC11132@kduck.kaduk.org>

--jqCV6QU6mdN46fjAo4LqVQQTp07lv0CcG
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi Ben,

On 2018-11-16 21:38, Benjamin Kaduk wrote:
> Hi all,
>=20
> I think that "recently" (in the past few weeks, maybe?  I was not good =
at
> taking notes), the HTML rendering of documents on tools.ietf.org has
> changed from having the main <pre> content being basically left-aligned=
 to
> being more center-aligned.  On the face of it this would be good, as
> centering is kinder to the reader and meets modern expectations for pag=
e
> behvaior.  However, when I go to copy/paste a chunk of text, there is n=
ow a
> huge left margin that I can't start my click+drag on in order to select=

> text.  Instead, I have to find something within the <pre>, and if I wan=
t
> uniform indentation, I have to hunt for the actual boundary of the <pre=
>
> where my cursor changes from a pointer to a text-selection 'I'.

Hmm.  Yes.  I see that this works as I'd expect in Firefox, but in Safari=

it will start the selection at the top of the page if I start the drag in=

the margin.

Ok, I'll see if there is a simple approach which will improve this.


Regards,

	Henrik


> Does this change ring a bell for anyone, and is there any hope for a tw=
eak
> that would make text-selection easier with the new layout?
>=20
> Thanks,
>=20
> Ben
>=20
> ___________________________________________________________
> Tools-discuss mailing list
> Tools-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/tools-discuss
>=20
> Please report datatracker.ietf.org and mailarchive.ietf.org
> bugs at http://tools.ietf.org/tools/ietfdb
> or send email to datatracker-project@ietf.org
>=20
> Please report tools.ietf.org bugs at
> http://tools.ietf.org/tools/issues
> or send email to webmaster@tools.ietf.org
>=20


--jqCV6QU6mdN46fjAo4LqVQQTp07lv0CcG--

--ibGma5pjXifDsbk0fJm5LrKjw6DoK2jcM
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEifjc5+rnL1MJBcZSTptXS4+7FxoFAlvvMwYACgkQTptXS4+7
Fxoa7w//dcHpegtWuX4z87jAM+d2FMP7zYO6rQX/qpMNWbcXXNsMJWJJj2dqKcOl
6fkHIQHZE5GyVxQ83QUzpadWYuxHiY8ZIQtX43cvZZi9yzJ742Kjrjl/PSuG1NLO
4Bzr5wVv4ffpcM9a2zmilg++8B45nS/IsC/ks4bcrovN5w8JfWgV4XvJEFu3NgGZ
VIiRScDKppjnNjDdBfhyv4uxUEigPAZBHpRye4NJwPHbO7bedpAx8WSavLhoojep
/omkJQGCKNc8oFhHEXApUGk6NWInyLHKTxdas7zyLQOzxA2MASmHgl4K4qDyb75T
Gsuj/M7tEGoYhu8ZxG/mKNMHV7yaRtKWIGsFB72WbSQIvJJZ4dYMkv9RgkzALjDx
BT4gz1ADGCKIHoBPPgjrh+v7Sj1T/Pe3sQtEnzqiO3zMw5V+pMhC4rFcUqrQV8Bu
y36y56CuOs4uDSHTzkk2y/jUtX27Ar/WEz6DT2uarnR8rwJYF/S3y/Jo8poMuRej
xXOvSPjB8Z3VCMvxqpDx140Fpn/14JIdCR6UrVuZkxWHnY/9L4yuPJfrlTAMhlol
UMyo1WDtf9dNVTtSHZF1xxsqKMUMjv5drMRObysaabt4Dg/QOv+xOF7CK2iw0VL4
rDgHFHe7PuOpklA5gF2KRM7YuaBEKJSxs2ha0Smgj8c4SXK2Pos=
=QY8R
-----END PGP SIGNATURE-----

--ibGma5pjXifDsbk0fJm5LrKjw6DoK2jcM--


From nobody Mon Nov 19 16:50:48 2018
Return-Path: <dkg@fifthhorseman.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2544A128766 for <tools-discuss@ietfa.amsl.com>; Mon, 19 Nov 2018 16:50:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.19
X-Spam-Level: 
X-Spam-Status: No, score=-4.19 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fL42k4Hiawlt for <tools-discuss@ietfa.amsl.com>; Mon, 19 Nov 2018 16:50:44 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [162.247.75.118]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F221B124C04 for <tools-discuss@ietf.org>; Mon, 19 Nov 2018 16:50:43 -0800 (PST)
Received: from fifthhorseman.net (ool-6c3a0662.static.optonline.net [108.58.6.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 0DC22F99A for <tools-discuss@ietf.org>; Mon, 19 Nov 2018 19:50:41 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 5FDAF1FFEF; Mon, 19 Nov 2018 18:53:23 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: "tools-discuss\@ietf.org" <tools-discuss@ietf.org>
Date: Mon, 19 Nov 2018 18:53:18 -0500
Message-ID: <87zhu4oc9d.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/qrKoyK9Q5_rFixW9zEWKlAb8q6k>
Subject: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 00:50:47 -0000

--=-=-=
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

I ran the following to get all html files:

     rsync -az rsync.tools.ietf.org::tools.html/ html/

But i found that there were several files in that repo that contained
non-ASCII and non-UTF-8 octet sequences.

In particular, i ran:

    find html/ -iname '*.html' -print0 | xargs -0 file | grep -v 'ASCII tex=
t' | grep -v 'UTF-8 Unicode text'

and it produced this output:

html/draft-calabrese-requir-logprot-04.html:                     data
html/draft-dewinter-queue-start-00.html:                         data
html/draft-dewinter-queue-start-01.html:                         data
html/draft-duerst-iri-05.html:                                   data
html/draft-dusse-smime-msg-03.html:                              data
html/draft-felton-universal-language-00.html:                    data
html/draft-fujiwara-dnsop-resolver-update-00.html:               data
html/draft-iana-special-ipv4-03.html:                              data
html/draft-iana-special-ipv4-04.html:                              data
html/draft-ietf-2000-issue-00.html:                                data
html/draft-ietf-2000-issue-03.html:                                data
html/draft-ietf-2000-issue-04.html:                                data
html/draft-ietf-2000-issue-05.html:                                data
html/draft-ietf-2000-issue-06.html:                                data
html/draft-ietf-ccamp-mpls-tp-rsvpte-ext-associated-lsp-08.html: data
html/draft-ietf-dhc-mdhcp-00.html:                               data
html/draft-ietf-dhc-multopt-00.html:                             data
html/draft-ietf-disman-remops-mib-00.html:                       data
html/draft-ietf-disman-remops-mib-01.html:                       data
html/draft-ietf-dnsop-interim-signed-root-01.html:               data
html/draft-ietf-hubmib-mau-mib-03.html:                          data
html/draft-ietf-hubmib-mau-mib-04.html:                          data
html/draft-ietf-hubmib-repeater-dev-02.html:                     data
html/draft-ietf-hubmib-repeater-dev-03.html:                     data
html/draft-ietf-ipcdn-pktc-signaling-07.html:                   data
html/draft-ietf-ippm-ipdv-01.html:                              data
html/draft-ietf-isis-opexp-01.html:                             data
html/draft-ietf-ldapext-matchedval-06.html:                       data
html/draft-ietf-mpls-p2mp-loose-path-reopt-01.html:              data
html/draft-ietf-ospf-hmac-sha-01.html:                           data
html/draft-ietf-pkix-cmmf-02.html:                               data
html/draft-ietf-pkix-scvp-15.html:                               data
html/draft-ietf-rmonmib-pi-ipv6-01.html:                         data
html/draft-ietf-rtfm-new-traffic-flow-00.html:                   data
html/draft-ietf-smime-domsec-00.html:                                data
html/draft-ietf-spki-cert-req-00.html:                               data
html/draft-ietf-spki-cert-structure-00.html:                         data
html/draft-ihren-dnsop-interim-signed-root-01.html:                   data
html/draft-klensin-dns-role-01.html:                             data
html/draft-lear-ietf-rfc2026bis-00.html:                         data
html/draft-leung-sigtran-stream-sctp-00.html:                    data
html/draft-loughney-sctp-sig-prot-00.html:                        data
html/draft-murray-auth-ftp-ssl-03.html:                              data
html/draft-newnan-isomib-internet-00.html:                           data
html/draft-rabbat-ccamp-carrier-survey-01.html:                      data
html/draft-rfced-info-dudley-01.html:                                      =
         data
html/draft-vasseur-mpls-backup-computation-02.html:               data
html/draft-vrancken-oauth-redelegation-01.html:                   data
html/draft-wenzel-cctld-bcp-00.html:                              data
html/draft-white-slapm-mib-00.html:                               data
html/draft-worster-mpls-in-ip-02.html:                            data
html/draft-zhu-rmcat-nada-02.html:                              data
html/draft-zhu-rmcat-nada-03.html:                              data
html/rfc1142.html:                                              data
html/rfc542.html:                                            data
html/rfc652.html:                                            data
html/rfc674.html:                                            data
html/rfc684.html:                                            data
html/rfc731.html:                                            data
html/rfc734.html:                                            data
html/rfc736.html:                                            data
html/rfc752.html:                                            data
html/rfc774.html:                                            data
html/rfc776.html:                                            data
html/rfc783.html:                                            data

Are these things that can be cleaned up by a future html re-rendering?
they're likely to break html parsers or other transformations that work
from the html source.

Regards,

    --dkg

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQTTaP514aqS9uSbmdJsHx7ezFD6UwUCW/NM7gAKCRBsHx7ezFD6
U8hXAQCnN8PqoPdVYFOkUPveX3u5jRaE/WS6kXgz3SGFZ4fUjgD+PPGrieI76ifX
Q+e2bJCjvRgIwKmTquyhA9eHZ/PamAs=
=vQWf
-----END PGP SIGNATURE-----
--=-=-=--


From nobody Tue Nov 20 05:45:46 2018
Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 667F512D7F8 for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 05:45:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level: 
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iLGz-UDhuxkJ for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 05:45:42 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3CFD3127148 for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 05:45:42 -0800 (PST)
Received: from h-37-140.a357.priv.bahnhof.se ([94.254.37.140]:49221 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1gP6LT-0001iX-PF; Tue, 20 Nov 2018 05:45:41 -0800
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>, "tools-discuss@ietf.org" <tools-discuss@ietf.org>
References: <87zhu4oc9d.fsf@fifthhorseman.net>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com>
Date: Tue, 20 Nov 2018 14:45:28 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <87zhu4oc9d.fsf@fifthhorseman.net>
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="KR60lnfTbUSjS7NA8Txg74TLK7P4VrLU5"
X-SA-Exim-Connect-IP: 94.254.37.140
X-SA-Exim-Rcpt-To: tools-discuss@ietf.org, dkg@fifthhorseman.net
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/YhMY_TJlJR0ruc_McW3Ql382IF8>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 13:45:45 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--KR60lnfTbUSjS7NA8Txg74TLK7P4VrLU5
Content-Type: multipart/mixed; boundary="6KXonG8KhRTpLTUvRoXPIAGlEX82o5LRG";
 protected-headers="v1"
From: Henrik Levkowetz <henrik@levkowetz.com>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>,
 "tools-discuss@ietf.org" <tools-discuss@ietf.org>
Message-ID: <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
References: <87zhu4oc9d.fsf@fifthhorseman.net>
In-Reply-To: <87zhu4oc9d.fsf@fifthhorseman.net>

--6KXonG8KhRTpLTUvRoXPIAGlEX82o5LRG
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi Daniel,

On 2018-11-20 00:53, Daniel Kahn Gillmor wrote:
> I ran the following to get all html files:
>=20
>      rsync -az rsync.tools.ietf.org::tools.html/ html/
>=20
> But i found that there were several files in that repo that contained
> non-ASCII and non-UTF-8 octet sequences.

Yes.  Many of these (possibly all, but I haven't inspected each file
individually) are due to control characters in the text documents.
SUB (^Z) seems to be the most common one, but there are BS, DC2, SI,
ACK, FS, and more.

Yes, I could have a go at filtering out those at the same time as I
generate the htmlized versions.  I'll do so for control characters,
then we can revisit and see if additional action is needed.


Best regards,

	Henrik

> In particular, i ran:
>=20
>     find html/ -iname '*.html' -print0 | xargs -0 file | grep -v 'ASCII=
 text' | grep -v 'UTF-8 Unicode text'
>=20
> and it produced this output:
>=20
> html/draft-calabrese-requir-logprot-04.html:                     data
> html/draft-dewinter-queue-start-00.html:                         data
> html/draft-dewinter-queue-start-01.html:                         data
> html/draft-duerst-iri-05.html:                                   data
> html/draft-dusse-smime-msg-03.html:                              data
> html/draft-felton-universal-language-00.html:                    data
> html/draft-fujiwara-dnsop-resolver-update-00.html:               data
> html/draft-iana-special-ipv4-03.html:                              data=

> html/draft-iana-special-ipv4-04.html:                              data=

> html/draft-ietf-2000-issue-00.html:                                data=

> html/draft-ietf-2000-issue-03.html:                                data=

> html/draft-ietf-2000-issue-04.html:                                data=

> html/draft-ietf-2000-issue-05.html:                                data=

> html/draft-ietf-2000-issue-06.html:                                data=

> html/draft-ietf-ccamp-mpls-tp-rsvpte-ext-associated-lsp-08.html: data
> html/draft-ietf-dhc-mdhcp-00.html:                               data
> html/draft-ietf-dhc-multopt-00.html:                             data
> html/draft-ietf-disman-remops-mib-00.html:                       data
> html/draft-ietf-disman-remops-mib-01.html:                       data
> html/draft-ietf-dnsop-interim-signed-root-01.html:               data
> html/draft-ietf-hubmib-mau-mib-03.html:                          data
> html/draft-ietf-hubmib-mau-mib-04.html:                          data
> html/draft-ietf-hubmib-repeater-dev-02.html:                     data
> html/draft-ietf-hubmib-repeater-dev-03.html:                     data
> html/draft-ietf-ipcdn-pktc-signaling-07.html:                   data
> html/draft-ietf-ippm-ipdv-01.html:                              data
> html/draft-ietf-isis-opexp-01.html:                             data
> html/draft-ietf-ldapext-matchedval-06.html:                       data
> html/draft-ietf-mpls-p2mp-loose-path-reopt-01.html:              data
> html/draft-ietf-ospf-hmac-sha-01.html:                           data
> html/draft-ietf-pkix-cmmf-02.html:                               data
> html/draft-ietf-pkix-scvp-15.html:                               data
> html/draft-ietf-rmonmib-pi-ipv6-01.html:                         data
> html/draft-ietf-rtfm-new-traffic-flow-00.html:                   data
> html/draft-ietf-smime-domsec-00.html:                                da=
ta
> html/draft-ietf-spki-cert-req-00.html:                               da=
ta
> html/draft-ietf-spki-cert-structure-00.html:                         da=
ta
> html/draft-ihren-dnsop-interim-signed-root-01.html:                   d=
ata
> html/draft-klensin-dns-role-01.html:                             data
> html/draft-lear-ietf-rfc2026bis-00.html:                         data
> html/draft-leung-sigtran-stream-sctp-00.html:                    data
> html/draft-loughney-sctp-sig-prot-00.html:                        data
> html/draft-murray-auth-ftp-ssl-03.html:                              da=
ta
> html/draft-newnan-isomib-internet-00.html:                           da=
ta
> html/draft-rabbat-ccamp-carrier-survey-01.html:                      da=
ta
> html/draft-rfced-info-dudley-01.html:                                  =
             data
> html/draft-vasseur-mpls-backup-computation-02.html:               data
> html/draft-vrancken-oauth-redelegation-01.html:                   data
> html/draft-wenzel-cctld-bcp-00.html:                              data
> html/draft-white-slapm-mib-00.html:                               data
> html/draft-worster-mpls-in-ip-02.html:                            data
> html/draft-zhu-rmcat-nada-02.html:                              data
> html/draft-zhu-rmcat-nada-03.html:                              data
> html/rfc1142.html:                                              data
> html/rfc542.html:                                            data
> html/rfc652.html:                                            data
> html/rfc674.html:                                            data
> html/rfc684.html:                                            data
> html/rfc731.html:                                            data
> html/rfc734.html:                                            data
> html/rfc736.html:                                            data
> html/rfc752.html:                                            data
> html/rfc774.html:                                            data
> html/rfc776.html:                                            data
> html/rfc783.html:                                            data
>=20
> Are these things that can be cleaned up by a future html re-rendering?
> they're likely to break html parsers or other transformations that work=

> from the html source.
>=20
> Regards,
>=20
>     --dkg
>=20
>=20
>=20
> ___________________________________________________________
> Tools-discuss mailing list
> Tools-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/tools-discuss
>=20
> Please report datatracker.ietf.org and mailarchive.ietf.org
> bugs at http://tools.ietf.org/tools/ietfdb
> or send email to datatracker-project@ietf.org
>=20
> Please report tools.ietf.org bugs at
> http://tools.ietf.org/tools/issues
> or send email to webmaster@tools.ietf.org
>=20


--6KXonG8KhRTpLTUvRoXPIAGlEX82o5LRG--

--KR60lnfTbUSjS7NA8Txg74TLK7P4VrLU5
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIyBAEBCAAdFiEEifjc5+rnL1MJBcZSTptXS4+7FxoFAlv0D/kACgkQTptXS4+7
Fxqb6g/4zK/sKz6MIpX7RfyjOi5siLfbjAZXTK6dvNIGC5btuQ570Nr9EOa8Q38t
Il6Xx+XQD7moNOMylG+ZSTLhgHegmyP4QRFTBvA5YkzLG8CiaBB+QE3158hJFRFC
FdVbEX9FK/6eVt859ZXaLhUmNnoMhrZE0nT0b9UYLmuMyLezPnsZDmzfEfi2CWr4
8mJPZz1mszH3azyOLCBpCZrxgHgRgDR77DS9Km6qI2gU6+sK4rsvBaHhEbI+Iwwq
bKP3tmjo03UvAlLjzGy4MTjBCIuuqABpmQDYB2JEjeikA3InzHikDb38Sf9WMGy7
ki6rCJeOhaMPlugzucfVz4OOIFE7kpzF1MNnT5TWRny3QMXgMc4jMfFN/E6Ggy+R
pe3OOMhTgHROlfJCXWJ3aLwZ1JY2j5GzwQb1txsGgHlvcjtEqrnZguidRZqNOXVd
LeuWT+iWMp9a/GmB4thKDuBPPds15rOsESeOUC/svtjmvktaMMzYW9QV5A0IOTWK
MQoc43HohzUcKzuvMBqPFebOb/bLEh+yPvytskBBxlGDvX6zIJoP5A0+PDC5BT26
O5Zg3lNEF3wkbEh2Yb0VvVJacE3iKKPm8wNi99htoojNeDE7TDlSQNdSExtyEC75
W5tMWW3zDrMliHyKeNtxDrpCWQuzi9fIk2CQWrr4wfFlwjP1KQ==
=z4iv
-----END PGP SIGNATURE-----

--KR60lnfTbUSjS7NA8Txg74TLK7P4VrLU5--


From nobody Tue Nov 20 08:18:14 2018
Return-Path: <dkg@fifthhorseman.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2DD6812D4F2 for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 08:18:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.89
X-Spam-Level: 
X-Spam-Status: No, score=-1.89 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id T3j2bJXW5uA7 for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 08:18:11 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [IPv6:2001:470:1:116::7]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5177112F295 for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 08:18:11 -0800 (PST)
Received: from fifthhorseman.net (unknown [38.109.115.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 0088DF99A; Tue, 20 Nov 2018 11:18:08 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 8601F202A8; Tue, 20 Nov 2018 11:16:29 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Henrik Levkowetz <henrik@levkowetz.com>, "tools-discuss\@ietf.org" <tools-discuss@ietf.org>
In-Reply-To: <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com>
Date: Tue, 20 Nov 2018 11:16:26 -0500
Message-ID: <87pnuzohb9.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/Lam4F71WhBguGMAZOP-UsUO7FBg>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 16:18:13 -0000

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On Tue 2018-11-20 14:45:28 +0100, Henrik Levkowetz wrote:
> On 2018-11-20 00:53, Daniel Kahn Gillmor wrote:
>
>> i found that there were several files in that repo that contained
>> non-ASCII and non-UTF-8 octet sequences.
>
> Yes.  Many of these (possibly all, but I haven't inspected each file
> individually) are due to control characters in the text documents.
> SUB (^Z) seems to be the most common one, but there are BS, DC2, SI,
> ACK, FS, and more.

Thanks!  Filtering those out for the HTML would be good.  But that's
clearly not the only thing remaining, since some files contain high
bytes.

I did a review of all of the documents that contain octets above 0x7f:

 * draft-ietf-2000-issue-05.html is the most troubling one -- it appears
   to contain fragments of an embedded pdf document, if i'm reading it
   correctly.

 * draft-felton-universal-language-00.html contains what appears to be
   attempted embedded unicode Chinese, but it has been garbled.

 * draft-ietf-dnsop-interim-signed-root-01.html and
   draft-ihren-dnsop-interim-signed-root-01.html contain more mangled
   unicode in the author names.

 * draft-ietf-pkix-cmmf-02.html contains some kind of mangled em dashes
   and smartquotes

 * draft-ietf-rmonmib-pi-ipv6-01.html has a lot of weirdness in the very
   end of the document

 * draft-ietf-smime-domsec-00.html appears to have mangled smartquotes

 * draft-klensin-dns-role-01.html has some breakage around "Montr=C3=A9al"
   and 'German glyph "=C3=B6"'

 * draft-loughney-sctp-sig-prot-00.html contains some mangled characters
   in the organization addresses

 * draft-rabbat-ccamp-carrier-survey-01.html contains a weird character
   in the page headers between "Expires" and "April 2005"

 * draft-vasseur-mpls-backup-computation-02.html mangles "Espa=C3=B1a"

 * draft-worster-mpls-in-ip-02.html has a damaged Fax number for Rick
   Wilder

Regards,

        --dkg

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQTTaP514aqS9uSbmdJsHx7ezFD6UwUCW/QzWgAKCRBsHx7ezFD6
U3nIAQDr3J6uDsG/XGDWKp4gMFNjzLMBzFqRxGDYMA5+wytONQEA5770T+CJxawu
HKOAqEbolATuY3dlNRA7NHoOEwSkkQ8=
=DRs0
-----END PGP SIGNATURE-----
--=-=-=--


From nobody Tue Nov 20 08:29:14 2018
Return-Path: <cabo@tzi.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B0BEF12008A for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 08:29:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level: 
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2gBwl03JV0zC for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 08:29:11 -0800 (PST)
Received: from mailhost.informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3ADDC12F1AC for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 08:29:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from submithost.informatik.uni-bremen.de (submithost2.informatik.uni-bremen.de [134.102.200.7]) by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id wAKGT06S024820; Tue, 20 Nov 2018 17:29:05 +0100 (CET)
Received: from sev.informatik.uni-bremen.de (sev.informatik.uni-bremen.de [134.102.218.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id 42zrkN22wkz1Bqf; Tue, 20 Nov 2018 17:29:00 +0100 (CET)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <87pnuzohb9.fsf@fifthhorseman.net>
Date: Tue, 20 Nov 2018 17:28:59 +0100
Cc: Henrik Levkowetz <henrik@levkowetz.com>, "tools-discuss@ietf.org" <tools-discuss@ietf.org>
X-Mao-Original-Outgoing-Id: 564424136.373087-dff77675f7c7d757f5c084ce267918ff
Content-Transfer-Encoding: quoted-printable
Message-Id: <EDF23977-8580-4C65-8F45-BB210C7B7081@tzi.org>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/NbllRs2XtAmCmh8AdizvZmmVYZ4>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 16:29:13 -0000

On Nov 20, 2018, at 17:16, Daniel Kahn Gillmor <dkg@fifthhorseman.net> =
wrote:
>=20
> I did a review of all of the documents that contain octets above 0x7f:

Are these all expired I-Ds?

I don=E2=80=99t think anything can really be done about those, as the =
archival value of keeping them around is based on them being unchanged.

Gr=C3=BC=C3=9Fe, Carsten


From nobody Tue Nov 20 09:39:34 2018
Return-Path: <dkg@fifthhorseman.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4091C12008A for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 09:39:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.189
X-Spam-Level: 
X-Spam-Status: No, score=-4.189 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TJPrxilPQxq3 for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 09:39:31 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [162.247.75.118]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D9DC6130DCC for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 09:39:29 -0800 (PST)
Received: from fifthhorseman.net (unknown [38.109.115.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 801B6F99E; Tue, 20 Nov 2018 12:39:27 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id F2DAD20387; Tue, 20 Nov 2018 12:31:01 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Carsten Bormann <cabo@tzi.org>
Cc: Henrik Levkowetz <henrik@levkowetz.com>, "tools-discuss\@ietf.org" <tools-discuss@ietf.org>
In-Reply-To: <EDF23977-8580-4C65-8F45-BB210C7B7081@tzi.org>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net> <EDF23977-8580-4C65-8F45-BB210C7B7081@tzi.org>
Date: Tue, 20 Nov 2018 12:30:58 -0500
Message-ID: <87k1l7odv1.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/T4ZFLA5KW7lsb51-hu-s1Hv6Gk0>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 17:39:33 -0000

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On Tue 2018-11-20 17:28:59 +0100, Carsten Bormann wrote:
> On Nov 20, 2018, at 17:16, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wr=
ote:
>>=20
>> I did a review of all of the documents that contain octets above 0x7f:
>
> Are these all expired I-Ds?
>
> I don=E2=80=99t think anything can really be done about those, as the arc=
hival
> value of keeping them around is based on them being unchanged.

what is the value of unchanged invalid bytestreams?  we re-render other
old documents when the html rendering process is improved.  why not do
the same for these if we know there is cleanup that will make the
document valid?

I'm willing to entertain specific arguments, but it's not clear to me
that we've ever held out that "these HTML renderings must remain
byte-for-byte identical" -- have we?

         --dkg

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQTTaP514aqS9uSbmdJsHx7ezFD6UwUCW/RE0wAKCRBsHx7ezFD6
U6m9AQDPr4GYt4/rsVcIvJ80AbUHjhgsODoe3tjBXEtWnBIGSwD+L/KL5Aias0+Z
8AHTXlGHNCNlhD1+3mh9E+5wq3rHGwA=
=xLat
-----END PGP SIGNATURE-----
--=-=-=--


From nobody Tue Nov 20 09:40:37 2018
Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AA4E7130E99 for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 09:40:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G1xtj5Ra6UZE for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 09:40:27 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E7359130E83 for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 09:40:27 -0800 (PST)
Received: from h-37-140.a357.priv.bahnhof.se ([94.254.37.140]:50084 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1gPA0h-00078M-5Z; Tue, 20 Nov 2018 09:40:27 -0800
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>, "tools-discuss@ietf.org" <tools-discuss@ietf.org>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com>
Date: Tue, 20 Nov 2018 18:40:19 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <87pnuzohb9.fsf@fifthhorseman.net>
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="r0ONwNh9MCqXPiF0FBMk0uB5oBnknI4ms"
X-SA-Exim-Connect-IP: 94.254.37.140
X-SA-Exim-Rcpt-To: tools-discuss@ietf.org, dkg@fifthhorseman.net
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/TIRjTYCS0tjgSk9BkgQHhbeJU3U>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 17:40:36 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--r0ONwNh9MCqXPiF0FBMk0uB5oBnknI4ms
Content-Type: multipart/mixed; boundary="wnlj2p31IiV4dSOI4hBpuA2eIX5IxS6LN";
 protected-headers="v1"
From: Henrik Levkowetz <henrik@levkowetz.com>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>,
 "tools-discuss@ietf.org" <tools-discuss@ietf.org>
Message-ID: <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
References: <87zhu4oc9d.fsf@fifthhorseman.net>
 <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com>
 <87pnuzohb9.fsf@fifthhorseman.net>
In-Reply-To: <87pnuzohb9.fsf@fifthhorseman.net>

--wnlj2p31IiV4dSOI4hBpuA2eIX5IxS6LN
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 2018-11-20 17:16, Daniel Kahn Gillmor wrote:
> On Tue 2018-11-20 14:45:28 +0100, Henrik Levkowetz wrote:
>> On 2018-11-20 00:53, Daniel Kahn Gillmor wrote:
>>
>>> i found that there were several files in that repo that contained
>>> non-ASCII and non-UTF-8 octet sequences.
>>
>> Yes.  Many of these (possibly all, but I haven't inspected each file
>> individually) are due to control characters in the text documents.
>> SUB (^Z) seems to be the most common one, but there are BS, DC2, SI,
>> ACK, FS, and more.
>=20
> Thanks!  Filtering those out for the HTML would be good.  But that's
> clearly not the only thing remaining, since some files contain high
> bytes.
>=20
> I did a review of all of the documents that contain octets above 0x7f:

I've looked through the equivalent .txt files, and I believe that common
for all except draft-ietf-2000-issue is that they use unknown code pages;=

not latin-1 and certainly not utf-8.  An attempt to check some of them
against other less code pages failed to find a match.

Any fixup of these would have to be built into the htmlizer as custom
tweaks for each specific document (except for draft-ietf-2000-issue,
where some other solution would be needed).

All of these are old, I think the latest from 2005; two from 2004; and
the rest earlier.  I doubt it's meaningful to add custom fixes...


Best regards,

	Henrik

>  * draft-ietf-2000-issue-05.html is the most troubling one -- it appear=
s
>    to contain fragments of an embedded pdf document, if i'm reading it
>    correctly.
>=20
>  * draft-felton-universal-language-00.html contains what appears to be
>    attempted embedded unicode Chinese, but it has been garbled.
>=20
>  * draft-ietf-dnsop-interim-signed-root-01.html and
>    draft-ihren-dnsop-interim-signed-root-01.html contain more mangled
>    unicode in the author names.
>=20
>  * draft-ietf-pkix-cmmf-02.html contains some kind of mangled em dashes=

>    and smartquotes
>=20
>  * draft-ietf-rmonmib-pi-ipv6-01.html has a lot of weirdness in the ver=
y
>    end of the document
>=20
>  * draft-ietf-smime-domsec-00.html appears to have mangled smartquotes
>=20
>  * draft-klensin-dns-role-01.html has some breakage around "Montr=C3=A9=
al"
>    and 'German glyph "=C3=B6"'
>=20
>  * draft-loughney-sctp-sig-prot-00.html contains some mangled character=
s
>    in the organization addresses
>=20
>  * draft-rabbat-ccamp-carrier-survey-01.html contains a weird character=

>    in the page headers between "Expires" and "April 2005"
>=20
>  * draft-vasseur-mpls-backup-computation-02.html mangles "Espa=C3=B1a"
>=20
>  * draft-worster-mpls-in-ip-02.html has a damaged Fax number for Rick
>    Wilder
>=20
> Regards,
>=20
>         --dkg
>=20


--wnlj2p31IiV4dSOI4hBpuA2eIX5IxS6LN--

--r0ONwNh9MCqXPiF0FBMk0uB5oBnknI4ms
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEifjc5+rnL1MJBcZSTptXS4+7FxoFAlv0RwMACgkQTptXS4+7
Fxq/5RAAs6kPTjQjWTlpLFIYQTqtT5iGpxI6Z+Xq2qAdJWWgaLV7t+NiJBO6JVOv
nCuUTMgAMurNduBrQ3JJtPZWNG7Yl9xAxS829T9NZQC1p8nfXO5RU9Om9tjHoZbb
pN7eZC1G3TQNBIBya0lErPB2t/SldOQcsSZCIXjZInQNHjhBvsPzZdUL3Q0F7TRr
h2f/Se1VaIb2MgpB8H31vQ5FMzo8e2aRsPiUE/v3/8bDDsHGAV4PM8QUvcm9ph6h
ETrknHa3gATbqgK1c1/gBQ12JRqi3RorWX37LxDQ+/46vpbc6tLzDkuZxXFZ6N+X
aX18oAVCw3IOn6/fA26ACRwFtjIHYvqXNqfJHNtmvRepfxoH8S1WZi6XQuMbzo0Z
oINWQbySZpJ290h9zJgDMWlTTBEWBUkWSgHbiY/4jXJr/RhLgsKv3rlZ+1JnebXV
4SqLa7mQV/RVfoBZWd+AAc/H3rMFDyJeadLDwNwOxXdRyoSHptFkAEQYUBQtjY05
l3aw1nrwL6nRqX4mU5ku0+BqKUD+Q3C/NFiAUhNf+LZQ9jV6UTJ46U+Wg/rX/FPb
Ol5Qrwf76lNB3oCNDEkJxALn8ZSZLtzCKXhJFMvLeSJxiX+Yctm/b8XKdYPbCnE+
8hn5Wzq61OVZTXZTKZ9q58piJn7fSdy9G3XZlYcwhRhHsVCClgU=
=yET7
-----END PGP SIGNATURE-----

--r0ONwNh9MCqXPiF0FBMk0uB5oBnknI4ms--


From nobody Tue Nov 20 10:11:20 2018
Return-Path: <dkg@fifthhorseman.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CFEB1130DCD for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 10:11:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.89
X-Spam-Level: 
X-Spam-Status: No, score=-1.89 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s1rFZ090W2ag for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 10:11:17 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [IPv6:2001:470:1:116::7]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 33B66130DCF for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 10:11:09 -0800 (PST)
Received: from fifthhorseman.net (unknown [38.109.115.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 5BC37F99B; Tue, 20 Nov 2018 13:11:07 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 163B220387; Tue, 20 Nov 2018 13:11:04 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Henrik Levkowetz <henrik@levkowetz.com>, "tools-discuss\@ietf.org" <tools-discuss@ietf.org>
In-Reply-To: <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net> <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com>
Date: Tue, 20 Nov 2018 13:11:00 -0500
Message-ID: <87bm6joc0b.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/_OVpNkxLf3uwGNJtgP_NFa5rHgw>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 18:11:19 -0000

--=-=-=
Content-Type: text/plain

On Tue 2018-11-20 18:40:19 +0100, Henrik Levkowetz wrote:
> I've looked through the equivalent .txt files, and I believe that common
> for all except draft-ietf-2000-issue is that they use unknown code pages;
> not latin-1 and certainly not utf-8.  An attempt to check some of them
> against other less code pages failed to find a match.

yep, i think there's been some sort of multi-trip conversion that
happened to them in the past, which makes them not quite right.

> Any fixup of these would have to be built into the htmlizer as custom
> tweaks for each specific document (except for draft-ietf-2000-issue,
> where some other solution would be needed).
>
> All of these are old, I think the latest from 2005; two from 2004; and
> the rest earlier.  I doubt it's meaningful to add custom fixes...

What does "i doubt it's meaningful" mean?  I'm working on trying to
import all of these drafts into a single index as a first step toward
providing nice/simple search and inference about these drafts.  It's
certainly meaningful to me to have syntactically valid documents for
doing this kind of work.

Of course, i could add custom fixes to the docs myself before i index
them, but that seems like the wrong way to fix things for the broader
community.

        --dkg

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQTTaP514aqS9uSbmdJsHx7ezFD6UwUCW/RONAAKCRBsHx7ezFD6
U0YyAP9Y5twF7qOZkxH8H3y1GpmUzf2WYQ7mdX4jikz9hHhoYQEA2LpFKSBhlWPN
WE6FkC5Ntv07asUIz/ppMDBCxYcEkwA=
=CuBV
-----END PGP SIGNATURE-----
--=-=-=--


From nobody Tue Nov 20 10:41:59 2018
Return-Path: <paul.hoffman@vpnc.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E0F58130DCF for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 10:41:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Vq3STgM_XEcg for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 10:41:56 -0800 (PST)
Received: from mail.proper.com (Opus1.Proper.COM [207.182.41.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 918C51277BB for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 10:41:56 -0800 (PST)
Received: from [10.32.60.84] (50-1-51-141.dsl.dynamic.fusionbroadband.com [50.1.51.141]) (authenticated bits=0) by mail.proper.com (8.15.2/8.15.2) with ESMTPSA id wAKIet3Q052569 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 20 Nov 2018 11:40:57 -0700 (MST) (envelope-from paul.hoffman@vpnc.org)
X-Authentication-Warning: mail.proper.com: Host 50-1-51-141.dsl.dynamic.fusionbroadband.com [50.1.51.141] claimed to be [10.32.60.84]
From: "Paul Hoffman" <paul.hoffman@vpnc.org>
To: "Daniel Kahn Gillmor" <dkg@fifthhorseman.net>
Cc: tools-discuss@ietf.org
Date: Tue, 20 Nov 2018 10:41:51 -0800
X-Mailer: MailMate (1.12.1r5552)
Message-ID: <8EBB9EBA-7893-4E93-8AD6-7BB122D90ACD@vpnc.org>
In-Reply-To: <87bm6joc0b.fsf@fifthhorseman.net>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net> <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com> <87bm6joc0b.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: text/plain; format=flowed
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/dt4-mxZzBWDC28IscRpqiCRL7tU>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 18:41:58 -0000

On 20 Nov 2018, at 10:11, Daniel Kahn Gillmor wrote:

> What does "i doubt it's meaningful" mean?  I'm working on trying to
> import all of these drafts into a single index as a first step toward
> providing nice/simple search and inference about these drafts.  It's
> certainly meaningful to me to have syntactically valid documents for
> doing this kind of work.

Why are you doing this on the HTML instead of the source text files? The 
latter seems less likely to be borked.

--Paul


From nobody Tue Nov 20 10:58:51 2018
Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A1B8F130DC7 for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 10:58:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ksQUZoRsju4W for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 10:58:48 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 61776130DC6 for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 10:58:48 -0800 (PST)
Received: from h-37-140.a357.priv.bahnhof.se ([94.254.37.140]:50406 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1gPBEU-0000wV-H4; Tue, 20 Nov 2018 10:58:47 -0800
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>, "tools-discuss@ietf.org" <tools-discuss@ietf.org>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net> <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com> <87bm6joc0b.fsf@fifthhorseman.net>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <093a816b-e7ee-d2cb-3583-67f89d4b7e4d@levkowetz.com>
Date: Tue, 20 Nov 2018 19:58:36 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <87bm6joc0b.fsf@fifthhorseman.net>
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="1ALFgiduO9CpiD3D2SDsLQiW1s2CCxdq8"
X-SA-Exim-Connect-IP: 94.254.37.140
X-SA-Exim-Rcpt-To: tools-discuss@ietf.org, dkg@fifthhorseman.net
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/2NMKBQsuFckj6pbPRTZwvpFtipc>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 18:58:50 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--1ALFgiduO9CpiD3D2SDsLQiW1s2CCxdq8
Content-Type: multipart/mixed; boundary="vcHXD5B0ds3WV7INMJcWpSP3SHjAVEDfQ";
 protected-headers="v1"
From: Henrik Levkowetz <henrik@levkowetz.com>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>,
 "tools-discuss@ietf.org" <tools-discuss@ietf.org>
Message-ID: <093a816b-e7ee-d2cb-3583-67f89d4b7e4d@levkowetz.com>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
References: <87zhu4oc9d.fsf@fifthhorseman.net>
 <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com>
 <87pnuzohb9.fsf@fifthhorseman.net>
 <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com>
 <87bm6joc0b.fsf@fifthhorseman.net>
In-Reply-To: <87bm6joc0b.fsf@fifthhorseman.net>

--vcHXD5B0ds3WV7INMJcWpSP3SHjAVEDfQ
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 2018-11-20 19:11, Daniel Kahn Gillmor wrote:
> On Tue 2018-11-20 18:40:19 +0100, Henrik Levkowetz wrote:
>> I've looked through the equivalent .txt files, and I believe that comm=
on
>> for all except draft-ietf-2000-issue is that they use unknown code pag=
es;
>> not latin-1 and certainly not utf-8.  An attempt to check some of them=

>> against other less code pages failed to find a match.
>=20
> yep, i think there's been some sort of multi-trip conversion that
> happened to them in the past, which makes them not quite right.
>=20
>> Any fixup of these would have to be built into the htmlizer as custom
>> tweaks for each specific document (except for draft-ietf-2000-issue,
>> where some other solution would be needed).
>>
>> All of these are old, I think the latest from 2005; two from 2004; and=

>> the rest earlier.  I doubt it's meaningful to add custom fixes...
>=20
> What does "i doubt it's meaningful" mean?

It means that I already work way too many hours per week and have a hard
time justifying putting hours into this, I'm afraid.  It is more a matter=

of what I can justify at the moment, not an attempt to state what's
meaningful in a wider context.

> I'm working on trying to
> import all of these drafts into a single index as a first step toward
> providing nice/simple search and inference about these drafts.  It's
> certainly meaningful to me to have syntactically valid documents for
> doing this kind of work.

Good background info; makes me understand better what's driving this.

> Of course, i could add custom fixes to the docs myself before i index
> them, but that seems like the wrong way to fix things for the broader
> community.

Ack.

If you could provide it, I can easily insert python code on this pattern
into the htmlizer:

    if draftname =3D=3D "draft-ietf-dnsop-interim-signed-root-01":
        text =3D text.replace(u"F\x84ltstr\xF7m", u"F\u00e4ltstr\u00f6")
        text =3D text.replace(u"Ihr\x89n", u"Ihr\u00e9n")
        # ...


Best,

	Henrik


--vcHXD5B0ds3WV7INMJcWpSP3SHjAVEDfQ--

--1ALFgiduO9CpiD3D2SDsLQiW1s2CCxdq8
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEifjc5+rnL1MJBcZSTptXS4+7FxoFAlv0WVwACgkQTptXS4+7
FxrIuBAAmu9U6IyuBq7W02QvJr1T+HDP3TBpYa1UYI9MmVIVW9ITsDud9tCjssqo
H/Fwhc3+5QTu444uy+g4/Me+bQt0/6KDPKwI4Uua7UTxhKvx++Om0EoRaW18xH3K
hCxRTo/CM+KyKvqUg02z/DCv5UTXFzmbeUQM2HJSumbLlMzM3GOBEE4d7mlcvXsU
3ZZsF0rwvuVxk5cf3XP4XZVh3XB+TsQXoDH7ThOONW6jn1cNdFpV+VCy6jGp0+ft
kSAk19SKjGVR94ONmJITpvHqL02ICstb4uiptmZeM2uUZNlXyzShgqry0IA3waXq
aa5YZiOBRRSOctl1VjLwjpntImQ+fg3tG7W2KnnHdbYeXH9agP77qu33Fk0Ngw+A
RwpU1iCCer345pa/wC0RPNtbmyWI80tgT4gpFPHUv30Id5H3le4dl0uasKp39u/t
fOsJCzZvIJKkt0A1jySJXkq75H6B/GjgkGJGbCf39mI5aktb/ouA6quHnleE1eIH
P6ESU3PnjzdoWvp/7PNqSnoOM/HOdgfaoevERL7BP2pS0nDbzszOo1KqMlsudUs1
IBCKbLBzJYqdreL8N05WItFKYFnwcvT1N23WZqSBG2Hl+LglqEoPg2dk9SdwpiSF
65C8LUIrCZPvwaRRWfG5fGuGGVBgdqk/oWoPsUBXBNV7xPzmxRA=
=Y2nu
-----END PGP SIGNATURE-----

--1ALFgiduO9CpiD3D2SDsLQiW1s2CCxdq8--


From nobody Tue Nov 20 11:05:35 2018
Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1778F12D4E9 for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 11:05:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0w2je9Y1HCOX for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 11:05:33 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0BC521277BB for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 11:05:33 -0800 (PST)
Received: from h-37-140.a357.priv.bahnhof.se ([94.254.37.140]:50445 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1gPBL1-0002fY-Cj; Tue, 20 Nov 2018 11:05:32 -0800
To: Paul Hoffman <paul.hoffman@vpnc.org>, Daniel Kahn Gillmor <dkg@fifthhorseman.net>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net> <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com> <87bm6joc0b.fsf@fifthhorseman.net> <8EBB9EBA-7893-4E93-8AD6-7BB122D90ACD@vpnc.org>
Cc: tools-discuss@ietf.org
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <c4b046cf-f474-8b70-4ff0-e0182e4c0b3c@levkowetz.com>
Date: Tue, 20 Nov 2018 20:05:21 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <8EBB9EBA-7893-4E93-8AD6-7BB122D90ACD@vpnc.org>
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="5695lS64SatGFBOSATfxBnK2tqCjJEn6v"
X-SA-Exim-Connect-IP: 94.254.37.140
X-SA-Exim-Rcpt-To: tools-discuss@ietf.org, dkg@fifthhorseman.net, paul.hoffman@vpnc.org
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/w2mX2EaogwIAPtrVqDiAcZMw6pY>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 19:05:34 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--5695lS64SatGFBOSATfxBnK2tqCjJEn6v
Content-Type: multipart/mixed; boundary="A9B2Wg1L0TD8FfRflw0t68JipXxQOp79O";
 protected-headers="v1"
From: Henrik Levkowetz <henrik@levkowetz.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>,
 Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Cc: tools-discuss@ietf.org
Message-ID: <c4b046cf-f474-8b70-4ff0-e0182e4c0b3c@levkowetz.com>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
References: <87zhu4oc9d.fsf@fifthhorseman.net>
 <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com>
 <87pnuzohb9.fsf@fifthhorseman.net>
 <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com>
 <87bm6joc0b.fsf@fifthhorseman.net>
 <8EBB9EBA-7893-4E93-8AD6-7BB122D90ACD@vpnc.org>
In-Reply-To: <8EBB9EBA-7893-4E93-8AD6-7BB122D90ACD@vpnc.org>

--A9B2Wg1L0TD8FfRflw0t68JipXxQOp79O
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 2018-11-20 19:41, Paul Hoffman wrote:
> On 20 Nov 2018, at 10:11, Daniel Kahn Gillmor wrote:
>=20
>> What does "i doubt it's meaningful" mean?  I'm working on trying to
>> import all of these drafts into a single index as a first step toward
>> providing nice/simple search and inference about these drafts.  It's
>> certainly meaningful to me to have syntactically valid documents for
>> doing this kind of work.
>=20
> Why are you doing this on the HTML instead of the source text files? Th=
e=20
> latter seems less likely to be borked.

The source text files *are* broken.  That's the origin of the problem.


	Henrik



--A9B2Wg1L0TD8FfRflw0t68JipXxQOp79O--

--5695lS64SatGFBOSATfxBnK2tqCjJEn6v
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEifjc5+rnL1MJBcZSTptXS4+7FxoFAlv0WvEACgkQTptXS4+7
Fxqpjg/+LvHrzrQwGsBs5ynG8tAgZGD/ogfKODXhLTw8Sm4nfgFcZcKcMGL1dZsv
ZCRC+M/gkPBNKZsmckEXzuz5Em6Ykh8Rk3XQqbcVhkVfs5M72iBuAHoJoZNcqmq1
M2JI/CeOKGR9I3UeQ+IrFh+Iby2F2XLnduStq01h+Pit33bsOin3pmPbE8rTXaap
cLtCHJnRxMh+Es+69ziEvKz4xAXIEnhwrYTyMGoiLTZqt8bRX96KahJhMtmhVewa
Tca3wyOrWWxClShiZOnUWCMJmxeGjI6ZqsNWK0NpTiDkgwu75amw1AZPCL/lmWv6
dpPePaoWEyqvMWuxhNiirI3HpDNPxqRMj5ASZO9kw0EWBA/XLSlIhMKgpTXpa8L/
m2r/6tp6sVbLxVKrjiiZ0HvLqctd86LhwAwhdSHeAvZIP2/N3U7fLRdfgBoNyZ4X
Lm8p8w7/gXNi+s4n2Ed2N3nSKlw4xcsOPt8gt14XSj2Lsv5cMisSA+U7oN8TxhOH
RlqeCMRWm80/FoTroAQoG0k5MtlZ4oljdi4osjrbcdPUUgIHCC0wt1UHnpxOxTBW
zKKl/eOsF/JEfHbsl+QJJUizavWqYvcWvdN72SegGPxV99Hs5V3PXzV/PzmB9k6b
e9X6I2no7GFi2R3XIeRXbo9FrZKH1dkOcT0dDv1lNdm0z8Q5c8s=
=epZF
-----END PGP SIGNATURE-----

--5695lS64SatGFBOSATfxBnK2tqCjJEn6v--


From nobody Tue Nov 20 11:33:59 2018
Return-Path: <dkg@fifthhorseman.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 87BF312785F for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 11:33:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.19
X-Spam-Level: 
X-Spam-Status: No, score=-4.19 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8WxjCChaZj9l for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 11:33:54 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [162.247.75.118]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DFF02130DC6 for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 11:33:53 -0800 (PST)
Received: from fifthhorseman.net (unknown [38.109.115.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 41938F99B; Tue, 20 Nov 2018 14:33:51 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 6423720275; Tue, 20 Nov 2018 14:26:40 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Henrik Levkowetz <henrik@levkowetz.com>, "tools-discuss\@ietf.org" <tools-discuss@ietf.org>
In-Reply-To: <093a816b-e7ee-d2cb-3583-67f89d4b7e4d@levkowetz.com>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net> <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com> <87bm6joc0b.fsf@fifthhorseman.net> <093a816b-e7ee-d2cb-3583-67f89d4b7e4d@levkowetz.com>
Date: Tue, 20 Nov 2018 14:26:33 -0500
Message-ID: <875zwro8ie.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/lBvHAcHblHgpBoAUcAJvaX2Kwek>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 19:33:56 -0000

--=-=-=
Content-Type: text/plain

On Tue 2018-11-20 19:58:36 +0100, Henrik Levkowetz wrote:
> It means that I already work way too many hours per week and have a hard
> time justifying putting hours into this, I'm afraid.  It is more a matter
> of what I can justify at the moment, not an attempt to state what's
> meaningful in a wider context.

gotcha, thanks!  I wasn't trying to imply that you should drop
everything and work on this :)

>> Of course, i could add custom fixes to the docs myself before i index
>> them, but that seems like the wrong way to fix things for the broader
>> community.
>
> Ack.
>
> If you could provide it, I can easily insert python code on this pattern
> into the htmlizer:
>
>     if draftname == "draft-ietf-dnsop-interim-signed-root-01":
>         text = text.replace(u"F\x84ltstr\xF7m", u"F\u00e4ltstr\u00f6")
>         text = text.replace(u"Ihr\x89n", u"Ihr\u00e9n")
>         # ...

cool, great. any particular flavor of python that you prefer (when
dealing with strings and bytes, python2 vs. python3 code tends to be
pretty different)?

If you point me toward the preferred revision control system where this
work would go in, maybe i can send patches, rather than just raw python
snippets.

thanks,

        --dkg

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQTTaP514aqS9uSbmdJsHx7ezFD6UwUCW/Rf6QAKCRBsHx7ezFD6
U/GYAQCxJxEUfIpsoPvqDRtNO5mUWeR8kNGWB70mU4mHNBcdGQEArmMLhdBitn2u
K12NPyEApVswYklFDIlH6HipjhPSZA0=
=IgxN
-----END PGP SIGNATURE-----
--=-=-=--


From nobody Tue Nov 20 13:08:23 2018
Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AE6B2130DD0 for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 13:08:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level: 
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NnijEXeOO0FS for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 13:08:19 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 88E8B130DCC for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 13:08:19 -0800 (PST)
Received: from h-37-140.a357.priv.bahnhof.se ([94.254.37.140]:50944 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1gPDFp-0006El-Sz; Tue, 20 Nov 2018 13:08:18 -0800
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>, "tools-discuss@ietf.org" <tools-discuss@ietf.org>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net> <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com> <87bm6joc0b.fsf@fifthhorseman.net> <093a816b-e7ee-d2cb-3583-67f89d4b7e4d@levkowetz.com> <875zwro8ie.fsf@fifthhorseman.net>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <3ac36e86-60dd-004d-84fb-3e046a204b72@levkowetz.com>
Date: Tue, 20 Nov 2018 22:08:09 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <875zwro8ie.fsf@fifthhorseman.net>
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="CqFi0CD8xkom5NllfX7GDlEmH1meRe4Kn"
X-SA-Exim-Connect-IP: 94.254.37.140
X-SA-Exim-Rcpt-To: tools-discuss@ietf.org, dkg@fifthhorseman.net
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/uhW56NoMfNg0j0QFebPRhfFFivA>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 21:08:22 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--CqFi0CD8xkom5NllfX7GDlEmH1meRe4Kn
Content-Type: multipart/mixed; boundary="wXgW8Scq4JT3hfcsHf2qXP4PLg29BV8CX";
 protected-headers="v1"
From: Henrik Levkowetz <henrik@levkowetz.com>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>,
 "tools-discuss@ietf.org" <tools-discuss@ietf.org>
Message-ID: <3ac36e86-60dd-004d-84fb-3e046a204b72@levkowetz.com>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
References: <87zhu4oc9d.fsf@fifthhorseman.net>
 <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com>
 <87pnuzohb9.fsf@fifthhorseman.net>
 <cd480b02-f907-1bc5-1265-570d4277b28d@levkowetz.com>
 <87bm6joc0b.fsf@fifthhorseman.net>
 <093a816b-e7ee-d2cb-3583-67f89d4b7e4d@levkowetz.com>
 <875zwro8ie.fsf@fifthhorseman.net>
In-Reply-To: <875zwro8ie.fsf@fifthhorseman.net>

--wXgW8Scq4JT3hfcsHf2qXP4PLg29BV8CX
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 2018-11-20 20:26, Daniel Kahn Gillmor wrote:
> On Tue 2018-11-20 19:58:36 +0100, Henrik Levkowetz wrote:
>> It means that I already work way too many hours per week and have a ha=
rd
>> time justifying putting hours into this, I'm afraid.  It is more a mat=
ter
>> of what I can justify at the moment, not an attempt to state what's
>> meaningful in a wider context.
>=20
> gotcha, thanks!  I wasn't trying to imply that you should drop
> everything and work on this :)

Allright :-)

>>> Of course, i could add custom fixes to the docs myself before i index=

>>> them, but that seems like the wrong way to fix things for the broader=

>>> community.
>>
>> Ack.
>>
>> If you could provide it, I can easily insert python code on this patte=
rn
>> into the htmlizer:
>>
>>     if draftname =3D=3D "draft-ietf-dnsop-interim-signed-root-01":
>>         text =3D text.replace(u"F\x84ltstr\xF7m", u"F\u00e4ltstr\u00f6=
")
>>         text =3D text.replace(u"Ihr\x89n", u"Ihr\u00e9n")
>>         # ...
>=20
> cool, great. any particular flavor of python that you prefer (when
> dealing with strings and bytes, python2 vs. python3 code tends to be
> pretty different)?

These days I try to write code that works on 2.7, 3.4, 3.5, and 3.6.

In the code above, 'text' is unicode under 2.7 and str under 3.x; and
the code should work as written under both 2.7 and 3.4 and upwards.

> If you point me toward the preferred revision control system where this=

> work would go in, maybe i can send patches, rather than just raw python=

> snippets.

It will have to go into both the standalone htmlizer lib and the legacy
rfcmarkup scripts, so I'll have to adapt a bit.  Fwiw, the lib is
html2rfc at https://pypi.org/project/rfc2html/ , with SVN at
https://svn.tools.ietf.org/svn/src/rfc2html/.

The legacy script is in https://svn.tools.ietf.org/svn/src/rfcmarkup.
Please don't consider providing patches for both; I'm happy to do the
adaptations.

The legacy script has code to pull all the stuff in the 'header' above th=
e
actual document from various places, and also insert CSS and dublin core
meta-information.  At some point I'll have to rewrite it to use the lib,
and pull information from the datatracker database instead of from misc.
files.

I'll have to add an optional 'name' or 'draftname' parameter to the funct=
ion
call in the lib in order to have something to test the document name agai=
nst,
but conceptually the code would look like what you see above.

This was converted using the rfc2html lib under python 3.6:
http://durif.tools.ietf.org/src/rfc2html/tmp/draft-ietf-dnsop-interim-sig=
ned-root-01.html

and this using the legacy script under 2.7:
http://durif.tools.ietf.org/src/rfcmarkup/tmp/draft-ietf-dnsop-interim-si=
gned-root-01.html


Best regards,

	Henrik

>=20
> thanks,
>=20
>         --dkg
>=20
>=20
>=20
> ___________________________________________________________
> Tools-discuss mailing list
> Tools-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/tools-discuss
>=20
> Please report datatracker.ietf.org and mailarchive.ietf.org
> bugs at http://tools.ietf.org/tools/ietfdb
> or send email to datatracker-project@ietf.org
>=20
> Please report tools.ietf.org bugs at
> http://tools.ietf.org/tools/issues
> or send email to webmaster@tools.ietf.org
>=20


--wXgW8Scq4JT3hfcsHf2qXP4PLg29BV8CX--

--CqFi0CD8xkom5NllfX7GDlEmH1meRe4Kn
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEifjc5+rnL1MJBcZSTptXS4+7FxoFAlv0d7kACgkQTptXS4+7
FxpmyRAAscP+S57TSqqHXemyp1gQnUh1xSnHcqjt/HWfr4oSifTfKR3zfe+6NXnZ
GogNeUqoZMdwHq12zDouRBAlEleZLZNelzvOYXUrOGzbvQJKQggoGflAmR1BXrmI
GTlVGei6wKa997ZdLByn9MBrykfruSt9CJLLjTIMqwhbFQ+cStHRb6yIm63+HF6g
/v0+zkfJoVtnAQ+3uEdiUeqoe+/j2cjfUH7Jsh/pFCED+5wsgbd7kBaYYlMIIaEc
lulaSaeB6s2XZ9I3DmAdqUt+zOuoIor/kvpIbtaq0ECdJ9/a1LSPHpGdPAv8EQjW
S3f3bn5LMKtqKW8RU6DxMHug4fyK8DFUrCNqUsxV1gKS2AFA5a1lhd+Ygb+60uuU
fyI/q8987zDc9v07Aw3SvoIvKs+6tW9BfptqB2fv+XTm3zP2xkLe5sBfSbZiVXFX
B7bcQq2w3Qq64aZN8/iCCoNSJ+5LfHKIg3jspnmMLAyd5AGQpd7m3XE2YFyDZ49f
YFeX8rH1/iKEll7qN7BEyFgE6CKXXAq+6iudpNt2pAlTRFGC2DlfX4HtQ0J5hYex
rFOE/1caNHNCxE3vtIGEVsgYjGrOxrMZXdWnVNkuysDSgyG0LzsJTq8cufeZtPrz
A4k/8nmAvxjwgYGkqC9888A3Lz7kLS+bQNI+BtNOaMS3elQxb6U=
=9n6K
-----END PGP SIGNATURE-----

--CqFi0CD8xkom5NllfX7GDlEmH1meRe4Kn--


From nobody Tue Nov 20 22:05:33 2018
Return-Path: <cabo@tzi.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BF555130EDF for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 22:05:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ezVC-TDpxXKz for <tools-discuss@ietfa.amsl.com>; Tue, 20 Nov 2018 22:05:30 -0800 (PST)
Received: from mailhost.informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6F45B12D4EA for <tools-discuss@ietf.org>; Tue, 20 Nov 2018 22:05:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from submithost.informatik.uni-bremen.de (submithost2.informatik.uni-bremen.de [134.102.200.7]) by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id wAL65I8I020964; Wed, 21 Nov 2018 07:05:23 +0100 (CET)
Received: from [192.168.217.102] (p54A6CE66.dip0.t-ipconnect.de [84.166.206.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id 430BrG144kz1Bqf; Wed, 21 Nov 2018 07:05:18 +0100 (CET)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <87k1l7odv1.fsf@fifthhorseman.net>
Date: Wed, 21 Nov 2018 07:05:17 +0100
Cc: Henrik Levkowetz <henrik@levkowetz.com>, "tools-discuss@ietf.org" <tools-discuss@ietf.org>
X-Mao-Original-Outgoing-Id: 564473113.734902-3dee603d254b74a068483f04f10c89fb
Content-Transfer-Encoding: quoted-printable
Message-Id: <9837F457-CEEA-4D44-B0D1-E2B92F4628E6@tzi.org>
References: <87zhu4oc9d.fsf@fifthhorseman.net> <45bd7024-1866-88ff-0289-157aef2cfe99@levkowetz.com> <87pnuzohb9.fsf@fifthhorseman.net> <EDF23977-8580-4C65-8F45-BB210C7B7081@tzi.org> <87k1l7odv1.fsf@fifthhorseman.net>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/AXQWyqJVaVn-hACBWSY6jnPwpj4>
Subject: Re: [Tools-discuss] non-ASCII, non-UTF-8 in rsync'ed html
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Nov 2018 06:05:33 -0000

Ah, this is about modern HTML generated from ancient borken expired =
I-Ds.

In that case, scrubbing the input indeed seems like the thing to do.

Sorry for misunderstanding the thread.

Gr=C3=BC=C3=9Fe, Carsten


> On Nov 20, 2018, at 18:30, Daniel Kahn Gillmor <dkg@fifthhorseman.net> =
wrote:
>=20
> Signed PGP part
> On Tue 2018-11-20 17:28:59 +0100, Carsten Bormann wrote:
>> On Nov 20, 2018, at 17:16, Daniel Kahn Gillmor =
<dkg@fifthhorseman.net> wrote:
>>>=20
>>> I did a review of all of the documents that contain octets above =
0x7f:
>>=20
>> Are these all expired I-Ds?
>>=20
>> I don=E2=80=99t think anything can really be done about those, as the =
archival
>> value of keeping them around is based on them being unchanged.
>=20
> what is the value of unchanged invalid bytestreams?  we re-render =
other
> old documents when the html rendering process is improved.  why not do
> the same for these if we know there is cleanup that will make the
> document valid?
>=20
> I'm willing to entertain specific arguments, but it's not clear to me
> that we've ever held out that "these HTML renderings must remain
> byte-for-byte identical" -- have we?
>=20
>         --dkg
>=20
>=20


From nobody Sun Nov 25 13:38:01 2018
Return-Path: <kaduk@mit.edu>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 563CC129C6A for <tools-discuss@ietfa.amsl.com>; Sun, 25 Nov 2018 13:37:59 -0800 (PST)
X-Quarantine-ID: <NNIrY7MMDIjy>
X-Virus-Scanned: amavisd-new at amsl.com
X-Amavis-Alert: BAD HEADER SECTION, Non-encoded 8-bit data (char 9C hex): Received: ...s kaduk@ATHENA.MIT.EDU)\n\t\234by outgoing.mit[...]
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NNIrY7MMDIjy for <tools-discuss@ietfa.amsl.com>; Sun, 25 Nov 2018 13:37:58 -0800 (PST)
Received: from dmz-mailsec-scanner-8.mit.edu (dmz-mailsec-scanner-8.mit.edu [18.7.68.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D59A812426E for <tools-discuss@ietf.org>; Sun, 25 Nov 2018 13:37:57 -0800 (PST)
X-AuditID: 12074425-385ff700000064d4-1a-5bfb16324d23
Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP id AC.E8.25812.3361BFB5; Sun, 25 Nov 2018 16:37:55 -0500 (EST)
Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11]) by mailhub-auth-4.mit.edu (8.14.7/8.9.2) with ESMTP id wAPLbp2R012513 for <tools-discuss@ietf.org>; Sun, 25 Nov 2018 16:37:52 -0500
Received: from kduck.kaduk.org (24-107-191-124.dhcp.stls.mo.charter.com [24.107.191.124]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) œby outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id wAPLbmYq032312 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for <tools-discuss@ietf.org>; Sun, 25 Nov 2018 16:37:50 -0500
Date: Sun, 25 Nov 2018 15:37:47 -0600
From: Benjamin Kaduk <kaduk@mit.edu>
To: tools-discuss@ietf.org
Message-ID: <20181125213747.GF70217@kduck.kaduk.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrGIsWRmVeSWpSXmKPExsUixG6nrmsi9jva4B6XxfYjcxkdGD2WLPnJ FMAYxWWTkpqTWZZapG+XwJWxYNIPtoIutor7h3YwNzB2s3YxcnJICJhInHixmLGLkYtDSGAN k8TOSQ+YQRJCAucYJSZsdIawfzFJNC+oB7FZBFQl3s85zw5iswmoSDR0XwarFxGQkpjWuIoF xBYW8JC43t7CBmLzAi3YPu86E4QtKHFy5hOwGmYBLYkb/14CxTmAbGmJ5f84QMKiAsoSe/sO sU9g5J2FpGMWko5ZCB0LGJlXMcqm5Fbp5iZm5hSnJusWJyfm5aUW6Vro5WaW6KWmlG5iBAeR i+oOxjl/vQ4xCnAwKvHwTvj1K1qINbGsuDL3EKMkB5OSKO98R6AQX1J+SmVGYnFGfFFpTmrx IUYJDmYlEV4e+5/RQrwpiZVVqUX5MClpDhYlcd4/Io+jhQTSE0tSs1NTC1KLYLIyHBxKErwz RH9HCwkWpaanVqRl5pQgpJk4OEGG8wANnwpSw1tckJhbnJkOkT/FqMvxbsH/6cxCLHn5ealS 4rzVIEUCIEUZpXlwc0DRL5G9v+YVozjQW8K8n0WAqniAiQNu0iugJUxAS+TnfwdZUpKIkJJq YDRof3I6eNOF1c2nhDe3sd7M3zLnS6Muy5SkCuYzn759/ZXuGLg5x5A754rA13W75QpF6ha9 07FPi/P/9kDaZvXb5Zkyd4R+h0s4CybtKOp9oBT8Nvuo7rSwbdf6xHg0FtxRLmtWS961/+ge 1U9KU2vvHl+5Q2BLtp5ng/7yp5s7Zvx6tsrv/UIlluKMREMt5qLiRACj7Sow2QIAAA==
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/ZwFIPPXoWS_J3gOtzNIWHd0FuS0>
Subject: [Tools-discuss] a case of lingering local superiority of MHonArc to mailarchive
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Nov 2018 21:37:59 -0000

Hi folks,

I know that email quoting is something of a disaster zone, but I just ran
into a case where I had to use MHonArc to get the actual semantic content
of a message.

Compare
https://mailarchive.ietf.org/arch/msg/tram/7b84m3qXldrZfy07q-beYM5Vb6c
with
https://www.ietf.org/mail-archive/web/tram/current/msg02655.html

the latter uses horizontal whitespace to indicate quoted text, apparently
via a <p class="MsoNormal" style="margin-left:35.4pt"> HTML element (the
accompanying text/plain component fails to have any quoting indication at
all).  Is mailarchive preferring the text/plain component or does it just
not know to render the margin-left style (or am I sufficiently ignorant in
these matters that my conjectures are nonsense)?

Thanks,

Ben


From nobody Sun Nov 25 21:17:47 2018
Return-Path: <rjsparks@nostrum.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1AE40130F6F for <tools-discuss@ietfa.amsl.com>; Sun, 25 Nov 2018 21:17:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.879
X-Spam-Level: 
X-Spam-Status: No, score=-1.879 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3Oi8qwMmU-xe for <tools-discuss@ietfa.amsl.com>; Sun, 25 Nov 2018 21:17:38 -0800 (PST)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1C672130F2D for <tools-discuss@ietf.org>; Sun, 25 Nov 2018 21:17:38 -0800 (PST)
Received: from unescapeable.local ([47.186.18.66]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id wAQ5HanU000786 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sun, 25 Nov 2018 23:17:37 -0600 (CST) (envelope-from rjsparks@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host [47.186.18.66] claimed to be unescapeable.local
To: Benjamin Kaduk <kaduk@mit.edu>, tools-discuss@ietf.org
References: <20181125213747.GF70217@kduck.kaduk.org>
From: Robert Sparks <rjsparks@nostrum.com>
Message-ID: <06f3d3c0-06a7-f544-c042-18a81b58b72b@nostrum.com>
Date: Sun, 25 Nov 2018 23:17:36 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:60.0) Gecko/20100101 Thunderbird/60.3.1
MIME-Version: 1.0
In-Reply-To: <20181125213747.GF70217@kduck.kaduk.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/EJHbfdg3hu1WORjpK3A9NvQUJyw>
Subject: Re: [Tools-discuss] a case of lingering local superiority of MHonArc to mailarchive
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Nov 2018 05:17:46 -0000

Thanks for calling this out Ben -

On 11/25/18 3:37 PM, Benjamin Kaduk wrote:
> Hi folks,
>
> I know that email quoting is something of a disaster zone,
something...
> but I just ran
> into a case where I had to use MHonArc to get the actual semantic content
> of a message.
>
> Compare
> https://mailarchive.ietf.org/arch/msg/tram/7b84m3qXldrZfy07q-beYM5Vb6c
> with
> https://www.ietf.org/mail-archive/web/tram/current/msg02655.html
>
> the latter uses horizontal whitespace to indicate quoted text, apparently
> via a <p class="MsoNormal" style="margin-left:35.4pt"> HTML element (the
> accompanying text/plain component fails to have any quoting indication at
> all).  Is mailarchive preferring the text/plain component

yes, this.

> or does it just
> not know to render the margin-left style (or am I sufficiently ignorant in
> these matters that my conjectures are nonsense)?
Thanks again for a specific report.
>
> Thanks,
>
> Ben
>
> ___________________________________________________________
> Tools-discuss mailing list
> Tools-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/tools-discuss
>
> Please report datatracker.ietf.org and mailarchive.ietf.org
> bugs at http://tools.ietf.org/tools/ietfdb
> or send email to datatracker-project@ietf.org
>
> Please report tools.ietf.org bugs at
> http://tools.ietf.org/tools/issues
> or send email to webmaster@tools.ietf.org


From nobody Mon Nov 26 15:23:54 2018
Return-Path: <kaduk@mit.edu>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 85802131058 for <tools-discuss@ietfa.amsl.com>; Mon, 26 Nov 2018 15:23:51 -0800 (PST)
X-Quarantine-ID: <8MXCXQIZYUgm>
X-Virus-Scanned: amavisd-new at amsl.com
X-Amavis-Alert: BAD HEADER SECTION, Non-encoded 8-bit data (char 9C hex): Received: ...s kaduk@ATHENA.MIT.EDU)\n\t\234by outgoing.mit[...]
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level: 
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8MXCXQIZYUgm for <tools-discuss@ietfa.amsl.com>; Mon, 26 Nov 2018 15:23:48 -0800 (PST)
Received: from dmz-mailsec-scanner-8.mit.edu (dmz-mailsec-scanner-8.mit.edu [18.7.68.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F247E131062 for <tools-discuss@ietf.org>; Mon, 26 Nov 2018 15:23:47 -0800 (PST)
X-AuditID: 12074425-279ff70000007b6e-ea-5bfc8081cab9
Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP id CA.04.31598.1808CFB5; Mon, 26 Nov 2018 18:23:46 -0500 (EST)
Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11]) by mailhub-auth-1.mit.edu (8.14.7/8.9.2) with ESMTP id wAQNNhlF008557; Mon, 26 Nov 2018 18:23:44 -0500
Received: from kduck.kaduk.org (24-107-191-124.dhcp.stls.mo.charter.com [24.107.191.124]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) œby outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id wAQNNcsh002615 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Nov 2018 18:23:42 -0500
Date: Mon, 26 Nov 2018 17:23:38 -0600
From: Benjamin Kaduk <kaduk@mit.edu>
To: Robert Sparks <rjsparks@nostrum.com>
Cc: tools-discuss@ietf.org
Message-ID: <20181126232337.GE10033@kduck.kaduk.org>
References: <20181125213747.GF70217@kduck.kaduk.org> <06f3d3c0-06a7-f544-c042-18a81b58b72b@nostrum.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <06f3d3c0-06a7-f544-c042-18a81b58b72b@nostrum.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrGIsWRmVeSWpSXmKPExsUixCmqrNvU8CfaYM8BLotrcxrZLLYfmcvo wOSxZMlPJo9ZO5+wBDBFcdmkpOZklqUW6dslcGW8X7eItaCds+Llj4ksDYzL2bsYOTgkBEwk Vi2p7mLk4hASWMMkceFHMyuEs5FRYklnPzOEc5dJYtvU+WxdjJwcLAKqEm0nt4DZbAIqEg3d l5lBbBEBDYlrS5awg9jMAlIS5z5MYAKxhQXiJKasvQpWwwu07fDjFlYQW0ggU+L5lCcsEHFB iZMzIWxmAS2JG/9eMoFcxywgLbH8HwdImFPAXuLvwYVgI0UFlCX29h1in8AoMAtJ9ywk3bMQ uhcwMq9ilE3JrdLNTczMKU5N1i1OTszLSy3StdDLzSzRS00p3cQIClJ2F9UdjHP+eh1iFOBg VOLhjfj6O1qINbGsuDL3EKMkB5OSKK/HX6AQX1J+SmVGYnFGfFFpTmrxIUYJDmYlEV7fJUA5 3pTEyqrUonyYlDQHi5I47x+Rx9FCAumJJanZqakFqUUwWRkODiUJ3uP1f6KFBItS01Mr0jJz ShDSTBycIMN5gIbng9TwFhck5hZnpkPkTzHqcrxb8H86sxBLXn5eqpQ47zWQIgGQoozSPLg5 oOQikb2/5hWjONBbwrxzQap4gIkJbtIroCVMQEuuTQT5oLgkESEl1cDYte1FcabjgnWJW3f3 Sjr5O7mGTD7fVPUjrvSfjoTQpt3JXXINaoftm7h0X3a80Gt6+P+3BsNJ5d3bl3yd+PuruuWC 9bteNT4+lbXu8I5//x8cvKlT6pUh/f9D62kR/sb3i5OnZjuu1prjePvCLjv78+6/379Zs6nr ol3Xpsju3PxKn4aOjVNUlFiKMxINtZiLihMB04w9TAkDAAA=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/bJ4rCFPPR2M5ht-cIbwvDCpk9ZA>
Subject: Re: [Tools-discuss] a case of lingering local superiority of MHonArc to mailarchive
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Nov 2018 23:23:51 -0000

On Sun, Nov 25, 2018 at 11:17:36PM -0600, Robert Sparks wrote:
> Thanks for calling this out Ben -
> 
> On 11/25/18 3:37 PM, Benjamin Kaduk wrote:
> > Hi folks,
> >
> > I know that email quoting is something of a disaster zone,
> something...
> > but I just ran
> > into a case where I had to use MHonArc to get the actual semantic content
> > of a message.
> >
> > Compare
> > https://mailarchive.ietf.org/arch/msg/tram/7b84m3qXldrZfy07q-beYM5Vb6c
> > with
> > https://www.ietf.org/mail-archive/web/tram/current/msg02655.html
> >
> > the latter uses horizontal whitespace to indicate quoted text, apparently
> > via a <p class="MsoNormal" style="margin-left:35.4pt"> HTML element (the
> > accompanying text/plain component fails to have any quoting indication at
> > all).  Is mailarchive preferring the text/plain component
> 
> yes, this.

Ah.  MIME/multipart with an unusable text/plain is a lot more common than
just using style=margin-left for quoting, unfortunately.  <blockquote> in
the HTML and using color to indicate new text are the two that I remember
as being most common, but I'm sure there are many others.

-Ben


From nobody Tue Nov 27 11:31:16 2018
Return-Path: <housley@vigilsec.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CAB12130DD6 for <tools-discuss@ietfa.amsl.com>; Tue, 27 Nov 2018 11:31:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level: 
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D_Aw8QL14Gjq for <tools-discuss@ietfa.amsl.com>; Tue, 27 Nov 2018 11:31:12 -0800 (PST)
Received: from mail.smeinc.net (mail.smeinc.net [209.135.209.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C6F18130E8D for <tools-discuss@ietf.org>; Tue, 27 Nov 2018 11:31:12 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by mail.smeinc.net (Postfix) with ESMTP id 98860300AAE for <tools-discuss@ietf.org>; Tue, 27 Nov 2018 14:31:10 -0500 (EST)
X-Virus-Scanned: amavisd-new at mail.smeinc.net
Received: from mail.smeinc.net ([127.0.0.1]) by localhost (mail.smeinc.net [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Zuj1BoiCHwZy for <tools-discuss@ietf.org>; Tue, 27 Nov 2018 14:31:08 -0500 (EST)
Received: from [192.168.1.161] (pool-71-178-45-35.washdc.fios.verizon.net [71.178.45.35]) by mail.smeinc.net (Postfix) with ESMTPSA id D750130078C; Tue, 27 Nov 2018 14:31:08 -0500 (EST)
From: Russ Housley <housley@vigilsec.com>
Message-Id: <CED9B21E-E9D0-4C90-9827-325DB23D97F1@vigilsec.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_008AA3CC-DDFC-4866-B01C-F2568B576CA3"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Date: Tue, 27 Nov 2018 14:31:09 -0500
In-Reply-To: <20181126232337.GE10033@kduck.kaduk.org>
Cc: Tools Discussion <tools-discuss@ietf.org>
To: Ben Kaduk <kaduk@MIT.EDU>
References: <20181125213747.GF70217@kduck.kaduk.org> <06f3d3c0-06a7-f544-c042-18a81b58b72b@nostrum.com> <20181126232337.GE10033@kduck.kaduk.org>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/LuETU2R48bt4nnmxCXfz6eOoJBU>
Subject: Re: [Tools-discuss] a case of lingering local superiority of MHonArc to mailarchive
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Nov 2018 19:31:15 -0000

--Apple-Mail=_008AA3CC-DDFC-4866-B01C-F2568B576CA3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii



> On Nov 26, 2018, at 6:23 PM, Benjamin Kaduk <kaduk@MIT.EDU> wrote:
>=20
> On Sun, Nov 25, 2018 at 11:17:36PM -0600, Robert Sparks wrote:
>> Thanks for calling this out Ben -
>>=20
>> On 11/25/18 3:37 PM, Benjamin Kaduk wrote:
>>> Hi folks,
>>>=20
>>> I know that email quoting is something of a disaster zone, =
something...
>>> but I just ran
>>> into a case where I had to use MHonArc to get the actual semantic =
content
>>> of a message.
>>>=20
>>> Compare
>>> =
https://mailarchive.ietf.org/arch/msg/tram/7b84m3qXldrZfy07q-beYM5Vb6c
>>> with
>>> https://www.ietf.org/mail-archive/web/tram/current/msg02655.html
>>>=20
>>> the latter uses horizontal whitespace to indicate quoted text, =
apparently
>>> via a <p class=3D"MsoNormal" style=3D"margin-left:35.4pt"> HTML =
element (the
>>> accompanying text/plain component fails to have any quoting =
indication at
>>> all).  Is mailarchive preferring the text/plain component
>>=20
>> yes, this.
>=20
> Ah.  MIME/multipart with an unusable text/plain is a lot more common =
than
> just using style=3Dmargin-left for quoting, unfortunately.  =
<blockquote> in
> the HTML and using color to indicate new text are the two that I =
remember
> as being most common, but I'm sure there are many others.

Ben:

See the big yellow text box at https://www.mhonarc.org/ =
<https://www.mhonarc.org/> ...

> MHonArc releases prior to v2.6.18 have known vulnerabilities to the =
HTML filter, making web sites hosting MHonArc web archives vulnerable to =
XSS attackes. All users are STRONGLY encouraged to upgrade to the latest =
release.
>=20
> If you are unable to upgrade immediately, and you are operating a site =
that archives messages from untrusted sources, please see the following =
item in the MHonArc FAQ: So how can I exclude HTML mail?. Even with the =
fixes provided, it is HIGHLY RECOMMENDED to neutralize HTML data for any =
archive containing content from untrusted sources.

This seems like a good reason continue to prefer the plaintext version =
of a message in the mail archive tool, even though the message nesting =
issue that you raise does impact readability.

Russ=

--Apple-Mail=_008AA3CC-DDFC-4866-B01C-F2568B576CA3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br =
class=3D""><div><br class=3D""><blockquote type=3D"cite" class=3D""><div =
class=3D"">On Nov 26, 2018, at 6:23 PM, Benjamin Kaduk &lt;<a =
href=3D"mailto:kaduk@MIT.EDU" class=3D"">kaduk@MIT.EDU</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div =
class=3D"">On Sun, Nov 25, 2018 at 11:17:36PM -0600, Robert Sparks =
wrote:<br class=3D""><blockquote type=3D"cite" class=3D"">Thanks for =
calling this out Ben -<br class=3D""><br class=3D"">On 11/25/18 3:37 PM, =
Benjamin Kaduk wrote:<br class=3D""><blockquote type=3D"cite" =
class=3D"">Hi folks,<br class=3D""><br class=3D"">I know that email =
quoting is something of a disaster zone, =
something...</blockquote><blockquote type=3D"cite" class=3D"">but I just =
ran<br class=3D"">into a case where I had to use MHonArc to get the =
actual semantic content<br class=3D"">of a message.<br class=3D""><br =
class=3D"">Compare<br class=3D""><a =
href=3D"https://mailarchive.ietf.org/arch/msg/tram/7b84m3qXldrZfy07q-beYM5=
Vb6c" =
class=3D"">https://mailarchive.ietf.org/arch/msg/tram/7b84m3qXldrZfy07q-be=
YM5Vb6c</a><br class=3D"">with<br =
class=3D"">https://www.ietf.org/mail-archive/web/tram/current/msg02655.htm=
l<br class=3D""><br class=3D"">the latter uses horizontal whitespace to =
indicate quoted text, apparently<br class=3D"">via a &lt;p =
class=3D"MsoNormal" style=3D"margin-left:35.4pt"&gt; HTML element =
(the<br class=3D"">accompanying text/plain component fails to have any =
quoting indication at<br class=3D"">all). &nbsp;Is mailarchive =
preferring the text/plain component<br class=3D""></blockquote><br =
class=3D"">yes, this.<br class=3D""></blockquote><br class=3D"">Ah. =
&nbsp;MIME/multipart with an unusable text/plain is a lot more common =
than<br class=3D"">just using style=3Dmargin-left for quoting, =
unfortunately. &nbsp;&lt;blockquote&gt; in<br class=3D"">the HTML and =
using color to indicate new text are the two that I remember<br =
class=3D"">as being most common, but I'm sure there are many others.<br =
class=3D""></div></div></blockquote></div><br class=3D""><div =
class=3D"">Ben:</div><div class=3D""><br class=3D""></div><div =
class=3D"">See the big yellow text box at&nbsp;<a =
href=3D"https://www.mhonarc.org/" =
class=3D"">https://www.mhonarc.org/</a>&nbsp;...<br class=3D""><br =
class=3D""><blockquote type=3D"cite" class=3D"">MHonArc releases prior =
to v2.6.18 have known vulnerabilities to the HTML filter, making web =
sites hosting MHonArc web archives vulnerable to XSS attackes. All users =
are STRONGLY encouraged to upgrade to the latest release.<br =
class=3D""><br class=3D"">If you are unable to upgrade immediately, and =
you are operating a site that archives messages from untrusted sources, =
please see the following item in the MHonArc FAQ: So how can I exclude =
HTML mail?. Even with the fixes provided, it is HIGHLY RECOMMENDED to =
neutralize HTML data for any archive containing content from untrusted =
sources.<br class=3D""></blockquote><br class=3D"">This seems like a =
good reason continue to prefer the plaintext version of a message in the =
mail archive tool, even though the message nesting issue that you raise =
does impact readability.</div><div class=3D""><br class=3D""></div><div =
class=3D"">Russ</div></body></html>=

--Apple-Mail=_008AA3CC-DDFC-4866-B01C-F2568B576CA3--


From nobody Tue Nov 27 18:37:57 2018
Return-Path: <dkg@fifthhorseman.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EDF7912F1A5 for <tools-discuss@ietfa.amsl.com>; Tue, 27 Nov 2018 18:37:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.19
X-Spam-Level: 
X-Spam-Status: No, score=-4.19 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mPl1of16z1YN for <tools-discuss@ietfa.amsl.com>; Tue, 27 Nov 2018 18:37:54 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [162.247.75.118]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AC09012777C for <tools-discuss@ietf.org>; Tue, 27 Nov 2018 18:37:54 -0800 (PST)
Received: from fifthhorseman.net (ool-6c3a0662.static.optonline.net [108.58.6.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 39367F99A; Tue, 27 Nov 2018 21:37:52 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 80DA620379; Tue, 27 Nov 2018 21:12:11 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: "Salz\, Rich" <rsalz@akamai.com>, Robert Sparks <rjsparks@nostrum.com>
Cc: "tools-discuss\@ietf.org" <tools-discuss@ietf.org>
In-Reply-To: <DB39AA05-0E98-4C53-BEAD-FDEA2F8AAE6A@akamai.com>
References: <33d1a5a4-5117-a264-316f-52465fdae372@nostrum.com> <DB39AA05-0E98-4C53-BEAD-FDEA2F8AAE6A@akamai.com>
Date: Tue, 27 Nov 2018 21:12:11 -0500
Message-ID: <87mupuympw.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/EI3WX6vFrkiO2LNG1URYiYDJ9eM>
Subject: Re: [Tools-discuss] Improvements to the mailarch tool, and plans to discontinue the mhonarc archives
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Nov 2018 02:37:56 -0000

On Thu 2018-10-04 16:58:46 +0000, Salz, Rich wrote:
> Another thing that would be useful is being able to retrieve a mail
> message in pure text/plain format.  I recently had to screen-scrap an
> old message so that I could reply-all to it. :)

In offlist discussion, Rich and i were talking about how you can kind-of
do this already by downloading the relevant files in mbox or maildir
format, and then importing them into a local MUA.  But this is only
available when you are logged in.  Why only when logged in?  it'd be
nice to make this easier.

     --dkg


From nobody Fri Nov 30 18:02:44 2018
Return-Path: <kaduk@mit.edu>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0D227129C6A for <tools-discuss@ietfa.amsl.com>; Fri, 30 Nov 2018 18:02:42 -0800 (PST)
X-Quarantine-ID: <8GVHYlakcAy7>
X-Virus-Scanned: amavisd-new at amsl.com
X-Amavis-Alert: BAD HEADER SECTION, Non-encoded 8-bit data (char 9C hex): Received: ...s kaduk@ATHENA.MIT.EDU)\n\t\234by outgoing.mit[...]
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8GVHYlakcAy7 for <tools-discuss@ietfa.amsl.com>; Fri, 30 Nov 2018 18:02:40 -0800 (PST)
Received: from dmz-mailsec-scanner-4.mit.edu (dmz-mailsec-scanner-4.mit.edu [18.9.25.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DE4031293FB for <tools-discuss@ietf.org>; Fri, 30 Nov 2018 18:02:39 -0800 (PST)
X-AuditID: 1209190f-1e1ff70000003805-c4-5c01ebbd6b88
Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id 68.E1.14341.DBBE10C5; Fri, 30 Nov 2018 21:02:38 -0500 (EST)
Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11]) by mailhub-auth-3.mit.edu (8.14.7/8.9.2) with ESMTP id wB122abO010261; Fri, 30 Nov 2018 21:02:37 -0500
Received: from kduck.kaduk.org (24-107-191-124.dhcp.stls.mo.charter.com [24.107.191.124]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) œby outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id wB122XhO022918 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Nov 2018 21:02:35 -0500
Date: Fri, 30 Nov 2018 20:02:32 -0600
From: Benjamin Kaduk <kaduk@mit.edu>
To: Russ Housley <housley@vigilsec.com>
Cc: Tools Discussion <tools-discuss@ietf.org>
Message-ID: <20181201020232.GH87441@kduck.kaduk.org>
References: <20181125213747.GF70217@kduck.kaduk.org> <06f3d3c0-06a7-f544-c042-18a81b58b72b@nostrum.com> <20181126232337.GE10033@kduck.kaduk.org> <CED9B21E-E9D0-4C90-9827-325DB23D97F1@vigilsec.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CED9B21E-E9D0-4C90-9827-325DB23D97F1@vigilsec.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmphleLIzCtJLcpLzFFi42IR4hTV1t33mjHGYN05VotXL26yW2w/MpfR gcljyZKfTB6r7nxhDWCK4rJJSc3JLEst0rdL4MqYuXkqY8EK4YrPP14yNjAe5u9i5OSQEDCR OHd0AVsXIxeHkMAaJomnLydDORsZJZbuv8YI4dxlkrg1rYkVpIVFQFVi1c0zbCA2m4CKREP3 ZWYQW0RAXeLv/AvsIDazgK5E47U1TCC2sECcxJS1V8FqeIHW/fi+jRli6ClGifYju6ASghIn Zz5hgWjWkrjx7yVQMweQLS2x/B8HSJhTwEHizLQ3jCC2qICyxN6+Q+wTGAVmIemehaR7FkL3 AkbmVYyyKblVurmJmTnFqcm6xcmJeXmpRbomermZJXqpKaWbGMGBKsm/g3FOg/chRgEORiUe 3gk5jDFCrIllxZW5hxglOZiURHn/SgCF+JLyUyozEosz4otKc1KLDzFKcDArifCWXgTK8aYk VlalFuXDpKQ5WJTEeX+JPI4WEkhPLEnNTk0tSC2CycpwcChJ8B57BdQoWJSanlqRlplTgpBm 4uAEGc4DNFzwNcjw4oLE3OLMdIj8KUZLjncL/k9n5lg1o2MGM0fHumVzmIVY8vLzUqXEeTVB GgRAGjJK8+BmghKPRPb+mleM4kAvCvNeBFnNA0xacFNfAS1kAloY0/M/GmhhSSJCSqqBMfqT 0qriFW+vnWXm8DzcfyXv7SmFy/81ly5j/meR8c7qjELJAempL1o2vfJ2eTztyqQvnctKWD9X skjP/DpriW+GwVnbPkHuI7beu2eotrAqnyyIfh22XHKBgdslTkvHOyFVjImFz8tjy6Zu8jxU 9iv5VHnO2ptaDJFTbh0UKCzjW8//rV48SomlOCPRUIu5qDgRADHgk5cXAwAA
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/O3VjvotJapSz2q2r7aDkvj2cT0I>
Subject: Re: [Tools-discuss] a case of lingering local superiority of MHonArc to mailarchive
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Dec 2018 02:02:42 -0000

On Tue, Nov 27, 2018 at 02:31:09PM -0500, Russ Housley wrote:
> 
> 
> > On Nov 26, 2018, at 6:23 PM, Benjamin Kaduk <kaduk@MIT.EDU> wrote:
> > 
> > On Sun, Nov 25, 2018 at 11:17:36PM -0600, Robert Sparks wrote:
> >> Thanks for calling this out Ben -
> >> 
> >> On 11/25/18 3:37 PM, Benjamin Kaduk wrote:
> >>> Hi folks,
> >>> 
> >>> I know that email quoting is something of a disaster zone, something...
> >>> but I just ran
> >>> into a case where I had to use MHonArc to get the actual semantic content
> >>> of a message.
> >>> 
> >>> Compare
> >>> https://mailarchive.ietf.org/arch/msg/tram/7b84m3qXldrZfy07q-beYM5Vb6c
> >>> with
> >>> https://www.ietf.org/mail-archive/web/tram/current/msg02655.html
> >>> 
> >>> the latter uses horizontal whitespace to indicate quoted text, apparently
> >>> via a <p class="MsoNormal" style="margin-left:35.4pt"> HTML element (the
> >>> accompanying text/plain component fails to have any quoting indication at
> >>> all).  Is mailarchive preferring the text/plain component
> >> 
> >> yes, this.
> > 
> > Ah.  MIME/multipart with an unusable text/plain is a lot more common than
> > just using style=margin-left for quoting, unfortunately.  <blockquote> in
> > the HTML and using color to indicate new text are the two that I remember
> > as being most common, but I'm sure there are many others.
> 
> Ben:
> 
> See the big yellow text box at https://www.mhonarc.org/ <https://www.mhonarc.org/> ...
> 
> > MHonArc releases prior to v2.6.18 have known vulnerabilities to the HTML filter, making web sites hosting MHonArc web archives vulnerable to XSS attackes. All users are STRONGLY encouraged to upgrade to the latest release.
> > 
> > If you are unable to upgrade immediately, and you are operating a site that archives messages from untrusted sources, please see the following item in the MHonArc FAQ: So how can I exclude HTML mail?. Even with the fixes provided, it is HIGHLY RECOMMENDED to neutralize HTML data for any archive containing content from untrusted sources.
> 
> This seems like a good reason continue to prefer the plaintext version of a message in the mail archive tool, even though the message nesting issue that you raise does impact readability.

I don't disagree ... but at the same time I am sad, since this will put a
wrench into several of my workflows, once MHonArc goes away.  "Surely it
can't be too hard to strip out all JavaScript, right?" ;)

-Ben

