From owner-ips@ece.cmu.edu  Thu Dec 21 18:51:34 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id SAA04909
	for <ips-archive@ietf.org>; Thu, 21 Dec 2000 18:51:33 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id SAA08044;
	Thu, 21 Dec 2000 18:51:34 -0500 (EST)
Date: Thu, 21 Dec 2000 18:51:34 -0500 (EST)
Message-Id: <200012212351.SAA08044@ece.cmu.edu>
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
To: ips-archive@ietf.org
From: Majordomo@ece.cmu.edu
Subject: Welcome to ips
Reply-To: Majordomo@ece.cmu.edu

--

Welcome to the ips mailing list!

Please save this message for future reference.  Thank you.

If you ever want to remove yourself from this mailing list,
you can send mail to <Majordomo@ece.cmu.edu> with the following
command in the body of your email message:

    unsubscribe ips

or from another account, besides ips-archive@ietf.org:

    unsubscribe ips ips-archive@ietf.org

If you ever need to get in contact with the owner of the list,
(if you have trouble unsubscribing, or have questions about the
list itself) send email to <owner-ips@ece.cmu.edu> .
This is the general rule for most mailing lists when you need
to contact a human.

 Here's the general information for the list you've subscribed to,
 in case you don't already have it:

No info has been entered


From owner-ips@ece.cmu.edu  Thu Dec 21 21:23:14 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id VAA07492
	for <ips-archive@ietf.org>; Thu, 21 Dec 2000 21:23:14 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id TAA08787
	for ips-outgoing; Thu, 21 Dec 2000 19:25:54 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from bsd.tuan.com (bsd.tuan.com [207.126.98.98])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id TAA08782
	for <ips@ece.cmu.edu>; Thu, 21 Dec 2000 19:25:50 -0500 (EST)
Received: from IETF ([208.184.100.249]) by bsd.tuan.com
          (Post.Office MTA v3.1.2 release (PO203-101c)
          ID# 3-54234U100L2S100) with SMTP id AAA3665;
          Thu, 21 Dec 2000 16:25:48 -0800
From: "Douglas Otis" <dotis@sanlight.net>
To: <julian_satran@il.ibm.com>, <ips@ece.cmu.edu>
Subject: RE: Framing Discussion
Date: Thu, 21 Dec 2000 16:24:37 -0800
Message-ID: <NEBBJGDMMLHHCIKHGBEJIELNCDAA.dotis@sanlight.net>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
In-Reply-To: <C12569BC.0023B907.00@d12mta02.de.ibm.com>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

Julian,

Although the IPS workgroup declares it is using TCP as the transport layer,
there is a strong desire to view this transport as framed rather than as a
byte stream.  Clever schemes to mark the end of a frame or the beginning of
a PDU (some hope at the segment boundary) are still unresolved.  At least
this is openly the case.  I have given up asking how such a scheme of
placing SCSI data into SCSI buffers delivered in out-of-sequence TCP
segments can be done without modifying TCP.  David has clearly declared such
a question out-of-scope.  I will blink and say that there is an application
running iSCSI on an adapter and that the interface for this adapter is akin
to some existing SCSI adapter product.  At least this limits much of the
standard's damage to the adapters.  I still hold only a dim hope with
respect to damage as I have yet to see clarification on the framing scheme.
If you wish to shed additional clarity on this subject, your comments are
always welcome.  Clearly we see this subject from different perspectives but
I would not say I see the truth.

As SCSI is just one of three applications to be supported by this new
framing scheme, then I guess we will see three additional interfaces exposed
at these adapters.  One for network use, one for SCSI, one for VI and yet
another for IPC.  I expect that in the time it will take these products to
mature to the point of not causing disruption with various routing
equipment, SCTP will have supplanted TCP in these applications.  The need
already expressed within this WG are reasons SCTP will be available before
Bill has his version ready.

Doug

> JP,
>
> I think that many people on this list have already discussed the so called
> out of order processing but Doug Otis seems adamant that he is the holder
> of THE truth.
> The reason why framing recovery is deemed necessary is to keep to
> a minimum
> the reassembly memory
> on the NIC cards.  Data can be placed in host memory without being
> delivered (i.e., the ULP made aware that data is there).
>
> Julo
>
> Douglas Otis <dotis@sanlight.net> on 20/12/2000 20:55:56
>
> Please respond to Douglas Otis <dotis@sanlight.net>
>
> To:   JP Raghavendra Rao <Jp.Raghavendra@EBay.Sun.COM>, ips@ece.cmu.edu
> cc:
> Subject:  RE: Framing Discussion
>
>
>
>
> JP,
>
> The VM page flipping will require blocks to be greater than or equal to
> pages in size and that these blocks are aligned at page boundaries.
> Neither
> of these assumptions are true. The goal is clear, the NIC will examine the
> content and then direct data payload to the system through the NIC
> interface.  The desire is to keep the buffers on the NIC small and thus
> allow out of sequence processing of the TCP stream.  This must be
> seen as a
> modification to the normal TCP implementation.  Such operation should
> include a complete description of the API to allow consideration of NIC
> design, inter-operability and security requirements.
>
> Doug
>
>
> > >> The design goal behind the framing discussion
> > >> is avoidance of that copy.  In contrast to the
> > >> typical case for NICs and TCP/IP, read data
> > >> buffers for SCSI data are usually *not*
> > >> interchangeable.
> > >>
> > >Why are they not interchangeable ? This is an
> > >assumption not stated anywhere.  Is there
> > >a list of other assumptions that is documented ?
> > >
> >
> > Typically, the original data buffer is with the application that
> > made the SCSI READ request - and this buffer may not have been
> > allocated with the constraints that should be ready for a simple
> > swapping. Assuming that the constraints are met, it would require
> > VM page flipping, which is considered to be an implementation hack.
> >
> > A protocol level solution to locate the buffer and its offset for
> > the incoming TCP segment is probably a better thing to have.
> >
> > -JP
> >
> >
>
>
>
>
>



From owner-ips@ece.cmu.edu  Fri Dec 22 00:16:24 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id AAA12164
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 00:16:24 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id WAA12585
	for ips-outgoing; Thu, 21 Dec 2000 22:18:12 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from server1.NishanSystems.COM (smtp.nishansystems.com [216.217.36.162])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id WAA12571
	for <ips@ece.cmu.edu>; Thu, 21 Dec 2000 22:18:01 -0500 (EST)
Received: by smtp.nishansystems.com with Internet Mail Service (5.5.2448.0)
	id <YGDLXZVK>; Thu, 21 Dec 2000 19:26:26 -0800
Message-ID: <B300BD9620BCD411A366009027C21D9B091BAE@ariel.nishansystems.com>
From: Joshua Tseng <jtseng@NishanSystems.com>
To: Black_David@emc.com, ips@ece.cmu.edu
Subject: RE: A couple of iFCP questions
Date: Thu, 21 Dec 2000 19:17:45 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="windows-1252"
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

David,

> 
> A couple of questions about iFCP:
> 
> (1) iFCP has been described as NAT-like in translating
> 24-bit FC identifiers (S_ID and D_ID) to an IP address of
> another iFCP gateway and a 24 bit identifier with respect
> to that gateway.  The iFCP document describes how
> translations are accumulated by an iFCP gateway.  How
> are they discarded?  To be specific, when and under
> what circumstances is it safe for a gateway to discard
> a (no longer used) translation, and what are the consequences
> of an erroneous discard?

It is safe to discard a translation when there are no active N_PORT
sessions for a remote N_PORT.  In that case, the iFCP gateway SHOULD
invalidate and remove the mapping to that remote N_PORT.  Of course,
even if it doesn't, a device establishing a new N_PORT session has the
responsibility of validating any pre-existing mapping by checking WWPN's
of the remote device.

An erroneous discard of an FC_ID mapping would be the result of a bug,
and we would consider it to be fatal iFCP gateway error.  While the
handling of such an error is implementation-specific, we would expect
some sort of fail-safe behavior.

In Fibre Channel, the fabric is responsible for assigning
the FC identifiers (used in S_ID and D_ID).  The iFCP gateway
will emulate this behavior in response to the FLOGI by assigning
its own locally-significant (to that particular iFCP gateway)
FC identifiers.  The behavior of the iFCP gateway should be no
different from any other FC fabric with regard to address assignment
and reassignment.
> 
> (2) If an FC fabric were to be connected to an iFCP gateway,
> the fabric may change how the 24 bit identifiers
> are mapped to ports (as identified by Port WWN) in some
> circumstances (recabling can cause this).  When
> such a change occurs, does this invalidate translations
> in other iFCP gateways?  If yes, how do those gateways find
> out/get updated translations in a fashion that ensures traffic
> will not be delivered to the wrong FC port?  If not, what
> does the first iFCP gateway do to preserve the old versions
> of the changed translations?

Anything resulting in a change to the value
of the 24-bit FC identifiers would be a major event, in
both FC and iFCP.  If it should occur, the iFCP gateway shall
immediately terminate the N_PORT sessions and supporting TCP
connections affected by the changed FC identifier.  This
includes events such as LIPs and cable disconnect.  Following
such an event, the iFCP gateways should automatically update
their FC_ID mappings in the normal process of re-establishing
N_PORT sessions.

iSNS facilitates the interoperation among iFCP gateways.  iSNS
is how other iFCP gateways learn of the FC identifier mappings
to IP address and WWPN.  If these mappings are changed, the iFCP
gateway shall register the change in the iSNS (and as mentioned
before, the old N_PORT sessions will be terminated).  The iSNS server
in turn, will issue state change notifications to all affected
iFCP gateways, informing them of the new 24-bit FC identifer
mappings.  

Josh
> 
> Thanks,
> --David
> 
> ---------------------------------------------------
> David L. Black, Senior Technologist
> EMC Corporation, 42 South St., Hopkinton, MA  01748
> +1 (508) 435-1000 x75140     FAX: +1 (508) 497-8500
> black_david@emc.com       Mobile: +1 (978) 394-7754
> ---------------------------------------------------
> > 


From owner-ips@ece.cmu.edu  Fri Dec 22 06:10:02 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id GAA28165
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 06:10:01 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id EAA19146
	for ips-outgoing; Fri, 22 Dec 2000 04:01:52 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id EAA19138
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 04:01:24 -0500 (EST)
From: julian_satran@il.ibm.com
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22])
	by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id KAA183138
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 10:00:50 +0100
Received: from d12mta05.de.ibm.com (d12mta05_cs0 [9.165.222.239])
	by d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id KAA129216
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 10:00:49 +0100
Received: by d12mta05.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569BD.00317F99 ; Fri, 22 Dec 2000 10:00:39 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: "Y P Cheng" <ycheng@advansys.com>
cc: "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
Message-ID: <C12569BD.00317D34.00@d12mta05.de.ibm.com>
Date: Fri, 22 Dec 2000 10:56:32 +0200
Subject: RE: Framing Discussion
Mime-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk



Dear Mr. Cheng,

You can't fit data to tags if you have lost an iSCSI data header.  However
you could (in theory) place them in anonymous buffers and tag them later
(once you have recovered the header). In this case you either copy them or
have the storage-end do some scather-gather on the fragments you have
placed in anonymous buffers.

Julo

"Y P Cheng" <ycheng@advansys.com> on 21/12/2000 18:21:12

Please respond to "Y P Cheng" <ycheng@advansys.com>

To:   "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
cc:
Subject:  RE: Framing Discussion




> I agree with David. For the sake of clarity however I would like to add
> that even for those cases in which buffers are anonymous (like
> some appliance fileservers or disk controllers) the data that comes
> from the wire are not necessarily entire storage data blocks and have
> to be placed somewhere within a storage buffer. That particular
> placement can be made itself anonymous only by "placing" a
> "scather-gather" burden on the storage end of the wire
> and that is either expensive or outright impossible.
>
> Julo
>

Julo,

You lost me on the very last statement.  For appliance fileservers or disk
controllers, there is typically an small RTOS running.  The anonymous
non-storage data are received into a small bucket, say 256 bytes, which is
passed back to application for processing.  These types of data are known
as
control data like mode-select and login, etc. The component interacting
with
wire is given a number of these small buckets. There is no scatter-gather
burden on the storage end.

In the context of TCP/iSCSI implementation, the wire interacting component
needs to sort out between solicited and unsolicited iSCSI PDUs. For
solicited PDU, it places the data based on target task tags.  For
unsolicited control data, it places them into these small buckets.  For
unsolicited disk block data -- like streamed writes -- it places the data
into anonymous buffers prepared by the application.  Scatter/gather
handling
is a part of the "wire interacting component" design.  No extra burden.  In
fact, the HBA for an iSCSI initiator or target is specialized in move data
to destination buffers quickly.  Its main task is parsing the TCP/IP
segments.

This is the way I understand how an appliance fileserver or disk controller
works.  Either big companies have total different designs or I must have
misunderstood your statement.

Y.P. Cheng, Connectcom Solutions.






From owner-ips@ece.cmu.edu  Fri Dec 22 12:49:15 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id MAA08787
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 12:49:14 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id KAA27902
	for ips-outgoing; Fri, 22 Dec 2000 10:45:23 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id KAA27895
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 10:45:18 -0500 (EST)
Received: from hpindlm.cup.hp.com (hpindlm.cup.hp.com [15.13.95.89])
	by palrel3.hp.com (Postfix) with ESMTP
	id CDF44965; Fri, 22 Dec 2000 07:45:12 -0800 (PST)
Received: from mk731913.cup.hp.com (mk731912.cup.hp.com [15.8.80.111])
	by hpindlm.cup.hp.com (8.9.3 (PHNE_18979)/8.9.3 SMKit7.02) with ESMTP id HAA05647;
	Fri, 22 Dec 2000 07:47:00 -0800 (PST)
Message-Id: <5.0.0.25.2.20001222071007.04026358@hpindlm.cup.hp.com>
X-Sender: krause@hpindlm.cup.hp.com
X-Mailer: QUALCOMM Windows Eudora Version 5.0
Date: Fri, 22 Dec 2000 07:30:20 -0800
To: Mohan Parthasarathy <Mohan.Parthasarathy@eng.sun.com>
From: Michael Krause <krause@cup.hp.com>
Subject: Re: Framing Discussion
Cc: ips@ece.cmu.edu
In-Reply-To: <200012210134.eBL1YgT12979@locked.eng.sun.com>
References: <5.0.0.25.2.20001220154325.0197ecf0@hpindlm.cup.hp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

At 05:34 PM 12/20/2000 -0800, Mohan Parthasarathy wrote:
>If every TCP segment has a iscsi header, which is a waste, then this
>problem is relatively simple in identifying which iscsi command
>a given TCP segment belongs to and also indexing into the right
>offset inside the buffer.

Why would such a solution be a waste?  As a percentage of bandwidth, this 
falls very much into the noise category even with 1500 Byte segments.  Most 
high-speed protocols including new ones such as InfiniBand use a small 
header per packet to allow hardware a simple mechanism to understand where 
to place the data (also applies to software implementations which can do 
better with minor hardware assists).

At the first BOF, I spoke about aligning the protocol with InfiniBand since 
that will eventually become the server point of attach in the coming 
years.  The suggestion was made to include the same RDMA semantics (if they 
are supported) as InfiniBand.  It was further suggested there an in other 
e-mail that a simple 8-byte header with a 4-byte CRC be associated with 
each segment and that these fields be contained within the data payload so 
that TCP is not impacted.  The contents of this header would contain a 
8-bit op-code, VA, length, etc. allowing the responder to target the memory 
on the host / controller if all fields were valid.  If there was an 
out-of-order delivery, the data could be spilled to temporary memory either 
in the host or the adapter and upon recovery, delivered to the correct 
target buffer without requiring host processor intervention with a little 
creativity.  This 12-bytes of overhead for SCSI operations would have 
minimal impact on link utilization and overall solution efficiency.  I 
believe these same concepts have been stated in the various RDMA proposals 
that have been distributed and given the eventual movement to InfiniBand 
for servers and the new SRP (SCSI RDMA Protocol), one might want to create 
an iSCSI solution that can easily bridge into these other technologies.

Note: The arguments about adapter complexity, impact to OS, etc. are rather 
moot in many ways.  The work will be done to support InfiniBand over the 
next couple of years and thus the cost to implement / support is going to 
be fairly minimal.  It should also be noted that many of these changes have 
already be done using PCI / PCI-X based solutions that support VIA, 
Scheduled Transfer, Oracle, MPI, Sockets Direct, etc. so the ability to 
deploy solutions in the highly desirable I/O interconnect independent way 
is available today as well.  One does need to wait or rely upon InfiniBand 
to make all of this happen.  It should also be noted that many companies 
will be working to have Linux support for this type of technology in the 
upcoming year so solutions should be available to all by the time iSCSI 
ramps to volume in 2002.

Mike



From owner-ips@ece.cmu.edu  Fri Dec 22 12:50:07 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id MAA08823
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 12:50:06 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id KAA27921
	for ips-outgoing; Fri, 22 Dec 2000 10:45:32 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id KAA27909
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 10:45:25 -0500 (EST)
Received: from hpindlm.cup.hp.com (hpindlm.cup.hp.com [15.13.95.89])
	by palrel1.hp.com (Postfix) with ESMTP
	id 1F01311FB; Fri, 22 Dec 2000 07:45:23 -0800 (PST)
Received: from mk731913.cup.hp.com (mk731912.cup.hp.com [15.8.80.111])
	by hpindlm.cup.hp.com (8.9.3 (PHNE_18979)/8.9.3 SMKit7.02) with ESMTP id HAA05652;
	Fri, 22 Dec 2000 07:47:03 -0800 (PST)
Message-Id: <5.0.0.25.2.20001222073340.040df828@hpindlm.cup.hp.com>
X-Sender: krause@hpindlm.cup.hp.com
X-Mailer: QUALCOMM Windows Eudora Version 5.0
Date: Fri, 22 Dec 2000 07:42:02 -0800
To: Douglas Otis <dotis@sanlight.net>
From: Michael Krause <krause@cup.hp.com>
Subject: RE: Framing Discussion
Cc: ips@ece.cmu.edu
In-Reply-To: <NEBBJGDMMLHHCIKHGBEJOELDCDAA.dotis@sanlight.net>
References: <5.0.0.25.2.20001220154325.0197ecf0@hpindlm.cup.hp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

At 04:58 PM 12/20/2000 -0800, Douglas Otis wrote:

>problem of the block size being less than the page size.  Unless the SCSI
>application is forced to allocate in pages and you have a means to force the
>alignment of these blocks as they are delivered by the network, then this
>MMU technique is not available.

Many file systems allocate in nice multiple 4KB quantities making much of 
this fairly straight to implement and for partials these are either placed 
in a multiple 4KB buffer or these tend to align on power of 2 quantities 
buffers so the performance impacts are mitigated.

>You are suggesting that you use a standard TCP, find the PDU, process the
>iSCSI headers, place the data according to the tag within the iSCSI header,
>and because this process is located on the network adapter, it does not
>impact TCP.  I think I understand the logic.  As you expect the PDU to be
>"often" segment aligned, framing is not an issue.

Correct.

> > A zero-copy TCP is a content directed placement of the data.  Content
> > directed placement implicitly implies header / data split including the
> > iSCSI protocol headers.  Don't see what the problem is.
>
>Not at all.  The content of the payload does not impact the placement with
>Zero Copy TCP.  Here you expand the definition of this interface to now
>include iSCSI.  Rather than using TCP as the boundary between the adapter
>and the system, iSCSI is now such a boundary.

For a completely off-loaded solution, this is correct.  It is possible to 
only off-load the TCP portion and provide a scatter operation where the 
iSCSI header is targeted to a different control set of buffers and the data 
goes the appropriate data buffers - this is done in many of today's 
implementations so the logic is well-understood.

>Fine, but you really should not call this part of TCP.  It is not.  You
>should call it an iSCSI adapter interface.  If you wish such an interface to
>be application specific, so will its interface.  Clearly SCSI dictates a
>different interface to that of TCP.  If you advocate the placement of the
>application on the adapter, then you are no longer discussing TCP.  You are
>discussing the back-end of a full or partial processing of iSCSI.  TCP is
>hidden within this application running on the adapter.  I had advocated
>using a pointer to an application routine to serve this function of placing
>the application on the adapter and should the value be less than 1024, it
>would indicate a pre-defined IANA designator for such an embedded
>application.  Perhaps the value of 1 could indicate a normal IP stack as the
>application as you bind the application to the port.  If this is the desire
>of the work group, as it seems to be, then declaring details of the
>application adapter interface is the next step.  What group makes adapters?

InfiniBand is looking to standardize the interface to InfiniBand-based TOE 
(TCP Off-load Engine) endnodes so that  a well-defined wire protocol can be 
created for all IHVs to implement above the InfiniBand transport 
protocol.  I think there might be some benefit to do the same thing for 
iSCSI as for TOE in an appropriate forum while allowing this workgroup to 
proceed with the iSCSI definition which defines the operational model and 
wire protocol being used.  In fact, if the iSCSI workgroup sets up some 
basic rules as outlined in another response from me today or along the 
lines of the various RDMA proposals, then the wire protocol defined by the 
iSCSI workgroup in essence defines much of this needed interface.  Need to 
think about what else if anything might be needed.

Mike



From owner-ips@ece.cmu.edu  Fri Dec 22 15:27:22 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id PAA13565
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 15:27:21 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id NAA03124
	for ips-outgoing; Fri, 22 Dec 2000 13:20:24 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from patan.sun.com (patan.Sun.COM [192.18.98.43])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id NAA03120
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 13:20:16 -0500 (EST)
Received: from engmail4.Eng.Sun.COM ([129.144.134.6])
	by patan.sun.com (8.9.3+Sun/8.9.3) with ESMTP id KAA06016;
	Fri, 22 Dec 2000 10:20:02 -0800 (PST)
Received: from locked.eng.sun.com (locked.Eng.Sun.COM [129.146.85.189])
	by engmail4.Eng.Sun.COM (8.9.3+Sun/8.9.3/ENSMAIL,v1.7) with ESMTP id KAA29505;
	Fri, 22 Dec 2000 10:20:02 -0800 (PST)
Received: (from mohanp@localhost)
	by locked.eng.sun.com (8.10.1+Sun/8.10.1) id eBMIIpi14164;
	Fri, 22 Dec 2000 10:18:51 -0800 (PST)
From: Mohan Parthasarathy <Mohan.Parthasarathy@eng.sun.com>
Message-Id: <200012221818.eBMIIpi14164@locked.eng.sun.com>
Subject: Re: Framing Discussion
In-Reply-To: <5.0.0.25.2.20001222071007.04026358@hpindlm.cup.hp.com> from Michael
 Krause at "Dec 22, 2000 07:30:20 am"
To: Michael Krause <krause@cup.hp.com>
Date: Fri, 22 Dec 2000 10:18:51 -0800 (PST)
CC: Mohan Parthasarathy <Mohan.Parthasarathy@eng.sun.com>, ips@ece.cmu.edu
X-Mailer: ELM [version 2.4ME+ PL66 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

> At 05:34 PM 12/20/2000 -0800, Mohan Parthasarathy wrote:
> >If every TCP segment has a iscsi header, which is a waste, then this
> >problem is relatively simple in identifying which iscsi command
> >a given TCP segment belongs to and also indexing into the right
> >offset inside the buffer.
> 
> Why would such a solution be a waste?  As a percentage of bandwidth, this 
> falls very much into the noise category even with 1500 Byte segments.  Most 
> high-speed protocols including new ones such as InfiniBand use a small 
> header per packet to allow hardware a simple mechanism to understand where 
> to place the data (also applies to software implementations which can do 
> better with minor hardware assists).
>
It is a waste, if there is a better solution to tackle the same thing.
If this can make things simpler, we should consider this as an option.
But how does the sender insert this header ? Typically the target
side would just dump all the data to TCP - which contains a iscsi
header followed by data. TCP then would segment the data depending
on the MSS. How do you insert such an header for each MSS sized
data ?

The only RDMA proposal on Infinaband i have seen is the one from
Microsoft. I am not sure which one you are referring to.

> At the first BOF, I spoke about aligning the protocol with InfiniBand since 
> that will eventually become the server point of attach in the coming 
> years.  The suggestion was made to include the same RDMA semantics (if they 
> are supported) as InfiniBand.  It was further suggested there an in other 
> e-mail that a simple 8-byte header with a 4-byte CRC be associated with 
> each segment and that these fields be contained within the data payload so 
> that TCP is not impacted.  The contents of this header would contain a 
> 8-bit op-code, VA, length, etc. allowing the responder to target the memory 

If VA is the virtual address, then this has security problems. How do
you prevent arbitrary packets over-writing memory ? Actually all
you need is a tag to identify the buffer pool, offset within a
stream, and some smarts in the h/w to associate the tag to a
buffer pool. Is this not sufficient ?

> on the host / controller if all fields were valid.  If there was an 
> out-of-order delivery, the data could be spilled to temporary memory either 
> in the host or the adapter and upon recovery, delivered to the correct 
> target buffer without requiring host processor intervention with a little 
> creativity.  This 12-bytes of overhead for SCSI operations would have 
> minimal impact on link utilization and overall solution efficiency.  I 
> believe these same concepts have been stated in the various RDMA proposals 
> that have been distributed and given the eventual movement to InfiniBand 
> for servers and the new SRP (SCSI RDMA Protocol), one might want to create 
> an iSCSI solution that can easily bridge into these other technologies.
> 
> Note: The arguments about adapter complexity, impact to OS, etc. are rather 
> moot in many ways.  The work will be done to support InfiniBand over the 
> next couple of years and thus the cost to implement / support is going to 
> be fairly minimal.  It should also be noted that many of these changes have 
> already be done using PCI / PCI-X based solutions that support VIA, 
> Scheduled Transfer, Oracle, MPI, Sockets Direct, etc. so the ability to 
> deploy solutions in the highly desirable I/O interconnect independent way 
> is available today as well.  One does need to wait or rely upon InfiniBand 
> to make all of this happen.  It should also be noted that many companies 
> will be working to have Linux support for this type of technology in the 
> upcoming year so solutions should be available to all by the time iSCSI 
> ramps to volume in 2002.
>
I am not sure which proposal you are talking about. Infiniband working
group is looking at the proposal from microsoft for RDMA. I think it
is more specific to Infiniband.

-mohan

> Mike
> 



From owner-ips@ece.cmu.edu  Fri Dec 22 15:27:50 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id PAA13612
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 15:27:50 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id NAA03490
	for ips-outgoing; Fri, 22 Dec 2000 13:31:08 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from hoemail2.firewall.lucent.com (hoemail2.lucent.com [192.11.226.163])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id NAA03484
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 13:31:03 -0500 (EST)
Received: from hoemail2.firewall.lucent.com (localhost [127.0.0.1])
	by hoemail2.firewall.lucent.com (Pro-8.9.3/8.9.3) with ESMTP id NAA12440
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 13:31:02 -0500 (EST)
Received: from il0015exch001h.wins.lucent.com (h135-1-23-83.lucent.com [135.1.23.83])
	by hoemail2.firewall.lucent.com (Pro-8.9.3/8.9.3) with ESMTP id NAA12423
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 13:31:02 -0500 (EST)
Received: by il0015exch001h.ih.lucent.com with Internet Mail Service (5.5.2650.21)
	id <ZNB1NGVP>; Fri, 22 Dec 2000 12:31:02 -0600
Message-ID: <80B684C5E29FD211AA8000A0C9CDD91904D09BB4@il0015exch005u.ih.lucent.com>
From: "Rodriguez, Elizabeth G (Elizabeth)" <egrodriguez@lucent.com>
To: "IPS Mailing List (E-mail)" <ips@ece.cmu.edu>
Cc: "Allison Mankin (E-mail)" <mankin@ISI.EDU>,
        "David Black (E-mail)"
	 <black_david@emc.com>,
        "Steven M. Bellovin (E-mail)"
	 <smb@research.att.com>,
        "Scott Bradner (E-mail)" <sob@harvard.edu>
Subject: IPS Interim meeting:  High level agenda
Date: Fri, 22 Dec 2000 12:31:01 -0600
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

Hi all,

We still do not have a detailed agenda for the IPS interim meeting.  We will
be addressing that the first part of the year.
But, did want to publish a high level agenda so that people can make their
travel arrangements.

Tuesday, Jan 16 8 am: Common issues
iSCSI will start immediately after common issues, no later than noon.  If
common issues finish early, e.g. at 9 or 10 am, iSCSI will start then, so
don't make travel arrangements assuming that iSCSI will start in the
afternoon...

Wed, Jan 17 8am - 9:30 am iSCSI wrap-up.  This has a 1/2 hour overlap with
CAP.

9:30-5 -- FC issues.

Also, the intent behind this meeting is to get good face time to thrash out
issues facing the various specifications.  So, we will not be doing 'status'
reports.  Instead, we need to identify issues that need to be discussed and
focus on them.  Instead of requesting time on the agenda for various
documents, please send in a list of issues that you feel need to be
addressed.  We will try to allocate time based on that.

Thanks,

Elizabeth


From owner-ips@ece.cmu.edu  Fri Dec 22 16:36:47 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id QAA14998
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 16:36:46 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id NAA04335
	for ips-outgoing; Fri, 22 Dec 2000 13:57:53 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from ludwig.troikanetworks.com (host03.troikanetworks.com [12.31.172.3] (may be forged))
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id NAA04331
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 13:57:50 -0500 (EST)
Received: by host03.troikanetworks.com with Internet Mail Service (5.5.2650.21)
	id <XPBVZM2R>; Fri, 22 Dec 2000 10:58:11 -0800
Message-ID: <C7CA595F9B9FD311A40D009027DC4A85BB87B5@host03.troikanetworks.com>
From: Wayland Jeong <wayland@troikanetworks.com>
To: ips@ece.cmu.edu
Subject: RE: Framing Discussion
Date: Fri, 22 Dec 2000 10:58:00 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

I have a question regarding the single bit TCP option. Forgive me if this
has been described somewhere else, but I don't completely understand how
this works.

The single bit option in the TCP header will inform the receiver that a PDU
header is present in the current TCP segment. Furthermore, I believe that
this option also indicates that the PDU header is aligned with the current
segment (i.e. the header can be found at a known offset). This must be the
case since the option does not include a pointer to the location of the
header within the segment.

Now, here is my question. If you cannot make any guarantees regarding the
alignment of PDU's and TCP segments being sent from the originator, then how
can you make any assumptions about when you might get this alignment bit?
Let's take an example. A receiver TOE that is parsing iSCSI PDU's notices a
dropped TCP segment. The iSCSI TOE must abandon the current PDU being
re-assembled and attempt to find the next alignment point. Everything that
comes in off the wire between the dropped segment and the next aligned
segment must be squirreled off into NIC memory somewhere until the dropped
segment is re-sent. Now, one of the problems we are trying to solve is the
problem of supplying a large amount of high-bandwidth memory on the NIC in
order to save-off an RTT's worth of wire-speed data waiting for information
to perform re-alignment. If a generalized receiver cannot make an assumption
of when it might receive this alignment through the bit in the TCP options
part of the header, then, how do you avoid having this worst-case
re-assembly memory?

Now, it might be the case that the receiver assumes that the sender always
packages its PDU's aligned. In this case, some segments may be less that
MSS, but that's okay. Thus, the receiver is assured that it will receive the
aligned bit in the TCP header and thus the next aligned PDU within a PDU's
worth of TCP data. This is fine, but it requires that both the sender and
the receiver play nicely together. If this is the case, is it the assumption
that if this option was agreed upon during negotiation, then the receiver
can assume that the sender is going to both use this option and ensure
alignment? Also, what about things in the network that terminate TCP like
NAT's, firewalls and PEP's. Surely they cannot be expected to keep this
assumption regarding PDU alignment.

-Wayland


From owner-ips@ece.cmu.edu  Fri Dec 22 18:31:03 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id SAA16250
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 18:31:03 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id QAA08795
	for ips-outgoing; Fri, 22 Dec 2000 16:18:07 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id QAA08791
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 16:18:03 -0500 (EST)
Received: from hpindlm.cup.hp.com (hpindlm.cup.hp.com [15.13.95.89])
	by palrel3.hp.com (Postfix) with ESMTP
	id 4999E818; Fri, 22 Dec 2000 13:18:02 -0800 (PST)
Received: from mk731913.cup.hp.com (mk731912.cup.hp.com [15.8.80.111])
	by hpindlm.cup.hp.com (8.9.3 (PHNE_18979)/8.9.3 SMKit7.02) with ESMTP id NAA12187;
	Fri, 22 Dec 2000 13:19:48 -0800 (PST)
Message-Id: <5.0.0.25.2.20001222125014.0404fb28@hpindlm.cup.hp.com>
X-Sender: krause@hpindlm.cup.hp.com
X-Mailer: QUALCOMM Windows Eudora Version 5.0
Date: Fri, 22 Dec 2000 13:17:23 -0800
To: Mohan Parthasarathy <Mohan.Parthasarathy@eng.sun.com>
From: Michael Krause <krause@cup.hp.com>
Subject: Re: Framing Discussion
Cc: ips@ece.cmu.edu
In-Reply-To: <200012221818.eBMIIpi14164@locked.eng.sun.com>
References: <5.0.0.25.2.20001222071007.04026358@hpindlm.cup.hp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

At 10:18 AM 12/22/2000 -0800, Mohan Parthasarathy wrote:

>It is a waste, if there is a better solution to tackle the same thing.
>If this can make things simpler, we should consider this as an option.
>But how does the sender insert this header ? Typically the target
>side would just dump all the data to TCP - which contains a iscsi
>header followed by data. TCP then would segment the data depending
>on the MSS. How do you insert such an header for each MSS sized
>data ?

This is a simple gather operation.  Most OS understand how to gather 
different segments into a single byte stream and also account for 
additional headers.  Nothing special here.


>The only RDMA proposal on Infinaband i have seen is the one from
>Microsoft. I am not sure which one you are referring to.

RDMA is defined within the InfiniBand architecture (within the workgroup I 
co-chair) - there is nothing for Microsoft to do here.  There have been a 
couple of drafts submitted to IETF on how to support RDMA over TCP which is 
what I'm referring to in this e-mail exchange.  The objective of a TCP RDMA 
solution should be to align as much of the semantics as possible so that 
bridging is simplified and products can easily inter-operate given the 
volume potential of iSCSI solutions relative to potential InfiniBand 
storage relative to InfiniBand servers in the long-run.


> > At the first BOF, I spoke about aligning the protocol with InfiniBand 
> since
> > that will eventually become the server point of attach in the coming
> > years.  The suggestion was made to include the same RDMA semantics (if 
> they
> > are supported) as InfiniBand.  It was further suggested there an in other
> > e-mail that a simple 8-byte header with a 4-byte CRC be associated with
> > each segment and that these fields be contained within the data payload so
> > that TCP is not impacted.  The contents of this header would contain a
> > 8-bit op-code, VA, length, etc. allowing the responder to target the 
> memory
>
>If VA is the virtual address, then this has security problems. How do
>you prevent arbitrary packets over-writing memory ? Actually all
>you need is a tag to identify the buffer pool, offset within a
>stream, and some smarts in the h/w to associate the tag to a
>buffer pool. Is this not sufficient ?

A VA is an address but what that address is a local issue and is only used 
as a key to find the real address.  As such, I don't see any security issue 
beyond that associated with any datea exchange on a network and the 
techniques even used today within FC products can be applied so the 
technology impacts are actually well understood and can be implemented by 
multiple vendors.

> > on the host / controller if all fields were valid.  If there was an
> > out-of-order delivery, the data could be spilled to temporary memory 
> either
> > in the host or the adapter and upon recovery, delivered to the correct
> > target buffer without requiring host processor intervention with a little
> > creativity.  This 12-bytes of overhead for SCSI operations would have
> > minimal impact on link utilization and overall solution efficiency.  I
> > believe these same concepts have been stated in the various RDMA proposals
> > that have been distributed and given the eventual movement to InfiniBand
> > for servers and the new SRP (SCSI RDMA Protocol), one might want to create
> > an iSCSI solution that can easily bridge into these other technologies.
> >
> > Note: The arguments about adapter complexity, impact to OS, etc. are 
> rather
> > moot in many ways.  The work will be done to support InfiniBand over the
> > next couple of years and thus the cost to implement / support is going to
> > be fairly minimal.  It should also be noted that many of these changes 
> have
> > already be done using PCI / PCI-X based solutions that support VIA,
> > Scheduled Transfer, Oracle, MPI, Sockets Direct, etc. so the ability to
> > deploy solutions in the highly desirable I/O interconnect independent way
> > is available today as well.  One does need to wait or rely upon InfiniBand
> > to make all of this happen.  It should also be noted that many companies
> > will be working to have Linux support for this type of technology in the
> > upcoming year so solutions should be available to all by the time iSCSI
> > ramps to volume in 2002.
> >
>I am not sure which proposal you are talking about. Infiniband working
>group is looking at the proposal from microsoft for RDMA. I think it
>is more specific to Infiniband.

I believe you are referring to the InfiniBand application workgroup which 
has been looking at how to bootstrap InfiniBand via SRP (SCSI RDMA 
Protocol) which is one possible mechanism to support storage over an 
InfiniBand fabric - there are several mechanisms being defined for 
boot.  This is a T10 not a Microsoft effort.  InfiniBand is simply trying 
to accelerate some of this work but T10 will own and drive it to 
completion. There is an upcoming T10 meeting Houston in January focused on 
SRP that people might want to attend concerning InfiniBand.

Mike



From owner-ips@ece.cmu.edu  Fri Dec 22 22:12:39 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id WAA18618
	for <ips-archive@ietf.org>; Fri, 22 Dec 2000 22:12:39 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id UAA14677
	for ips-outgoing; Fri, 22 Dec 2000 20:10:40 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from patan.sun.com (patan.Sun.COM [192.18.98.43])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id UAA14661
	for <ips@ece.cmu.edu>; Fri, 22 Dec 2000 20:10:32 -0500 (EST)
Received: from engmail3.Eng.Sun.COM ([129.144.170.5])
	by patan.sun.com (8.9.3+Sun/8.9.3) with ESMTP id RAA29968;
	Fri, 22 Dec 2000 17:10:20 -0800 (PST)
Received: from locked.eng.sun.com (locked.Eng.Sun.COM [129.146.85.189])
	by engmail3.Eng.Sun.COM (8.9.3+Sun/8.9.3/ENSMAIL,v1.7) with ESMTP id RAA12670;
	Fri, 22 Dec 2000 17:10:19 -0800 (PST)
Received: (from mohanp@localhost)
	by locked.eng.sun.com (8.10.1+Sun/8.10.1) id eBN19AT14550;
	Fri, 22 Dec 2000 17:09:10 -0800 (PST)
From: Mohan Parthasarathy <Mohan.Parthasarathy@eng.sun.com>
Message-Id: <200012230109.eBN19AT14550@locked.eng.sun.com>
Subject: Re: Framing Discussion
In-Reply-To: <5.0.0.25.2.20001222125014.0404fb28@hpindlm.cup.hp.com> from Michael
 Krause at "Dec 22, 2000 01:17:23 pm"
To: Michael Krause <krause@cup.hp.com>
Date: Fri, 22 Dec 2000 17:09:10 -0800 (PST)
CC: Mohan Parthasarathy <Mohan.Parthasarathy@eng.sun.com>, ips@ece.cmu.edu
X-Mailer: ELM [version 2.4ME+ PL66 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

> At 10:18 AM 12/22/2000 -0800, Mohan Parthasarathy wrote:
> 
> >It is a waste, if there is a better solution to tackle the same thing.
> >If this can make things simpler, we should consider this as an option.
> >But how does the sender insert this header ? Typically the target
> >side would just dump all the data to TCP - which contains a iscsi
> >header followed by data. TCP then would segment the data depending
> >on the MSS. How do you insert such an header for each MSS sized
> >data ?
> 
> This is a simple gather operation.  Most OS understand how to gather 
> different segments into a single byte stream and also account for 
> additional headers.  Nothing special here.
>
Sure it is possible but generally not the optimized path in the stack.
But there are other problems with this proposal i guess. As TCP
is stream oriented it need not obey boundaries of the various
segments. The effect is you won't see a iscsi header on every
datagram.
> 
> 
> > > At the first BOF, I spoke about aligning the protocol with InfiniBand 
> > since
> > > that will eventually become the server point of attach in the coming
> > > years.  The suggestion was made to include the same RDMA semantics (if 
> > they
> > > are supported) as InfiniBand.  It was further suggested there an in other
> > > e-mail that a simple 8-byte header with a 4-byte CRC be associated with
> > > each segment and that these fields be contained within the data payload so
> > > that TCP is not impacted.  The contents of this header would contain a
> > > 8-bit op-code, VA, length, etc. allowing the responder to target the 
> > memory
> >
> >If VA is the virtual address, then this has security problems. How do
> >you prevent arbitrary packets over-writing memory ? Actually all
> >you need is a tag to identify the buffer pool, offset within a
> >stream, and some smarts in the h/w to associate the tag to a
> >buffer pool. Is this not sufficient ?
> 
> A VA is an address but what that address is a local issue and is only used 
> as a key to find the real address.  As such, I don't see any security issue 
> beyond that associated with any datea exchange on a network and the 
> techniques even used today within FC products can be applied so the 
> technology impacts are actually well understood and can be implemented by 
> multiple vendors.
>

Normal packets - i.e not using RDMA, can't write anywhere. But i can see
how that can be done with RDMA i.e restict the region accessed by
the NIC for a given RDMA transaction.  Does the FC products do this ?

> > > on the host / controller if all fields were valid.  If there was an
> > > out-of-order delivery, the data could be spilled to temporary memory 
> > either
> > > in the host or the adapter and upon recovery, delivered to the correct
> > > target buffer without requiring host processor intervention with a little
> > > creativity.  This 12-bytes of overhead for SCSI operations would have
> > > minimal impact on link utilization and overall solution efficiency.  I
> > > believe these same concepts have been stated in the various RDMA proposals
> > > that have been distributed and given the eventual movement to InfiniBand
> > > for servers and the new SRP (SCSI RDMA Protocol), one might want to create
> > > an iSCSI solution that can easily bridge into these other technologies.
> > >
> > > Note: The arguments about adapter complexity, impact to OS, etc. are 
> > rather
> > > moot in many ways.  The work will be done to support InfiniBand over the
> > > next couple of years and thus the cost to implement / support is going to
> > > be fairly minimal.  It should also be noted that many of these changes 
> > have
> > > already be done using PCI / PCI-X based solutions that support VIA,
> > > Scheduled Transfer, Oracle, MPI, Sockets Direct, etc. so the ability to
> > > deploy solutions in the highly desirable I/O interconnect independent way
> > > is available today as well.  One does need to wait or rely upon InfiniBand
> > > to make all of this happen.  It should also be noted that many companies
> > > will be working to have Linux support for this type of technology in the
> > > upcoming year so solutions should be available to all by the time iSCSI
> > > ramps to volume in 2002.
> > >
> >I am not sure which proposal you are talking about. Infiniband working
> >group is looking at the proposal from microsoft for RDMA. I think it
> >is more specific to Infiniband.
> 
> I believe you are referring to the InfiniBand application workgroup which 
> has been looking at how to bootstrap InfiniBand via SRP (SCSI RDMA 
> Protocol) which is one possible mechanism to support storage over an 
> InfiniBand fabric - there are several mechanisms being defined for 

This is Winsock direct proposal. It came to the IBTA working group
i think. I have the document with me. This proposes to bypass the
normal tcp/ip stack and use RDMA to achieve high throughput.

-mohan

> boot.  This is a T10 not a Microsoft effort.  InfiniBand is simply trying 
> to accelerate some of this work but T10 will own and drive it to 
> completion. There is an upcoming T10 meeting Houston in January focused on 
> SRP that people might want to attend concerning InfiniBand.
> 
> Mike
> 



From owner-ips@ece.cmu.edu  Sat Dec 23 12:39:38 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id MAA05656
	for <ips-archive@ietf.org>; Sat, 23 Dec 2000 12:39:38 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id KAA28999
	for ips-outgoing; Sat, 23 Dec 2000 10:45:36 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id KAA28991
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 10:45:29 -0500 (EST)
From: julian_satran@il.ibm.com
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22])
	by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id QAA207780
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 16:44:56 +0100
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])
	by d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id QAA63614
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 16:44:57 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569BE.00567F69 ; Sat, 23 Dec 2000 16:44:47 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C12569BE.00567E20.00@d12mta02.de.ibm.com>
Date: Sat, 23 Dec 2000 17:31:01 +0200
Subject: Re: [Fwd: Review feedback on draft-ietf-ips-iSCSI-02.txt]
Mime-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk



Santosh,

That is in my - "to consider" stack and that is the reason you did not hear
back (don't get nervous!).
Bu here is my position:
(interspersed in text)

Thanks,
Julo

Santosh Rao <santoshr@cup.hp.com> on 23/12/2000 02:06:03

Please respond to Santosh Rao <santoshr@cup.hp.com>

To:   Julian Satran/Haifa/IBM@IBMIL
cc:   santoshr@cup.hp.com
Subject:  [Fwd: Review feedback on draft-ietf-ips-iSCSI-02.txt]




Julian,

I am re-sending this message to you since I did'nt hear any response back
on this.
Your response on some/all of these issues would be highly appreciated.

Thanks and Regards,
Santosh Rao


Santosh Rao wrote:

> Julian/All,
>
> Enclosed is some review feedback on  draft-ietf-ips-iSCSI-02.txt.
>
> Thanks,
> Santosh Rao
>
>
------------------------------------------------------------------------
> Reference : draft-ietf-ips-iSCSI-02.txt
> ========================================
>
> o       When command numbering is turned off (by setting InitCmdRN to 0),
>         what is the value of CmdRN to be used for commands ?
>         It would be preferrable to use a "don't care" value for the CmdRN
>         like 0xFFFFFFFF instead of using 0 which is intended to indicate
>         Immediate Delivery.
>
If you don't use numbering CmdRN is a reserved field. You are only after
avoinding a test on the target?  If the initiator is not using numbering
having each command delivered as imediate convey the right intent.


> o       Normally, command ordering is enforced by the layer above
>         SCSI (ex : file systems, volume managers, etc) and in such
>         cases command ordering should not be required.
>         The spec iSCSI-02.txt states that "iSCSI initiators MUST
implement the
>         command/request numbering scheme if they support more than 1
connection
>         per session" (Section 1.2.2.1).
>         Can the use of command numbering be made optional (using the
existing
>         mechanism of setting InitCmdRN to 0 during LOGIN) if the
initiator
>         stack does not require command ordering, but has chosen to use
>         multiple connections per session ?
I don't see how this will ever work. How can the layer above enforce
ordering and let iSCSI mess it up?

>
> o       The reference to "Response Length" and "Response Data" in the
SCSI
>         Response PDU (Section 2.3) is unclear about Response Data. Does
this
>         refer to the same semantics as the Response Data used in Fibre
>         Channel's FCP_RSP terminology ?
>         Where is the Response Data defined in the standard ?
>

Will clarify in the next text

> o       It would be desirable to have a set of error codes in the SCSI
Response
>         PDU that reflect any iSCSI PDU errors such as :
>         - iSCSI cmd PDU fields invalid
>         - Data Length different from desired length requested in R2T
>         - Data Relative Offset different from buffer offset requested in
R2T
>         ...
>

Will clarify how this is done in the next draft.

> o       A standard REJECT response would be highly desirable for all
command
>         type PDUs (ex : LOGIN, LOGOUT, TEXT, NOP).
>         The REJECT response could include reason code and reason
explanations
>         which allow identification of the reasons for the REJECT.
>

as above (closer to a common iSCSI reject than the current command not
understood)

> o       Section 2.1.5 :
>         "The initiator assigns a task tag to each SCSI task that it
issues."
>         The above should read ..."to each task that it issues", since
initiator
>         task tags are also used for non-scsi tasks like LOGIN, LOGOUT,
NOP,
>         TEXT, etc.

Thanks - will do

>
> o       What are the "don't care" values to be used for ExpCmdRN and
MaxCmdRN
>         when command numbering is turned off ? Can this be explicitly
specified
>         in the draft ?
>
Will do (all reserved fields MUST be 0 unless specifically stated
otherwise)

> o       iSCSI should avoid interpreting Sense Data and creating/using any
iSCSI
>         specific check conditions, if possible. iSCSI specific errors
should be
>         returned in Response Data in the SCSI Response PDU (when the
response
>         data gets defined). THis allows for cleaner layering between SCSI
and
>         iSCSI.
>

That is so now and will be even cleaner in the next text.
> o       The MaxCmdRN provides a form of dynamic queue depth management at
the
>         target end, which complements the standard SCSI Queue Full
mechanisms.
>         However, turning off command numbering (as in the case of single
>         connection per session) also implies that this mechanism is
un-usable.
>         Any thoughts on some alternate schemes that would work even when
using
>         single connections per session ? (Could command numbering be
turned on
>         even for single connection sessions with a different meaning
implying
>         use for queue depth management only and not ordering ?)

The command numbering can be used for queue management even if you have a
single connection.
IF THE WG FEELS STRONGLY ABOUT IT THEN WE COULD raise command numbering to
a MUST implement for every initiator and target.

> o       Section 2.7. The Response field in the SCSI Task Management
Response
>         PDU could do with some enhancing.
>         - The "No Task Found" response can be removed since the target
should
>           not REJECT an Abort Task  received for a non-existent task.
Targets
>           should respond with "Function Complete" for an Abort Task sent
on a
>           completed task. This ensures that failure of Abort Task does
not
>           trigger a higher level of error recovery from the initiator.
>
Pierre Labat found this hole. An abort can come after a reset by some other
initiator.

>         - The following responses could be considered for addition to
>           the list :
>           i)    "No such LUN" when Abort Task Set, CLear Task Set or LUN
Reset
>                 is attempted on an invalid LUN.
>           ii)   "Logical Busy" when a 2nd task management function is
attempted
>                 while a prior conflicting task is in progress.
>                 (ex: attempting LUN Reset while Target Reset is
>                  in progress, etc.)
>          iii)   "Function Not Supported" to allow targets to indicate no
support
>                 for certain task management requests. (ex: target reset
!).
>                 This should be treated differently from "Function
Rejected"
>                 which could be used to indicate an invalid request.
>
Those are allready fixed in the next text (some other way though).
> o       It is still not clear as to why WRITE SCSI Data PDUs need to
specify
>         the LUN. (This is not done in the case of fibre channel.)
>
To let the targets choose liberaly the Target Task Tag
> o       Section 2.12 only specifies version major and version minor. A
Version
>         High and Version Low would help initators and targets negotiate
amongst
>         a range of versions. The major and minor versions can be made 4
bits in
>         size, using the same length as is currently used for version
major and
>         version minor. (There have already been requests for this on the
>         reflector).
Will consider
> o       Section 2.14.
>         The NOP Response PDU should allow a REJECT response for
>         the NOP to deal with an invalid NOP command PDU.
>
All rejects will be treated equal

> o       Section 2.14. NOP Response PDU.
>         "The target SHOULD also duplicate as much of the initiator ping
data as
>         allowed by a configurable target parameter".
>         Is this referring to the PingMaxReplyLength login key ? If so,
the
>         initiator should not be sending more than this length in its NOP
>         command PDU payload. Hence, the above could be re-worded to
indicate
>         that initators should honor PingMaxReplyLength and targets SHOULD
>         duplicate all of the initiator ping data in their response.
>
So you are suggesting that the targets should not respond at all or reject
a ping that does not comply with your rule? Our design rule was to be only
as restrictive as needed.
> o       Section 2.15.
>         "The logout command is used by an initiator to 'clean up' the
target
>         end of a failing connection and enable recovery to start".
>
>         This should be re-phrased since a logout can be used as a form of
a
>         graceful close of the I-T nexus.
>
Will consider
> o       Section 2.17.
>         "The buffer offsets and lengths for consecutive PDUs should form
a
>         continuous range".
>
>         Does the above imply targets are expected to request data
in-order on
>         R2Ts ? This is not a requirement for Fibre Channel currently. If
this
>         is not the intent, then, the above should be re-phrased to make
this
>         more clear.
>
No - it is meant to convey that they should cover the whole range (will
restate)

> o       Section 2.17.
>         "The present document does not limit the number of outstanding
data
>         transfers".
>         This is not true, since MaxOutstandingR2T (as defined in Annexe
C) is
>         used to pre-negotiate the limit on the number of outstanding data
>         transfers.
>
Will fix wording.

> o       Section 2.17.
>         "All outstanding R2T should have different target transfer tags".
>         Target transfer tags are used to track the state of the entire
task
>         and should be the same value for all phases of that task.
>         The ability to have multiple concurrent R2Ts seems to call for an
>         additional qualifier for the task such as a task sub-id. Target
>         firmware would typically use the target task tag to lookup a per
I/O
>         data structure that is tracking the state of that task. In order
to
>         track multiple outstanding R2Ts within the same task, a different
field
>         should be used. (something equivalent to a Sequence ID in Fibre
Channel,
>         if the target task tag were to be thought of as the RX_ID of
fibre
>         channel.)

That's why you have the LUN.  However within this any target can do as it
pleases and the initiator will only echo the tag.

 - santoshr.vcf





From owner-ips@ece.cmu.edu  Sat Dec 23 12:39:42 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id MAA05667
	for <ips-archive@ietf.org>; Sat, 23 Dec 2000 12:39:42 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id KAA29001
	for ips-outgoing; Sat, 23 Dec 2000 10:45:36 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id KAA28992
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 10:45:29 -0500 (EST)
From: julian_satran@il.ibm.com
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22])
	by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id QAA120966
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 16:44:56 +0100
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])
	by d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id QAA63616
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 16:44:57 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569BE.00567F1C ; Sat, 23 Dec 2000 16:44:46 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C12569BE.00567D42.00@d12mta02.de.ibm.com>
Date: Sat, 23 Dec 2000 16:51:57 +0200
Subject: RE: SNIA-SNIFS: Re: SNIA-SNIFS: Clarification on SNIF marketing b
		oundaries
Mime-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk



Tom,

You are overpowering me if by nothing else then by sheer volume.

The reason I said something on this list is because I think that marketing
is important.
It is important to get a simple and clear and accurate message to your
potential customers.

And with my technical persons naivete I still think that you can't market
everything

By SoiP I meant the protocol forwarded by Adaptec.

By audience I meant a specific area and I did not attach to it any
attribute (nor a class).

And FC tunels and gateways are transitional - there won't be a world of 2
or 3 infrastructures.

As for the large installed base of FC - it is non-negligeable - but if you
care to bring up the numbers you will find out that (as a percentage of
storage subsystems sold) it is rater tiny.
In fact FC is the  main inhibitor to the widespread acceptance of SANs
(price, complexity, incompatibilities).  That does not come to say that
there is no market need for iFCP - only that it fullfils a specific need
and if you don't have that need you better look elswhere.

As for speculating about tomorrow - I am not.   My only claim was that
iSCSI (in its current or future form) is about native IP connectivity.  And
it will evolve (you will have iSCSI-0 and iSCSI-n).

And a broad statement that places all IP storage solutions under the same
umbrela will harm most of the companies that form the consortium and will
force their own marketing team to work overtime to contain the damage.

Julo



Tom Clark <tclark@NishanSystems.com> on 22/12/2000 23:09:17

Please respond to Tom Clark <tclark@NishanSystems.com>

To:   Julian Satran/Haifa/IBM@IBMIL, snia-snifs@snia.org
cc:
Subject:  RE: SNIA-SNIFS: Re: SNIA-SNIFS: Clarification on SNIF marketing b
      oundaries




Julian,

I have seen your input so often on the IPS reflector, I have to admit I was
surprised to see you weigh in on this tawdry marketing discussion.

You raise some interesting issues that I would not have expected.

1.  Who's the intended market (audience) for IP storage solutions?

I think that all flavors of IP storage solutions, as well as Fibre Channel
solutions, have the same basic audience.  If it walks, crawls or puts data
on disk, I want to market to it. The audience is customers who have storage
problems.  Now, we may offer different types of solutions to them, but
that's our problem as vendors and competitors.  So I can't agree that every
IP storage protocol effort has its own unique audience.  We may be offering
different sorts of plumbing, but all the customer really wants is running
water.

Additionally, some of the companies supporting specific IP storage
solutions
also sell servers, disk arrays, tape, and FC interconnect.  IBM, for
example.  So I don't think we can logically subdivide the market audience
into little portions, one per solution-type.  From what I've seen of IBM
marketing (as an example), they like other solutions vendors are appealing
to a wide range of enterprises.

2.  Is iSCSI the really the only viable IP storage protocol?

Your statement that iSCSI is the only 'living' attempt to standardize an
(emphatically capitalized) native IP solution implies that someone died.
Given the immaturity of the standards development for a number of protocol
initiatives, I think it would be inappropriate to declare anyone's funeral
just yet.  Who knows what bright minds may bring to the discussion
tomorrow?
Having a native IP solution is a good thing.  But it doesn't mean that
customers won't find value in other IP-based block storage solutions, even
if they are not 1000% dyed-in-the-wool IP top-to-bottom.

FCIP is, as you say, a tunneling protocol.  I don't think that makes it a
second-class citizen.

iFCP is a 'transition' protocol only?  iFCP is great as a transition
protocol, because it makes it far easier to integrate the large installed
base of FC devices.  But as Steve Looby correctly points out, you could
have
iFCP end systems as well.  In that case, it is no longer just a transition
protocol, but a viable means to move block storage data over IP end-to-end.
Will that compete with iSCSI?  Sure.  Is that a crime in these great United
States?  No.

SoIP has been shelved?  By whom?  As I understand it, Storage over IP is a
trademarked marketing strategy of one company, Nishan.  SoIP as a marketing
strategy includes a number of solutions, including iSNS, mFCP, iFCP and
iSCSI, i.e. Nishan views all of these as productive paths to IP storage
solutions.  Perhaps by 'SoIP' you meant mFCP.  Perhaps you meant to say
that
the IPS is not considering UDP-based protocols such as mFCP.  That would be
correct, since the IETF's focus is protocols that will be used on the
Internet.  It would be presumptuous, however, to declare even mFCP
"shelved".  As a protocol seeking standardization, maybe it just needs
another standards body besides IETF.

Obviously, if I were an iSCSI-only marketing proponent, I would like to
declare all other solutions dead or a least not worthy of discussion. But
then, that's marketing bias, isn't it?  As a technologist, however, I see
merit in all these solutions.


3.  A mistake to market all the products to everybody?

The proposed SNIF could fulfill two functions:  advance the message of IP
storage solutions in general, and advance specific messages on individual
solutions, e.g. iSCSI, iFCP, etc.  It would be the responsibility of the
subgroups within an IP Storage SNIF to generate their own collateral,
events, etc. highlighting the benefits of their own solutions.  It would be
the responsibility of the SNIF to generate the aggregate message on the
benefits of this new technology and how it will facilitate integration into
mainstream data networks.

Your statement that "it would be a blunder not to specify where the market
will be over time" is a little overboard, however.  It would indeed be a
marketing blunder if the iSCSI advocates did not paint their picture of the
iSCSI-only future.  It would also be a mistake if the FCIP proponents
didn't
paint a similar picture of a future world inhabited by next-generation FC
devices jointed by FCIP tunneling.  That's marketing.  But unless you have
a
really good Tarot deck, that's not necessarily reality. Stating where you
want the market to be, and where it ends up may be two different things.

4.  Positioning of IP storage to Fibre Channel

The point of my previous email was to clarify the rules of engagement
necessary within the SNIA.  The focus needs to be on positive marketing
messages, advancing all the features/benefits that a solution offers while
avoiding overt negative marketing that may offend other SNIA members.

Granted, in the case of IP storage, I don't know if it's possible to avoid
comparisons to FC.  Then there is the upcoming issue of Infiniband I/O.
However, it is clear that some of the people interested in a SNIF within
SNIA assumed that there would be an open season on all other solutions.  We
need to avoid that, but the devil is in the details, as usual.


5.  Trying to keep an open mind

You state that:  "I would also refrain from focusing on solutions that are
entirely transitional, will not undergo standardization,  and will get the
wholegroup bad press from the networking community if this group starts
talking too much about them (like the UDP based solutions)."

I find this particularly disturbing.

If you don't have a transitional story to tell as part of the IP storage
initiative, you will have no solutions for customers with large installed
bases of FC end systems.  In fact, you may have nothing to talk about
except
futures for some time to come.  Even if you believe wholeheartedly in the
iSCSI promised land, you have to show people how to get there.

It is also inappropriate to talk about solutions that "...will not undergo
standardization...".  In the case of mFCP (and perhaps other solutions that
neither you nor I have heard anything about), the IETF may not be the right
standards body.  There are, however, other standards bodies.  UDP-based
solutions are fine for closed, controlled enterprise network environments.
I don't they have been declared illegal yet.

In the case of iFCP and FCIP, the discussion over how to put these into
standards tracks is still occurring.  This is a technical issue, not a
marketing issue. It would be prejudicial to mix the two.

Tom




















-----Original Message-----
From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
Sent: Friday, December 22, 2000 1:22 AM
To: snia-snifs@snia.org
Subject: SNIA-SNIFS: Re: SNIA-SNIFS: Clarification on SNIF marketing
boundaries


Tom,

That will be fine and will utterly confuse the target audience.

Every one of the protocols that undergo standardization has a targeted
audience and this should be stated clearly.

iSCSI is the only "living" attempt to standardize a NATIVE IP solution.
FCIP is tunneling protocol and iFCP is a "transition" protocol.

SoIP has been shelved.

Once you state clear what are you address you don't have to be negative
about anything.

It would be a mistake anyhow to "market" all the products to everybody and
it would be a blunder not to specify where the market will be over time.

And while I agree that you don't have to bash FC the whole marketing effort
would be worthless if
you can't state why IP is better than FC in the long run and how to get
there.

I would also refrain from focusing on solutions that are entirely
transitional, will not undergo standardization,  and will get the whole
group bad press from the networking community if this group starts talking
too much about them (like the UDP based solutions).

Julo


Tom Clark <tclark@NishanSystems.com> on 22/12/2000 01:45:11

Please respond to tclark@NishanSystems.com

To:   snia-snifs@snia.org
cc:
Subject:  SNIA-SNIFS: Clarification on SNIF marketing boundaries




Friends,

Side discussions on the SNIFs concept have surfaced some possible
misunderstandings on the types of marketing initiatives what will be
encouraged.

An independent industry or trade association can do pretty much what it
wants, given the common interests of its members.   A SNIF within the SNIA
umbrella has the autonomy to generate its own marketing, but should do so
within the spirit of multivendor cooperation that the SNIA fosters.

The SNIF proposal that the board approved states that the SNIF "advocacy
activity must maintain a professional decorum and avoid disparagement of
competing technologies".

An IP Storage SNIF composed of all vendors in the block storage over IP
space would not be allowed to collectively disparage Fibre Channel, as an
example.  An iSCSI SNIF, promoting just one IP storage solution, would not
be allowed to trash FCIP or SoIP (both protocols represented by other SNIA
members).  And so on.  This would include overtly competitive venues like
performance bakeoffs (unless all parties entered into it willingly).

The enforcement for this SNIA policy is the board's ability to revoke the
charter of a SNIF if it consistently violates SNIA principles.

This is restrictive only of negative marketing efforts.  A SNIF has full
license to positively advance the industry interests it represents.

If, as I hope, we end up with an IP Storage SNIF that includes iSCSI and
other protocols as subgroups, each subgroup will be encouraged to spin
positive messages about their implementations and the SNIF as a whole will
be encouraged to create positive messaging about IP storage in general.

An inclusive SNIF that embraces all IP storage solutions is not the
limiting
factor to marketing messages by any individual protocol proponent.   The
limiting factor is participation within the boundaries established by the
SNIA itself.

Thanks,

Tom






















From owner-ips@ece.cmu.edu  Sat Dec 23 14:47:37 2000
Received: from ece.cmu.edu ([128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id OAA06279
	for <ips-archive@ietf.org>; Sat, 23 Dec 2000 14:47:37 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id MAA01275
	for ips-outgoing; Sat, 23 Dec 2000 12:53:28 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id MAA01269
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 12:53:21 -0500 (EST)
Received: from hpcuhe.cup.hp.com (hpcuhe.cup.hp.com [15.0.80.203])
	by palrel1.hp.com (Postfix) with ESMTP id C06BB11AA
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 09:53:20 -0800 (PST)
Received: (from santoshr@localhost)
	by hpcuhe.cup.hp.com (8.9.3 (PHNE_18979)/8.9.3 SMKit7.02) id JAA08423;
	Sat, 23 Dec 2000 09:53:16 -0800 (PST)
From: Santosh Rao <santoshr@cup.hp.com>
Message-Id: <200012231753.JAA08423@hpcuhe.cup.hp.com>
Subject: Multi Connection Sessions and Active/Passive Targets
To: ips@ece.cmu.edu (ips)
Date: Sat, 23 Dec 2000 09:53:15 -0800 (PST)
Cc: santoshr@hpcuhe.cup.hp.com (Santosh Rao)
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

Hello,

I have a question regarding how the multi-connection session model would
work with Active/Passive target devices :

Consider the standard usage of multiple TCP connections within an 
iSCSI session, where the initiator and target setup each connection 
per available network interface port. THe intent here is to allow the SCSI
traffic to be spread across the available netowrk interface ports for the
purpose of load balancing, etc.

However, in the case of Active/Passive SCSI targets, traffic should be
sent only on the active port. Such multi connection sessions would not
distingush between the active and passive ports, causing traffic to be
sent to the passive port and active port. 

The use of multiple connections per session needs to be turned off when
speaking to such active/passive devices. (unless such devices dedicate a
group of network interface ports as active and passive). 
A couple of ways that come to mind to do this are :

a) Initiators maintain knowledge of active/passive device types and will
not initiate more than 1 TCP connection per session on such device types.
However, the recognition of the active/passive device would be based on the VID/PID fields in the response to an INQUIRY command, which makes this a
rather cumbersome approach.

b) Targets should indicate MaxConnections as 1 in their iSCSI LOGIN 
response for active/passive devices. This removes the need for initiators 
to detect Active/Passive devices and ensure the usage of the appropriate 
single or multiple tcp connections per session based on the device type.

Any thoughts on this ?

Thanks,
Santosh
correct usage of mul

-- 
#################################
Santosh Rao
Software Design Engineer,
HP, Cupertino.
email : santoshr@cup.hp.com
Phone : 408-447-3751
#################################


From owner-ips@ece.cmu.edu  Sat Dec 23 17:17:45 2000
Received: from ece.cmu.edu ([128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id RAA07203
	for <ips-archive@ietf.org>; Sat, 23 Dec 2000 17:17:44 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id PAA03682
	for ips-outgoing; Sat, 23 Dec 2000 15:13:51 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id PAA03678
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 15:13:47 -0500 (EST)
Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.132.205])
	by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id PAA37420
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 15:07:17 -0500
Received: from f5n70e (d03nm094h.boulder.ibm.com [9.99.140.70])
	by westrelay02.boulder.ibm.com (8.11.0m3/NCO v4.95) with ESMTP id eBNKDkN39702
	for <ips@ece.cmu.edu>; Sat, 23 Dec 2000 13:13:46 -0700
X-Priority: 1 (High)
Importance: Normal
Subject: IPS ALL:Please use the approprate prefix on your Subject line
To: "ips" <ips@ece.cmu.edu>
X-Mailer: Lotus Notes Release 5.0.3 (Intl) 21 March 2000
Message-ID: <OFF4423B06.72F62FD2-ON882569BE.006EA383@LocalDomain>
From: "John Hufferd" <hufferd@us.ibm.com>
Date: Sat, 23 Dec 2000 12:13:20 -0800
X-MIMETrack: Serialize by Router on D03NM094/03/M/IBM(Release 5.0.3 (Intl)|21 March 2000) at
 12/23/2000 01:13:45 PM
MIME-Version: 1.0
Content-type: text/plain; charset=us-ascii
Sender: owner-ips@ece.cmu.edu
Precedence: bulk


Again, I am requesting that you use the approprate prefix on your Subject
lines when you send messages, or reply to messages.  Please use iSCSI,
FCIP, iFCP, etc.  If you are responding to a message that did not include
the prefix, please add it to your response.

.
.
.
John L. Hufferd
Senior Technical Staff Member (STSM)
IBM/SSG San Jose Ca
(408) 256-0403, Tie: 276-0403
Internet address: hufferd@us.ibm.com



From owner-ips@ece.cmu.edu  Sun Dec 24 04:00:23 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id EAA06901
	for <ips-archive@ietf.org>; Sun, 24 Dec 2000 04:00:23 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id CAA14397
	for ips-outgoing; Sun, 24 Dec 2000 02:07:00 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id CAA14390
	for <ips@ece.cmu.edu>; Sun, 24 Dec 2000 02:06:55 -0500 (EST)
From: julian_satran@il.ibm.com
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22])
	by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id IAA30300
	for <ips@ece.cmu.edu>; Sun, 24 Dec 2000 08:06:24 +0100
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])
	by d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id IAA265204
	for <ips@ece.cmu.edu>; Sun, 24 Dec 2000 08:06:20 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569BF.00270568 ; Sun, 24 Dec 2000 08:06:12 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C12569BF.0027048F.00@d12mta02.de.ibm.com>
Date: Sun, 24 Dec 2000 09:02:12 +0200
Subject: Re: IPS Interim meeting: High level agenda
Mime-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk



Elizabeth,

I would appreciate if you could reserve a slot for error reporting and
recovery in iSCI - it has never been discussed in enough detail.

Julo

"Rodriguez, Elizabeth G (Elizabeth)" <egrodriguez@lucent.com> on 22/12/2000
20:31:01

Please respond to "Rodriguez, Elizabeth G (Elizabeth)"
      <egrodriguez@lucent.com>

To:   "IPS Mailing List (E-mail)" <ips@ece.cmu.edu>
cc:   "Allison Mankin (E-mail)" <mankin@ISI.EDU>, "David Black (E-mail)"
      <black_david@emc.com>, "Steven M. Bellovin (E-mail)"
      <smb@research.att.com>, "Scott Bradner (E-mail)" <sob@harvard.edu>
Subject:  IPS Interim meeting:  High level agenda




Hi all,

We still do not have a detailed agenda for the IPS interim meeting.  We
will
be addressing that the first part of the year.
But, did want to publish a high level agenda so that people can make their
travel arrangements.

Tuesday, Jan 16 8 am: Common issues
iSCSI will start immediately after common issues, no later than noon.  If
common issues finish early, e.g. at 9 or 10 am, iSCSI will start then, so
don't make travel arrangements assuming that iSCSI will start in the
afternoon...

Wed, Jan 17 8am - 9:30 am iSCSI wrap-up.  This has a 1/2 hour overlap with
CAP.

9:30-5 -- FC issues.

Also, the intent behind this meeting is to get good face time to thrash out
issues facing the various specifications.  So, we will not be doing
'status'
reports.  Instead, we need to identify issues that need to be discussed and
focus on them.  Instead of requesting time on the agenda for various
documents, please send in a list of issues that you feel need to be
addressed.  We will try to allocate time based on that.

Thanks,

Elizabeth





From owner-ips@ece.cmu.edu  Tue Dec 26 17:22:18 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id RAA25867
	for <ips-archive@ietf.org>; Tue, 26 Dec 2000 17:22:18 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id PAA12879
	for ips-outgoing; Tue, 26 Dec 2000 15:09:20 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from ludwig.troikanetworks.com (host03.troikanetworks.com [12.31.172.3] (may be forged))
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id PAA12873
	for <ips@ece.cmu.edu>; Tue, 26 Dec 2000 15:09:14 -0500 (EST)
Received: by host03.troikanetworks.com with Internet Mail Service (5.5.2653.19)
	id <ZPSRGYK4>; Tue, 26 Dec 2000 12:08:47 -0800
Message-ID: <C7CA595F9B9FD311A40D009027DC4A85BB87BB@host03.troikanetworks.com>
From: Wayland Jeong <wayland@troikanetworks.com>
To: Wayland Jeong <wayland@troikanetworks.com>, ips@ece.cmu.edu
Subject: iSCSI: RE: Framing Discussion
Date: Tue, 26 Dec 2000 12:08:41 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

I think I now understand the assumptions regarding the single-bit TCP option
for indicating the presence of a PDU header in the current segment (thanks
Costa). It boils down to a hardware implementation that is tuned for the
best case (i.e. small amount of re-assembly memory on the NIC to park
fragmented PDU's with the assumption that the next aligned PDU is coming
shortly) and a method for dropping things into software when we are talking
to an ill-behaved NIC. Okay, I'll buy that.

Now, I have another dumb question. For direct data placement, the
discussions have been centered mostly around the need for alignment when
parsing PDU's on the receiving iSCSI TOE. One potential issue that has not
been discussed is the problem of how to handle re-transmission on the
sending iSCSI TOE. 

From previous discussions, I am assuming that our goal is to avoid having a
network BWDP worth of memory on the NIC. The receiver can avoid this memory
by recovering PDU alignment in the TCP stream and using the self-describing
headers in the wire protocol (either iSCSI offsets or a RDMA shim layer) to
put the data directly in the buffer cache. On the sending side, we can DMA
directly from iSCSI descriptor CDB's into the TCP pipe using a hardware
path. But, unless we keep all of those un-acked TCP segment buffers around
in the NIC, it will be difficult to recover the context when we have to
re-transmit. 

Let's suppose that we have an iSCSI TCP connection in which we have multiple
outstanding I/O's. Thus, the byte stream has interleaved within it commands
and data from different I/O's. When we detect a dropped segment either
through normal TCP congestion or via SACK, how do we map the missing byte
block to the appropriate context? If we keep the segments around, then we
could match the missing segment easily and re-transmit. But that would
require the NIC to implement a BWDP's worth of transmit buffer memory. 

To have the iSCSI TOE re-transmit directly from the buffer cache, it seems
that we would need some sort of context that would allow us to map a byte
window to a specific, meaningful point somewhere in the middle of a CDB
context. Essentially, you need enough context to be able to re-construct the
TCP fifo since the memory in this fifo has since been effectively
re-allocated. Maybe this isn't too hard, but it sure sounds like a difficult
problem for hardware to solve. But, as the software folks around here keep
telling me, "it's just gates" ;-)

-Wayland


From owner-ips@ece.cmu.edu  Wed Dec 27 03:46:02 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id DAA15435
	for <ips-archive@ietf.org>; Wed, 27 Dec 2000 03:46:01 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id BAA22225
	for ips-outgoing; Wed, 27 Dec 2000 01:29:11 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id BAA22221
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 01:29:06 -0500 (EST)
From: julian_satran@il.ibm.com
Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23])
	by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id HAA43398
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 07:28:35 +0100
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])
	by d12relay02.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id HAA18608
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 07:28:34 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569C2.00239285 ; Wed, 27 Dec 2000 07:28:32 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C12569C2.002390B7.00@d12mta02.de.ibm.com>
Date: Wed, 27 Dec 2000 08:22:58 +0200
Subject: Re: iSCSI: RE: Framing Discussion
Mime-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk



Wayland,

Let's ignore for the first approximation anything that is not data.
Our assumption when outlining iSCSI was that all send data  (including most
of the iSCSI headers can be rebuilt from context).   The only context
information a sender has to keep is an association of a sender TCP sequence
number with either an iSCSI header or a SCSI buffer address and length.
The details will certainly vary with implementation.
In a loosely integrated solution the TCP stack will keep this context for
you.

In a tightly integrated solution you will have the TCP, on resend,
call-back the iSCSI layer to rebuild
headers and restate data addresses in which case iSCSI has to keep in its
tables some relation between the TCP sequence number and a specific data
piece (Task Tag, Data-packet-# etc.).

Regards,
Julo



Wayland Jeong <wayland@troikanetworks.com> on 26/12/2000 22:08:41

Please respond to Wayland Jeong <wayland@troikanetworks.com>

To:   Wayland Jeong <wayland@troikanetworks.com>, ips@ece.cmu.edu
cc:
Subject:  iSCSI: RE: Framing Discussion




I think I now understand the assumptions regarding the single-bit TCP
option
for indicating the presence of a PDU header in the current segment (thanks
Costa). It boils down to a hardware implementation that is tuned for the
best case (i.e. small amount of re-assembly memory on the NIC to park
fragmented PDU's with the assumption that the next aligned PDU is coming
shortly) and a method for dropping things into software when we are talking
to an ill-behaved NIC. Okay, I'll buy that.

Now, I have another dumb question. For direct data placement, the
discussions have been centered mostly around the need for alignment when
parsing PDU's on the receiving iSCSI TOE. One potential issue that has not
been discussed is the problem of how to handle re-transmission on the
sending iSCSI TOE.

From previous discussions, I am assuming that our goal is to avoid having a
network BWDP worth of memory on the NIC. The receiver can avoid this memory
by recovering PDU alignment in the TCP stream and using the self-describing
headers in the wire protocol (either iSCSI offsets or a RDMA shim layer) to
put the data directly in the buffer cache. On the sending side, we can DMA
directly from iSCSI descriptor CDB's into the TCP pipe using a hardware
path. But, unless we keep all of those un-acked TCP segment buffers around
in the NIC, it will be difficult to recover the context when we have to
re-transmit.

Let's suppose that we have an iSCSI TCP connection in which we have
multiple
outstanding I/O's. Thus, the byte stream has interleaved within it commands
and data from different I/O's. When we detect a dropped segment either
through normal TCP congestion or via SACK, how do we map the missing byte
block to the appropriate context? If we keep the segments around, then we
could match the missing segment easily and re-transmit. But that would
require the NIC to implement a BWDP's worth of transmit buffer memory.

To have the iSCSI TOE re-transmit directly from the buffer cache, it seems
that we would need some sort of context that would allow us to map a byte
window to a specific, meaningful point somewhere in the middle of a CDB
context. Essentially, you need enough context to be able to re-construct
the
TCP fifo since the memory in this fifo has since been effectively
re-allocated. Maybe this isn't too hard, but it sure sounds like a
difficult
problem for hardware to solve. But, as the software folks around here keep
telling me, "it's just gates" ;-)

-Wayland





From owner-ips@ece.cmu.edu  Wed Dec 27 13:17:23 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id NAA19928
	for <ips-archive@ietf.org>; Wed, 27 Dec 2000 13:17:23 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id LAA00525
	for ips-outgoing; Wed, 27 Dec 2000 11:08:04 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from main.connectcom.net (anubis.advansys.com [204.247.22.2])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id LAA00521
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 11:07:59 -0500 (EST)
Received: from yp_portable (anubis.advansys.com [204.247.22.2]) by main.connectcom.net with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id ZJ5LTHW4; Wed, 27 Dec 2000 08:07:58 -0800
From: "Y P Cheng" <ycheng@advansys.com>
To: <ips@ece.cmu.edu>
Subject: RE: iSCSI: RE: Framing Discussion
Date: Wed, 27 Dec 2000 08:08:03 -0800
Message-ID: <000201c0701f$30f60320$90c809c0@yp_portable.advansys.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
Importance: Normal
In-Reply-To: <C7CA595F9B9FD311A40D009027DC4A85BB87BB@host03.troikanetworks.com>
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

> Now, I have another dumb question. For direct data placement, the
> discussions have been centered mostly around the need for alignment when
> parsing PDU's on the receiving iSCSI TOE. One potential issue that has not
> been discussed is the problem of how to handle re-transmission on the
> sending iSCSI TOE.
> From previous discussions, I am assuming that our goal is to
> avoid having a network BWDP worth of memory on the NIC. The receiver
> can avoid this memory by recovering PDU alignment in the TCP stream
> and using the self-describing headers in the wire protocol
> (either iSCSI offsets or a RDMA shim layer) to put the data
> directly in the buffer cache. On the sending side, we can DMA
> directly from iSCSI descriptor CDB's into the TCP pipe using a hardware
> path. But, unless we keep all of those un-acked TCP segment buffers around
> in the NIC, it will be difficult to recover the context when we have to
> re-transmit.

Wayland, I enjoyed reading your questions.  Obviously, you have given a lot
of thoughts on implementing the iSCSI.  To answer your question on
retransmit, just remember it is actually very easy.  You are the sender. You
have total control -- I assume you are not using an existing TCP
implementation.  Therefore, from the TCP segment sequence number as a sender
you should know how to reassemble the data.  No extra buffers are needed.

> Let's suppose that we have an iSCSI TCP connection in which we
> have multiple outstanding I/O's. Thus, the byte stream has
> interleaved within it commands and data from different I/O's.
> When we detect a dropped segment either through normal TCP congestion
> or via SACK, how do we map the missing byte block to the
> appropriate context? If we keep the segments around, then we
> could match the missing segment easily and re-transmit. But that would
> require the NIC to implement a BWDP's worth of transmit buffer memory.
>
> To have the iSCSI TOE re-transmit directly from the buffer cache, it seems
> that we would need some sort of context that would allow us to map a byte
> window to a specific, meaningful point somewhere in the middle of a CDB
> context. Essentially, you need enough context to be able to re-construct
> the TCP fifo since the memory in this fifo has since been effectively
> re-allocated. Maybe this isn't too hard, but it sure sounds like
> a difficult problem for hardware to solve. But, as the software
> folks around here keep telling me, "it's just gates" ;-)

Yes, multiple I/O's and interleaved data streams require a context manager
who maps the missing segment back to its large exchange table to determine
how to retransmit the dropped segment.  No, Wayland, I would not do it in
hardware.  It is all in microcode.  The microcode size is actually not that
big. On the contrary, the exchange table can be a few hundred KB's.  All you
need is a very very fast microengine with small number of gates, a true
RISC.  Please keep asking the "dumb" questions.  I am mostly impressed by
your questions.

Y.P. Cheng, Connectom Solutions.



From owner-ips@ece.cmu.edu  Wed Dec 27 16:07:23 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id QAA24452
	for <ips-archive@ietf.org>; Wed, 27 Dec 2000 16:07:23 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id OAA04198
	for ips-outgoing; Wed, 27 Dec 2000 14:10:31 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from ludwig.troikanetworks.com (host03.troikanetworks.com [12.31.172.3] (may be forged))
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id OAA04192
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 14:10:26 -0500 (EST)
Received: by host03.troikanetworks.com with Internet Mail Service (5.5.2653.19)
	id <ZPSRGZG8>; Wed, 27 Dec 2000 11:10:00 -0800
Message-ID: <C7CA595F9B9FD311A40D009027DC4A85BB87BC@host03.troikanetworks.com>
From: Wayland Jeong <wayland@troikanetworks.com>
To: "'Y P Cheng'" <ycheng@advansys.com>, ips@ece.cmu.edu
Subject: RE: iSCSI: RE: Framing Discussion
Date: Wed, 27 Dec 2000 11:09:59 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

[ stuff deleted ]

>> Let's suppose that we have an iSCSI TCP connection in which we
>> have multiple outstanding I/O's. Thus, the byte stream has
>> interleaved within it commands and data from different I/O's.
>> When we detect a dropped segment either through normal TCP congestion
>> or via SACK, how do we map the missing byte block to the
>> appropriate context? If we keep the segments around, then we
>> could match the missing segment easily and re-transmit. But that would
>> require the NIC to implement a BWDP's worth of transmit buffer memory.
>>
>> To have the iSCSI TOE re-transmit directly from the buffer cache, it
seems
>> that we would need some sort of context that would allow us to map a byte
>> window to a specific, meaningful point somewhere in the middle of a CDB
>> context. Essentially, you need enough context to be able to re-construct
>> the TCP fifo since the memory in this fifo has since been effectively
>> re-allocated. Maybe this isn't too hard, but it sure sounds like
>> a difficult problem for hardware to solve. But, as the software
>> folks around here keep telling me, "it's just gates" ;-)
>
> Yes, multiple I/O's and interleaved data streams require a context manager
> who maps the missing segment back to its large exchange table to determine
> how to retransmit the dropped segment.  No, Wayland, I would not do it in
> hardware.  It is all in microcode.  The microcode size is actually not
that
> big. On the contrary, the exchange table can be a few hundred KB's.  All
you
> need is a very very fast microengine with small number of gates, a true
> RISC.  Please keep asking the "dumb" questions.  I am mostly impressed by
> your questions.
>
Thanks for the reply. You'll have to let me know which questions don't
impress you ;-)

Yes, I am assuming that the re-transmit process will be handled in
firmware/micro-code. It's still gates though, they just happen to be in
someone's uP core ;-) It still seems like a tough problem in the general
case. 

Let's assume a worst case scenario. The iSCSI PDU size is greater than the
TCP MSS and the network MTU and you are talking to a firewall that is
re-packaging your TCP stream. No matter how much you try to send nicely
aligned PDU's, the firewall is going to take your less than MSS size TCP
segments and package them up so that you get full-size TCP segments by the
time it hits the target. The target detects a missing segment and keeps the
left edge of the window constant for three consecutive ACK's. Furthermore,
we are using the SACK option in TCP to optimize our performance over LFN's.
Thus, we are presented with the exact blocks that are missing.
Unfortunately, these missing blocks have fragments of PDU's from different
I/O's (could be command, could be data). Even worse, since we chose a PDU
size greater than MSS, some segments might be part of a PDU that does not
contain an iSCSI header and does contains a digest covering the entire PDU. 

Yeesh!! Thank goodness all I have to do is drop-in an embedded processor
into our chip. I'll let the firmware folks deal with this problem.
Certainly, this path does not have to be high-performance since we are going
into congestion control anyway, but we have to deal with it. We can keep a
context stack for the current open TCP sessions which contain mappings of
TCP sequence numbers to specific CDB context locations (either command or
offset within the gather list). We can keep this stack as deep as the
maximum number of outstanding (i.e. not ack'd) TCP segments which for
Randy's example (1.25Gbs and RTT of 100ms) is not too bad (about 8K
entries). We can recover the contexts needed to re-build this missing TCP
segment and re-construct entire PDU(s) so that any necessary digests can be
re-calculated. We can then stage this data in memory somewhere and pull-out
the exact TCP block that we need to re-send. Lovely.

I'm not saying it's impossible, but I am saying that implementing Fibre
Channel looks like a walk in the park compared to this stuff.

BTW, regarding the current iSCSI draft. I didn't see a Login/Text key
associated with negotiating the iSCSI PDU size. Is it assumed that an iSCSI
implementation should handle any PDU size?

> Y.P. Cheng, Connectom Solutions.
>
-Wayland


From owner-ips@ece.cmu.edu  Wed Dec 27 17:27:54 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id RAA25659
	for <ips-archive@ietf.org>; Wed, 27 Dec 2000 17:27:54 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id PAA05955
	for ips-outgoing; Wed, 27 Dec 2000 15:32:05 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from main.connectcom.net (ns1.advansys.com [204.247.22.3] (may be forged))
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id PAA05951
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 15:32:01 -0500 (EST)
Received: from yp_portable (anubis.advansys.com [204.247.22.2]) by main.connectcom.net with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id ZJ5LTH96; Wed, 27 Dec 2000 12:32:00 -0800
From: "Y P Cheng" <ycheng@advansys.com>
To: "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
Subject: RE: iSCSI: RE: Framing Discussion
Date: Wed, 27 Dec 2000 12:32:03 -0800
Message-ID: <000301c07044$12487780$90c809c0@yp_portable.advansys.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
Importance: Normal
In-Reply-To: <C7CA595F9B9FD311A40D009027DC4A85BB87BC@host03.troikanetworks.com>
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

> Yeesh!! Thank goodness all I have to do is drop-in an embedded processor
> into our chip. I'll let the firmware folks deal with this problem.
> Certainly, this path does not have to be high-performance since
> we are going into congestion control anyway, but we have to deal with it.
> We can keep a
> context stack for the current open TCP sessions which contain mappings of
> TCP sequence numbers to specific CDB context locations (either command or
> offset within the gather list). We can keep this stack as deep as the
> maximum number of outstanding (i.e. not ack'd) TCP segments which for
> Randy's example (1.25Gbs and RTT of 100ms) is not too bad (about 8K
> entries). We can recover the contexts needed to re-build this missing TCP
> segment and re-construct entire PDU(s) so that any necessary
> digests can be re-calculated. We can then stage this data in memory
> somewhere and pull-out the exact TCP block that we need to re-send.
> Lovely.

You forget the most important fact of an iSCSI adapter.  The microcode does
the transmit knows exactly how it creates a TCP segment.  First, it would
not mix multiple PDU's into a single segment.  Second, there is no TCP stack
in the iSCSI adapter.   While a command or status PDU is created on the fly,
all data PDUs are retransmitted from the application buffers which are not
freed until an exchange is complete.  Third, there is a separate TCP stream
to each connection.  The task of retransmit is to find the exchange number
by using the end-point and segment sequence number. The microcode keeps
track the starting sequence numbers for multiple open exchanges.  However,
one can take an easy way out by having only one open exchange per endpoint.
Of course, this slows things down.  But, don't be too surprised that some
implementation might just do that.   As I said before, to keep track of a
few thousand exchanges, there is a lot of memory needed.  However, the
algorithm to go through the exchange context to retransmit a missing segment
is straight forward and does not take a rocket scientist!  (Or should I say
an iSCSI scientist?)

In summary, transmit is easy because you got to call the shots as long as
rules are followed.  Can't say the same for receive when old TCP
implementations must be honored.



From owner-ips@ece.cmu.edu  Wed Dec 27 19:13:29 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id TAA26590
	for <ips-archive@ietf.org>; Wed, 27 Dec 2000 19:13:29 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id RAA08055
	for ips-outgoing; Wed, 27 Dec 2000 17:11:53 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from ludwig.troikanetworks.com (host03.troikanetworks.com [12.31.172.3] (may be forged))
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id RAA08051
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 17:11:49 -0500 (EST)
Received: by host03.troikanetworks.com with Internet Mail Service (5.5.2653.19)
	id <ZPSRGZNB>; Wed, 27 Dec 2000 14:11:23 -0800
Message-ID: <C7CA595F9B9FD311A40D009027DC4A85BB87BE@host03.troikanetworks.com>
From: Wayland Jeong <wayland@troikanetworks.com>
To: "'Y P Cheng'" <ycheng@advansys.com>,
        "'ips@ece.cmu.edu'"
	 <ips@ece.cmu.edu>
Subject: RE: iSCSI: RE: Framing Discussion
Date: Wed, 27 Dec 2000 14:11:23 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

>> Yeesh!! Thank goodness all I have to do is drop-in an embedded processor
>> into our chip. I'll let the firmware folks deal with this problem.
>> Certainly, this path does not have to be high-performance since
>> we are going into congestion control anyway, but we have to deal with it.
>> We can keep a
>> context stack for the current open TCP sessions which contain mappings of
>> TCP sequence numbers to specific CDB context locations (either command or
>> offset within the gather list). We can keep this stack as deep as the
>> maximum number of outstanding (i.e. not ack'd) TCP segments which for
>> Randy's example (1.25Gbs and RTT of 100ms) is not too bad (about 8K
>> entries). We can recover the contexts needed to re-build this missing TCP
>> segment and re-construct entire PDU(s) so that any necessary
>> digests can be re-calculated. We can then stage this data in memory
>> somewhere and pull-out the exact TCP block that we need to re-send.
>> Lovely.
>
> You forget the most important fact of an iSCSI adapter.  The microcode
does
> the transmit knows exactly how it creates a TCP segment.  First, it would
> not mix multiple PDU's into a single segment.  
>
Yes. The iSCSI adapter can make this choice. But, my concern is in talking
to a proxy like a firewall that terminates TCP. Such a proxy may and
probably will re-package your TCP stream especially if you are sending less
than MSS size segments which you inevitably will if you are trying to align
PDU's at the sender. If a proxy re-bundles your TCP stream and that stream
hits the target with a missing segment, that segment could cross alignment
boundaries. Normally, this will not be the case, but if you design an iSCSI
TOE that cannot handle TCP ACK's with non-PDU aligned sequence numbers then
your TOE has some inherent limitations. At a minimum, it shouldn't break.
Thus, some amount of context and an ability to handle the situation, albeit
on a very slow path, is required.

> Second, there is no TCP stack
> in the iSCSI adapter.   While a command or status PDU is created on the
fly,
> all data PDUs are retransmitted from the application buffers which are not
> freed until an exchange is complete. 
>
I'm not sure what you mean by application buffer. We want to immediately
free the NIC buffers since one of the goals discussed in Randy's framing
document is to avoid having a BWDP worth of high-speed memory on the NIC.
Current Fibre Channel HBA's are single chip solutions with no off-board
memory and only a handful (4-8) transmit FC frame buffers on-chip. If you
are referring to the buffer cache, then you have the same problem which is
how to map the TCP sequence numbers back to your SCSI protocol-level
constructs.

> Third, there is a separate TCP stream
> to each connection.  The task of retransmit is to find the exchange number
> by using the end-point and segment sequence number. The microcode keeps
> track the starting sequence numbers for multiple open exchanges.  However,
> one can take an easy way out by having only one open exchange per
endpoint.
> Of course, this slows things down.  But, don't be too surprised that some
> implementation might just do that.   As I said before, to keep track of a
> few thousand exchanges, there is a lot of memory needed.  However, the
> algorithm to go through the exchange context to retransmit a missing
segment
> is straight forward and does not take a rocket scientist!  (Or should I
say
> an iSCSI scientist?)
>
Yes, it does simplify things if you only have a single outstanding I/O, but
speaking from experience, your performance will stink.

> 
> In summary, transmit is easy because you got to call the shots as long as
> rules are followed.  Can't say the same for receive when old TCP
> implementations must be honored.
>
Thanks again for the input. This discussion really isn't too relevant to the
efforts by this group to standardize a mapping of iSCSI to TCP. These
implementation details are more about the complexity associated with
designing high-performance implementations of iSCSI. By high-performance, I
mean implementations that can meet or exceed current Fibre Channel
performance and can be brought to market at cost parity with FC HBA's.

-Wayland


From owner-ips@ece.cmu.edu  Wed Dec 27 19:13:50 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id TAA26606
	for <ips-archive@ietf.org>; Wed, 27 Dec 2000 19:13:49 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id RAA08538
	for ips-outgoing; Wed, 27 Dec 2000 17:32:41 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from mta6.snfc21.pbi.net (mta6.snfc21.pbi.net [206.13.28.240])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id RAA08534
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 17:32:38 -0500 (EST)
Received: from IETF ([63.202.160.80])
 by mta6.snfc21.pbi.net (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9)
 with SMTP id <0G68007M8XGL20@mta6.snfc21.pbi.net> for ips@ece.cmu.edu; Wed,
 27 Dec 2000 13:38:59 -0800 (PST)
Date: Wed, 27 Dec 2000 13:39:47 -0800
From: Douglas Otis <dotis@sanlight.net>
Subject: RE: Framing Discussion
In-reply-to: <5.0.0.25.2.20001222073340.040df828@hpindlm.cup.hp.com>
To: Michael Krause <krause@cup.hp.com>
Cc: end2end-interest <end2end-interest@ISI.EDU>, ips@ece.cmu.edu
Message-id: <NEBBJGDMMLHHCIKHGBEJEEMECDAA.dotis@sanlight.net>
MIME-version: 1.0
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
Content-type: text/plain; charset="iso-8859-1"
Content-transfer-encoding: 7bit
Importance: Normal
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
X-Priority: 3 (Normal)
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

Michael,

> At 04:58 PM 12/20/2000 -0800, Douglas Otis wrote:
>
> >problem of the block size being less than the page size.  Unless the SCSI
> >application is forced to allocate in pages and you have a means
> to force the
> >alignment of these blocks as they are delivered by the network, then this
> >MMU technique is not available.
>
> Many file systems allocate in nice multiple 4KB quantities making much of
> this fairly straight to implement and for partials these are
> either placed
> in a multiple 4KB buffer or these tend to align on power of 2 quantities
> buffers so the performance impacts are mitigated.

Here we have a sticky problem of alignment.  You envision an intelligent
adapter forcing this alignment on an application specific basis.  This
intelligent adapter will recognize content of the TCP stream, remove
prefixed iSCSI headers, and pad-out non-page-sized data transfers.  The
intelligent adapter will use content of this TCP stream to direct iSCSI
headers into different buffers than SCSI data and keep a relationship
between header and data.  One means would be to extract the tag within the
header and use the pre-allocated buffer to place data.  In David Black's
view, this interface would be the same as a SCSI adapter.  In that manner,
the header-data association is automatic.  Your desire is to allow a simple
merge of InfiniBand hardware with IP technology.  Presently InfiniBand does
not automate IP nor SCSI, so you are trying to solving more than an SCSI-TCP
specific problem.

Julian has presented his view of adding a suffix to the IP header as a means
of framing and adding InfiniBand-IP differential information.  Those fearful
of this approach could use standard TCP and suffer additional buffer
requirements or retransmissions upon a segment loss.  Will there be enough
performance loss to justify such a transition and will InfiniBand be
motivation for this transition?  InfiniBand is agnostic about the protocol
and has not breached the area of IP or SCSI with perhaps the exception of
using IPv6 size addresses.  With such short-comings of TCP, why use a
modified TCP to instantiate a new SCSI interface?  Solutions being hardware
based, especially for those within InfiniBand, what benefit is derived using
this modified TCP?  You will be forced to handle two such protocols.  Normal
and the enhanced for wide screen.

In Julian's words:

  "Many of the new applications could benefit from using functions
   available from a new transport (like SCTP) but no vendor is going
   to "bet the business" on an all new and radically different
   transport protocol - unless there is a transition plan that
   allows him to start by using a mature transport protocol and then
   migrate it to a new protocol."

This transition plan is going from TCP to TAF-TCP, where the application
interacts at the datagram level to align retransmit and where additional
headers are added ahead of the transport headers.  To review iSCSI, it is
SCSI on iSCSI on TCP on TAF on IP.  How is it good for either IP or
InfiniBand to add additional application specific layers?  To deploy,
network equipment is impacted by this transition so you are not just
stepping on your own TOEs.  It would appear that even Julian agrees that
SCTP is a solution, but he can not see the transition.  RFC 2960 does
provide the framing for RDMA, VI, SCSI, IPC, gaming, multi-media, audio, and
every feature being sought with this TAF extension.  Which provides the
superior solution and allows the cleanest transition?

See:
http://www.haifa.il.ibm.com/satran/ips/draft-satran-transport-adaptation-fra
mework-00.txt

Doug


> InfiniBand is looking to standardize the interface to
> InfiniBand-based TOE
> (TCP Off-load Engine) endnodes so that  a well-defined wire
> protocol can be
> created for all IHVs to implement above the InfiniBand transport
> protocol.  I think there might be some benefit to do the same thing for
> iSCSI as for TOE in an appropriate forum while allowing this workgroup to
> proceed with the iSCSI definition which defines the operational model and
> wire protocol being used.  In fact, if the iSCSI workgroup sets up some
> basic rules as outlined in another response from me today or along the
> lines of the various RDMA proposals, then the wire protocol
> defined by the
> iSCSI workgroup in essence defines much of this needed interface.
>  Need to
> think about what else if anything might be needed.
>
> Mike
>
>



From owner-ips@ece.cmu.edu  Wed Dec 27 22:19:17 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id WAA28826
	for <ips-archive@ietf.org>; Wed, 27 Dec 2000 22:19:16 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id UAA11464
	for ips-outgoing; Wed, 27 Dec 2000 20:11:46 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from mta5.snfc21.pbi.net (mta5.snfc21.pbi.net [206.13.28.241])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id UAA11458
	for <ips@ece.cmu.edu>; Wed, 27 Dec 2000 20:11:42 -0500 (EST)
Received: from IETF ([63.202.160.80])
 by mta5.snfc21.pbi.net (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9)
 with SMTP id <0G69004X32IJE9@mta5.snfc21.pbi.net> for ips@ece.cmu.edu; Wed,
 27 Dec 2000 15:27:56 -0800 (PST)
Date: Wed, 27 Dec 2000 15:28:41 -0800
From: Douglas Otis <dotis@sanlight.net>
Subject: RE: iSCSI: RE: Framing Discussion
In-reply-to: <000301c07044$12487780$90c809c0@yp_portable.advansys.com>
To: Y P Cheng <ycheng@advansys.com>, "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
Message-id: <NEBBJGDMMLHHCIKHGBEJOEMECDAA.dotis@sanlight.net>
MIME-version: 1.0
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
Content-type: text/plain; charset="iso-8859-1"
Content-transfer-encoding: 7bit
Importance: Normal
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
X-Priority: 3 (Normal)
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

Y.P.,

With respect to networks, there is more than just SCSI.  Yes, we can create
a network interface that solves immediate needs of SCSI, but at the
potential expense of adding a unique solution per each application demanding
content directed services.  IPS advocates are not allowed to discuss impact
of this interface such as if this connection is shared with IPs or ports or
how an application is indicated.  Once vendor specific details of SCSI are
solved, this same connection may also then be called upon to create another
unique interface for VI or IPC.  Are application/vendor unique solutions
beneficial if there is only a resemblance to the original protocol where
different schemes employed?  As these application standards change, be sure
to include Flash memory for your embedded processor.  Is this how WinModem
was created where application and hardware interface is blurred by vendor
unique solutions?

Doug

> > Yeesh!! Thank goodness all I have to do is drop-in an embedded processor
> > into our chip. I'll let the firmware folks deal with this problem.
> > Certainly, this path does not have to be high-performance since
> > we are going into congestion control anyway, but we have to
> deal with it.
> > We can keep a
> > context stack for the current open TCP sessions which contain
> mappings of
> > TCP sequence numbers to specific CDB context locations (either
> command or
> > offset within the gather list). We can keep this stack as deep as the
> > maximum number of outstanding (i.e. not ack'd) TCP segments which for
> > Randy's example (1.25Gbs and RTT of 100ms) is not too bad (about 8K
> > entries). We can recover the contexts needed to re-build this
> missing TCP
> > segment and re-construct entire PDU(s) so that any necessary
> > digests can be re-calculated. We can then stage this data in memory
> > somewhere and pull-out the exact TCP block that we need to re-send.
> > Lovely.
>
> You forget the most important fact of an iSCSI adapter.  The
> microcode does
> the transmit knows exactly how it creates a TCP segment.  First, it would
> not mix multiple PDU's into a single segment.  Second, there is
> no TCP stack
> in the iSCSI adapter.   While a command or status PDU is created
> on the fly,
> all data PDUs are retransmitted from the application buffers which are not
> freed until an exchange is complete.  Third, there is a separate
> TCP stream
> to each connection.  The task of retransmit is to find the exchange number
> by using the end-point and segment sequence number. The microcode keeps
> track the starting sequence numbers for multiple open exchanges.  However,
> one can take an easy way out by having only one open exchange per
> endpoint.
> Of course, this slows things down.  But, don't be too surprised that some
> implementation might just do that.   As I said before, to keep track of a
> few thousand exchanges, there is a lot of memory needed.  However, the
> algorithm to go through the exchange context to retransmit a
> missing segment
> is straight forward and does not take a rocket scientist!  (Or
> should I say
> an iSCSI scientist?)
>
> In summary, transmit is easy because you got to call the shots as long as
> rules are followed.  Can't say the same for receive when old TCP
> implementations must be honored.
>
>



From owner-ips@ece.cmu.edu  Thu Dec 28 15:15:14 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id PAA18863
	for <ips-archive@ietf.org>; Thu, 28 Dec 2000 15:15:14 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id NAA27395
	for ips-outgoing; Thu, 28 Dec 2000 13:28:39 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from main.connectcom.net (ns1.advansys.com [204.247.22.3] (may be forged))
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id NAA27391
	for <ips@ece.cmu.edu>; Thu, 28 Dec 2000 13:28:35 -0500 (EST)
Received: from yp_portable (anubis.advansys.com [204.247.22.2]) by main.connectcom.net with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id ZJ5LT274; Thu, 28 Dec 2000 10:28:34 -0800
From: "Y P Cheng" <ycheng@advansys.com>
To: "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
Subject: RE: iSCSI: RE: Framing Discussion
Date: Thu, 28 Dec 2000 10:28:35 -0800
Message-ID: <000d01c070fb$fd879fc0$90c809c0@yp_portable.advansys.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-reply-to: <NEBBJGDMMLHHCIKHGBEJOEMECDAA.dotis@sanlight.net>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

> With respect to networks, there is more than just SCSI.  Yes, we
> can create a network interface that solves immediate needs of
> SCSI, but at the potential expense of adding a unique solution
> per each application demanding content directed services.
> IPS advocates are not allowed to discuss impact of this
> interface such as if this connection is shared with IPs or ports or
> how an application is indicated.  Once vendor specific details of SCSI are
> solved, this same connection may also then be called upon to create
another
> unique interface for VI or IPC.  Are application/vendor unique solutions
> beneficial if there is only a resemblance to the original protocol where
> different schemes employed?  As these application standards
> change, be sure to include Flash memory for your embedded processor.
> Is this how WinModem was created where application and hardware
> interface is blurred by vendor unique solutions?
>
> Doug

Doug,

What I said that "You can send anything you want, as long as rules are
followed" doesn't meant it is an incompatible solution requiring special
API.  An iSCSI adapter is both a NIC and a SCSI adapter.  As a NIC, it is
capable of RDMA to support VI and Winsock Direct.  As an iSCSI adapter it
adhere strictly to SCSI API and the IPS specifications syntactically and
semantically.  It has device drivers for different OS's providing SCSI
services with the exception of making TCP connections for logins.  However,
having said all these, each iSCSI adapter still has a unique implementation
including the ways it sends and resends TCP frames and supports multiple TCP
connections concurrently.  Being stream-oriented, TCP does give us a lot of
freedom how to segment the byte stream and how to send ACKs.

There are a lot of issues with traditional TCP implementations to support
networks with long latency delays and high probability of dropping frames.
But, if someone puts a new implementation inside an adapter, he has the
freedom to add new options like TCPRDMA and MSGBNDRY, even the SMTP protocol
to overcome the problems.  Not all implementations are created equal.  Not
everyone supports new options.  The magic word is interoperability.  The
secret is how two adapters both supporting new options talk to each other.
I thought the TCP option negotiation permits that already.  What I expect
from the IPS effort is to tell me how these new options should look like.

Yes, almost all adapters today have a flashable ROM that supports field
upgrades with new implementations for new TCP options approved by IPS WG.

Y.P.



From owner-ips@ece.cmu.edu  Thu Dec 28 20:15:28 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id UAA21053
	for <ips-archive@ietf.org>; Thu, 28 Dec 2000 20:15:28 -0500 (EST)
Received: by ece.cmu.edu (8.9.2/8.8.8) id SAA03100
	for ips-outgoing; Thu, 28 Dec 2000 18:01:24 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from latte.2xtreme.net (latte.2xtreme.net [209.63.222.34])
	by ece.cmu.edu (8.9.2/8.8.8) with SMTP id SAA03096
	for <ips@ece.cmu.edu>; Thu, 28 Dec 2000 18:01:20 -0500 (EST)
Received: (qmail 12365 invoked from network); 28 Dec 2000 23:01:18 -0000
Received: from p25.oak1.2xtreme.net (HELO IETF) (209.63.216.25)
  by latte.2xtreme.net with SMTP; 28 Dec 2000 23:01:18 -0000
From: "Douglas Otis" <dotis@sanlight.net>
To: "Y P Cheng" <ycheng@advansys.com>, "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
Subject: RE: iSCSI: RE: Framing Discussion
Date: Thu, 28 Dec 2000 15:00:13 -0800
Message-ID: <NEBBJGDMMLHHCIKHGBEJIEMHCDAA.dotis@sanlight.net>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
In-reply-to: <000d01c070fb$fd879fc0$90c809c0@yp_portable.advansys.com>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

Y.P.

So you agree that each application requiring content directed services will
necessitate an additional vendor unique hardware interface.  The T10 group
does have the required parameters to support SCSI, and  Microsoft, Compaq,
and Intel have parameters to support VI and most OS implementations already
support network devices at the transport level or below.  With Microsoft
wishing the addition of IPC, there is now at least 4 vendor unique hardware
interfaces expected of this network connection if such level of service is
required.  Do you see an advantage discussing all these interfaces together
than each separately?  If we can find a non-application specific means of
providing these services, then the world is spared these hardware variants.
I would expect a great deal of complexity in attempting to nail down all the
details required to interface so many clever independent hardware solutions.

With one interface requiring the application to reconstruct the datagrams
for sending a retry, and with the standard method handling such details
below the application, there will be a significant change required to
support both modes of operation.  iSCSI is already a transport layer in
itself by allowing a separate retry mechanism from that of either SCSI or
TCP.  iSCSI combines multiple connections and implements its own flow
control and error detection.  Combine that with a layer between IP and TCP
to inject the concept of iSCSI PDUs as an element contained and aligned with
TCP.  From this alignment or framing, out of sequence network segments can
have the data extracted from application specific encapsulation and placed
within third-hand buffers.  This allows a simple ASIC which uses embedded
memory rather than external memory.  Perhaps more importantly, the number of
drivers to support this plethora becomes another concern if to see
standardization.

Doug

> > With respect to networks, there is more than just SCSI.  Yes, we
> > can create a network interface that solves immediate needs of
> > SCSI, but at the potential expense of adding a unique solution
> > per each application demanding content directed services.
> > IPS advocates are not allowed to discuss impact of this
> > interface such as if this connection is shared with IPs or ports or
> > how an application is indicated.  Once vendor specific details
> of SCSI are
> > solved, this same connection may also then be called upon to create
> another
> > unique interface for VI or IPC.  Are application/vendor unique solutions
> > beneficial if there is only a resemblance to the original protocol where
> > different schemes employed?  As these application standards
> > change, be sure to include Flash memory for your embedded processor.
> > Is this how WinModem was created where application and hardware
> > interface is blurred by vendor unique solutions?
> >
> > Doug
>
> Doug,
>
> What I said that "You can send anything you want, as long as rules are
> followed" doesn't meant it is an incompatible solution requiring special
> API.  An iSCSI adapter is both a NIC and a SCSI adapter.  As a NIC, it is
> capable of RDMA to support VI and Winsock Direct.  As an iSCSI adapter it
> adhere strictly to SCSI API and the IPS specifications syntactically and
> semantically.  It has device drivers for different OS's providing SCSI
> services with the exception of making TCP connections for logins.
>  However,
> having said all these, each iSCSI adapter still has a unique
> implementation
> including the ways it sends and resends TCP frames and supports
> multiple TCP
> connections concurrently.  Being stream-oriented, TCP does give
> us a lot of
> freedom how to segment the byte stream and how to send ACKs.
>
> There are a lot of issues with traditional TCP implementations to support
> networks with long latency delays and high probability of dropping frames.
> But, if someone puts a new implementation inside an adapter, he has the
> freedom to add new options like TCPRDMA and MSGBNDRY, even the
> SMTP protocol
> to overcome the problems.  Not all implementations are created equal.  Not
> everyone supports new options.  The magic word is interoperability.  The
> secret is how two adapters both supporting new options talk to each other.
> I thought the TCP option negotiation permits that already.  What I expect
> from the IPS effort is to tell me how these new options should look like.
>
> Yes, almost all adapters today have a flashable ROM that supports field
> upgrades with new implementations for new TCP options approved by IPS WG.
>
> Y.P.
>
>



From owner-ips@ece.cmu.edu  Sat Dec 30 04:41:59 2000
Received: from ece.cmu.edu ([128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id EAA01335
	for <ips-archive@ietf.org>; Sat, 30 Dec 2000 04:41:58 -0500 (EST)
Received: (from majordom@localhost)
	by ece.cmu.edu (8.11.0/8.10.2) id eBTL5R715396
	for ips-outgoing; Fri, 29 Dec 2000 16:05:27 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from falcon.vixel.com (mail.vixel.com [207.115.190.210])
	by ece.cmu.edu (8.9.2/8.8.8) with ESMTP id VAA06483
	for <ips@ece.cmu.edu>; Thu, 28 Dec 2000 21:48:32 -0500 (EST)
Received: from vixel.com ([192.168.1.195]) by falcon.vixel.com
          (Netscape Messaging Server 4.15) with ESMTP id G6B6GV00.VZ5;
          Thu, 28 Dec 2000 18:48:31 -0800 
Message-ID: <3A4BFAFC.F2F77C5C@vixel.com>
Date: Thu, 28 Dec 2000 18:46:21 -0800
From: "Ken Hirata" <Ken.Hirata@Vixel.com>
Organization: Vixel Corporation
X-Mailer: Mozilla 4.76 [en] (WinNT; U)
X-Accept-Language: en,pdf
MIME-Version: 1.0
To: ips@ece.cmu.edu
CC: Ken Hirata <Ken.Hirata@Vixel.com>
Subject: Re: iFCP - FCIP merge proposal
References: <B300BD9620BCD411A366009027C21D9B0716D7@ariel.nishansystems.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-ips@ece.cmu.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

I have to agree with what Joshua said in the "iFCP fabric attachments"
thread; it doesn't make sense to merge iFCP and FCIP.

FCIP is used to solve the problem of connecting 2 Fibre Channel SANs
via IP; it's a tunneling protocol.  As such, it is very simple; the amount
of
processing on any Fibre Channel frame is minimal.  It doesn't read or modify

any of the FC-2 header, add augmentation information or manipulate any
Extended Link Service frames, and allows all FC-2 functionality.

iFCP is a gateway protocol.  It cracks the FC-2 header, handles some Fibre
Channel Extended Link Service frames in a special manner, and there is a
possibility that it won't support all FC-2 functionality.

Merging the FCIP and iFCP documents would make a single document in name
only.  FCIP doesn't need to use any of the functionality described in the
current
iFCP document.

                                                        Ken

Joshua Tseng wrote:

> Venkat,
>
> <stuff deleted...>
> >
> > But as far as I can tell, iFCP requires you to remove devices
> > that support
> > E_Port, B_Port and FC-AL functionality and replace them with iFCP plus
> > OSPF/BGP/RIP implementaions, which is quite a drastic step
> > for a deployed
> > SAN to take on. Merging the two would appear to provide both
> > capabilities.
> >
> iFCP does not require you to remove anything.  There are implementation
> techniques to connect E_PORTS, Loop ports, and whatever ports you have
> in FC to the iFCP transport.  Merging the two will provide you nothing
> but a very complicated, confusing document describing two dissimilar
> techniques.
>
> Regards,
>
> Josh
>
> > Regards,
> >
> > Venkat Rangan
> > Rhapsody Networks Inc.
> > http://www.rhapsodynetworks.com
> >
> >
> > -----Original Message-----
> > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
> > Julian Satran
> > Sent: Wednesday, December 13, 2000 4:36 PM
> > To: ips@ece.cmu.edu
> > Subject: iFCP - FCIP merge proposal
> >
> >
> > Dear colleagues,
> >
> > At yesterdays IPS WG meeting and had no chance to clarify my proposal
> > regarding a merger of FCIP and iFCP into a single effort.
> >
> > iFCP attempt to provide an IP interconnect for FCP devices.
> > It has also the
> > capabilty to interconnect FC islands.
> >
> > FCIP has the narrower scope of connecting only FC islands -
> > admittedly even
> > FC devices other then FCP.
> >
> > Given that FCP devices where the main concern of this WG and that iFCP
> > serves a wider purpose than FCIP and will enable not only
> > tunneling but also
> > migration of FCP devices to IP infrastructure my intention
> > was to suggest
> > that iFCP should attempt to incorporate those FCIP functions
> > it does not
> > care about today and those two groups should work towards one
> > common draft
> > that should cover not only tunneling but also device migration to IP
> > networks.
> >
> > Julo
> > ______________________________________________________________
> > ______________
> > _________
> > Get more from the Web.  FREE MSN Explorer download :
> http://explorer.msn.com

--
Kenneth Hirata
Vixel Corporation
Irvine, CA 92618
Phone: (949) 450-6100
Email: khirata@vixel.com




From owner-ips@ece.cmu.edu  Sat Dec 30 16:25:51 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id QAA03300
	for <ips-archive@ietf.org>; Sat, 30 Dec 2000 16:25:51 -0500 (EST)
Received: (from majordom@localhost)
	by ece.cmu.edu (8.11.0/8.10.2) id eBU10V020263
	for ips-outgoing; Fri, 29 Dec 2000 20:00:31 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from server1.NishanSystems.COM (smtp.nishansystems.com [216.217.36.162])
	by ece.cmu.edu (8.11.0/8.10.2) with ESMTP id eBU10Cr20254
	for <ips@ece.cmu.edu>; Fri, 29 Dec 2000 20:00:13 -0500 (EST)
Received: by smtp.nishansystems.com with Internet Mail Service (5.5.2448.0)
	id <YGDLX895>; Fri, 29 Dec 2000 17:09:05 -0800
Message-ID: <B300BD9620BCD411A366009027C21D9B0B69A6@ariel.nishansystems.com>
From: Charles Monia <cmonia@NishanSystems.com>
To: ips@ece.cmu.edu
Subject: RE: iFCP - FCIP merge proposal
Date: Fri, 29 Dec 2000 16:59:43 -0800
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="windows-1252"
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

Hi:

>........................FCIP doesn't need to use any of the functionality 
> described in the
> current
> iFCP document.
> 

Considering the the limited purpose for which it was designed, the tunneling
protocol  is certainly adequate as it stands. Fundamentally, these
specifications have widely different goals and constituencies and need to
evolve as seperate documents.

Charles

> -----Original Message-----
> From: Ken Hirata [mailto:Ken.Hirata@Vixel.com]
> Sent: Thursday, December 28, 2000 6:46 PM
> To: ips@ece.cmu.edu
> Cc: Ken Hirata
> Subject: Re: iFCP - FCIP merge proposal
> 
> 
> I have to agree with what Joshua said in the "iFCP fabric attachments"
> thread; it doesn't make sense to merge iFCP and FCIP.
> 
> FCIP is used to solve the problem of connecting 2 Fibre Channel SANs
> via IP; it's a tunneling protocol.  As such, it is very 
> simple; the amount
> of
> processing on any Fibre Channel frame is minimal.  It doesn't 
> read or modify
> 
> any of the FC-2 header, add augmentation information or manipulate any
> Extended Link Service frames, and allows all FC-2 functionality.
> 
> iFCP is a gateway protocol.  It cracks the FC-2 header, 
> handles some Fibre
> Channel Extended Link Service frames in a special manner, and 
> there is a
> possibility that it won't support all FC-2 functionality.
> 
> Merging the FCIP and iFCP documents would make a single 
> document in name
> only.  FCIP doesn't need to use any of the functionality 
> described in the
> current
> iFCP document.
> 
>                                                         Ken
> 
> Joshua Tseng wrote:
> 
> > Venkat,
> >
> > <stuff deleted...>
> > >
> > > But as far as I can tell, iFCP requires you to remove devices
> > > that support
> > > E_Port, B_Port and FC-AL functionality and replace them 
> with iFCP plus
> > > OSPF/BGP/RIP implementaions, which is quite a drastic step
> > > for a deployed
> > > SAN to take on. Merging the two would appear to provide both
> > > capabilities.
> > >
> > iFCP does not require you to remove anything.  There are 
> implementation
> > techniques to connect E_PORTS, Loop ports, and whatever 
> ports you have
> > in FC to the iFCP transport.  Merging the two will provide 
> you nothing
> > but a very complicated, confusing document describing two dissimilar
> > techniques.
> >
> > Regards,
> >
> > Josh
> >
> > > Regards,
> > >
> > > Venkat Rangan
> > > Rhapsody Networks Inc.
> > > http://www.rhapsodynetworks.com
> > >
> > >
> > > -----Original Message-----
> > > From: owner-ips@ece.cmu.edu 
> [mailto:owner-ips@ece.cmu.edu]On Behalf Of
> > > Julian Satran
> > > Sent: Wednesday, December 13, 2000 4:36 PM
> > > To: ips@ece.cmu.edu
> > > Subject: iFCP - FCIP merge proposal
> > >
> > >
> > > Dear colleagues,
> > >
> > > At yesterdays IPS WG meeting and had no chance to clarify 
> my proposal
> > > regarding a merger of FCIP and iFCP into a single effort.
> > >
> > > iFCP attempt to provide an IP interconnect for FCP devices.
> > > It has also the
> > > capabilty to interconnect FC islands.
> > >
> > > FCIP has the narrower scope of connecting only FC islands -
> > > admittedly even
> > > FC devices other then FCP.
> > >
> > > Given that FCP devices where the main concern of this WG 
> and that iFCP
> > > serves a wider purpose than FCIP and will enable not only
> > > tunneling but also
> > > migration of FCP devices to IP infrastructure my intention
> > > was to suggest
> > > that iFCP should attempt to incorporate those FCIP functions
> > > it does not
> > > care about today and those two groups should work towards one
> > > common draft
> > > that should cover not only tunneling but also device 
> migration to IP
> > > networks.
> > >
> > > Julo
> > > ______________________________________________________________
> > > ______________
> > > _________
> > > Get more from the Web.  FREE MSN Explorer download :
> > http://explorer.msn.com
> 
> --
> Kenneth Hirata
> Vixel Corporation
> Irvine, CA 92618
> Phone: (949) 450-6100
> Email: khirata@vixel.com
> 
> 


From owner-ips@ece.cmu.edu  Sat Dec 30 18:45:39 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id SAA03685
	for <ips-archive@ietf.org>; Sat, 30 Dec 2000 18:45:38 -0500 (EST)
Received: (from majordom@localhost)
	by ece.cmu.edu (8.11.0/8.10.2) id eBULNFo17112
	for ips-outgoing; Sat, 30 Dec 2000 16:23:15 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate-2.de.ibm.com (d12lmsgate-2.de.ibm.com [195.212.91.200])
	by ece.cmu.edu (8.11.0/8.10.2) with ESMTP id eBULMsQ17092
	for <ips@ece.cmu.edu>; Sat, 30 Dec 2000 16:22:54 -0500 (EST)
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22])
	by d12lmsgate-2.de.ibm.com (1.0.0) with ESMTP id SAA244862
	for <ips@ece.cmu.edu>; Sat, 30 Dec 2000 18:15:54 +0100
From: julian_satran@il.ibm.com
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])
	by d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id SAA87842
	for <ips@ece.cmu.edu>; Sat, 30 Dec 2000 18:15:54 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569C5.005ED4BF ; Sat, 30 Dec 2000 18:15:48 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C12569C5.005ED309.00@d12mta02.de.ibm.com>
Date: Sat, 30 Dec 2000 19:11:06 +0200
Subject: new iSCSI draft - 02.txt
Mime-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk





Dear colleagues,

I've just submitted to the Internet-Drafts repository and to our list
archive (at CMU) a new
version of the draft.

Changes (from 2b!):

   framing by Urgent-Pointer replaced by framing through in-stream marker
   editorials and typos (not completed yet)
   simpler digests
   digest recovery and some clarifications on iSCSI specific errors (more
   to come)


I will have version 03 with many more editorial changes before the
intermediate meeting.

You can see the draft at:

http://www.haifa.il.ibm.com/satran/draft-ietf-ips-iSCSI-02.txt

Regards,
Julo







From owner-ips@ece.cmu.edu  Sat Dec 30 18:45:41 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id SAA03696
	for <ips-archive@ietf.org>; Sat, 30 Dec 2000 18:45:41 -0500 (EST)
Received: (from majordom@localhost)
	by ece.cmu.edu (8.11.0/8.10.2) id eBULHCw16914
	for ips-outgoing; Sat, 30 Dec 2000 16:17:12 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199])
	by ece.cmu.edu (8.11.0/8.10.2) with ESMTP id eBULH7Q16909
	for <ips@ece.cmu.edu>; Sat, 30 Dec 2000 16:17:07 -0500 (EST)
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22])
	by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id RAA122090
	for <ips@ece.cmu.edu>; Sat, 30 Dec 2000 17:43:50 +0100
From: julian_satran@il.ibm.com
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])
	by d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id RAA182128
	for <ips@ece.cmu.edu>; Sat, 30 Dec 2000 17:43:51 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569C5.005BE51E ; Sat, 30 Dec 2000 17:43:44 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C12569C5.005BE3E3.00@d12mta02.de.ibm.com>
Date: Sat, 30 Dec 2000 18:39:34 +0200
Subject: Re: Out of order response to R2T
Mime-Version: 1.0
Content-type: multipart/mixed; 
	Boundary="0__=FWeEhfndzheaTIeSWpYE0GlWip7hOAbfbjeqIwEyqP62L2X1vdpendJ0"
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

--0__=FWeEhfndzheaTIeSWpYE0GlWip7hOAbfbjeqIwEyqP62L2X1vdpendJ0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline



Barry,

I can understand the rationale and the new text is strict (see attached).
However a bad digest can result in the need to redo part or the whole R2T
(or to reclaim all the data after the failure).

   If the R2T is answered with a sequence of Data PDUs the Buffer Offset
   and Length must be within the range of those
   specified by R2T, the last PDU should have the F bit set to 1, the
   Buffer Offsets and Lengths for consecutive PDUs SHOULD form a continuous
   non-overlapping range and the PDUs should be sent in increasing offset
   order.


   Regards,
   Julo

"Barry Reinhold" <bbrtrebia@mediaone.net> on 29/12/2000 23:48:30

Please respond to "Barry Reinhold" <bbrtrebia@mediaone.net>

To:   Julian Satran/Haifa/IBM@IBMIL
cc:   "Jon Sreekanth" <jon.sreekanth@trebia.com>, "James Smart"
      <james.smart@trebia.com>
Subject:  Out of order response to R2T




Julian,
     This is a bit of a delayed follow up to a conversation we had in San
Diego.
The issue has to do with how an initiator is allowed to respond to an R2T.

Right now the iSCSI draft says:

"If the R2T is
   answered with a sequence of Data PDUs the Buffer Offset and Length
   must be within the range of those
   specified by R2T and the last PDU should have the F bit set to 1; the
   Buffer Offsets and Lengths for consecutive PDUs SHOULD form a
   continuous range. "

Based on previous conversations this means that an initiator can break up
the delivery of the data into 4 segments and deliver the segments in any
order.

I would like to argue that this should be restricted such that when
responding to an R2T the data is delivered in order. I understand the logic
behind allowing the target to request data out of order based on the R2T.
This is consistent with Fibre Channel protocol and disk drive needs.
However, it is not clear to me why we should allow the iSCSI data PDUs sent
in response to the R2T to be out of order. I have three observations on
this:

1. The concept behind the target sending the R2T (based on analogy to the
FCP_XFER_RDY) is that the target is ready to receive data starting at
"offset" of a given length. This does not happen when the data is delivered
out of order forcing the end device to reassemble the information and
making
the check process more complicated.

2. This is specifically diallowed by Fibre Channel. Fibre Channel requires
that the data be delivered in one IU (sequence) and that the first frame of
the sequence must have the data offset set to the value sent in the
FCP_XFER_RDY frame. All the rest of the frames in the sequence must deliver
data in order by FC-FS rules.

For reference - FCP-2 clause 9.3

"If an FCP_XFER_RDY IU
is used to describe a data transfer and the first frame of the requested
FCP_DATA IU has a relative offset that
differs from the value in the FCP_DATA_RO field of the FCP_XFER_RDY IU, the
target shall post the error code

--0__=FWeEhfndzheaTIeSWpYE0GlWip7hOAbfbjeqIwEyqP62L2X1vdpendJ0
Content-type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


=F4FCP_DATA Parameter mismatch with FCP_DATA_RO=

--0__=FWeEhfndzheaTIeSWpYE0GlWip7hOAbfbjeqIwEyqP62L2X1vdpendJ0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline


--0__=FWeEhfndzheaTIeSWpYE0GlWip7hOAbfbjeqIwEyqP62L2X1vdpendJ0
Content-type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


=F6 in the FCP_RSP_INFO field of
the FCP_RSP IU."

I contacted Bob Snively (author of FCP-2) to confirm this. Bob is willi=
ng
to
discuss the semantics of the FCP_XFER_RDY data transfer with you if you=

wish
to pick up that thread.

3. We have not seen this behavior in the field. If some device actually=

took
advantage of this flexibility I suspect a number of implementation issu=
es
would come up. In my experience these types of options hinder
interoperabiliy.

Also note that if a device is going to translate from iSCSI to FC, it i=
s
going to have a difficult time with this. It must buffer up all the iSC=
SI
frames until it finds the first piece. It the data is sent in reverse o=
rder
and the transfer is large the buffering could be significant. Testing t=
his
is a difficult process, and when testing is difficult interoperability
problems creep in.

In summary I would like to suggest that out of order data delivery in
response to an R2T is not a helpful feature, hinders ineroperability, i=
s
not
compatible with Fibre Channel, and therefore makes interconnecting with=

Fibre Channel legacy devices more difficult. I would like to see us ado=
pt
the same semantics for iSCSI in this requard as Fibre Channel has.

Thanks for your time on this,

Barry Reinhold


=

--0__=FWeEhfndzheaTIeSWpYE0GlWip7hOAbfbjeqIwEyqP62L2X1vdpendJ0--



From owner-ips@ece.cmu.edu  Sun Dec 31 09:56:47 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id JAA07620
	for <ips-archive@ietf.org>; Sun, 31 Dec 2000 09:56:47 -0500 (EST)
Received: (from majordom@localhost)
	by ece.cmu.edu (8.11.0/8.10.2) id eBVCnHp09456
	for ips-outgoing; Sun, 31 Dec 2000 07:49:17 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate-3.de.ibm.com (d12lmsgate-3.de.ibm.com [195.212.91.201])
	by ece.cmu.edu (8.11.0/8.10.2) with ESMTP id eBVCmiP09447
	for <ips@ece.cmu.edu>; Sun, 31 Dec 2000 07:48:44 -0500 (EST)
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22])
	by d12lmsgate-3.de.ibm.com (1.0.0) with ESMTP id NAA69476;
	Sun, 31 Dec 2000 13:48:05 +0100
From: julian_satran@il.ibm.com
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])
	by d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id NAA56340;
	Sun, 31 Dec 2000 13:48:06 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569C6.0046508F ; Sun, 31 Dec 2000 13:48:01 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: internet-drafts@ietf.org
cc: ips@ece.cmu.edu, bassoon@yogi.ece.cmu.edu
Message-ID: <C12569C6.00465031.00@d12mta02.de.ibm.com>
Date: Sun, 31 Dec 2000 14:43:54 +0200
Subject: new  iSCSI draft 02.txt
Mime-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk



The correct URL is :

http://www.haifa.il.ibm.com/satran/ips/draft-ietf-ips-iSCSI-02.txt

Sorry for the inconvenience.

Julo
---------------------- Forwarded by Julian Satran/Haifa/IBM on 31/12/2000
14:39 ---------------------------

Julian Satran
30/12/2000 19:05

To:   internet-drafts@ietf.org
cc:   bassoon@yogi.ece.cmu.edu
From: Julian Satran/Haifa/IBM@Haifa/IBM@IBMIL
Subject:  new  iSCSI draft 02.txt




On behalf of a group of authors from several organizations as part of the
IPS working group  I submit a revision of an IETF-draft for immediate
publication. It specifies iSCSI - a SCSI Over TCP protocol and the file
name is "draft-ietf-ips-iSCSI-02.txt".  It completely replaces
"draft-ietf-ips-iSCSI-01.txt".

The draft can be found at:

http://www.haifa.il.ibm.com/satran/draft-ietf-ips-iSCSI-02.txt

Julian Satran - IBM Research Laboratory at Haifa







From owner-ips@ece.cmu.edu  Sun Dec 31 09:56:55 2000
Received: from ece.cmu.edu (ECE.CMU.EDU [128.2.236.200])
	by ietf.org (8.9.1a/8.9.1a) with SMTP id JAA07631
	for <ips-archive@ietf.org>; Sun, 31 Dec 2000 09:56:54 -0500 (EST)
Received: (from majordom@localhost)
	by ece.cmu.edu (8.11.0/8.10.2) id eBVChH409372
	for ips-outgoing; Sun, 31 Dec 2000 07:43:17 -0500 (EST)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate-3.de.ibm.com (d12lmsgate-3.de.ibm.com [195.212.91.201])
	by ece.cmu.edu (8.11.0/8.10.2) with ESMTP id eBVCgtP09364
	for <ips@ece.cmu.edu>; Sun, 31 Dec 2000 07:42:55 -0500 (EST)
Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23])
	by d12lmsgate-3.de.ibm.com (1.0.0) with ESMTP id NAA111934
	for <ips@ece.cmu.edu>; Sun, 31 Dec 2000 13:42:48 +0100
From: julian_satran@il.ibm.com
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])
	by d12relay02.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id NAA244044
	for <ips@ece.cmu.edu>; Sun, 31 Dec 2000 13:42:48 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))  id C12569C6.0045D3C5 ; Sun, 31 Dec 2000 13:42:41 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C12569C6.0045D245.00@d12mta02.de.ibm.com>
Date: Sun, 31 Dec 2000 14:38:31 +0200
Subject: Warning: could not send message for past 4 hours
Mime-Version: 1.0
Content-type: multipart/mixed; 
	Boundary="0__=48dW8Gl3mTRMXLId3o6cj8yOA7JNtHIDFsZ4C7klkLTMHWbLAUCminZz"
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk

--0__=48dW8Gl3mTRMXLId3o6cj8yOA7JNtHIDFsZ4C7klkLTMHWbLAUCminZz
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline




---------------------- Forwarded by Julian Satran/Haifa/IBM on 31/12/2000
14:38 ---------------------------

Mail Delivery Subsystem <MAILER-DAEMON@d12lmsgate.de.ibm.com> on 30/12/2000
22:48:16

Please respond to Mail Delivery Subsystem
      <MAILER-DAEMON@d12lmsgate.de.ibm.com>

To:   Julian Satran/Haifa/IBM@IBMIL
cc:
Subject:  Warning: could not send message for past 4 hours




    **********************************************
    **      THIS IS A WARNING MESSAGE ONLY      **
    **  YOU DO NOT NEED TO RESEND YOUR MESSAGE  **
    **********************************************

The original message was received at Sat, 30 Dec 2000 17:43:50 +0100
from d12relay01.de.ibm.com [9.165.215.22]

   ----- The following addresses had transient non-fatal errors -----
<ips@ece.cmu.edu>

  --- The transcript of the session follows ---
<ips@ece.cmu.edu>... Deferred: A remote host did not respond within the
timeout period. with ece.cmu.edu.
Warning: message still undelivered after 4 hours
Will keep trying until message is 3 days old


Reporting-MTA: dns; d12lmsgate.de.ibm.com
Arrival-Date: Sat, 30 Dec 2000 17:43:50 +0100

Final-Recipient: RFC822; ips@ece.cmu.edu
Action: delayed
Status: 4.4.1
Remote-MTA: DNS; ece.cmu.edu
Last-Attempt-Date: Sat, 30 Dec 2000 21:48:16 +0100
Will-Retry-Until: Tue, 2 Jan 2001 17:43:50 +0100


Return-Path: <julian_satran@il.ibm.com>
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22])
by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id RAA122090     for
<ips@ece.cmu.edu>; Sat, 30 Dec 2000 17:43:50 +0100
From: julian_satran@il.ibm.com
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237])      by
d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id RAA182128  for
<ips@ece.cmu.edu>; Sat, 30 Dec 2000 17:43:51 +0100
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2 5-20-1999))
id C12569C5.005BE51E ; Sat, 30 Dec 2000 17:43:44 +0100
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C12569C5.005BE3E3.00@d12mta02.de.ibm.com>
Date: Sat, 30 Dec 2000 18:39:34 +0200
Subject: Re: Out of order response to R2T
Mime-Version: 1.0
Content-type: multipart/mixed;
Boundary="0__=FWeEhfndzheaTIeSWpYE0GlWip7hOAbfbjeqIwEyqP62L2X1vdpendJ0"
Content-Disposition: inline




Barry,

I can understand the rationale and the new text is strict (see attached).
However a bad digest can result in the need to redo part or the whole R2T
(or to reclaim all the data after the failure).

   If the R2T is answered with a sequence of Data PDUs the Buffer Offset
   and Length must be within the range of those
   specified by R2T, the last PDU should have the F bit set to 1, the
   Buffer Offsets and Lengths for consecutive PDUs SHOULD form a continuous
   non-overlapping range and the PDUs should be sent in increasing offset
   order.


   Regards,
   Julo

"Barry Reinhold" <bbrtrebia@mediaone.net> on 29/12/2000 23:48:30

Please respond to "Barry Reinhold" <bbrtrebia@mediaone.net>

To:   Julian Satran/Haifa/IBM@IBMIL
cc:   "Jon Sreekanth" <jon.sreekanth@trebia.com>, "James Smart"
      <james.smart@trebia.com>
Subject:  Out of order response to R2T




Julian,
     This is a bit of a delayed follow up to a conversation we had in San
Diego.
The issue has to do with how an initiator is allowed to respond to an R2T.

Right now the iSCSI draft says:

"If the R2T is
   answered with a sequence of Data PDUs the Buffer Offset and Length
   must be within the range of those
   specified by R2T and the last PDU should have the F bit set to 1; the
   Buffer Offsets and Lengths for consecutive PDUs SHOULD form a
   continuous range. "

Based on previous conversations this means that an initiator can break up
the delivery of the data into 4 segments and deliver the segments in any
order.

I would like to argue that this should be restricted such that when
responding to an R2T the data is delivered in order. I understand the logic
behind allowing the target to request data out of order based on the R2T.
This is consistent with Fibre Channel protocol and disk drive needs.
However, it is not clear to me why we should allow the iSCSI data PDUs sent
in response to the R2T to be out of order. I have three observations on
this:

1. The concept behind the target sending the R2T (based on analogy to the
FCP_XFER_RDY) is that the target is ready to receive data starting at
"offset" of a given length. This does not happen when the data is delivered
out of order forcing the end device to reassemble the information and
making
the check process more complicated.

2. This is specifically diallowed by Fibre Channel. Fibre Channel requires
that the data be delivered in one IU (sequence) and that the first frame of
the sequence must have the data offset set to the value sent in the
FCP_XFER_RDY frame. All the rest of the frames in the sequence must deliver
data in order by FC-FS rules.

For reference - FCP-2 clause 9.3

"If an FCP_XFER_RDY IU
is used to describe a data transfer and the first frame of the requested
FCP_DATA IU has a relative offset that
differs from the value in the FCP_DATA_RO field of the FCP_XFER_RDY IU, the
target shall post the error code


--0__=48dW8Gl3mTRMXLId3o6cj8yOA7JNtHIDFsZ4C7klkLTMHWbLAUCminZz
Content-type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable



=F4FCP_DATA Parameter mismatch with FCP_DATA_RO
=


=F6 in the FCP_RSP_INFO field of
the FCP_RSP IU."

I contacted Bob Snively (author of FCP-2) to confirm this. Bob is willi=
ng
to
discuss the semantics of the FCP_XFER_RDY data transfer with you if you=

wish
to pick up that thread.

3. We have not seen this behavior in the field. If some device actually=

took
advantage of this flexibility I suspect a number of implementation issu=
es
would come up. In my experience these types of options hinder
interoperabiliy.

Also note that if a device is going to translate from iSCSI to FC, it i=
s
going to have a difficult time with this. It must buffer up all the iSC=
SI
frames until it finds the first piece. It the data is sent in reverse o=
rder
and the transfer is large the buffering could be significant. Testing t=
his
is a difficult process, and when testing is difficult interoperability
problems creep in.

In summary I would like to suggest that out of order data delivery in
response to an R2T is not a helpful feature, hinders ineroperability, i=
s
not
compatible with Fibre Channel, and therefore makes interconnecting with=

Fibre Channel legacy devices more difficult. I would like to see us ado=
pt
the same semantics for iSCSI in this requard as Fibre Channel has.

Thanks for your time on this,

Barry Reinhold



=

--0__=48dW8Gl3mTRMXLId3o6cj8yOA7JNtHIDFsZ4C7klkLTMHWbLAUCminZz--



