Dropbox: Resumable Download Support

DateAuthor
2008-05-12J Frey

The HTTP 1.1 standard defines headers that a download agent may use to retrieve a sub-range of the full requested URI. The most useful application of this methodology is in resuming the retrieval of a large download that has been interrupted in-transit.

Server-Supplied Headers

When the download agent initially requests the document, the server must supply the following headers in order to fully support the mechanism:

Last-Modified: [date]
ETag: [unique string]
Accept-Ranges: bytes

The [date] should be RFC 1123 format, though any acceptable format under the HTTP 1.1 specification will suffice:

Sun, 06 Nov 1994 08:49:37 GMT

The ETag is a unique tag that identifies the downloaded resource. The URI itself cannot be guaranteed to be a one-to-one relationship to the resource, since the URI could reference a CGI that facilitates the download action.

In testing, the Opera 9.x browser was initially the only browser that seemed to support resumable downloading. Initially, I did not include the Last-Modified and ETag headers, either. Only once I included them did Safari and OmniWeb begin treating the downloads as resumable. All of the sample code I found online for handling resumable download headers in PHP mistakenly neglected to add these two headers.

The need for them is quite logical:

  • If the file has been modified on the server since the interrupted download, then the interrupted data on the client machine is likely stale and a full re-download must happen.
  • Imagine two users dropoff files named Finances.xls and one user begins downloading both in his browser when suddenly his network connection drops. Without a per-file unique ETag, how are the two files to be distinguished when resuming the download?

Granted, in the second case the browser can do some additional work to match the full URI (including GET form data) to the file on-disk or in the cache. But why not provide a unique tag that the browser can use?

Another important set of headers to include are cache-invalidation headers. There are myriad choices that are represented online in other programmer's code samples, but I found the following to work fine:

Cache-control: private
Pragma: private
Expires: 0

Headers must also be included to indicate the size of the payload and the byte-range of the payload in the context of the full resource:

HTTP/1.1 206 Partial Content
Content-Range: bytes [start]-[end]/[full size]
Content-Length: {[end] - [start] + 1}

If the full byte-range is in effect for the payload, then the first two headers should not be sent. The Content-Length is the calculated length of the data based upon the [start] and [end] byte offsets within the full resource.

As of the writing of this document, Dropbox supports single-range partial content transfers; the HTTP 1.1 standard provides for multiple ranges to be specified in the incoming Range header, though. With multiple ranges, the agent must send a reply of MIME type multipart/byteranges with multiple payloads demarcated in standard MIME-boundary fashion, each containing the Content-Range and Content-Length headers with the requested data range. At this point, resumable downloading from web browsers only sends a single range to the server, so support for multiple ranges is not yet present in Dropbox.

Client-Supplied Headers

The download agent (client) sends a single header in its HTTP request in order to request a partial download of a resource:

Range: /^(([0-9]*)-([0-9]*),)*(([0-9]*)-([0-9]*)$/

The Range header value is composed of comma-separated ranges containing one or both integers on either side of the - character. The left integer is a starting byte offset, the right integer is an ending byte offset; both are inclusive. Omitting the left integer requests the last N bytes of the resource (where N is the right integer). Omitting the right integer requests all bytes from M to the end of the resource content (where M is the left integer). See this page for further explanation.

Browser Testing

I tested the resume-downloading capabilities within a number of mainstream web browsers. If you don't see your browser mentioned here, drop me an email and tell me what it can do.

Safari

Apple's Safari browser line has apparently supported resumable downloads since late revisions of the 1.0 version; I tested in version 3.1 and found the feature to work properly once the Last-Modified and ETag headers were included in the server response.

The latest versions of Safari actually create a bundle when one begins a download. This bundle includes the downloaded file PLUS an Info.plist file that contains meta-data that can be used across browser sessions to resume download. Look in one of these files and you'll find amongst other things – that's right – the ETag for the resource! So I can quit out of Safari, even restart my Mac, and I can still (hypothetically) resume downloading that resource when I startup Safari once again.

Firefox

Firefox currently has limited support for resumable downloads. On a Mac with version 2.0.0.14, I was able to pause and continue downloads – this works solely because Firefox did not close the TCP stream. The Apache server on which Dropbox runs has a five minute timeout on HTTP streams, so anywhere within those five minutes if I resume downloading the TCP stream simply continues sending data from the original request1).

Real resumable download support can be added via Firefox extensions; the 3.0 feature set for Firefox reportedly includes true resumable download support.

Opera

Also on the Mac I tested Opera 9.x and found it to work properly – though the fact that it worked when even lacking a Last-Modification header in the response causes me a little concern!

OmniWeb 5.x

Only available on the Mac, OmniWeb behaved in similar fashion to Safari with respect to the inclusion of the two additional response headers. Nonetheless, when they were included this browser also supported resumable downloading from Dropbox.

1) Indeed, when I paused download and waited > 5 minutes before resuming, Firefox simply reported the download as complete – because the TCP stream had already been closed.
 
Back to top
documentation/resumabledownloads.txt · Last modified: 2008-05-12 10:22 by frey
Driven by DokuWiki Recent changes RSS feed