Reading gzip compressed data via Javascript

Monday, May 5. 2008  –  Category: Code, Songbird

This weekend while I was working on my Magnatune extension for Songbird, I found I needed to fetch, expand, and parse a remote gzip’d XML document. The fetch was easy (XMLHttpRequest), as was the parse (DOMParser), but I had no idea how to do the expand.

Fortunately, Mossop over on extdev pointed me at Mozilla’s streamConverter services.

Unfortunately there wasn’t much sample code for me to blatantly rip-off^W^W^Wlearn from, so after much bumbling around like the JS amateur that I am, I finally got something working. I’m documenting it here so that hopefully others might find it useful. Or at the very least, I can look it up again when I will inevitably need to do this again :-)

Mossop first pointed out that I wouldn’t be able to use XMLHttpRequest and that I would need to open a channel:

// Get the IO service
var ioService = Cc["@mozilla.org/network/io-service;1"]
        .getService(Ci.nsIIOService);
// Create an nsIURI
var mtUri = ioService.newURI(magnatuneURL, null, null);
// Create a channel from that URI
var chan = ioService.newChannelFromURI(mtUri);

Awesome. The tricky part now is the docs lie. They say there is a gzip to uncompressed stream converter that implements asyncConvertData() and a synchronous convert(). I opt’d for synchronous since it seemed easier to get working off the bat, but kept getting error messages saying it wasn’t implemented. Turns out that’s true. The gzip->uncompressed method only implements asyncConvertData. So now I’d need to define a stream listener (implementing the nsiStreamListener) interface. This is the listener that is invoked for each uncompressed chunk. It needs to implement onStartRequest, onStopRequest, & onDataAvailable where onDataAvailable is passed the uncompressed data:

function StreamListener() {
    this._data = null;
    this._first = true;
}   

StreamListener.prototype = {
    onStartRequest: function(aReq, aContext) {},
    onStopRequest: function(aReq, aContext, aStatusCode) {
        // this._data is my full uncompressed file now, for Magnatune this is my
        // XML file, so now I can go do whatever I want with it.
        Magnatune.Controller.completeSyncWithStore(this._data);
    },  
    onDataAvailable: function(aReq, aContext, aInputStream, aOffset, aCount) {
        var binInputStream = Cc["@mozilla.org/binaryinputstream;1"]
                    .createInstance(Ci.nsIBinaryInputStream);
        binInputStream.setInputStream(aInputStream);
        if (this._first) {
            this._data = binInputStream.readBytes(binInputStream.available());
            this._first = false;
        } else
            this._data += binInputStream.readBytes(binInputStream.available());
        binInputStream.close();
    }
};

So now that I have my channel open, and my stream listener defined - I need to create my nsIStreamConverter service to take the gzip’d data from the channel, and pass it to the stream listener so it can do its thing with the uncompressed data.

// Get the converter service
var converterService = Cc["@mozilla.org/streamConverters;1"]
            .getService(Ci.nsIStreamConverterService);
        
// Instantiate our gzip decompresser converter
var converter = converterService.asyncConvertData("gzip",
            "uncompressed", myListener, null);

So now that we have all our pieces defined, all that’s left to do is pass the converter to the channel and start the pipeline:

// Initiate the asynchronous open.  This will initiate the connection
// to Magnatune, grab the gzip'd data and pass it to our gzip converter
// which will then call the StreamListener, so our completion hook is
// fired in the StreamListener's onStopRequest()
chan.asyncOpen(converter, null);

Awesome. So now every Songbird/Magnatune user will be downloading a 300kb gzip’d file instead of a massive 6MB file each time they sync with the Magnatune DB.

10 Responses to “Reading gzip compressed data via Javascript”

  1. Boris Says:

    Is the URI not an http: URI? You could just use HTTP content-encoding and have all of this handled for you behind the scenes…

  2. Jorge Says:

    Interesting. When I needed to do something like this I opted to save the file locally and then open it with ZipReader. This sounds much more efficient and better encapsulated, though. Thank you for posting it.

  3. ToTheBatCave Says:

    If you turned this into an extension that ungzipped remote files on 3rd party servers that fail to serve the correct Content-encoding: gzip headers, you’ll find it very popular if combined with the functionality of “openinbrowser” at http://www.spasche.net/mozilla/ for when they also serve the wrong content-type. For instance, I’ve not found any free web hosting services (including archive.org) that serve .svgz files with that header, even though it would benefit them bandwidth-wise.

  4. Dennis Says:

    Instead of doing: this._data += xx.

    You should consider: this.data = []; this.data.push(xx) this._data.join(”").

    On large pieces of data thats alot faster.

  5. Stephen Lau Says:

    Thanks for the tip Dennis, I’ll try that.

  6. mozilla Says:

    [...] The fetch was easy XMLHttpRequest, as was the parse DOMParser, but I had no idea how to do the expanhttp://whacked.net/2008/05/05/reading-gzip-compressed-data-via-javascript/Features and Case Studies ZDNet AustraliaThe mozilla Foundation is perhaps best known for its [...]

  7. Stephen Lau Says:

    @Boris: The Magnatune store sends the gzip’d XML file with mime-type application/gzip instead of text/xml, so you can’t use XMLHttpRequest.

  8. Boris Says:

    Ah, this is a URI you don’t control. That’s unfortunate…

  9. Neil Says:

    One possibility is to use a stream loader, although from JavaScript it gives you a rather useless array of byte values rather than the string you’re looking for (which I hope is ASCII for your sake.)

    Ideally what you could do with is some way of hooking up the stream listener for an XML document with the stream converter, though I don’t see how to achieve that without hacking into the Gecko codebase.

  10. Stephen Lau Says:

    Yeah - the method I used gives me nice ASCII XML data which I can then parse more easily.

Leave a Reply

Comments will be sent to the moderation queue.


Recent posts