Reading gzip compressed data via Javascript
May 5th, 2008 Stephen Lau
This weekend while I was working on my Magnatune extension for Songbird, I found I needed to fetch, expand, and parse a remote gzip’d XML document. The fetch was easy (XMLHttpRequest), as was the parse (DOMParser), but I had no idea how to do the expand.
Fortunately, Mossop over on extdev pointed me at Mozilla’s streamConverter services.
Unfortunately there wasn’t much sample code for me to blatantly rip-off^W^W^Wlearn from, so after much bumbling around like the JS amateur that I am, I finally got something working. I’m documenting it here so that hopefully others might find it useful. Or at the very least, I can look it up again when I will inevitably need to do this again
Mossop first pointed out that I wouldn’t be able to use XMLHttpRequest and that I would need to open a channel:
// Get the IO service
var ioService = Cc["@mozilla.org/network/io-service;1"]
.getService(Ci.nsIIOService);
// Create an nsIURI
var mtUri = ioService.newURI(magnatuneURL, null, null);
// Create a channel from that URI
var chan = ioService.newChannelFromURI(mtUri);
Awesome. The tricky part now is the docs lie. They say there is a gzip to uncompressed stream converter that implements asyncConvertData() and a synchronous convert(). I opt’d for synchronous since it seemed easier to get working off the bat, but kept getting error messages saying it wasn’t implemented. Turns out that’s true. The gzip->uncompressed method only implements asyncConvertData. So now I’d need to define a stream listener (implementing the nsiStreamListener) interface. This is the listener that is invoked for each uncompressed chunk. It needs to implement onStartRequest, onStopRequest, & onDataAvailable where onDataAvailable is passed the uncompressed data:
function StreamListener() {
this._data = null;
this._first = true;
}
StreamListener.prototype = {
onStartRequest: function(aReq, aContext) {},
onStopRequest: function(aReq, aContext, aStatusCode) {
// this._data is my full uncompressed file now, for Magnatune this is my
// XML file, so now I can go do whatever I want with it.
Magnatune.Controller.completeSyncWithStore(this._data);
},
onDataAvailable: function(aReq, aContext, aInputStream, aOffset, aCount) {
var binInputStream = Cc["@mozilla.org/binaryinputstream;1"]
.createInstance(Ci.nsIBinaryInputStream);
binInputStream.setInputStream(aInputStream);
if (this._first) {
this._data = binInputStream.readBytes(binInputStream.available());
this._first = false;
} else
this._data += binInputStream.readBytes(binInputStream.available());
binInputStream.close();
}
};
So now that I have my channel open, and my stream listener defined - I need to create my nsIStreamConverter service to take the gzip’d data from the channel, and pass it to the stream listener so it can do its thing with the uncompressed data.
// Get the converter service
var converterService = Cc["@mozilla.org/streamConverters;1"]
.getService(Ci.nsIStreamConverterService);
// Instantiate our gzip decompresser converter
var converter = converterService.asyncConvertData("gzip",
"uncompressed", myListener, null);
So now that we have all our pieces defined, all that’s left to do is pass the converter to the channel and start the pipeline:
// Initiate the asynchronous open. This will initiate the connection
// to Magnatune, grab the gzip'd data and pass it to our gzip converter
// which will then call the StreamListener, so our completion hook is
// fired in the StreamListener's onStopRequest()
chan.asyncOpen(converter, null);
Awesome. So now every Songbird/Magnatune user will be downloading a 300kb gzip’d file instead of a massive 6MB file each time they sync with the Magnatune DB.
10 Comments Add your own
-
1.
Boris | May 5th, 2008 at 11:12
Is the URI not an http: URI? You could just use HTTP content-encoding and have all of this handled for you behind the scenes…
-
2.
Jorge | May 5th, 2008 at 11:21
Interesting. When I needed to do something like this I opted to save the file locally and then open it with ZipReader. This sounds much more efficient and better encapsulated, though. Thank you for posting it.
-
3.
ToTheBatCave | May 5th, 2008 at 14:44
If you turned this into an extension that ungzipped remote files on 3rd party servers that fail to serve the correct Content-encoding: gzip headers, you’ll find it very popular if combined with the functionality of “openinbrowser” at http://www.spasche.net/mozilla/ for when they also serve the wrong content-type. For instance, I’ve not found any free web hosting services (including archive.org) that serve .svgz files with that header, even though it would benefit them bandwidth-wise.
-
4.
Dennis | May 6th, 2008 at 02:26
Instead of doing: this._data += xx.
You should consider: this.data = []; this.data.push(xx) this._data.join(”").
On large pieces of data thats alot faster.
-
5.
Stephen Lau | May 6th, 2008 at 08:39
Thanks for the tip Dennis, I’ll try that.
-
6.
mozilla&hellip | May 6th, 2008 at 11:56
[…] The fetch was easy XMLHttpRequest, as was the parse DOMParser, but I had no idea how to do the expanhttp://whacked.net/2008/05/05/reading-gzip-compressed-data-via-javascript/Features and Case Studies ZDNet AustraliaThe mozilla Foundation is perhaps best known for its […]
-
7.
Stephen Lau | May 6th, 2008 at 17:11
@Boris: The Magnatune store sends the gzip’d XML file with mime-type application/gzip instead of text/xml, so you can’t use XMLHttpRequest.
-
8.
Boris | May 6th, 2008 at 20:01
Ah, this is a URI you don’t control. That’s unfortunate…
-
9.
Neil | May 7th, 2008 at 06:23
One possibility is to use a stream loader, although from JavaScript it gives you a rather useless array of byte values rather than the string you’re looking for (which I hope is ASCII for your sake.)
Ideally what you could do with is some way of hooking up the stream listener for an XML document with the stream converter, though I don’t see how to achieve that without hacking into the Gecko codebase.
-
10.
Stephen Lau | May 7th, 2008 at 08:33
Yeah - the method I used gives me nice ASCII XML data which I can then parse more easily.
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>









