Mucking About With SquashFS – /dev/ttyS0

SquashFS is an incredibly popular file system for embedded Linux devices. Unfortunately, it is also notorious for being hacked up by vendors, causing the standard SquashFS tools (i.e., unsquashfs) to fail when extracting these file systems.

While projects like the Firmware-Mod-Kit (FMK) have amassed many unsquashfs utilities to work with a wide range of SquashFS variations found in the wild, this approach has several draw backs, most notably that each individual unsquashfs tool only supports its one particular variation. If you run into a SquashFS image that is mostly compatible with a given unsquashfs tool, but has some minor modification, you can’t extract it – and worse, you probably don’t know why.

So what are these “minor modifications” that cause unsquashfs to fail?

It generally comes down to compression, specifically, lzma. Although SquashFS 4.0 now supports a wide variety of compression types, ’twas not always thus. Prior to version 4, SquashFS only officially supported zlib compression. However, lzma compresses much smaller, so many embedded vendors hacked in lzma support, and of course they all did it in a slightly different way.

Some vendors put the standard 13-byte lzma header in front of all their compressed data blocks, which includes important compression meta-data, most notably the lzma properties used to compress that of the data block:

struct lzma_header
{
    uint8_t properties;          // Contains the lc, lp, and pb property values
    uint32_t dictionary_size;
    uint64_t uncompressed_size;
};

This makes decompressing each data block straightforward; even so, the official SquashFS tools assume that any SquashFS file system prior to 4.0 is compressed using zlib, requiring special lzma versions of these tools to be built in order to support lzma compressed file systems prior to version 4.

Some vendors omitted the uncompressed size field from the lzma header of each data block:

struct lzma_header
{
    uint8_t properties;          // Contains the lc, lp, and pb property values
    uint32_t dictionary_size;
    //uint64_t uncompressed_size;
};

This kind of makes sense, since the uncompressed size field is not really required anyway; SquashFS code will know the exact, or at least the maximum, size of each data block, and lzma itself will just keep uncompressing data until it’s done. While it is valid to set the uncompressed size field to -1 in the lzma header if the size of the original data is not known at compression time, lzma decompressors still expect this field to exist. If it doesn’t, the decompressor will interpret whatever bytes happen to be there as the uncompressed size field, which likely won’t make sense, and decompression will fail.

Other implementations decided to encode lzma properties for each compressed data block using their own custom structure. Take DD-WRT for example:

struct lzma_header
{
    uint8_t pb;
    uint8_t lc;
    uint8_t lp;
    uint8_t unk;
};

Some just use hard-coded compression properties for all data blocks, so there’s no lzma header at the beginning their compressed data blocks at all. Further, these properties are not necessarily the default lzma property values:

// lzma zlib simplified wrapper
#include <zlib.h>

#define ZLIB_LC 0  // The default value for lc is 3; here, it's been changed to 0
#define ZLIB_LP 0
#define ZLIB_PB 2
...

Still others throw seemingly unnecessary data into the beginning of their data blocks, like the string “7zip”.

Due to the use of non-standard compression, many vendors also change the SquashFS “magic bytes”, which makes standard unsquashfs utilities think that the SquashFS image is invalid.

All this, coupled with the fact that most unsquashfs utilities are pedantic about which SquashFS version(s) they support, requires anyone interested in extracting embedded file systems to litter their system with many different unsquashfs variants.


Luckily, the latest unsquashfs utility supports all versions of SquashFS (v1 – v4). While it still suffers from all the other above problems, it provides a useful base from which to develop a more “hacker friendly” tool.

In a (perhaps futile) attempt to write one extraction tool to support as many SquashFS variations as possible, sasquatch was born. It’s basically unsquashfs v4.3 that has been modified with some nifty features:

  • Doesn’t care about the SquashFS magic bytes
  • Doesn’t trust the reported compression header field
  • Tries all supported decompressors until it finds one that works, regardless of the SquashFS version
  • Adds some vendor-specific lzma implementations to the supported decompressor list
  • Includes an “adaptive” lzma decompressor that attempts to dynamically identify lzma compression options
  • Provides more fine-grained command line control over decompression and debug output

The adaptive lzma decompressor is perhaps the best feature, as it not only generically auto-detects and decompresses several known vendor variations, but potentially can detect and decompress yet-unknown variations. In fact, it has already been able to extract SquashFS images that could not be extracted by any of the unsquashfs utilities in the Firmware-Mod-Kit.

With that said, the code is still beta and there are a couple of known SquashFS images that sasquatch can’t extract (yet). Bug reports and patches welcome.

Bookmark the permalink.

Comments are closed.