Squashfs: Directly decompress into the page cache for file data

This introduces an implementation of squashfs_readpage_block()
that directly decompresses into the page cache.

This uses the previously added page handler abstraction to push
down the necessary kmap_atomic/kunmap_atomic operations on the
page cache buffers into the decompressors.  This enables
direct copying into the page cache without using the slow
kmap/kunmap calls.

The code detects when multiple threads are racing in
squashfs_readpage() to decompress the same block, and avoids
this regression by falling back to using an intermediate
buffer.

This patch enhances the performance of Squashfs significantly
when multiple processes are accessing the filesystem simultaneously
because it not only reduces memcopying, but it more importantly
eliminates the lock contention on the intermediate buffer.

Using single-thread decompression.

        dd if=file1 of=/dev/null bs=4096 &
        dd if=file2 of=/dev/null bs=4096 &
        dd if=file3 of=/dev/null bs=4096 &
        dd if=file4 of=/dev/null bs=4096

Before:

629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s

After:

629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>
5 files changed