xz, unxz, xzcat, lzma, unlzma, and lzcat compress or decompress .xz and .lzma files.

Description

xz is a general-purpose data compression tool with command line syntax similar to gzip and bzip2. The native file format is the .xz format, but the legacy .lzma format used by LZMA Utils and raw compressed streams with no container format headers are also supported.

  • Syntax
  • Examples
  • Related commands
  • Linux commands help

xz compresses or decompresses each file according to the selected operation mode. If no files are given or file is specified as a dash ("-"), xz reads from standard input and writes the processed data to standard output. xz will refuse (display an error and skip the file) to write compressed data to standard output if it is a terminal. Similarly, xz will refuse to read compressed data from standard input if it is a terminal.

Unless –stdout is specified, files other than “-” are written to a new file whose name is derived from the source file name:

  • When compressing, the suffix of the target file format (.xz or .lzma) is appended to the source file name to get the target file name.
  • When decompressing, the .xz or .lzma suffix is removed from the file name to get the target file name. xz also recognizes the suffixes .txz and .tlz, and replaces them with the .tar suffix.

If the target file already exists, an error is displayed and the file is skipped.

Unless writing to standard output, xz displays a warning and skip the file if any of the following applies:

  • File is not a regular file. Symbolic links are not followed, and thus they are not considered to be regular files.
  • File has more than one hard link.
  • File has setuid, setgid, or sticky bit set.
  • The operation mode is set to compress and the file already has a suffix of the target file format (.xz or .txz when compressing to the .xz format, and .lzma or .tlz when compressing to the .lzma format).
  • The operation mode is set to decompress and the file doesn’t have a suffix of any of the supported file formats (.xz, .txz, .lzma, or .tlz).

After successfully compressing or decompressing the file, xz copies the owner, group, permissions, access time, and modification time from the source file to the target file. If copying the group fails, the permissions are modified so that the target file doesn’t become accessible to users who didn’t have permission to access the source file. xz doesn’t support copying other metadata like access control lists or extended attributes yet.

Once the target file is successfully closed, the source file is removed unless –keep was specified. The source file is never removed if the output is written to standard output.

Sending SIGINFO or SIGUSR1 signals to the xz process makes it print progress information to standard error. This has only limited use since when standard error is a terminal, using –verbose displays an automatically updating progress indicator.

Memory usage

The memory usage of xz varies from a few hundred kilobytes to several gigabytes depending on the compression settings. The settings used when compressing a file determine the memory requirements of the decompressor. Typically the decompressor needs 5% to 20% of the amount of memory that the compressor needed when creating the file. For example, decompressing a file created with xz -9 currently requires 65 MiB of memory. Still, it is possible to have .xz files that require several gigabytes of memory to decompress.

Especially users of older systems may find the possibility of very large memory usage annoying. To prevent uncomfortable surprises, xz has a built-in memory usage limiter, which is disabled by default. While some operating systems provide ways to limit the memory usage of processes, relying on it wasn’t deemed to be flexible enough.

The memory usage limiter can be enabled with the command line option –memlimit=limit. Often it is more convenient to enable the limiter by default by setting the environment variable XZ_DEFAULTS, e.g., XZ_DEFAULTS=–memlimit=150MiB. It is possible to set the limits separately for compression and decompression using –memlimit-compress=limit and –memlimit-decompress=limit. Using these two options outside XZ_DEFAULTS is rarely useful because a single run of xz cannot do both compression and decompression and –memlimit=limit (or -M limit) is shorter to type on the command line.

If the specified memory usage limit is exceeded when decompressing, xz displays an error and decompressing the file will fail. If the limit is exceeded when compressing, xz tries to scale the settings down so that the limit is no longer exceeded (except when using –format=raw or –no-adjust). This way the operation won’t fail unless the limit is very small. The scaling of the settings is done in steps that don’t match the compression level presets, e.g., if the limit is only slightly less than the amount required for xz -9, the settings will be scaled down only a little, not down to xz -8.

Concatenating and padding with .xz files

It is possible to concatenate .xz files as is. xz will decompress such files as if they were a single .xz file.

It is possible to insert padding between the concatenated parts or after the last part. The padding must consist of null bytes and the size of the padding must be a multiple of four bytes. This can be useful e.g., if the .xz file is stored on a medium that measures file sizes in 512-byte blocks.

Concatenation and padding are not allowed with .lzma files or raw streams.

Syntax

xz [option]… [file]…

unxz is equivalent to xz –decompress.

xzcat is equivalent to xz –decompress –stdout.

lzma is equivalent to xz –format=lzma.

unlzma is equivalent to xz –format=lzma –decompress.

lzcat is equivalent to xz –format=lzma –decompress –stdout.

Options: operation modes

These options tell xz what mode to use. If more than one mode is specified, the last one takes effect.

Options: operation modifiers

Options: basic file format and compression options

Custom compressor filter chains

A custom filter chain allows specifying the compression settings in detail instead of relying on the settings associated to the preset levels. When a custom filter chain is specified, the compression preset level options (-0 … -9 and –extreme) are silently ignored.

Supported check types:

The differences between the presets are more significant than with gzip and bzip2. The selected compression settings determine the memory requirements of the decompressor, thus using a too high preset level might make it painful to decompress the file on an old system with little RAM. Specifically, it’s not a good idea to blindly use -9 for everything like it often is with gzip and bzip2.

On the same hardware, the decompression speed is approximately a constant number of bytes of compressed data per second. In other words, the better the compression, the faster the decompression will usually be. This also means that the amount of uncompressed output produced per second can vary a lot.

The following table summarises the features of the presets:

Since there are two presets with dictionary sizes 4 MiB and 8 MiB, the presets -3e and -5e use slightly faster settings (lower CompCPU) than -4e and -6e, respectively. That way no two presets are identical.

A filter chain is comparable to piping on the command line. When compressing, the uncompressed input goes to the first filter, whose output goes to the next filter (if any). The output of the last filter gets written to the compressed file. The maximum number of filters in the chain is four, but often a filter chain has only one or two filters.

Many filters have limitations on where they can be in the filter chain: some filters can work only as the last filter in the chain, some only as a non-last filter, and some work in any position in the chain. Depending on the filter, this limitation is either inherent to the filter design or exists to prevent security issues.

A custom filter chain is specified using one or more filter options in the order they are wanted in the filter chain. That is, the order of filter options is significant! When decoding raw streams (–format=raw), the filter chain is specified in the same order as it was specified when compressing.

Filters take filter-specific options as a comma-separated list. Extra commas in options are ignored. Every option has a default value, so you need to specify only those you want to change.

Other options

Robot mode

The robot mode is activated with the –robot option. It makes the output of xz easier to parse by other programs. Currently –robot is supported only together with –version, –info-memory, and –list. It will be supported for normal compression and decompression in the future.

The following match finders are supported. The memory usage formulas below are rough approximations, which are closest to the reality when dict is a power of two.

Since the BCJ-filtered data is usually compressed with LZMA2, the compression ratio may be improved slightly if the LZMA2 options are set to match the alignment of the selected BCJ filter. For example, with the IA-64 filter, it’s good to set pb=4 with LZMA2 (2^4=16). The x86 filter is an exception; it’s usually good to stick to LZMA2’s default four-byte alignment when compressing x86 executables.

All BCJ filters support the same options:

Robot mode: version

xz –robot –version will print the version number of xz and liblzma in the following format:

XZ_VERSION=XYYYZZZS LIBLZMA_VERSION=XYYYZZZS

Here’s what the version number means, part by part:

Examples: 4.999.9beta is 49990091 and 5.0.0 is 50000002.

Robot mode: memory limit information

xz –robot –info-memory prints a single line with three tab-separated columns:

  • Total amount of physical memory (RAM) in bytes
  • Memory usage limit for compression in bytes. A special value of zero indicates the default setting, which for single-threaded mode is the same as no limit.
  • Memory usage limit for decompression in bytes. A special value of zero indicates the default setting, which for single-threaded mode is the same as no limit.

In the future, the output of xz –robot –info-memory may have more columns, but never more than a single line.

Robot mode: list mode

xz –robot –list uses tab-separated output. The first column of every line has a string that indicates the type of the information found on that line:

The columns of the file lines are:

  • Number of streams in the file.
  • Total number of blocks in the stream(s.
  • Compressed size of the file.
  • Uncompressed size of the file.
  • Compression ratio, for example 0.123. If ratio is over 9.999, three dashes (—) are displayed instead of the ratio.
  • Comma-separated list of integrity check names. The following strings are used for the known check types: None, CRC32, CRC64, and SHA-256. For unknown check types, Unknown-N is used, where N is the Check ID as a decimal number (one or two digits).
  • Total size of stream padding in the file.

The columns of the stream lines are:

  • Stream number (the first stream is 1).
  • Number of blocks in the stream.
  • Compressed start offset.
  • Uncompressed start offset.
  • Compressed size (does not include stream padding).
  • Uncompressed size.
  • Compression ratio.
  • Name of the integrity check.
  • Size of stream padding.

The columns of the block lines are:

  • Number of the stream containing this block.
  • Block number relative to the beginning of the stream (the first block is 1).
  • Block number relative to the beginning of the file.
  • Compressed start offset relative to the beginning of the file.
  • Uncompressed start offset relative to the beginning of the file.
  • Total compressed size of the block (includes headers).
  • Uncompressed size.
  • Compression ratio.
  • Name of the integrity check.

If –verbose was specified twice, additional columns are included on the block lines. These are not displayed with a single –verbose, because getting this information requires many seeks and can thus be slow:

  • Value of the integrity check in hexadecimal.
  • Block header size.
  • Block flags: c indicates that compressed size is present, and u indicates that uncompressed size is present. If the flag is not set, a dash (-) is shown instead to keep the string length fixed. New flags may be added to the end of the string in the future.
  • Size of the actual compressed data in the block (this excludes the block header, block padding, and check fields).
  • Amount of memory (in bytes) required to decompress this block with this xz version.
  • Filter chain. Note that most of the options used at compression time cannot be known, because only the options that are needed for decompression are stored in the .xz headers.

The columns of the summary lines are:

  • Amount of memory (in bytes) required to decompress this file with this xz version.
  • yes or no indicating if all block headers have both compressed size and uncompressed size stored in them since xz 5.1.2alpha.
  • Minimum xz version required to decompress the file.

The columns of the totals line:

  • Number of streams.
  • Number of blocks.
  • Compressed size.
  • Uncompressed size.
  • Average compression ratio.
  • Comma-separated list of integrity check names that were present in the files.
  • Stream padding size.
  • Number of files. This is here to keep the order of the earlier columns the same as on file lines.

If –verbose was specified twice, additional columns are included on the totals line:

  • Maximum amount of memory (in bytes) required to decompress the files with this xz version.
  • yes or no indicating if all block headers have both compressed size and uncompressed size stored in them.
  • Minimum xz version required to decompress the file.

Exit status

Environment

xz parses space-separated lists of options from the environment variables XZ_DEFAULTS and XZ_OPT, in this order, before parsing the options from the command line. Note that only options are parsed from the environment variables; all non-options are silently ignored.

Examples

xz foo

Compress the file foo into foo.xz using the default compression level (-6), and remove foo if compression is successful.

XZ_OPT=-2v tar caf foo.tar.xz foo

XZ_OPT=${XZ_OPT-"-7e"}; export XZ_OPT

xz -dk bar.xz

Decompress bar.xz into bar and don’t remove bar.xz even if decompression is successful.

tar cf - baz | xz -4e > baz.tar.xz

Create baz.tar.xz with the preset -4e (-4 –extreme), which is slower than the default -6, but needs less memory for compression and decompression (48 MiB and 5 MiB, respectively).

xz -dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt

Decompress a mix of compressed and uncompressed files to standard output, using a single command.

compress — Compress a file or files.gzip — Create, modify, list the contents of, and extract files from GNU zip archives.zip — A compression and archiving utility.