Dar Format Specification

Author:Wesley Leggette
Revision:$Revision: 1.1 $
Copyright:GNU Free Documentation License

Kaylix Series

This specification series differs from the standard dar format specification. Files written to this specification may not be binary compatible with those using a standard dar format. The kaylix series is derived from the standard specification.

The version number is selected by incrementing version number and prepending a 'k'.

THIS IS A DRAFT DOCUMENT

This specification is a pre-release draft. Any corrections should be directed to <wleggette@kaylix.net>.

Dar Format Version k06

Dar was designed by Denis Corbin as part of the Dar Archive application (DAR). All materials related to DAR are copyright Denis Corbin and are available under the terms of the GNU General Public License.

Dar is implemented by the Dar Library (libdar), which is available only under the terms of the GNU General Public License.

The kaylix series was designed by Wesley Leggette. It was created with the aim that many of the changes will eventually be reintegrated into the standard dar specification.

Summary

The dar format is a POSIX oriented archival format meant as a full-featured replacement for tar, cpio, or dump when used to backup file systems. The format has explicit support for compression, encryption, disk spanning, and random file access. In addition, the dar format supports an extractable catalog that can be used to create and examine differential backups. Dar archives have full support for traditional POSIX metadata and POSIX Extended Attributes.

Dar archives contain cyclic redundancy check information but cannot recover corrupted files (non-corrupted files can often be restored from corrupted archives). Parity files must be created separately with a tool like Parchive. Also, dar archives cannot be implicitly modified once created. Further changes are generally recorded by creating new differential archives using an older archive or catalog as a reference.

Archive Structure

|-- first slice size ----------------------------|

+-------------+---------+------------------------+
| slice + ext | archive | file data + EA         |
| header      | header  |                        |
+-------------+---------+------------------------+
              ^                                  ^
              zero offset at archive level       offset (A)

|-- slice size ---------------------------------------|

+--------+--------------------------------------------+
| slice  |          file data + EA                    |
| header |                                            |
+--------+--------------------------------------------+
         ^
         offset (A)

+--------+--------------------------------------------+
| slice  |          file data + EA                    |
| header |                                            |
+--------+--------------------------------------------+


|-- final slice size -----------------------------|

+--------+---------------------+-----------+------+
| slice  | file data + EA      | catalogue | term |
| header |                     |           |      |
+--------+---------------------+-----------+------+
                               ^
                               catalog offset

Dar archives are written sequentially with one exception: During writing, the last flag byte is initially set to FLAG_TERMINAL and is reset to FLAG_NON_TERMINAL on transition to a new slice.

When reading, implementations should read the first slice header, the archive header, then the terminator on the last slice and finally the catalog. The implementation may then proceed with the desired operation. Besides the mandatory access of the first and last slices, other slices may be requested only as needed.

All offsets stored in the archive level are based from a zero offset at the end of the first slice header before the archive header.

The abstracted archive level offset is resolved using several values. FIRST_SLICE_SIZE is queried from the file descriptor of that slice, SLICE_SIZE is set to the value stored in the slice header's size extension (if present) or set equal to the FIRST_SLICE_SIZE. LAST_SLICE_SIZE and LAST_SLICE_NUMBER are queried from the file descriptor and the file name of the last slice file respectively. FIRST_SLICE_OFFSET is set to the current position after reading the first slice header. SLICE_OFFSET is set equal to SLICE_HEADER_SIZE (the size of a slice header without extensions, currently 16 bytes). Any archive level offset can be translated to real file offsets with these values.

Slice Level

+--------+-------------------------------------------+
| header |  data                                     |
|        |                                           |
+--------+-------------------------------------------+

Dar archives are placed in a sort of logical envelope called a slice, allowing the archive to be spanned across one or more files. Each slice consists of a slice header followed by data.

Slice Header

+-------+----------+------+-----------+................+
| magic | internal | last | extension | extension      |
| num.  | name     | flag | flag      | data           |
+-------+----------+------+-----------+................+

The slice header, located at the beginning of every slice file, identifies the file as a dar archive, matches slices together, and indicates other basic file information.

magic number
Fixed value indicating file type.
internal name
Unique identifier for a given archive. This machine generated name should be guaranteed to be unique. The name is used to verify that two slices belong to the same archive.
last flag
Indicates if the slice is the last of the archive.
extension flag
Indicates if an extension field is present. Currently, only one extension field can be present at any time.
(extension data)
Extension data.
magic number

4 byte field

MSB order

SAUV_MAGIC_NUMBER = 123

0x{ 00 00 00 7B }

internal name 10 byte field In libdar, internal name is generated from system time appended by process id.
last flag 1 byte field

FLAG_NON_TERMINAL = 'N'  0x{ 4E }

FLAG_TERMINAL = 'T'  0x{ 54 }

extension flag 1 byte field

EXTENSION_NO = 'N'  0x{ 4E }

EXTENSION_SIZE = 'S' 0x{ 53 }

extension data Variable size field Extension specified data.

Extensions

+-----------+----------------+
| extension | extension      |
| flag      | data           |
+-----------+----------------+

Currently the only valid slice level header extension is the EXTENSION_SIZE header, which indicates that the first slice is set to be different than the other slices. The first file size is determined from the file itself and the size of other slices is indicated in the extension data. Last slice size is also determined from that slice file itself.

EXTENSION_SIZE
{
   char flag = 'S';
   infinint size;   // size of following slices
}

Archive Level

+---------+-----------------------------------------+-----------+------+
| archive |     data                                | catalogue | term |
| header  |                                         |           |      |
+---------+-----------------------------------------+-----------+------+

An archive may be split into multiple slices. The slices are handled exclusively at the slice level. The archive level is accessed by logically concatenating the slice level payloads. The zero offset starts after the first slice header and continues, jumping over each slice header.

Archive Header

+---------+-------+---------------+------\..................\----------+
| format  | comp. | command line  | flag / extensions       / ext_end  |
| version | algo  |               |      \                  \          |
+---------+-------+---------------+------/................../----------+
format version
Indicates the format version of this archive.
compress algorithm
Indicates the compression algorithm used.
command line
The command line string used when originally making the archive. Usage of this field is DEPRECATED.
flag
Indicates if root or user Extended Attributes are saved; indicates if the scramble weak encryption algorithm is used. If Extended Attributes are stored it is optional to restore these values to disk. This flag was added in format version 02.
extensions
Variable number of extensions whose type is indicated by a one byte extension flag.
ext_end
A special extension whose extension flag is T.
format version

Variable length field

null terminated string

SUPPORTED_VERSION = "k06"

0x{ 6B 30 36 00 }

compress algorithm 1 byte field

none = 'n'  0x{ 6E }

zip = 'p'  0x{ 70 }

gzip = 'z'  0x{ 7A }

bzip2 = 'y'  0x{ 79 }

command line

Variable length field

null terminated string

default = "N/A"

0x{ 4E 2F 41 00 }

flag 1 byte field

SAVED_EA_ROOT = 0x80

SAVED_EA_USER = 0x40

SCRAMBLED = 0x20

SAVED_EA_ENCAPSULATED = 0x10

extensions 1 byte field + defined payload ARC_EXTENSION_SYTEM = s  0x{ 73 }
ext_end 1 byte field ARC_EXTENSION_TERM = T  0x{ 54 }

System Extension

+-----------+----------------+
| extension | extension      |
| flag      | data           |
+-----------+----------------+

Currently the only valid archive level header extension is the ARC_EXTENSION_SYSTEM, which indicates what system a given archive was written on and serves as a guideline to the implementation for processing any alternative (non-POSIX) metadata encapsulated in the extended attributes. There are three payload fields: system, simple_desc, detail_desc. The later two are mostly for informational purposes and should roughly follow the guidelines detailed below. The system field is for processing and must attempt to indicate what type of filesystem/operating system combination is preset. For example, while POSIX indicates standard VFS file metadata, POSIX_MAC indicates a Mac OSX system where additional stream data may be present. WIN32 generally indicates NTFS or FAT metadata.

When setting the system field, implementations may default to POSIX as the lowest common denominator and upgrade to other system types only when additional metadata is being saved, or they may always upgrade to a specific system type. The actual presence of system specific metadata is indicated in the flag field of the archive header.

Implementations should generally never use the UNDEFINED field, even when the system type is not known. Instead, POSIX should generally be indicated and POSIX metadata should be guessed if not available.

As an additional note, if conflicting metadata types are ever saved (i.e. if for some reason both NTFS metadata and HFS streams are saved) it would be okay for implementations to indicate the first special metadata type found. If an NTFS ACL is found before the first HFS stream, WIN32 could be indicated. It would also be okay for implementations to default to the type of operating system. If an NTFS ACL is found before the first HFS stream, but the system is Mac OSX, then POSIX_MAC could also be indicated. This implies that implementations are also responsible for handling unexpected encapsulated metadata values when reading or writing encapsulated extended attributes.

Whatever value is finally chosen for the system field, the type of metadata implied by that field should be stored as completely as possible. For example, if WIN32 is chosen, all catalog entries should contain a complete set of encapsulated FAT and/or NTFS metadata. Implementations should also expect to have to generate their own special metadata (when restoring to a system with alternative metadata) but must default to using any encapsulated metadata except where impossible (such as inadequate access rights to restore ownership).

ARC_EXTENSION_SYSTEM
{
   char flag = 's';
   char system;
   char* simple_desc;
   char* detail_desc;
}
system 1 byte field

UNDEFINED = 0x00

POSIX = 0x01

WIN32 = 0x02

POSIX_MAC = 0x03

simple_desc null terminated string

Value returned by uname or one of these preset strings:

SYS_UNIX = "Unix"

SYS_LINUX = "Linux"

SYS_SOLARIS = "Solaris"

SYS_WINDOWS = "Windows"

SYS_MAC = "Mac OSX"

detail_desc null terminated string Machine type, e.g. value returned by gcc -dumpmachine.

Stored Data

....--+---------------------+----+------------+-----------+----+---....
      | file data           | EA | file data  | file data | EA |
      | (may be compressed) |    | (no EA)    |           |    |
....--+---------------------+----+------------+-----------+----+---....

Roughly speaking, data is stored as a stream of file contents followed by Extended Attributes if present. Offsets to file data and Extended Attributes are stored in the catalog at the end of the archive. They are zeroed at the beginning of the archive header.

Encryption and compression are implemented as "layers" on top of the archive level. File data and Extended Attributes are written to the top most layer. Data is not stored in a solid format, meaning that the top most layer is flushed after each file and set of Extended Attributes are written.

Not all layers are required in a given archive. The possible layers, in order, are compression, encryption, and archive level (below which the slice layer is located). The following set of diagrams detail several possible configurations. Note that they are not draw to scale (e.g. raw data is often larger than the compressed output).

FIXME: Are elastic buffers added to the end of each file data and EA section by the encryption layer?

When encryption is used, "elastic buffers" are added to the encryption layer before and after the data stream. These are indicated by EE in the following diagrams.

Case 1: Encrypted, compressed archive

               +------+----+------+----+..............+-----------+
               | file | EA | file | EA |              | catalogue |
               | data |    | data |    |              |           |
               +------+----+------+----+..............+-----------+

          +----+--------------------------------------------------+------+----+
          | EE | Compressed data                                  | term | EE |
          |    |                                                  |      |    |
          +----+--------------------------------------------------+------+----+

+---------+-------------------------------------------------------------------+
| archive | Encrypted data                                                    |
| header  |                                                                   |
+---------+-------------------------------------------------------------------+

Case 2: Encrypted archive

          +----+------+----+------+----+..............+-----------+------+----+
          | EE | file | EA | file | EA |              | catalogue | term | EE |
          |    | data |    | data |    |              |           |      |    |
          +----+------+----+------+----+..............+-----------+------+----+

+---------+-------------------------------------------------------------------+
| archive | Encrypted data                                                    |
| header  |                                                                   |
+---------+-------------------------------------------------------------------+

Case 3: Compressed archive

          +------+----+------+----+........................+-----------+
          | file | EA | file | EA |                        | catalogue |
          | data |    | data |    |                        |           |
          +------+----+------+----+........................+-----------+

+---------+------------------------------------------------------------+------+
| archive | Compressed data                                            | term |
|         |                                                            |      |
+---------+------------------------------------------------------------+------+

Case 4: Plain archive

+---------+------+----+------+----+........................+-----------+------+
| archive | file | EA | file | EA |                        | catalogue | term |
|         |      |    |      |    |                        |           |      |
+---------+------+----+------+----+........................+-----------+------+

Compression Level

When compression is used in an archive, the algorithm is specified in the archive header. Compression may be selectively disabled per catalog file entry. Compression of EA entries stored in the data stream may not be disabled.

When data compression is disabled, storage_size must be set to zero. The original size must be stored as size.

Encryption Level

When weak encryption is used in an archive, the variable length key is matched starting at the first byte of the encryption layer. Decryption can be performed by generally applying the following equation at the encryption level:

archive_level[offset] - key[(offset - header_length) % key_length]

Elastic Buffer

To protect against plain text attacks when encryption is used, a special buffer of random data is added at the beginning and end of the archive. The size of the buffer is randomly chosen when the archive is created. The buffer contains self-describing size information.

FIXME: Is the following transparent offset information true?

The encryption layer transparently handles the elastic buffers. The space occupied by the elastic buffers are beyond the archive level offset count.

          +----+--------------------------------------------------+------+----+
          | EE | Compressed data                                  | term | EE |
          |    |                                                  |      |    |
          +----+--------------------------------------------------+------+----+

+---------+-------------------------------------------------------------------+
| archive | Encrypted data                                                    |
| header  |                                                                   |
+---------+-------------------------------------------------------------------+
^         ^    ^                                                         ^
zero      A    A + 1                                            EOF offset

Elastic buffers have a minimum size of 1 byte. There is no inherent maximum size, however the maximum size is defined to be 4 GiB. In libdar, buffer size is stored in a 32 bit integer.

The format of an elastic buffer can be demonstrated by four cases. In these examples the character '.' indicates a random byte and '0' indicates part of a variable length unsigned integer. This integer is stored in big endian form.

Case 1: 1 byte buffer

Size indicated by the special character 'X'

     X

Case 2: 2 byte buffer

Size indicated by special end characters '>' and '<'

     ><

Case 3: 3 byte buffer

Size indicated by 1 byte integer 0x3

     >0<

Case 4: 20 byte buffer

Size indicated by 1 byte integer 0x14

     .....>0<............

The size indicator structure is randomly placed in the buffer. Its position is chosen when the buffer is created. Each random byte is any value from 0x00 to 0xFF with the exception of the characters 'X', '>', and '<'.

When implementing an elastic buffer interpretor, care must be taken to allow for variable size integers. When writing a buffer, the integer should be and sometimes must be reduced to the smallest possible size.

EA Data Storage Format

Extended Attributes are stored as an array of entries. Each entry contains key and value strings with a flag indicating special attribute types.

While most attributes are stored in the data stream after the file data, some or all entries may be stored in "close" form, meaning they are stored directly in the catalog. This is mostly used for encapsulated alternate entries. Entries stored in the catalog are compared directly with the file system during each archive compare operation.

+--------+------------------+-------------+-----------------------+
| entry  |  key             | value_size  |  value                |
| flag   |                  |             |                       |
+--------+------------------+-------------+-----------------------+
entry_flag
Character bit-field indicating special attribute types.
key
Null-terminated string containing Extended Attribute key minus class string.
value_size
Infinint indicating size of Extended Attribute value.
value
String containing Extended Attribute value. The string is not null-terminated.

The valid entry_flag values are indicated below:

default value 0x00 When no flags are set.
encapsulated alternate 0x20 A special encapsulated attribute that stores alternate (non-POSIX) metadata that can be translated back when restoring the a non-POSIX filesystem or stored as a user class attribute if restoring to a POSIX filesystem.

The array of Extended Attribute entries is stored in the data section directly after the file data it concerns. The array is prepended by an infinint indicating the number of entries present.

+---------\....................................+
| size    /  EA entries                        |
|         \                                    |
+---------/....................................+

Encapsulated EA Values

Alternate non-POSIX metadata may be saved and restored into the same mechanism used to store POSIX Extended Attributes. These special metadata pieces may be normal metadata pieces or non-POSIX Extended Attributes. When an EA entry is saved as an encapsulated alternate entry, it is prefixed with a string indicating what type of entry it is, indicating how to translate it back into usable metadata. These values are mostly freeform. Standard values and how they should be used are enumerated below. All the attributes shown below should be stored in "close" form and compared with the filesystem during each archive compare operation.

Value type Indicator Key General usage
FAT File Attributes FAT.ATTRIB:BYTE Used to store the traditional six FAT file attributes. Value is the attribute byte of the CFileStatus struct returned from CFile::GetStatus. See Dos/FAT Metadata in this document.
FAT Creation Time FAT.TIME:C Used to store the FAT ctime value (file creation time). Value is an infinint containing the time_t value returned from the CTime::GetTime method of the ctime member of the CFileStatus struct. See Dos/FAT Metadata in this document.
NTFS File Attributes NTFS.ATTRIB:COMPRESSED Used to indicate that a file is compressed on an NTFS partition. When restored, the compression flag should be reset if possible.
NTFS.ATTRIB:ENCRYPTED Used to indicated that a file is encrypted on an NTFS partition. The encryption descriptor stream $EFS must also be saved and restored with the file. Implementations may use ReadEncryptedFileRaw and WriteEncryptedFileRaw to read and write raw encrypted data.
NTFS.ATTRIB:NO_CONT_INDEX Used to indicate that a file is set to not be indexed by the operating system.
NTFS.ATTRIB:REPARSE_POINT Used to indicate that a file includes a a reparse point. The reparse data stream BACKUP_REPARSE_DATA must also be saved and restored with the file.
NTFS.ATTRIB:SPARSE_FILE Used to indicate that a file uses sparse allocation. The sparse block stream BACKUP_SPARSE_BLOCK must also be saved and restored with the file.
NTFS File Times NTFS.TIME:C Used to store the NTFS creation time value. Value is an infinint containing the FILETIME value stored in ftCreationTime in the WIN32_FILE_ATTRIBUTE_DATA struct.
NTFS.TIME:M Used to store the NTFS modification time value. Value is an infinint containing the FILETIME value stored in ftLastWriteTime in the WIN32_FILE_ATTRIBUTE_DATA struct.
NTFS.TIME:A Used to store the NTFS access time value. Value is an infinint containing the FILETIME value stored in ftLastAccessTime in the WIN32_FILE_ATTRIBUTE_DATA struct.
NTFS Stream Attributes NTFS.STREAM:MOD_READ Used to indicate that the stream data is modified by the operating system when read. This means that verification against on-disk data will fail, thus verification should be performed in an alternate matter.
NTFS.STREAM:SECURITY Used to indicate that the stream contains security data.

Catalogue

The catalog contains all inode, directory structure, and hard link information. The directory structure is stored in a simple way: The inode of a directory comes first, the inode of the files it contains, then a special end of directory entry.

Consider the following tree:

- toto
   | titi
   | tutu
   | tata
   |  | blup
   |  +---
   | boum
   | coucou
   +---

The following sequence would be generated for the catalog:

+-------+------+------+------+------+-----+------+--------+-----+
| toto  | titi | tutu | tata | blup | EOD | boum | coucou | EOD |
|       |      |      |      |      |     |      |        |     |
+-------+------+------+------+------+-----+------+--------+-----+

The first catalog entry in a dar archive is a special "root" directory entry. This entry serves as a marker; its presence must be verified and it must not be restored by implementations. The root directory entry's values are shown below. See the format information for directory entries for more information.

ROOT_DIRECTORY
{
   char      signature   = 'd';
   char*     name_string = "root";
   char      EA_flag     = ea_none; // 0x00
   uint16_t  UID         = 0;
   uint16_t  GID         = 0;
   uint16_t  permissions = 0;
   infinint  atime       = 0;
   infinint  mtime       = 0;
}

Entry Types

The first byte of catalog entries indicates the entry type and its status. An entry has a status of s_saved if full file data is saved in the archive. This is the normal situation. When an archive catalog is isolated from a given archive all entries in the extracted catalog do not have a saved status.

Saved status is indicated by unsetting the most significant bit of the entry type byte. An entry does not have a saved status when this bit is set:

+-------------------------------+
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---+---+---+---+---+---+---+---+
| 0 |  ASCII character          |  saved status
| 1 |  ASCII character          |  unsaved status
+---+---+---+---+---+---+---+---+

Entry formats can be arranged in a hierarchal format. Higher level entry types are contained as the header of deriving types. The highest level type is the entree, which contains the entry signature.

- entree
   | EOD
   | nomme
   |  | hard link label
   +- | deleted file marker
      | inode
      |  | symbolic link
      +- | device
         |  | character special device
         |  | block special device
         |  +---
         | directory
         | ignored directory
         | file
         |  | file label
         |  +---
         | socket
         | pipe
         +---

The entry signature, while part of the entree entry type, is defined by the entry that is stored in the catalog. There are 15 valid entry types. This includes the 7 standard POSIX file types and 8 catalog specific types.

Entry Type Signature Saved Status Unsaved Status
regular file f 0x66 0xE6
symbolic link l 0x6C 0xEC
character device c 0x63 0xE3
block device b 0x62 0xE2
pipe p 0x70 0xF0
socket s 0x73 0xF3
directory d 0x64 0xE4
end of directory z 0x7A 0xFA
auxilary stream a 0x61 0xE1
deleted file marker x 0x78 0xF8
hard link label h 0x68 0xE8
regular file label e 0x65 0xE5
symbolic link label m 0x6D 0xED
pipe label q 0x71 0xF1
directory label g 0x67 0xE7

Entree Format

+-----+
| sig |
|     |
+-----+
signature
Signature character encoding entry type and status.

EOD Format

+-----+
| sig |
|     |
+-----+
signature
Signature is 'z'. Indicates end of directory entry.

Nomme Format

+-----+-------------+
| sig | name_string |
|     |             |
+-----+-------------+
signature
Signature is defined by the specific entry type.
name_string
Null terminated string containing the name of the object (path not included).

Deleted File Marker Format

+--------------+------+
| NOMME        | orig |
| header dump  | sig  |
+--------------+------+

Files deleted since a given reference archive are tracked with deleted file marker entries. The original catalog entry is removed and replaced with this type of entry. The signature of the original entry is appended.

Deleted directories are stored in the same ways as deleted files, with a name header and original signature set to d. As such, previous contents of the deleted directory are not recorded.

signature
Signature is 'x'.
original_signature
The signature of the original file before it was deleted.

Inode Format

+--------------+------+-----+-----+------+-------+--------\
| NOMME        | EA   | UID | GID | perm | atime | mtime  /
| header dump  | flag |     |     |      |       |        \
+--------------+------+-----+-----+------+-------+--------/

                   \............+.........+..........+....................+
                   / *ea_offset | *ea_crc | ea_ctime | **CLOSE EA data    |
                   \            |         |          |                    |
                   /............+.........+..........+....................+
signature
Signature is defined by the specific inode entry type.
EA flag
1 byte inode flag; values are ea_none = 0x00 when no Extended Attributes are stored, ea_partial = 0x01 when partial Extended Attribute information is stored, ea_full = 0x02 when Extended Attributes are fully stored, and ea_close = 0x04 when some Extended Attributes are stored within the catalog. Extended Attributes are "partially" stored when the catalog is isolated. In such cases, only the inode change time is stored, not the attribute data.
UID
16-bit word containing POSIX user identification number.
GID
16-bit word containing POSIX group identification number.
permissions
16-bit word containing POSIX mode and permissions. See man 2 chmod for more information.
atime
Infinint containing time of last access. See man 2 stat for more information.
mtime
Infinint containing time of last modification. See man 2 stat for more information.
ea_offset
Infinint containing the offset to Extended Attribute data. This field is present only when full Extended Attribute information is stored.
ea_crc
16-bit word containing CRC sum of Extended Attribute data. This field is present only when full Extended Attribute information is stored.
ea_ctime
Infinint containing the time of last Extended Attribute modification. On Linux, this is the same as the inode's ctime. On other systems this may be set to the file modification time. This field is always present when Extended Attributes are saved.
CLOSE EA data
A collection of Extended Attributes that are stored in the catalog. These are usually encapsulated attributes that need to be compared to the filesystem to determine if they have changed. For example, there is no ctime value that can hint that FAT and NTFS attributes have changed, so they have to be compared manually. The entries are stored in the same format as those stored in the data stream, i.e. an infinint indicating how many entries are present followed by that number of entries. These entries are always present when close Extended Attributes are saved.

Device Format

+---------------------\.......+.......+
| INODE               / major | minor |
| header dump         \       |       |
+---------------------/.......+.......+
signature
Signature is 'c' when saving a character special device file, 'b' for block special device files.
major
16-bit word containing major device number. This ensures compatibility when systems move to 16-bit major and minor device numbers. This field is not present when device data is not saved, which occurs when the catalog is isolated.
minor
16-bit word containing minor device number. This ensures compatibility when systems move to 16-bit major and minor device numbers. This field is not present when device data is not saved, which occurs when the catalog is isolated.

Directory Format

+---------------------\...................\-----+
| INODE               /   catalog         / EOD |
| header dump         \   entries         \     |
+---------------------/.................../-----+

Directories are stored as inode entries with a signature 'd'. The name of the directory, without path information, is stored in the nomme header. The directory contents are dumped sequentially after the directory entry, after which an end of directory entry must be inserted. For an empty directory an end of directory entry is inserted directly after the directory entry.

signature
Signature is 'd'.

Directory Label Format

+---------------------+-----------+----------\..............+.....\
| DIRECTORY           | etiquette | attached /  attachments | EOD /
| header dump         |           | entries  \              |     \
+---------------------+-----------+----------/..............+...../

                                        \...................\-----+
                                        /   catalog         / EOD |
                                        \   entries         \     |
                                        /.................../-----+
signature
Signature is 'g'.
etiquette
Infinint containing a number that is unique for each inode saved in the archive. This number is not the same as the filesystem inode number. Implementations must keep an inode table to match label numbers during save and restore operations.
attached_entries
1 byte attachment flag; values are attachments_none = 0x00 when no attachments are not present (there will also be no terminal EOD entry), or attachments_present = 0x01 when attachments are present (there will also be a terminal EOD entry).

Ignored Directory Format

+---------------------+-----+
| INODE               | EOD |
| header dump         |     |
+---------------------+-----+

The inclusion of stub entries for pruned directories is not required for implementations and can be implemented as an optional parameter. If stubs are included, they are treated like empty directories.

signature
Signature is 'd'.

File Format

+---------------------+-------\.........+..............\------+
| INODE               | size  / offset  | storage_size /  crc |
| header dump         |       \         |              \      |
+---------------------+-------/.........+............../------+
signature
Signature is 'f'.
size
Infinint containing the original size of the file.
offset
Infinint containing the offset to the file data. This field is not present when the file data is not saved, which occurs when the catalog is isolated.
storage_size
Infinint containing the new size of the file saved in the archive. When no compression is used, this is set to zero, indicating that the storage size is equal to the original size. This field is not present when the file data is not saved, which occurs when the catalog is isolated.
crc
16-bit word containing CRC sum of the original file data.

File Label Format

+--------------------------------------+-----------+----------\
| FILE                                 | etiquette | attached /
| header dump                          |           | entries  \
+--------------------------------------+-----------+----------/

                                        \..............+.....+
                                        /  attachments | EOD |
                                        \              |     |
                                        /..............+.....+

When more than one hard link for a given regular file is saved in an archive the first occurrence of the file will be appended with a label indicating an archive wide unique number. Later occurrences will be saved as a hard link label entry with the matching label number.

Labels are also used to save files that contain auxiliary streams. The main data stream is indicated by the File entry and auxiliary streams. These auxiliary streams are handled as "attachments". The attached_entries flag indicates if attachments are present. These attachments are then dumped into the catalog as if they were files in a directory. The end of the attachments are indicated by an EOD entry. Each attachment also contains an etiquette field with the same identification number as the entry they are attached to.

signature
Signature is 'e'.
etiquette
Infinint containing a number that is unique for each inode saved in the archive. This number is not the same as the filesystem inode number. Implementations must keep an inode table to match label numbers during save and restore operations.
attached_entries
1 byte attachment flag; values are attachments_none = 0x00 when no attachments are not present (there will also be no terminal EOD entry), or attachments_present = 0x01 when attachments are present (there will also be a terminal EOD entry).

Socket Format

+---------------------+
| INODE               |
| header dump         |
+---------------------+
signature
Signature is 's'.

Pipe (FIFO) Format

+---------------------+
| INODE               |
| header dump         |
+---------------------+
signature
Signature is 'p'.

Pipe (FIFO) Label Format

+---------------------+-----------+----------\..............+.....+
| INODE               | etiquette | attached /  attachments | EOD |
| header dump         |           | entries  \              |     |
+---------------------+-----------+----------/..............+.....+
signature
Signature is 'q'.
etiquette
Infinint containing a number that is unique for each inode saved in the archive. This number is not the same as the filesystem inode number. Implementations must keep an inode table to match label numbers during save and restore operations.
attached_entries
1 byte attachment flag; values are attachments_none = 0x00 when no attachments are not present (there will also be no terminal EOD entry), or attachments_present = 0x01 when attachments are present (there will also be a terminal EOD entry).

Stream Format

+-------------------+------\............+.........+..........+..............\
| HARD LINK LABEL   | EA   / *ea_offset | *ea_crc | ea_ctime | **CLOSE EA   /
| header dump       | flag \            |         |          | data         \
+-------------------+------/............+.........+..........+............../

                                 \-------\.........+..............\------+
                                 / size  / offset  | storage_size /  crc |
                                 \       \         |              \      |
                                 /-------/.........+............../------+

When auxiliary streams are attached to a file, the file is stored with a label entry. Stream entries are created for each auxiliary stream and dumped as attachments to the label entry. An EOD entry will indicate the end of a list of attachments.

signature
Signature is 'a'.
name_string
Null terminated string containing the name of the stream.
etiquette
Infinint containing a number that is unqiue for each inode saved in an archive.
EA flag
1 byte inode flag; values are ea_none = 0x00 when no Extended Attributes are stored, ea_partial = 0x01 when partial Extended Attribute information is stored, ea_full = 0x02 when Extended Attributes are fully stored, and ea_close = 0x04 when some Extended Attributes are stored within the catalog. Extended Attributes are "partially" stored when the catalog is isolated. In such cases, only the inode change time is stored, not the attribute data.
ea_offset
Infinint containing the offset to Extended Attribute data. This field is present only when full Extended Attribute information is stored.
ea_crc
16-bit word containing CRC sum of Extended Attribute data. This field is present only when full Extended Attribute information is stored.
ea_ctime
Infinint containing the time of last Extended Attribute modification. On Linux, this is the same as the inode's ctime. On other systems this may be set to the file modification time. This field is always present when Extended Attributes are saved.
size
Infinint containing the original size of the file.
offset
Infinint containing the offset to the file data. This field is not present when the file data is not saved, which occurs when the catalog is isolated.
storage_size
Infinint containing the new size of the file saved in the archive. This field is not present when the file data is not saved, which occurs when the catalog is isolated.
crc
16-bit word containing CRC sum of the original file data.
CLOSE EA data
A collection of Extended Attributes that are stored in the catalog. These are usually encapsulated attributes that need to be compared to the filesystem to determine if they have changed. For example, there is no ctime value that can hint that FAT and NTFS attributes have changed, so they have to be compared manually. The entries are stored in the same format as those stored in the data stream, i.e. an infinint indicating how many entries are present followed by that number of entries. These entries are always present when close Extended Attributes are saved.

Terminator

+-------------------+----------+----------+----------+
| infinint          | 0x00     | bitfield | 0xFF     |
| catalog position  | padding  | term cap | term cap |
+-------------------+----------+----------+----------+
  3.                  4.         2.         1.
 
(2)-------------------------->  <------------------(1)

The terminator stores the position of the beginning of the catalog. It is the last object written during archive creation. After processing the slice and archive headers the reader must seek to the end and read the terminator, then the catalog.

The terminator is read in two stages. First, the terminator cap is read starting from EOF to the bitfield. The cap is used to determine the size and calculate the offset of the infinint. The infinint stores the archive level offset of the catalog.

  1. Terminator cap stage

    1. N 0xFF bytes. One byte indicates 8 blocks added to the offset where the blocksize is 4 bytes.

    2. 1 bitfield indicating 0 to 7 additional blocks in excess of 8 block groups:

      +-------------------------------+
      | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
      +---+---+---+---+---+---+---+---+
      | 1 | - | - | - | - | - | - | - |  1 extra blocks
      | 1 | 1 | - | - | - | - | - | - |  2 extra blocks
      | 1 | 1 | 1 | - | - | - | - | - |  3 extra blocks ...
      |   |   |   |   |   |   |   |   |
      | 1 | 1 | 1 | 1 | 1 | 1 | 1 | - |  7 extra blocks
      +-------------------------------+
      

      From position 2. an additional seek equal to the determined size of the infinint will be performed. The infinint will then be read in the forward direction.

  2. Catalog position indicator stage

    1. Infinint containing the offset of the beginning of the catalog. Offset zero is the start of the archive level, the position after the end of the first slice header:

      +---------+-------------------------------+-----------+------+
      | archive |     data                      | catalogue | term |
      | header  |                               |           |      |
      +---------+-------------------------------+-----------+------+
                ^                               ^
                zero offset                     stored cat. offset
      
    2. N 0x00 bytes. Aligns end of infinint to 4 byte block boundary.

File System Specific Features

The dar format is aimed towards archiving files stored on filesystems which store POSIX compatible metadata. Metadata, Extended Attributes, and many Access Control List implementations are directly recorded in the dar archive catalog. Metadata from other file systems must be stored in the mechanism used to store POSIX Extended Attributes.

POSIX Metadata

The traditional POSIX metadata set is fully supported as defined in POSIX.1. Device specific metadata which is not applicable to archival is not stored in the archive. Metadata value storage is detailed below.

struct stat {
    dev_t         st_dev;      /* Not stored in archive. */
    ino_t         st_ino;      /* Not explicitly stored, but each entry
                                * except labels are individual inodes.
                                */
    mode_t        st_mode;     /* Lower 16 bits (permissions) stored in
                                * Inode header. File type bits stored
                                * implicitly as entry type.
                                */
    nlink_t       st_nlink;    /* Stored implicitly as number of label
                                * entries.
                                */
    uid_t         st_uid;      /* Stored in Inode header. */
    gid_t         st_gid;      /* Stored in Inode header. */
    dev_t         st_rdev;     /* Stored in Device header. */
    off_t         st_size;     /* Stored in File entry. */
    time_t        st_atime;    /* Stored in Inode header. */
    time_t        st_mtime;    /* Stored in Inode header. */
    time_t        st_ctime;    /* Stored in Inode header only for entries with
                                * Extended Attribute information.
                                */
    blksize_t     st_blksize;  /* Not stored in archive. */
    blkcnt_t      st_blocks;   /* Not stored in archive. */    
};

Dos/FAT Metadata

FAT metadata that cannot be directly translated to POSIX metadata can be stored as encapsulated values in the Extended Attributes mechanism. Other values are stored in the appropriate POSIX metadata fields. For reference, the basic FAT file metadata is show below, after which encapsulation notes are provided.

struct CFileStatus {
    CTime         m_ctime;
    CTime         m_mtime;
    CTime         m_atime;
    LONG          m_size;
    BYTE          m_attribute;
    TCHAR         m_szFullName[_MAX_PATH];
 };
 
 enum Attribute {
     normal     = 0x00;
     readOnly   = 0x01;
     hidden     = 0x02;
     system     = 0x04;
     volume     = 0x08;
     directory  = 0x10;
     archive    = 0x20;
 };

Dos metadata are translated and encapsulated in the following manner:

m_ctime Encapsulated in FAT.TIME:C. Stored as an infinint containing time_t value.
m_mtime Translated to mtime in Inode header. Stored as an infinint containing time_t value.
m_atime Translated to atime in Inode header. Stored as an infinint containing time_t value.
m_size Translated to size in File header. Stored as an infinint containing byte count.
m_attribute Encapsulated in FAT.ATTRIB:BYTE. Stored as a single byte field.

Win32/NTFS Metadata

Like FAT metadata, much of the NTFS metadata that cannot be translated directly to POSIX metadata can be stored as encapsulated values in the Extended Attributes mechanism. However, some metadata, like encrypted file information, must be stored by copying stream information from NTFS files. These pieces of data are stored in Stream entries. Other values are stored in the appropriate POSIX metadata fields. For reference, several classes of NTFS file metadata are shown below, after which encapsulation notes are provided.

As a special note, sparse files are not supported.

struct WIN32_FILE_ATTRIBUTE_DATA {
    DWORD dwFileAttributes;
    FILETIME ftCreationTime;
    FILETIME ftLastAccessTime;
    FILETIME ftLastWriteTime;
    DWORD nFileSizeHigh;
    DWORD nFileSizeLow;
}

enum dwFileAttributes {
    FILE_ATTRIBUTE_ARCHIVE;
    FILE_ATTRIBUTE_COMPRESSED;
    FILE_ATTRIBUTE_DIRECTORY;
    FILE_ATTRIBUTE_ENCRYPTED;
    FILE_ATTRIBUTE_HIDDEN;
    FILE_ATTRIBUTE_NORMAL;
    FILE_ATTRIBUTE_NOT_CONTENT_INDEXED;
    FILE_ATTRIBUTE_OFFLINE;
    FILE_ATTRIBUTE_READONLY;
    FILE_ATTRIBUTE_REPARSE_POINT;
    FILE_ATTRIBUTE_SPARSE_FILE;
    FILE_ATTRIBUTE_SYSTEM;
    FILE_ATTRIBUTE_TEMPORARY;
}

struct WIN32_STREAM_ID {
    DWORD dwStreamId;
    DWORD dwStreamAttributes;
    LARGE_INTEGER Size;
    DWORD dwStreamNameSize;
    WCHAR cStreamName[ANYSIZE_ARRAY];
}

enum dwStreamId {
    BACKUP_DATA;
    BACKUP_EA_DATA;
    BACKUP_SECURITY_DATA;
    BACKUP_ALTERNATE_DATA;
    BACKUP_LINK;
    BACKUP_PROPERTY_DATA;
    BACKUP_OBJECT_ID;
    BACKUP_REPARSE_DATA;
    BACKUP_SPARSE_BLOCK;
}

enum dwStreamAttributes {
    STREAM_MODIFIED_WHEN_READ;
    STREAM_CONTAINS_SECURITY;
}
FILE_ATTRIBUTE_ARCHIVE Encapsulated in FAT.ATTRIB:BYTE. Set in archive bit of FAT attributes.
FILE_ATTRIBUTE_COMPRESSED Encapsulated in NTFS.ATTRIB:COMPRESSED. Stored as a single byte set to 0 (false) or 1 (true). File will be recompressed when restored.
FILE_ATTRIBUTE_DIRECTORY Encapsulated in FAT.ATTRIB:BYTE. Set in directory bit of FAT attributes.
FILE_ATTRIBUTE_ENCRYPTED Encapsulated in NTFS.ATTRIB:ENCRYPTED. Stored as a single byte set to 0 (false) or 1 (true). Implies the $EFS stream will also be saved and restored.
FILE_ATTRIBUTE_HIDDEN Encapsulated in FAT.ATTRIB:BYTE. Set in hidden bit of FAT attributes.
FILE_ATTRIBUTE_NORMAL Encapsulated in FAT.ATTRIB:BYTE. Set in normal bit of FAT attributes.
FILE_ATTRIBUTE_NOT_CONTENT_INDEXED Encapsulated in NTFS.ATTRIB:NO_CONT_INDEX. Stored as a single byte set to 0 (false) or 1 (true).
FILE_ATTRIBUTE_OFFLINE Not stored.  
FILE_ATTRIBUTE_READONLY Encapsulated in FAT.ATTRIB:BYTE. Set in readOnly bit of FAT attributes.
FILE_ATTRIBUTE_REPARSE_POINT Encapsulated in NTFS.ATTRIB:REPARSE_POINT. Stored as a single byte set to 0 (false) or 1 (true). Implies the BACKUP_REPARSE_DATA stream will also be saved and restored.
FILE_ATTRIBUTE_SPARSE_FILE Encapsulated in NTFS.ATTRIB:SPARSE_FILE. Stored as a single byte set to 0 (false) or 1 (true). Implies the BACKUP_SPARSE_BLOCK stream will also be saved and restored.
FILE_ATTRIBUTE_SYSTEM Encapsulated in FAT.ATTRIB:BYTE. Set in system bit of FAT attributes.
FILE_ATTRIBUTE_TEMPORARY Not stored.  
ftCreationTime Encapsulated in NTFS.TIME:C. Stored as an infinint containing FILETIME value.
ftLastAccessTime Encapsulated in NTFS.TIME:A and translated to atime in Inode header. Stored as an infinint containing FILETIME value.
ftLastWriteTime Encapsulated in NTFS.TIME:M and translated to mtime in Inode header. Stored as an infinint containing FILETIME value.
nFileSizeHigh, nFileSizeLow Translated to size in File header. Stored as an infinint containing byte count.
BACKUP_DATA Saved in data area. Stream saved in principal entry.
BACKUP_EA_DATA Saved in data area. Stream saved in stream entry named BACKUP_EA_DATA.
BACKUP_SECURITY_DATA Saved in data area. Stream saved in stream entry named BACKUP_SECURITY_DATA.
BACKUP_ALTERNATE_DATA Saved in data area. Each alternate data stream saved in stream entry with the name set to the value of cStreamName prepended by BACKUP_ALTERNATE_DATA:.
BACKUP_LINK Saved in data area. Stream saved in stream entry named BACKUP_LINK.
BACKUP_PROPERTY_DATA Saved in data area. Stream saved in stream entry named BACKUP_PROPERTY_DATA.
BACKUP_OBJECT_ID Saved in data area. Stream saved in stream entry named BACKUP_OBJECT_ID.
BACKUP_REPARSE_DATA Saved in data area. Stream saved in stream entry named BACKUP_REPARSE_DATA.
BACKUP_SPARSE_BLOCK Saved in data area. Stream saved in stream entry named BACKUP_SPARSE_BLOCK.
$EFS Saved in data area. Stream saved in stream entry named $EFS.
WIN32_STREAM_ID.Size Translated to size in Stream header. Stored as an infinint containing byte count.
cStreamName Translated to name_string in Stream header. Stored as null terminated string prepended by stream type. For example, an alternate data stream named "example" would be stored as BACKUP_ALTERNATE_DATA:example.
STREAM_MODIFIED_WHEN_READ Encapsulated in NTFS.STREAM:MOD_READ. Stored as a single byte set to 0 (false) or 1 (true).
STREAM_CONTAINS_SECURITY Encapsulated in NTFS.STREAM:SECURITY. Stored as a single byte set to 0 (false) or 1 (true).

Extended Attributes

POSIX Extended Attributes are fully supported. POSIX Access Control Lists that are implemented using Extended Attributes are automatically saved when saving Extended Attributes.

Extended Attributes from the user and system classes are currently implemented in libdar. The trusted class, while supported, is not implemented.

In general, NTFS Extended Attributes are saved as an auxiliary stream typed BACKUP_EA_DATA. Some NTFS attributes can be translated to POSIX attributes, other attributes are encapsulated in the Extended Attributes mechanism.

For reference, some common NTFS Extended Attributes are Read, Write, eXecute, Append, ReadEa, WriteEa, ReadAttr, WriteAttr, Delete, ReadControl, WriteDac, takeOwnership, and Synchronize.

Access Control Lists

All POSIX Access Control Lists are not necessarily supported. Only ACLs that are implemented using Extended Attributes may be saved.

On Linux, in some cases it is desirable to use the libacl library to save and restore Access Control Lists instead of directly copying the Extended Attributes. Implementations must decide what method to use. See acl/libacl.h for functions related to Access Control Lists on Linux.

NTFS Access Control Lists are saved as an auxiliary stream typed BACKUP_SECURITY_DATA.

Binary Format

When not otherwise specified, all binary information is stored as described below. Formats and encodings follow the appropriate standards with the exception of the variable length integer format infinint and the weak encryption algorithm scramble.

Endian Encoding

Multiple byte integer values are always written in network byte order (MSB). All values are unsigned unless otherwise indicated.

Time Format

All time values use the definition of Seconds Since the Epoch as per POSIX.1. Values are encoded in variable length infinint form. UTC should be used whenever possible but the value returned from the time system call should be deferred to. See man 2 time for more information.

Character Encoding

All characters are stored in UTF-8. Strings are null-terminated arrays.

Infinint Format

+....+....+....+----+....+....+....+....+
| 00 | 00 | 00 | BB | XX | XX | XX | XX |
+....+....+....+----+....+....+....+....+
  1.             2.   3.
(1)-----------------| (2)---------------|

Many of the numbers stored in dar archives have specific limits. For example, because on most POSIX systems UID and GID are 16-bit words those values are stored in that way in the archive. For other values, integers are stored in infinint form. The integer data is placed after a length cap.

  1. Length preamble

    1. N 0x00 bytes. One byte for each group of 8 blocks in payload where the blocksize is 4 bytes (one integer).

    2. 1 bitfield indicating the number of integers beyond the 8 integer group boundary. Indicates 0 to 7 additional integers. One bit must be set:

      +-------------------------------+
      | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
      +---+---+---+---+---+---+---+---+
      | 1 | - | - | - | - | - | - | - |  1 extra integers
      | - | 1 | - | - | - | - | - | - |  2 extra integers
      | - | - | 1 | - | - | - | - | - |  3 extra integers ...
      |   |   |   |   |   |   |   |   |
      | - | - | - | - | - | - | - | 1 |  0 extra integers
      +-------------------------------+
      
  2. Payload

    1. N bytes containing data. Stored in network byte order (MSB).

Scrambling Encryption

Scrambling is a weak block cipher algorithm that uses a passphrase to form a variable length key. The UTF-8 passphrase is translated directly into the symmetric key and reiteratively each byte of the key is matched with the bytes of the plain text. The cipher text is formed by adding the matched byte of the key to the plain text data (plain + pass % 256).

For example, take "example" as a passphrase and "source plain text data" as the plain text. A 56-bit symmetric key is formed directly from the passphrase and the plain text is divided into four blocks:

blocks: -- 1 --------------- | -- 2 --------------- | -- 3 --------------- | --
data:   73 6F 75 72 63 65 20   70 6C 61 69 6E 20 74   65 78 74 20 64 61 74   61
key:    65 78 61 6D 70 6C 65   65 78 61 6D 70 6C 65   65 78 61 6D 70 6C 65   65

result: D8 E7 D6 DF D3 D1 85   D5 E4 C2 D6 DE 8C D9   CA F0 D5 8D D4 CD D9   C6

Decryption is performed by subtracting the key in from the cipher text in the same manner.

This scramble algorithm was implemented before support for strong encryption was available. Scramble is not secure and its use is not recommended.

Standard Encryption Algorithms

The dar format includes definitions for blowfish strong encryption. libcrypto may be used to encrypt archive data.

While data is technically encrypted using CBC (Cipher Block Chaining) mode, the chain is not allowed to propagate beyond the first block. Instead, the chain is reinitialized with a new IV at each block. The IV is generated from the block number, which is zeroed at the beginning of the encryption data stream. The block number is generally calculated by the following equation:

(offset - header_length) / block_size

When using blowfish encryption, the IV is generally calculated by the following algorithm:

infinint upper = ref >> 32;
U_32 high = 0, low = 0;

high = upper % (U_32)(0xFFFF); // for bytes (high weight)
low = ref % (U_32)(0xFFFF); // for bytes (lowest weight)

ivec[0] = low % 8;
ivec[1] = (low >> 8) % 8;
ivec[2] = (low >> 16) % 8;
ivec[3] = (low >> 24) % 8;
ivec[4] = high % 8;
ivec[5] = (high >> 8) % 8;
ivec[6] = (high >> 16) % 8;
ivec[7] = (high >> 24) % 8;

Standard Compression Algorithms

The dar format includes definitions for gzip and bzip2 compression. zlib and libbzip2 may be used to compress archive data.