Dar Format Specification

Author:	Wesley Leggette
Revision:	$Revision: 1.1 $
Copyright:	GNU Free Documentation License

THIS IS A DRAFT DOCUMENT

This specification is a pre-release draft. Any corrections should be directed to <wleggette@kaylix.net>.

Dar Format Version 5

Dar was designed by Denis Corbin as part of the Dar Archive application (DAR). All materials related to DAR are copyright Denis Corbin and are available under the terms of the GNU General Public License.

Dar is implemented by the Dar Library (libdar), which is available only under the terms of the GNU General Public License.

Summary

The dar format is a POSIX oriented archival format meant as a full-featured replacement for tar, cpio, or dump when used to backup file systems. The format has explicit support for compression, encryption, disk spanning, and random file access. In addition, the dar format supports an extractable catalog that can be used to create and examine differential backups. Dar archives have full support for traditional POSIX metadata and POSIX Extended Attributes.

Dar archives contain cyclic redundancy check information but cannot recover corrupted files (non-corrupted files can often be restored from corrupted archives). Parity files must be created separately with a tool like Parchive. Also, dar archives cannot be implicitly modified once created. Further changes are generally recorded by creating new differential archives using an older archive or catalog as a reference.

Archive Structure

|-- first slice size ----------------------------|

+-------------+---------+------------------------+
| slice + ext | archive | file data + EA         |
| header      | header  |                        |
+-------------+---------+------------------------+
              ^                                  ^
              zero offset at archive level       offset (A)

|-- slice size ---------------------------------------|

+--------+--------------------------------------------+
| slice  |          file data + EA                    |
| header |                                            |
+--------+--------------------------------------------+
         ^
         offset (A)

+--------+--------------------------------------------+
| slice  |          file data + EA                    |
| header |                                            |
+--------+--------------------------------------------+


|-- final slice size -----------------------------|

+--------+---------------------+-----------+------+
| slice  | file data + EA      | catalogue | term |
| header |                     |           |      |
+--------+---------------------+-----------+------+
                               ^
                               catalog offset

Dar archives are written sequentially with one exception: During writing, the last flag byte is initially set to FLAG_TERMINAL and is reset to FLAG_NON_TERMINAL on transition to a new slice.

When reading, implementations should read the first slice header, the archive header, then the terminator on the last slice and finally the catalog. The implementation may then proceed with the desired operation. Besides the mandatory access of the first and last slices, other slices may be requested only as needed.

All offsets stored in the archive level are based from a zero offset at the end of the first slice header before the archive header.

The abstracted archive level offset is resolved using several values. FIRST_SLICE_SIZE is queried from the file descriptor of that slice, SLICE_SIZE is set to the value stored in the slice header's size extension (if present) or set equal to the FIRST_SLICE_SIZE. LAST_SLICE_SIZE and LAST_SLICE_NUMBER are queried from the file descriptor and the file name of the last slice file respectively. FIRST_SLICE_OFFSET is set to the current position after reading the first slice header. SLICE_OFFSET is set equal to SLICE_HEADER_SIZE (the size of a slice header without extensions, currently 16 bytes). Any archive level offset can be translated to real file offsets with these values.

Slice Level

+--------+-------------------------------------------+
| header |  data                                     |
|        |                                           |
+--------+-------------------------------------------+

Dar archives are placed in a sort of logical envelope called a slice, allowing the archive to be spanned across one or more files. Each slice consists of a slice header followed by data.

Slice Header

+-------+----------+------+-----------+................+
| magic | internal | last | extension | extension      |
| num.  | name     | flag | flag      | data           |
+-------+----------+------+-----------+................+

The slice header, located at the beginning of every slice file, identifies the file as a dar archive, matches slices together, and indicates other basic file information.

magic number: Fixed value indicating file type.
internal name: Unique identifier for a given archive. This machine generated name should be guaranteed to be unique. The name is used to verify that two slices belong to the same archive.
last flag: Indicates if the slice is the last of the archive.
extension flag: Indicates if an extension field is present. Currently, only one extension field can be present at any time.
(extension data): Extension data.

magic number	4 byte field MSB order	`SAUV_MAGIC_NUMBER = 123` `0x{ 00 00 00 7B }`
internal name	10 byte field	In libdar, internal name is generated from system time appended by process id.
last flag	1 byte field	`FLAG_NON_TERMINAL = 'N' 0x{ 4E }` `FLAG_TERMINAL = 'T' 0x{ 54 }`
extension flag	1 byte field	`EXTENSION_NO = 'N' 0x{ 4E }` `EXTENSION_SIZE = 'S' 0x{ 53 }`
extension data	Variable size field	Extension specified data.

Extensions

+-----------+----------------+
| extension | extension      |
| flag      | data           |
+-----------+----------------+

Currently the only valid slice level header extension is the EXTENSION_SIZE header, which indicates that the first slice is set to be different than the other slices. The first file size is determined from the file itself and the size of other slices is indicated in the extension data. Last slice size is also determined from that slice file itself.

EXTENSION_SIZE
{
   char flag = 'S';
   infinint size;   // size of following slices
}

Archive Level

+---------+-----------------------------------------+-----------+------+
| archive |     data                                | catalogue | term |
| header  |                                         |           |      |
+---------+-----------------------------------------+-----------+------+

An archive may be split into multiple slices. The slices are handled exclusively at the slice level. The archive level is accessed by logically concatenating the slice level payloads. The zero offset starts after the first slice header and continues, jumping over each slice header.

Archive Header

+---------+-------+---------------+------+
| format  | comp. | command line  | flag |
| version | algo  |               |      |
+---------+-------+---------------+------+

format version: Indicates the format version of this archive.
compress algorithm: Indicates the compression algorithm used.
command line: The command line string used when originally making the archive. Usage of this field is DEPRECATED.
flag: Indicates if root or user Extended Attributes are saved; indicates if the scramble weak encryption algorithm is used. If Extended Attributes are stored it is optional to restore these values to disk. This flag was added in format version 02.

format version	Variable length field null terminated string	`SUPPORTED_VERSION = "05"` `0x{ 30 35 00 }`
compress algorithm	1 byte field	`none = 'n' 0x{ 6E }` `zip = 'p' 0x{ 70 }` `gzip = 'z' 0x{ 7A }` `bzip2 = 'y' 0x{ 79 }`
command line	Variable length field null terminated string	`default = "N/A"` `0x{ 4E 2F 41 00 }`
flag	1 byte field	`SAVED_EA_ROOT = 0x80` `SAVED_EA_USER = 0x40` `SCRAMBLED = 0x20`

Stored Data

....--+---------------------+----+------------+-----------+----+---....
      | file data           | EA | file data  | file data | EA |
      | (may be compressed) |    | (no EA)    |           |    |
....--+---------------------+----+------------+-----------+----+---....

Roughly speaking, data is stored as a stream of file contents followed by Extended Attributes if present. Offsets to file data and Extended Attributes are stored in the catalog at the end of the archive. They are zeroed at the beginning of the archive header.

Encryption and compression are implemented as "layers" on top of the archive level. File data and Extended Attributes are written to the top most layer. Data is not stored in a solid format, meaning that the top most layer is flushed after each file and set of Extended Attributes are written.

Not all layers are required in a given archive. The possible layers, in order, are compression, encryption, and archive level (below which the slice layer is located). The following set of diagrams detail several possible configurations. Note that they are not draw to scale (e.g. raw data is often larger than the compressed output).

FIXME: Are elastic buffers added to the end of each file data and EA section by the encryption layer?

When encryption is used, "elastic buffers" are added to the encryption layer before and after the data stream. These are indicated by EE in the following diagrams.

Case 1: Encrypted, compressed archive

               +------+----+------+----+..............+-----------+
               | file | EA | file | EA |              | catalogue |
               | data |    | data |    |              |           |
               +------+----+------+----+..............+-----------+

          +----+--------------------------------------------------+------+----+
          | EE | Compressed data                                  | term | EE |
          |    |                                                  |      |    |
          +----+--------------------------------------------------+------+----+

+---------+-------------------------------------------------------------------+
| archive | Encrypted data                                                    |
| header  |                                                                   |
+---------+-------------------------------------------------------------------+

Case 2: Encrypted archive

          +----+------+----+------+----+..............+-----------+------+----+
          | EE | file | EA | file | EA |              | catalogue | term | EE |
          |    | data |    | data |    |              |           |      |    |
          +----+------+----+------+----+..............+-----------+------+----+

+---------+-------------------------------------------------------------------+
| archive | Encrypted data                                                    |
| header  |                                                                   |
+---------+-------------------------------------------------------------------+

Case 3: Compressed archive

          +------+----+------+----+........................+-----------+
          | file | EA | file | EA |                        | catalogue |
          | data |    | data |    |                        |           |
          +------+----+------+----+........................+-----------+

+---------+------------------------------------------------------------+------+
| archive | Compressed data                                            | term |
|         |                                                            |      |
+---------+------------------------------------------------------------+------+

Case 4: Plain archive

+---------+------+----+------+----+........................+-----------+------+
| archive | file | EA | file | EA |                        | catalogue | term |
|         |      |    |      |    |                        |           |      |
+---------+------+----+------+----+........................+-----------+------+

Compression Level

When compression is used in an archive, the algorithm is specified in the archive header. Compression may be selectively disabled per catalog file entry. Compression of EA entries stored in the data stream may not be disabled.

When data compression is disabled, storage_size must be set to zero. The original size must be stored as size.

Encryption Level

When weak encryption is used in an archive, the variable length key is matched starting at the first byte of the encryption layer. Decryption can be performed by generally applying the following equation at the encryption level:

archive_level[offset] - key[(offset - header_length) % key_length]

Elastic Buffer

To protect against plain text attacks when encryption is used, a special buffer of random data is added at the beginning and end of the archive. The size of the buffer is randomly chosen when the archive is created. The buffer contains self-describing size information.

FIXME: Is the following transparent offset information true?

The encryption layer transparently handles the elastic buffers. The space occupied by the elastic buffers are beyond the archive level offset count.

          +----+--------------------------------------------------+------+----+
          | EE | Compressed data                                  | term | EE |
          |    |                                                  |      |    |
          +----+--------------------------------------------------+------+----+

+---------+-------------------------------------------------------------------+
| archive | Encrypted data                                                    |
| header  |                                                                   |
+---------+-------------------------------------------------------------------+
^         ^    ^                                                         ^
zero      A    A + 1                                            EOF offset

Elastic buffers have a minimum size of 1 byte. There is no inherent maximum size, however the maximum size is defined to be 4 GiB. In libdar, buffer size is stored in a 32 bit integer.

The format of an elastic buffer can be demonstrated by four cases. In these examples the character '.' indicates a random byte and '0' indicates part of a variable length unsigned integer. This integer is stored in big endian form.

Case 1: 1 byte buffer

Size indicated by the special character 'X'

     X

Case 2: 2 byte buffer

Size indicated by special end characters '>' and '<'

     ><

Case 3: 3 byte buffer

Size indicated by 1 byte integer 0x3

     >0<

Case 4: 20 byte buffer

Size indicated by 1 byte integer 0x14

     .....>0<............

The size indicator structure is randomly placed in the buffer. Its position is chosen when the buffer is created. Each random byte is any value from 0x00 to 0xFF with the exception of the characters 'X', '>', and '<'.

When implementing an elastic buffer interpretor, care must be taken to allow for variable size integers. When writing a buffer, the integer should be and sometimes must be reduced to the smallest possible size.

EA Data Storage Format

Extended Attributes are stored as an array of entries. Each entry contains key and value strings.

The Extended Attribute key is copied directly from the filesystem, meaning the domain is included within the key string. For example, an attribute from the user domain would look like user.key.

+------------------+-------------+-----------------------+
|  key             | value_size  |  value                |
|                  |             |                       |
+------------------+-------------+-----------------------+

key: Null-terminated string containing Extended Attribute key minus class string.
value_size: Infinint indicating size of Extended Attribute value.
value: String containing Extended Attribute value. The string is not null-terminated.

The array of Extended Attribute entries is stored in the data section directly after the file data it concerns. The array is prepended by an infinint indicating the number of entries present.

+---------\....................................+
| size    /  EA entries                        |
|         \                                    |
+---------/....................................+

Catalogue

The catalog contains all inode, directory structure, and hard link information. The directory structure is stored in a simple way: The inode of a directory comes first, the inode of the files it contains, then a special end of directory entry.

Consider the following tree:

- toto
   | titi
   | tutu
   | tata
   |  | blup
   |  +---
   | boum
   | coucou
   +---

The following sequence would be generated for the catalog:

+-------+------+------+------+------+-----+------+--------+-----+
| toto  | titi | tutu | tata | blup | EOD | boum | coucou | EOD |
|       |      |      |      |      |     |      |        |     |
+-------+------+------+------+------+-----+------+--------+-----+

The first catalog entry in a dar archive is a special "root" directory entry. This entry serves as a marker; its presence must be verified and it must not be restored by implementations. The root directory entry's values are shown below. See the format information for directory entries for more information.

ROOT_DIRECTORY
{
   char      signature   = 'd';
   char*     name_string = "root";
   char      EA_flag     = ea_none; // 0x00
   uint16_t  UID         = 0;
   uint16_t  GID         = 0;
   uint16_t  permissions = 0;
   infinint  atime       = 0;
   infinint  mtime       = 0;
}

Entry Types

The first byte of catalog entries indicates the entry type and its status. An entry has a status of s_saved if full file data is saved in the archive. This is the normal situation. When an archive catalog is isolated from a given archive all entries in the extracted catalog do not have a saved status.

Saved status is indicated by unsetting the most significant bit of the entry type byte. An entry does not have a saved status when this bit is set:

+-------------------------------+
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---+---+---+---+---+---+---+---+
| 0 |  ASCII character          |  saved status
| 1 |  ASCII character          |  unsaved status
+---+---+---+---+---+---+---+---+

Entry formats can be arranged in a hierarchal format. Higher level entry types are contained as the header of deriving types. The highest level type is the entree, which contains the entry signature.

- entree
   | EOD
   | nomme
   |  | hard link label
   +- | deleted file marker
      | inode
      |  | symbolic link
      +- | device
         |  | character special device
         |  | block special device
         |  +---
         | directory
         | ignored directory
         | file
         |  | file label
         |  +---
         | socket
         | pipe
         +---

The entry signature, while part of the entree entry type, is defined by the entry that is stored in the catalog. There are 11 valid entry types. This includes the 7 standard POSIX file types and 4 catalog specific types.

Entry Type	Signature	Saved Status	Unsaved Status
regular file	`f`	`0x66`	`0xE6`
symbolic link	`l`	`0x6C`	`0xEC`
character device	`c`	`0x63`	`0xE3`
block device	`b`	`0x62`	`0xE2`
pipe	`p`	`0x70`	`0xF0`
socket	`s`	`0x73`	`0xF3`
directory	`d`	`0x64`	`0xE4`
end of directory	`z`	`0x7A`	`0xFA`
deleted file marker	`x`	`0x78`	`0xF8`
hard link label	`h`	`0x68`	`0xE8`
regular file label	`e`	`0x65`	`0xE5`

Entree Format

+-----+
| sig |
|     |
+-----+

signature: Signature character encoding entry type and status.

EOD Format

+-----+
| sig |
|     |
+-----+

signature: Signature is 'z'. Indicates end of directory entry.

Nomme Format

+-----+-------------+
| sig | name_string |
|     |             |
+-----+-------------+

signature: Signature is defined by the specific entry type.
name_string: Null terminated string containing the name of the object (path not included).

Hard Link Label Format

+--------------+-----------+
| NOMME        | etiquette |
| header dump  |           |
+--------------+-----------+

When more than one hard link for a given file is saved in an archive the first occurrence of the file will be saved as a special "label" entry matching the file type. For example, regular files with multiple hard links will first be saved as a file label entry. Later occurrences, regardless of the file type, will be saved as a hard link label entry with a matching label number.

Currently regular files are the sole file type for whom multiple hard links can be saved.

signature: Signature is 'h'.
etiquette: Infinint containing a number that is unique for each inode saved in the archive. This number is not the same as the filesystem inode number. Implementations must keep an inode table to match label numbers during save and restore operations.

Deleted File Marker Format

+--------------+------+
| NOMME        | orig |
| header dump  | sig  |
+--------------+------+

Files deleted since a given reference archive are tracked with deleted file marker entries. The original catalog entry is removed and replaced with this type of entry. The signature of the original entry is appended.

Deleted directories are stored in the same ways as deleted files, with a name header and original signature set to d. As such, previous contents of the deleted directory are not recorded.

signature: Signature is 'x'.
original_signature: The signature of the original file before it was deleted.

Inode Format

+--------------+------+-----+-----+------+-------+--------\
| NOMME        | EA   | UID | GID | perm | atime | mtime  /
| header dump  | flag |     |     |      |       |        \
+--------------+------+-----+-----+------+-------+--------/

                                        \............+.........+..........+
                                        / *ea_offset | *ea_crc | ea_ctime |
                                        \            |         |          |
                                        /............+.........+..........+

signature: Signature is defined by the specific inode entry type.
EA flag: 1 byte inode flag; values are ea_none = 0x00 when no Extended Attributes are stored, ea_partial = 0x01 when partial Extended Attribute information is stored, ea_full = 0x02 when Extended Attributes are fully stored. Extended Attributes are "partially" stored when the catalog is isolated. In such cases, only the inode change time is stored, not the attribute data.
UID: 16-bit word containing POSIX user identification number.
GID: 16-bit word containing POSIX group identification number.
permissions: 16-bit word containing POSIX mode and permissions. See man 2 chmod for more information.
atime: Infinint containing time of last access. See man 2 stat for more information.
mtime: Infinint containing time of last modification. See man 2 stat for more information.
ea_offset: Infinint containing the offset to Extended Attribute data. This field is present only when full Extended Attribute information is stored.
ea_crc: 16-bit word containing CRC sum of Extended Attribute data. This field is present only when full Extended Attribute information is stored.
ea_ctime: Infinint containing the time of last Extended Attribute modification. On Linux, this is the same as the inode's ctime. This field is always present when Extended Attributes are saved.

Symbolic Link Format

+---------------------+----------------------------------------+
| INODE               | target_string                          |
| header dump         |                                        |
+---------------------+----------------------------------------+

signature: Signature is 'l'.
target_string: Null terminated string containing the symbolic link target as it is saved on the file system.

Device Format

+---------------------\.......+.......+
| INODE               / major | minor |
| header dump         \       |       |
+---------------------/.......+.......+

signature: Signature is 'c' when saving a character special device file, 'b' for block special device files.
major: 16-bit word containing major device number. This ensures compatibility when systems move to 16-bit major and minor device numbers. This field is not present when device data is not saved, which occurs when the catalog is isolated.
minor: 16-bit word containing minor device number. This ensures compatibility when systems move to 16-bit major and minor device numbers. This field is not present when device data is not saved, which occurs when the catalog is isolated.

Directory Format

+---------------------\...................\-----+
| INODE               /   catalog         / EOD |
| header dump         \   entries         \     |
+---------------------/.................../-----+

Directories are stored as inode entries with a signature 'd'. The name of the directory, without path information, is stored in the nomme header. The directory contents are dumped sequentially after the directory entry, after which an end of directory entry must be inserted. For an empty directory an end of directory entry is inserted directly after the directory entry.

signature: Signature is 'd'.

Ignored Directory Format

+---------------------+-----+
| INODE               | EOD |
| header dump         |     |
+---------------------+-----+

The inclusion of stub entries for pruned directories is not required for implementations and can be implemented as an optional parameter. If stubs are included, they are treated like empty directories.

signature: Signature is 'd'.

File Format

+---------------------+-------\.........+..............\------+
| INODE               | size  / offset  | storage_size /  crc |
| header dump         |       \         |              \      |
+---------------------+-------/.........+............../------+

signature: Signature is 'f'.
size: Infinint containing the original size of the file.
offset: Infinint containing the offset to the file data. This field is not present when the file data is not saved, which occurs when the catalog is isolated.
storage_size: Infinint containing the new size of the file saved in the archive. When no compression is used, this is set to zero, indicating that the storage size is equal to the original size. This field is not present when the file data is not saved, which occurs when the catalog is isolated.
crc: 16-bit word containing CRC sum of the original file data.

File Label Format

+--------------------------------------+-----------+
| FILE                                 | etiquette |
| header dump                          |           |
+--------------------------------------+-----------+

When more than one hard link for a given regular file is saved in an archive the first occurrence of the file will be appended with a label indicating an archive wide unique number. Later occurrences will be saved as a hard link label entry with the matching label number.

signature: Signature is 'e'.
etiquette: Infinint containing a number that is unique for each inode saved in the archive. This number is not the same as the filesystem inode number. Implementations must keep an inode table to match label numbers during save and restore operations.

Socket Format

+---------------------+
| INODE               |
| header dump         |
+---------------------+

signature: Signature is 's'.

Pipe (FIFO) Format

+---------------------+
| INODE               |
| header dump         |
+---------------------+

signature: Signature is 'p'.

Terminator

+-------------------+----------+----------+----------+
| infinint          | 0x00     | bitfield | 0xFF     |
| catalog position  | padding  | term cap | term cap |
+-------------------+----------+----------+----------+
  3.                  4.         2.         1.
 
(2)-------------------------->  <------------------(1)

The terminator stores the position of the beginning of the catalog. It is the last object written during archive creation. After processing the slice and archive headers the reader must seek to the end and read the terminator, then the catalog.

The terminator is read in two stages. First, the terminator cap is read starting from EOF to the bitfield. The cap is used to determine the size and calculate the offset of the infinint. The infinint stores the archive level offset of the catalog.

Terminator cap stage

N 0xFF bytes. One byte indicates 8 blocks added to the offset where the blocksize is 4 bytes.
1 bitfield indicating 0 to 7 additional blocks in excess of 8 block groups:
+-------------------------------+
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---+---+---+---+---+---+---+---+
| 1 | - | - | - | - | - | - | - |  1 extra blocks
| 1 | 1 | - | - | - | - | - | - |  2 extra blocks
| 1 | 1 | 1 | - | - | - | - | - |  3 extra blocks ...
|   |   |   |   |   |   |   |   |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | - |  7 extra blocks
+-------------------------------+
From position 2. an additional seek equal to the determined size of the infinint will be performed. The infinint will then be read in the forward direction.

Catalog position indicator stage

Infinint containing the offset of the beginning of the catalog. Offset zero is the start of the archive level, the position after the end of the first slice header:
+---------+-------------------------------+-----------+------+
| archive |     data                      | catalogue | term |
| header  |                               |           |      |
+---------+-------------------------------+-----------+------+
          ^                               ^
          zero offset                     stored cat. offset
N 0x00 bytes. Aligns end of infinint to 4 byte block boundary.

File System Specific Features

The dar format is aimed towards archiving files stored on filesystems which store POSIX compatible metadata. Metadata, Extended Attributes, and many Access Control List implementations are directly recorded in the dar archive catalog. Metadata from other file systems must be stored in the mechanism used to store POSIX Extended Attributes.

POSIX Metadata

The traditional POSIX metadata set is fully supported as defined in POSIX.1. Device specific metadata which is not applicable to archival is not stored in the archive. Metadata value storage is detailed below.

struct stat {
    dev_t         st_dev;      /* Not stored in archive. */
    ino_t         st_ino;      /* Not explicitly stored, but each entry
                                * except labels are individual inodes.
                                */
    mode_t        st_mode;     /* Lower 16 bits (permissions) stored in
                                * Inode header. File type bits stored
                                * implicitly as entry type.
                                */
    nlink_t       st_nlink;    /* Stored implicitly as number of label
                                * entries.
                                */
    uid_t         st_uid;      /* Stored in Inode header. */
    gid_t         st_gid;      /* Stored in Inode header. */
    dev_t         st_rdev;     /* Stored in Device header. */
    off_t         st_size;     /* Stored in File entry. */
    time_t        st_atime;    /* Stored in Inode header. */
    time_t        st_mtime;    /* Stored in Inode header. */
    time_t        st_ctime;    /* Stored in Inode header only for entries with
                                * Extended Attribute information.
                                */
    blksize_t     st_blksize;  /* Not stored in archive. */
    blkcnt_t      st_blocks;   /* Not stored in archive. */    
};

Dos Metadata

Dos metadata is not explicitly supported in the dar format. Dos attributes may be encapsulated as Extended Attributes. For reference, the basic Dos file metadata is shown below.

struct CFileStatus {
    CTime         m_ctime;
    CTime         m_mtime;
    CTime         m_atime;
    LONG          m_size;
    BYTE          m_attribute;
    TCHAR         m_szFullName[_MAX_PATH];
 };
 
 enum Attribute {
     normal     = 0x00;
     readOnly   = 0x01;
     hidden     = 0x02;
     system     = 0x04;
     volume     = 0x08;
     directory  = 0x10;
     archive    = 0x20;
 };

Extended Attributes

POSIX Extended Attributes are fully supported. POSIX Access Control Lists that are implemented using Extended Attributes are automatically saved when saving Extended Attributes.

Extended Attributes from the user and system classes are currently supported. The trusted class is not supported.

NTFS Extended Attributes are not explicitly supported but may be stored using the same mechanism used to store POSIX Extended Attributes. NTFS EAs have to be classified into the user or system class to be stored in the Extended Attribute storage mechanism.

For reference, some common NTFS Extended Attributes are Read, Write, eXecute, Append, ReadEa, WriteEa, ReadAttr, WriteAttr, Delete, ReadControl, WriteDac, takeOwnership, and Synchronize.

Access Control Lists

All POSIX Access Control Lists are not necessarily supported. Only ACLs that are implemented using Extended Attributes may be saved.

On Linux, in some cases it is desirable to use the libacl library to save and restore Access Control Lists instead of directly copying the Extended Attributes. Implementations must decide what method to use. See acl/libacl.h for functions related to Access Control Lists on Linux.

NTFS Access Control Lists are not explicitly supported, but may also be stored using the same mechanism used to store Extended Attributes.

Binary Format

When not otherwise specified, all binary information is stored as described below. Formats and encodings follow the appropriate standards with the exception of the variable length integer format infinint and the weak encryption algorithm scramble.

Endian Encoding

Multiple byte integer values are always written in network byte order (MSB). All values are unsigned unless otherwise indicated.

Time Format

All time values use the definition of Seconds Since the Epoch as per POSIX.1. Values are encoded in variable length infinint form. UTC should be used whenever possible but the value returned from the time system call should be deferred to. See man 2 time for more information.

Character Encoding

All characters are stored in UTF-8. Strings are null-terminated arrays.

Infinint Format

+....+....+....+----+....+....+....+....+
| 00 | 00 | 00 | BB | XX | XX | XX | XX |
+....+....+....+----+....+....+....+....+
  1.             2.   3.
(1)-----------------| (2)---------------|

Many of the numbers stored in dar archives have specific limits. For example, because on most POSIX systems UID and GID are 16-bit words those values are stored in that way in the archive. For other values, integers are stored in infinint form. The integer data is placed after a length cap.

Length preamble

N 0x00 bytes. One byte for each group of 8 blocks in payload where the blocksize is 4 bytes (one integer).
1 bitfield indicating the number of integers beyond the 8 integer group boundary. Indicates 0 to 7 additional integers. One bit must be set:
+-------------------------------+
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---+---+---+---+---+---+---+---+
| 1 | - | - | - | - | - | - | - |  1 extra integers
| - | 1 | - | - | - | - | - | - |  2 extra integers
| - | - | 1 | - | - | - | - | - |  3 extra integers ...
|   |   |   |   |   |   |   |   |
| - | - | - | - | - | - | - | 1 |  0 extra integers
+-------------------------------+

Payload
1. N bytes containing data. Stored in network byte order (MSB).

Scrambling Encryption

Scrambling is a weak block cipher algorithm that uses a passphrase to form a variable length key. The UTF-8 passphrase is translated directly into the symmetric key and reiteratively each byte of the key is matched with the bytes of the plain text. The cipher text is formed by adding the matched byte of the key to the plain text data (plain + pass % 256).

For example, take "example" as a passphrase and "source plain text data" as the plain text. A 56-bit symmetric key is formed directly from the passphrase and the plain text is divided into four blocks:

blocks: -- 1 --------------- | -- 2 --------------- | -- 3 --------------- | --
data:   73 6F 75 72 63 65 20   70 6C 61 69 6E 20 74   65 78 74 20 64 61 74   61
key:    65 78 61 6D 70 6C 65   65 78 61 6D 70 6C 65   65 78 61 6D 70 6C 65   65

result: D8 E7 D6 DF D3 D1 85   D5 E4 C2 D6 DE 8C D9   CA F0 D5 8D D4 CD D9   C6

Decryption is performed by subtracting the key in from the cipher text in the same manner.

This scramble algorithm was implemented before support for strong encryption was available. Scramble is not secure and its use is not recommended.

Standard Encryption Algorithms

The dar format includes definitions for blowfish strong encryption. libcrypto may be used to encrypt archive data.

While data is technically encrypted using CBC (Cipher Block Chaining) mode, the chain is not allowed to propagate beyond the first block. Instead, the chain is reinitialized with a new IV at each block. The IV is generated from the block number, which is zeroed at the beginning of the encryption data stream. The block number is generally calculated by the following equation:

(offset - header_length) / block_size

When using blowfish encryption, the IV is generally calculated by the following algorithm:

infinint upper = ref >> 32;
U_32 high = 0, low = 0;

high = upper % (U_32)(0xFFFF); // for bytes (high weight)
low = ref % (U_32)(0xFFFF); // for bytes (lowest weight)

ivec[0] = low % 8;
ivec[1] = (low >> 8) % 8;
ivec[2] = (low >> 16) % 8;
ivec[3] = (low >> 24) % 8;
ivec[4] = high % 8;
ivec[5] = (high >> 8) % 8;
ivec[6] = (high >> 16) % 8;
ivec[7] = (high >> 24) % 8;

Standard Compression Algorithms

The dar format includes definitions for gzip and bzip2 compression. zlib and libbzip2 may be used to compress archive data.