THIS IS A DRAFT DOCUMENT

What's the difference between DAR, dar, and Dar?

While I make no warranty as to grammatical consistency: "DAR" indicates the DAR application suite. "dar" indicates the dar file format. "Dar" has no special meaning, but is most often the same as "dar" (at the beginning of a sentence, for example) but may also be a lazy representation of "DAR" if clearly indicated.

What is DAR?

DAR stands for Disk Archive and is the name of a command line application by Denis Corbin. DAR is a suite of applications used to back up directory trees and files. dar, the main archival tool, is supported by dar_xform, dar_slave, dar_manager, and dar_cp. The DAR Suite uses libdar, a C++ library, to read and write the dar file format. For more information about DAR, go to http://dar.linux.free.fr.

What is Libdar?

Libdar is the library behind the DAR Suite. Full API documentation can be found here and a tutorial here. The library is written in C++. It handles a fairly complete set of dar archive operations and is the standard implementation of the dar file format. Libdar can be compiled in several configurations, with or without many of the optional capabilities of the archive format.

Libdar has no required dependencies (!) but needs some or all of the following to implement optional features:

zlib for gzip compression
libbzip2 for bzip2 compression
libcrypto for blowfish encryption

The DAR suite also uses the following applications for other optional features:

upx for compressed executables
doxygen for documentation generation from source

The above two applications are not required for DAR and are not used for Libdar.

Libdar can be compiled to support Extended Attributes if they are available on the system.

What is dar?

The dar format is an archival format with internal support for compression, encryption, and disk spanning. Further, the format provides explicit support for differential backup and management. The archives support all Posix file metadata including file type, hard linking, standard permissions and file times, and extended attributes. However, at this time dar archives do not support DOS or NTFS file attributes.

Dar archives are created with a catalog at the end which can be extracted individually and used for reference when creating differential backups.

The archives are organized for non-linear access and can not be read in a purely consecutive manner.

While error detection is included, dar archives do not contain imbedded recovery information. Parchive must be used for this purpose.

The Libdar library provides many other features, such as flat restoration and nodump flag recognition, that are not implicit in the dar format.

For Unix systems, the dar file format can be used as a replacement and enhancement of formats such as tar and cpio. At this time other formats are more suitable for non-Posix operating systems, but cross-platform support is always being improved.

The dar format specification has changed over time. The specification is available for several versions.

The kaylix series is a fork format derived from the standard dar format specification. The changes in the kaylix series are aimed towards eventual inclusion into the standard dar format. The main benefits of the kaylix series currently consist of several cross-platform portability enhancements.

A detailed comparison of dar and several common file formats is provided below. Dar format version 4 is used as the comparison point.

At this time, this chart is a draft document. Please report any omissions or mistakes as a bug against the website.

dar

tar [1]

rar

zip

ace

arj

jar [2]

cpio

rpm

archive
limitations

file entries [3]

unlimited

unlimited [4]

unlimited ??

64 KiB

16 Ei

unlimited

64 Ki

unlimited [4]

unlimited

entry size

unlimited [5]

8 GiB (unlimited)

4 GiB

16 EiB

4 GiB

(4 GiB)

4 GiB

(4 GiB)

entry name length

unlimited

99 characters (unlimited)

64 KiB

16 Ei characters (32 EiB)

64 KiB

unlimited

(64 KiB)

64 KiB

(64 KiB)

compression

yes

no [6]

yes

formats

gzip, bzip2

?? LZ77 [7]

deflate, others [8]

LZMA, deflate, bzip2, others

LZ77

(LZ77)

gzip

selective compression

yes??

encryption

yes

yes [9]

yes

formats

weak [10], blowfish

AES-128

varies

AES-256

modes

ECB

CBC

authentication

yes

disk spanning

yes

yes [11]

yes

implicit archive modification [12]

add

yes

(yes)

differential
backup support [13]

added, changed

yes

deleted marker

yes

archive file
access

true consecutive [14]

yes

non-consecutive

yes

special files

hard-links

yes

extension [15]

yes

posix file types [16]

yes

extension

yes

metadata

time
attributes [17]

posix

posix [18]

archive [19], msdos

archive, msdos, posix extension

msdos [20]

archive

posix, msdos [21]

posix [18]

posix

posix user data [22]

yes

extension

yes

posix permissions [23]

yes

dos
attributes [24]

yes

extended attributes [25]

posix

os/2

ntfs, vms, os/2

ntfs

java

archive comments

yes

comment per entry

yes

multi-fork/stream data [26]

time storage
format [27]

unix [28]

unix [29]

ms-dos, win32

ms-dos, unix, win32

win32

ms-dos

string [30]

unix [29]

unix

character encoding

UTF-8

UTF-16 [31]

UTF-8

endian encoding

network

little

archive
corruption

detection

yes

recovery [32]

yes [33]

yes

catalog
isolation

yes [34]

yes [35]

catalog position

end

none

end

none

solid file
archives

n/a

yes

n/a

archive
"lock" [36]

n/a

yes

n/a

specification

link

text

link

[1] The tar format is worth special note in that it is a series of file header and file contents, making true consecutive access possible but random access impossible. Older tar formats have significant length restrictions. The newer GNU extensions remove many of these restrictions. Tar v7 values appear in plaintext, Posix and GNU extension limits in parentheses.

[2] Jar archives consist of zip archives with a special directory called META-INF containing extra metadata. Values in parentheses indicate archive format features and limitations inherited from the zip file format.

[3] Most archives achieve unlimited entry capacity by arranging entries as a list terminated by an end of catalog marker.

[4] Both tar and cpio do not have a centralized catalog. An end of file marker serves as a catalog end marker.

[5] To eliminate size limits, dar archives use a variable-sized integer format, the infinint, which contains size information in the binary format. infinint values are written in 32-bit blocks in network byte order. The size determiner is located at the beginning of the dump. For a 32- to 256-bit values overhead is one additional byte.

[6] Tar archives are traditionally compressed or encrypted by external means. Implementations using tar generally incorporate this feature.

[7] Rar uses a slight variant of the LZ77 algorithm.

[8] deflate is the standard compression format but several non-standard and semi-standard extensions exist.

[9]Encryption extensions are more non-standard across zip implementations. At present, most implementations are incompatible. There are known plaintext attacks on several algorithms.

[10]Dar implements a weak algorithm called scramble. This is a simple ECB algorithm that mods data against a passphrase used directly as a key.

[11]Each zip span must have the same filename, meaning multiple slices cannot be placed in the same directory.

[12] Here, "implicit archive modification" should be taken to mean the file format provides an inherently easy way to implement archive modifications. Archive formats are designed such that they can generally be modified in the indicated ways without having to rewrite the entire archive. This is, however, a subjective metric as archive modification is inevitably based on library implementation.

[13] Archives that "support differential archives" contain in-archive methods of tracking file changes.

[14] If the archive format is designed to be read without tracking backwards, it supports "true consecutive" file access. These archive formats can be read through a pipe without additional software (such as the dar_slave socket provider).

[15] The zip format stores dos-style data internally but stores unix metadata in a special extension. Hard links, posix file types, and posix time and user data are stored in this extension.

[16] Here, "posix file types" refers to the seven primary file types: file, directory, character special, block special, pipe (fifo), socket, and symlink.

[17] Time attributes are posix (atime, mtime, ctime) where as it is impractical ctime is not saved, msdos (time and date fields for modified, created, and accessed), string (usually similar to ISO 8601), or archive (meaning the time the file is added to the archive). For attribute categories, not all attributes are stored for all formats. Modification time is the most common format stored.

[18] For tar and cpio formats only mtime is stored.

[19] Differential backup capability, while not specifically provided for in the rar format, is implemented by comparing the archive addition time with the file modification time.

[20] Modified field is stored for all entries; created, modified, and accessed fields are stored for files.

[21] File creation, modification, and last access time are stored.

[22] User and group id in numeric format.

[23] Posix file "mode" stored in 16-bit word.

[24] Basic dos file attributes: Archive, hidden, compressed, read-only, system, encrypted (A H C R S E).

[25] Support for extended attributes can consist of storage of specific values or generic name value pairs. This is often used to implement access control lists. Some common ntfs values are Read, Write, eXecute, Append, ReadEa, WriteEa, ReadAttr, WriteAttr, Delete, ReadControl, WriteDac, takeOwnership, and Synchronize.

[26] Some file systems support multiple data "streams" or "forks". ntfs indicates support for ntfs streams, hfs indicates support for hfs forks (used heavily in older Macintosh platforms).

[27] Here, "time storage format" means the binary format used for encoding time values. Specific formats have specific limitations. 32-bit unix time stores seconds since the epoch (1970-01-01T00:00:00Z) and will overflow in 2038, but unix time can be represented in any integer format. Expansion to a 64-bit integer alleviates the overflow problem for the foreseeable future. 32-bit msdos time (16-bit date/16-bit time) stores time in bits 0-15 and date in bits 16-31 and overflowed in 2000. Because it stores date components separately, msdos time cannot be implicitly extended. 64-bit win32 file time format stores the number of 100 nanosecond intervals since 1601-01-01T00:00:00Z and will not expire for the foreseeable future.

[28] As dar stores unix time in infinint format, it will not overflow. The overhead will always be one additional byte.

[29] In the tar and cpio formats, unix time is stored as a 32-bit signed integer.

[30] String date format is yyyy-mm-dd HH:MM:SS.NNNNNNN.

[31] While using UTF-16 encoding would normally waste space the 7z format uses compressed headers, thus alleviating this potential problem.

[32] File parity can generally be accomplished by Parchive. Archive internal parity has not historically been very good.

[33] Rar parity is somewhat buggy.

[34] Dar catalogs can be extracted into a new archive containing only the catalog. These "catalog archives" can be used as a reference archive when performing a differential backup.

[35] Jar catalogs can be extracted into a text file. This is essentially a copy of the text file stored in the META-INF directory within the archive.

[36] Here, archive "lock" refers to an artificial modification lock that can be placed on an archive using one or more internal flags. This setting is often inapplicable if an archive is not designed to be modifiable.