Manifest File Formats

Binary Format

The binary format produced is compatible with Scooter Software’s Beyond Compare program. The format is from their website in the forums.

-------------------------------------------------------------------------------
Beyond Compare Snapshot Format                                      Version 1.1
                                                               January 25, 2011
Copyright (c) 2001-2017 Scooter Software, Inc.
This document may be redistributed.
For newer versions contact support@scootersoftware.com
-------------------------------------------------------------------------------

Beyond Compare snapshots (.bcss) are binary files containing the file metadata
(names, sizes, last modified times) of a directory structure without storing
any of the file content.  They are designed to be read sequentially.  File
record sizes are variable, so there's no way to seek to arbitrary records
without reading all of the records before it.


==========
Data Types
==========

All integer values are stored in little-endian format.
 
UByte:
    Unsigned 8-bit value

UInt16:
    Unsigned 16-bit value

Int32:
    Signed 32-bit value

UInt32:
    Unsigned 32-bit value

Int64:
    Signed 64-bit value

char[]:
    Variable length single-byte character array (ANSI or UTF-8).

FileTime:
    Windows FILETIME structure.  64-bit value representing the number of
100-nanosecond intervals since January 1, 1601 UTC.  Stored in local time.

ShortString:
    Variable length single-byte character string (ANSI or UTF-8).   Not null
terminated.

    Length   : UByte
    Data     : char[Length]

FileExString:
    Variable length single-byte character string (UTF-8).  See "File Extended
Header" section for details.


===========
File Header
===========

Snapshots start with a fixed size header that contains an ID value, version
information, a creation date, and various flags, optionally followed by the
source folder's path:

 - HEADER STRUCTURE -
    [0..3]   = 'BCSS'
    [4]      = Major version (UByte) 
    [5]      = Minor version (UByte)
    [6]	     = Minimum Supported Major Version (UByte)
    [7]	     = Minimum Supported Minor Version (UByte)
    [8..F]   = Creation Time (FileTime)
    [10..11] = Flags         (UWord)

            Bit : Meaning
              0 : Compressed
              1 : Source Path included
              2 : Reserved
              3 : UTF-8
           4-15 : Reserved

    [12..13] = Path Length (UWord)   | Optional
    [14..N]  = Path        (char[])  |

Version Information:
    The first two version bytes represent the actual major and minor versions
of the file, and reference a specific version of this specification.  The
second pair of version bytes represent the minimum snapshot version which must
be supported in order to read the snapshot file.  Version 1.1 can be read by
Version 1.0 applications, so currently Major/Minor should be set to 1.1 and
Minimum should be 1.0.

Flags:
    Compressed: If set everything following the header is compressed as a raw
deflate stream, as defined by RFC 1951.  It is the same compression used by
.zip and .gz archives.

    Source Path included: If set the original folder's path is included
immediately after the header.  This is only on part of the file besides the
fixed header that is not compressed.

    UTF-8: If set the snapshot was compressed on a system where the default
character encoding is UTF-8 (Linux, OS X).  Filenames, paths, and link targets
will all be stored as UTF-8.  If this isn't set the paths are stored using the
original OS's ANSI codepage (Windows).  In that case any paths may be stored a
second time as UTF-8 in extended headers.


===============
Content Records
===============

Immediately after the header the directory tree is stored as a series of
records.  Directories are stored recursively: each one starts with the
directory header, then any files and subdirectories (and their children), then
the directory end record.

The ID_DIRECTORY record for the outer most (source) folder is not stored, so
the content stream actually starts with the first child, and continues until
it finds an unmatched ID_END_REC record.  Anything following that is currently
ignored.

Each record starts with a single UByte ID value and then the data defined below.

ID_DIRECTORY (0x01)
    Represents a directory on the system, or an expanded archive file.

    Name           : ShortString
    Last Modified  : FileTime
    DOS Attributes : UInt32


ID_DIRECTORY_END (0xFF)
    Represents the end of a directory listing.  No data.


ID_FILE (0x02)
    Represents a file on the system.  

    Name           : ShortString
    Last Modified  : FileTime
    DOS Attributes : UInt32
    Size           : Int32[+Int64]
       If Size > 2GB, store as Int32(-1) followed by Int64
    CRC32          : UInt32
        

ID_FILE_EX (0x03)
    Represents a file on the system, with extended headers.
    
    Name..CRC32 is the same as ID_FILE
    ExtraLen       : UInt16
    ExtraData      : Byte[ExtraLen]


ID_EXTENDED (0x04)
    Extended headers

    SubType        : UByte
    Length         : UWord
    Data           : Byte[Length]


========================
Extended Header Subtypes
========================

Extended headers should be written in ascending numeric order.  Once BC sees
an extended subtype that it doesn't undertand it stops processing ID_EXTENDED
headers until it finds one of ID_DIRECTORY/ID_DIRECTORY_END/ID_FILE/ID_FILE_EX.


EX_UTF8 (0x01)
    UTF-8 encoded filename for the ID_DIRECTORY that immediately preceeded
this header.  The length is given in the ID_EXTENDED header and the data is a
char[].
    If the .bcss header flags indicate that the data is not UTF-8 and the
source path is included this can be included as the first record in the file
in order to give a UTF-8 version of the source path.


EX_DIRECTORY_EX (0x02)
    Extended directory header for the ID_DIRECTORY that immediately preceeded
this header.  Data is the record below, but Length may be larger to support
future expansion.

    Flags         : UByte
      Bit : Meaning
        0 : Error - Contents not available.  Flag as a load error in BC.
   

EX_RESYNC (0x03)
    Works around a bug in Beyond Compare's parser in versions prior to 3.2.2.
If an ID_DIRECTORY is followed by any ID_EXTENDED headers besides EX_UTF8 or
EX_DIRECTORY_EX include one copy of this header before them.

    Length : UWord   = 0x0001
    Data   : Byte[1] = 0


EX_LINK_PATH (0x04)
    UTF-8 encoded symbolic link path for the ID_DIRECTORY that immediately
preceeded this header.  The length is given in the ID_EXTENDED header and the
data is a char[].


=====================
File Extended Headers
=====================

Like extended headers, file extended headers should be written in ascending
numeric order.  Multiple headers may occur within a single ID_FILE_EX record,
and compliant parsers should break once they read a type they don't recognize.

FILE_EX_VERSION (0x01)
    String representation of an executable file's Major/Minor/Maint/Build
version (e.g., "2.11.28.3542").

    Length : UByte
    Data   : char[Length]


FILE_EX_UTF8 (0x02)
    UTF-8 encoded filename.  Stored as a FileExString.  Only used if the UTF-8
name doesn't match the ANSI encoded one or if the filename is longer than 255
characters.


FILE_EX_LINK_PATH (0x03)
    UTF-8 encoded symbolic link path.  Stored as a FileExString.


FileExString
------------
Beyond Compare v2.4.1 and earlier will produce incorrect results if it
encounters a raw 0x01 byte in a file extended header.  To prevent that most
strings in ID_FILE_EX extended headers are written like so:

    Length : UByte[+UByte]
    Data   : char[Length]

    If (Length <> 1) and (Length <= 127) then Length is 1 byte
    Otherwise the Length is written as
      Low  : UByte(Length) OR 0x80
      High : UByte(Length shr 7) OR 0x80

If an extended header must have a 0x01 in it (other than FILE_EX_VERSION),
increase the .bcss header's Minimum Supported Version to 1.1.

XML Format

The XML format is derviced from the binary format above so much of the naming matches. The XML file is encoded in the standard UTF-8. There is both a name and utf8 attribute so that the full filename is held correctly in utf8 but in case another encoding is desired (or created by Beyond Compare) it can be retained. All directories are stored as “DirExtended” while there is also an acceptable “Directory” tag but it is just a subset of the DirExtended. Directory links (symlinks or junctions) are stored as FileExtended tags as that is how Beyond Compare handles them.

The times stored in the “modified” attribute have an extra digit of precision compared to what would normally be produced as the FILETIME structure keeps full tens of nanoseconds resolution while Python datetimes only have hundreds.

Here is an example dataset (there is the matching binary dataset under the test directory of the Python3/HSTB/datatransfer code):

<BCSSHeader compressed="false" creation_time="2017-01-20 09:38:52.4080000" major="1" min_major="1" min_minor="0" minor="1" path="D:\BCSS Sample" path_included="true" reserved="false" reserved2="0" str_id="BCSS" utf8="false">
	<DirExtended dos_attr="16" flags="0" link="" modified="2017-01-20 09:33:01.3408341" name="Archive Contents" utf8="">
		<DirExtended dos_attr="32" flags="0" link="" modified="2011-02-22 13:16:15.0000000" name="Deflate.zip" utf8="">
			<File crc="1711308218" dos_attr="33" filesize="152089" modified="1996-09-26 16:51:00.0000000" name="alice29.txt" />
			<File crc="22960486" dos_attr="33" filesize="125179" modified="1996-09-26 14:33:00.0000000" name="asyoulik.txt" />
			<File crc="2833299507" dos_attr="32" filesize="24603" modified="1996-06-12 16:44:00.0000000" name="cp.htm" />
			<File crc="1331791460" dos_attr="33" filesize="11150" modified="1996-09-26 15:02:00.0000000" name="fields.c" />
			<File crc="3541276541" dos_attr="33" filesize="3721" modified="1996-09-26 17:16:00.0000000" name="grammar.lsp" />
			<File crc="1139203212" dos_attr="32" filesize="1029744" modified="1996-11-12 17:16:00.0000000" name="kennedy.xls" />
			<File crc="1295196079" dos_attr="33" filesize="426754" modified="1996-09-26 14:51:00.0000000" name="lcet10.txt" />
			<File crc="2737076971" dos_attr="33" filesize="481861" modified="1996-09-26 14:39:00.0000000" name="plrabn12.txt" />
			<File crc="1259857308" dos_attr="32" filesize="513216" modified="1996-11-06 14:13:00.0000000" name="ptt5" />
			<File crc="933891259" dos_attr="32" filesize="38240" modified="1996-11-12 17:12:00.0000000" name="sum" />
			<File crc="3737924087" dos_attr="32" filesize="4227" modified="1996-11-06 13:15:00.0000000" name="xargs.1" />
		</DirExtended>
		<DirExtended dos_attr="16" flags="0" link="" modified="2017-01-20 09:32:48.0990503" name="source" utf8="">
			<File crc="1711308218" dos_attr="32" filesize="152089" modified="2011-02-22 13:16:15.0000000" name="alice29.txt" />
			<File crc="22960486" dos_attr="32" filesize="125179" modified="1996-09-26 14:33:00.0000000" name="asyoulik.txt" />
			<File crc="2833299507" dos_attr="32" filesize="24603" modified="1996-06-12 16:44:00.0000000" name="cp.htm" />
			<File crc="1331791460" dos_attr="32" filesize="11150" modified="1996-09-26 15:02:00.0000000" name="fields.c" />
			<File crc="3541276541" dos_attr="32" filesize="3721" modified="1996-09-26 17:16:00.0000000" name="grammar.lsp" />
			<File crc="1139203212" dos_attr="32" filesize="1029744" modified="2011-02-22 13:16:15.0000000" name="kennedy.xls" />
			<File crc="1295196079" dos_attr="32" filesize="426754" modified="2011-02-22 13:16:15.0000000" name="lcet10.txt" />
			<File crc="2737076971" dos_attr="32" filesize="481861" modified="2011-02-22 13:16:15.0000000" name="plrabn12.txt" />
			<File crc="1259857308" dos_attr="32" filesize="513216" modified="2011-02-22 13:16:15.0000000" name="ptt5" />
			<File crc="933891259" dos_attr="32" filesize="38240" modified="2011-02-22 13:16:15.0000000" name="sum" />
			<File crc="3737924087" dos_attr="32" filesize="4227" modified="1996-11-06 13:15:00.0000000" name="xargs.1" />
		</DirExtended>
	</DirExtended>
	<DirExtended dos_attr="16" flags="0" link="" modified="2017-01-20 09:32:18.7192669" name="Empty Folder" utf8="" />
	<DirExtended dos_attr="16" flags="0" link="" modified="2017-01-20 09:37:56.5607341" name="Symlinks" utf8="">
		<DirExtended dos_attr="16" flags="0" link="" modified="2017-01-20 09:36:50.9173649" name="Dir" utf8="" />
		<FileExtended crc="0" dos_attr="1024" filesize="0" link="D:\BCSS Sample\Symlinks\Dir" modified="2017-01-20 09:37:56.5596676" name="Dir Junction" utf8="" version="" />
		<FileExtended crc="0" dos_attr="1024" filesize="0" link="Dir" modified="2017-01-20 09:37:42.4955032" name="Dir Symlink" utf8="" version="" />
		<FileExtended crc="0" dos_attr="1056" filesize="0" link="File.txt" modified="2017-01-20 09:37:30.1105412" name="File Symlink" utf8="" version="" />
		<File crc="2580470043" dos_attr="32" filesize="13" modified="2017-01-20 09:35:58.6446888" name="File.txt" />
	</DirExtended>
	<DirExtended dos_attr="16" flags="0" link="" modified="2017-01-20 09:32:12.5327676" name="Unicode Filenames" utf8="">
		<File crc="1925615395" dos_attr="32" filesize="10872" modified="2001-07-05 13:44:18.0000000" name="English - Universal Declaration of Human Rights.txt" />
		<FileExtended crc="1267793261" dos_attr="32" filesize="12239" link="" modified="2011-01-31 11:20:47.5797717" name="French - Déclaration universelle des droits de l'homme.txt" utf8="French - Déclaration universelle des droits de l'homme.txt" version="" />
		<FileExtended crc="844031706" dos_attr="32" filesize="13002" link="" modified="2001-07-05 13:25:20.0000000" name="Japanese - ??????.txt" utf8="Japanese - 世界人権宣言.txt" version="" />
		<FileExtended crc="3513325372" dos_attr="32" filesize="11721" link="" modified="2001-07-05 13:37:42.0000000" name="Korean - ? ? ? ? ? ?.txt" utf8="Korean - 세 계 인 권 선 언.txt" version="" />
		<FileExtended crc="2986264585" dos_attr="32" filesize="6" link="" modified="2001-07-05 14:34:00.0000000" name="????Piraten!!!!????.txt" utf8="☠☠☠☠Piraten!!!!☠☠☠☠.txt" version="" />
	</DirExtended>
	<DirExtended dos_attr="16" flags="0" link="" modified="2017-01-20 09:34:07.7326965" name="Version Info" utf8="">
		<FileExtended crc="1078314189" dos_attr="32" filesize="3043840" link="" modified="2003-05-22 19:28:10.9565000" name="BC2.exe" utf8="" version="2.1.0.200" />
		<FileExtended crc="123075898" dos_attr="32" filesize="9567240" link="" modified="2015-04-06 16:19:46.0000000" name="BCompare.exe" utf8="" version="3.3.14.20002" />
		<FileExtended crc="2749794895" dos_attr="32" filesize="138344" link="" modified="2014-03-10 16:17:28.0000000" name="BCShellEx.dll" utf8="" version="3.0.0.15" />
		<File crc="1564415502" dos_attr="32" filesize="733696" modified="2013-09-17 14:03:10.5198851" name="BEYOND16.EXE" />
	</DirExtended>
</BCSSHeader>

Resources

Online Resources:

Contact

If you find errors in the documentation or want to contribute, you are encouraged to email the following personnel emails @NOAA.GOV

  • barry.gallagher
  • jack.riley
  • chen.zhang
  • eric.younkin