Microsoft KB Archive/124613

From BetaArchive Wiki

Article ID: 124613

Article Last Modified on 8/16/2005



APPLIES TO

  • Microsoft Office 4.2c
  • Microsoft Office 4.3c
  • Microsoft Office 95 Standard Edition
  • Microsoft Word 6.0c
  • Microsoft Word 95 Standard Edition
  • Microsoft Excel 5.0c
  • Microsoft Excel 95 Standard Edition



This article was previously published under Q124613

SUMMARY

The Setup program for the applications listed above uses the following technologies that previous versions of the Setup program do not use:

  • Diamond: A lossless data compression tool
  • Quantum: A new core compressor
  • DMF (Distribution Media Format): a new read-only format for 3.5-inch floppy disks.

The following information describes each of these technologies.

DIAMOND

Diamond is a lossless data compression tool that can be used for a wide variety of purposes. Although it was originally designed for use by Setup programs, it can also be used in almost any situation where lossless data compression is required and slow compression time (in exchange for better compression) is OK.

Diamond has three key features: (1) storing multiple files together in a single cabinet file, (2) compressing across file boundaries, and (3) permitting files to span across cabinets. Existing products such as PKZIP, LHARC, and ARJ support some of these features, but combining all of these features does not seem to be a common practice.

Depending on how many files are to be compressed, and what kind of access patterns are expected (sequential versus random access; most of the files will be read versus only a small number of files), you will make different choices about how you tell Diamond to build your cabinet files. One very key concept in Diamond is the folder. A folder is a collection of one or more files that are compressed together, as a single entity. The most important property of a folder is that to access a particular file in the folder, any preceding files in the folder must be read and decompressed. For example, if you have 100 files in a folder, and they compress down from 3M to 1M, and you want to extract the last file in the folder, you must read the entire folder in order to do so.

Diamond Concepts

The key feature of Diamond is that it takes a set of files and produces a disk layout while at the same time attempting to minimize the number of disks required. To understand how Diamond does this, you need to understand the following terms: cabinet, folder, and file. Essentially, Diamond takes all of your files, lays the bytes down as one continues byte stream, compresses the entire stream, chopping it up into folders as appropriate, and then filling up one or more cabinets with the folders.

Cabinet: A normal file that contains pieces of one or more files, usually compressed.

Folder: A decompression boundary. Large folders enable higher compression, because the compressor can refer back to more data in finding patterns. However, to retrieve a file at the end of a folder, the entire folder must be decompressed. So there is a tradeoff between achieved compression and the quickness of random access to individual files.

File: A file to be placed in the layout.

Diamond Application Disk Layout

The distribution disks for a typical application such as Microsoft Word for Windows produced by Diamond appear as follows:

   Disk1 -- WORD1.CAB
            SETUP.EXE
            WDREADME.HLP
            ...

   Disk2 -- WORD2.CAB

   Disk3 -- WORD3.CAB
                

QUANTUM

Quantum is a new compression technology that Microsoft obtained an unrestricted license to in early May, 1994. It achieves compressed file sizes 10-15% smaller than MSZIP, and Quantum will be the preferred compressor (possibly the only one) supported by Diamond. In order to achieve these impressive results, Quantum can require a fair amount of memory (up to 12 MB) at compress time, and even at decompress time (configurable from 1K to 2 MB), and Quantum gets its best results on large data streams. For this reason, cabinet files and Quantum are a great fit, because cabinet files with large folders ensure that Quantum is always compressing big blocks of data. The decompression memory requirements for Quantum is tunable in the Diamond directive file.

Distribution Media Format (DMF)

DMF is a special read-only format for 3.5-inch floppy disks that permits storing 1.7 MB of data (a 17.7% increase over the standard 1.44 MB format). This is achieved by reducing the inter-sector gap, and adding 3 sectors per track. This does not affect the ability of arbitrary floppy drives to read the disk, because we have not changed the magnetic recording density. With this reduced inter-sector gap, however, there is not enough room between sectors to allow a floppy drive to reliably write to a DMF disk. There are tools to create DMF disk images, and we have verified that the disk duplicating machines (Trace and Rimage) used by Microsoft and our key duplicators will correctly and efficiently duplicate these disks.

One limitation of the DMF format is that the root directory only holds 16 entries, and the cluster size is 2K. For this reason, using cabinet files on DMF is ideal, since the root directory size will not be exceeded, and with only one cabinet file per DMF disk, the 2K cluster allocation granularity does not cause any wasted space.

The combination of Diamond, Quantum, and DMF should yield a 20-30% reduction in the number of disks in a product, compared to previous Setup programs that used Microsoft Setup version 1.0, and MSZIP. The actual results may vary, but in measurements using Microsoft Office version 4.2 for Windows, the 25 3.5-inch disks required using Microsoft Setup version 1.0 and MSZIP were reduced to 18 3.5-inch disks using Diamond + Quantum + DMF; a 28% savings.

For additional information, please see the following article in the Microsoft Knowledge Base:

120006 XL5C: How to Copy Files from Cabinets on DMF-Formatted Disks



Additional query words: 1.10 4.20c 4.30c 5.00c 6.00c

Keywords: kbinfo kbsetup KB124613