Microsoft KB Archive/187919

= BIFF8 BOUNDSHEET record data for uncompressed Unicode in Excel =

Article ID: 187919

Article Last Modified on 5/25/2007

-

APPLIES TO


 * Microsoft Office Excel 2007
 * Microsoft Office Excel 2003
 * Microsoft Excel 2002 Standard Edition
 * MSPRESS Microsoft Excel 97 Developer's Kit ISBN 1-57231-498-2
 * Microsoft Visual C++ 5.0 Standard Edition

-



This article was previously published under Q187919



SUMMARY
Note The information contained in this article applies to a workbook in Microsoft Office Excel 2007 only if the workbook was saved in the "Excel 97-2003 Workbook (*.xls)" file format.

The Binary Interchange File Format version 8.0 (BIFF8) record data information in the Microsoft Developer Network (MSDN) and in the "Microsoft Excel 97 Developer's Kit" book does not mention a new flag that specifies whether the name of the worksheet is represented in uncompressed Unicode. Without this information, a developer might interpret the name field of the BOUNDSHEET record incorrectly if the name is stored in uncompressed Unicode.

The "BIFF8 Record Data" table at the top of page 291 of the "Microsoft Excel 97 Developer's Kit" book states that the cch (count of characters) field beginning at offset 10 is two bytes in size. This is incorrect and should state that the cch field is one byte, and that there is a one-byte flag field that reflects whether the name field is stored in compressed Unicode (one byte per character) or uncompressed Unicode (two bytes per character) at offset 11.

Note The BOUNDSHEET record is entitled "BUNDLESHEET" by the Microsoft Biffview utility program.



MORE INFORMATION
The default representation for sheet name is compressed Unicode. Compressed Unicode uses one byte to represent the two-byte Unicode value of a character. It correctly assumes the high-order byte is zero, and stores only the low-order code for the letter or number at that character location.

If the sheet name is truly double-byte code, it is stored as uncompressed Unicode. Each character requires two bytes. Consequently, the name requires more space than that required for compressed Unicode.

The BIFF8 record uses the single byte at offset 11 to hold a flag indicating uncompressed Unicode. If that flag is binary one, the cch value at offset 10 is the count of double-byte characters beginning at offset 12.

The BIFF8 Record Data table at the top of Page 291 should read as follows:

 OFFSET NAME      SIZE  CONTENTS --

4 lbPlyPos 4 Stream position of the start of the BOF record for the sheet. 8 grbit 2 Option flags. 10 cch 1 Length of sheet name in characters, not bytes. 11 grbitChr 1 Compressed/uncompressed Unicode. 12 rgch var Sheet name.

The following examples compare values with and without Unicode compression:

Uncompressed: Beginning at Offset 4 (16 bytes)
  20 0b 00 00   00 00   04   01   e5 5d  5c 4f  68 88  31 00

The BOF for this sheet starts at 00 00 0b 20. Note the byte-swapping that is explained on page 268 of the printed edition of the Excel SDK.

The option flags 00 00 tell you that this BOUNDSHEET record applies to a visible worksheet.

The cch value of 04 says the sheet name is 4 characters long.

The grbitChr value 01 means the sheet name is uncompressed Unicode, and each character is stored in 2 bytes - seen in the rgch field.

In the next 8 bytes the rgch field stores 5d-e5 4f-5c 88-68 00-31

Compressed: Beginning at Offset 4 (14 bytes)
  17 0d 00 00   00 00   06   00   53 68 65 65 74 32

The BOF for this sheet starts at 00 00 0d 17.

The option flags, 00 00, tell you that the BOUNDSHEET record applies to a worksheet that is visible.

The cch value of 06 says the sheet name is 6 characters long.

The grbitChr value of 00 means the sheet name is compressed Unicode, each character in the name is stored in one byte, with an assumed value of 00hex in the missing high-order byte of the character.

Hence, the sheet name is in 6 characters stored in 6 bytes as 53 68 65 65 74 32.

