Microsoft KB Archive/867626

= New error correcting code is included in Exchange Server 2003 SP1 =

Article ID: 867626

Article Last Modified on 10/30/2007

-

APPLIES TO


 * Microsoft Exchange Server 2003 Service Pack 1

-





SUMMARY
''Microsoft Exchange Server 2003 Service Pack 1 (SP1) introduces a new error-correcting code (ECC) algorithm to help resolve -1018 errors that may occur in your Exchange database.

-1018 errors are not caused by a problem in Exchange 2003. A -1018 error indicates that a problem has occurred in the computer's hard disk subsystem, and that this problem has affected an Exchange database file.

A typical cause of a -1018 error is a &quot;flipped&quot; single bit in a database page. In this scenario, a zero bit is changed to one, or a one bit is changed to zero. The ECC algorithm that is contained in Exchange 2003 SP1 is designed to help resolve this specific problem.

While this ECC algorithm helps automatically repair single-bit errors in your Exchange database, there are certain issues to consider when you back up or restore your Exchange 2003 SP1 database files:''


 * If you back up a file that contains a single-bit error, the error is automatically fixed on the backup media, but it remains on the hard disk.
 * You cannot restore an Exchange 2003 SP1 database file to a computer that is running the original release version of Exchange 2003.

''Single-bit errors are only repaired during a write operation to the database file. If a read operation is performed from a file that contains a single-bit error, the original file on the hard disk is not repaired. In Exchange 2003 SP1, two new events are logged to record the correction of single-bit errors.''



INTRODUCTION
This article discusses an error-correcting code (ECC) algorithm that is introduced in Exchange Server 2003 Service Pack 1 (SP1). The Extensible Storage Engine (ESE) in Exchange 2003 SP1 uses this algorithm to help resolve occurrences of error -1018 JET_errReadVerifyFailure. For additional information about -1018 errors, click the following article number to view the article in the Microsoft Knowledge Base:

314917 Understanding and analyzing -1018, -1019, and -1022 Exchange database errors



MORE INFORMATION
Error -1018 is generated if the built-in integrity verification component in Exchange determines that Exchange could not correctly store or could not correctly retrieve Exchange database file data from the hard disk. When this problem occurs, you must repair the Exchange database file or restore the database file from a recent backup.

Our research has concluded that up to 40 percent of -1018 errors occur because of database corruption that is caused by a single-bit error. A single-bit error is also known as a &quot;bit flip&quot; error. A single-bit or bit flip error is a hardware-level occurrence where a single bit of data is changed from a zero to a one or from a one to a zero. A parity bit can be added to computer data to detect when a bit flip problem occurs. However, parity systems can only detect this problem; they cannot repair it. ECC algorithms can automatically detect and repair a single-bit error. Exchange 2003 SP1 implements an ECC algorithm in its Extensible Storage Engine (ESE) database to detect and to automatically correct single-bit errors.

Exchange database files are divided into 4 kilobyte (KB) blocks (also known as pages). Each page has its own ECC data. Exchange 2003 SP1 can correct single-bit errors on each page. Therefore, if multiple pages in a database are corrupted by single-bit errors, Exchange 2003 SP1 can correct each page. However, if a single database page contains multiple errors, Exchange 2003 SP1 cannot correct it. In this scenario, you must repair the database file or restore the database file from a recent backup.

By automatically repairing single-bit errors, Exchange 2003 SP1 can recover from the most frequent type of database corruption. The typical -1018 error is now &quot;self-healing&quot; and no longer requires that you repair the database file or that you restore the database file from a recent backup.

Note Although Exchange 2003 SP1 automatically repairs typical single-bit errors, we recommend that you do not ignore the occurrence of -1018 errors. A -1018 error indicates that a hardware component is failing or is corrupted. The repair of a single-bit -1018 error does not resolve the hardware problem that caused the error. This hardware problem may affect other files on your computer in addition to the Exchange database files. Additionally, single-bit errors only account for approximately 40 percent of -1018 errors. Other -1018 errors that you may experience require that you repair or restore your Exchange database file.

Database upgrade issues
When you upgrade the original release version of Exchange 2003 to Exchange 2003 SP1, the database files are not immediately upgraded to the new ECC format. This means that if an existing database experiences a single-bit -1018 error, the error is not automatically repaired by Exchange 2003 SP1. Database pages are upgraded to the new ECC format only when the data in that page is modified. If a database page is only read from the database, and is not modified, that database page remains in the original database format. That page is not upgraded to the new ECC format.

Over a period of several weeks, most or all of the pages in the database are rewritten and automatically upgraded during typical Exchange operation. If you upgrade all the database pages at the same time, you may cause a significant and unexpected slowdown in service from your Exchange computer.

To upgrade all the database pages at the same time, install Exchange 2003 SP1, take the database offline, and then defragment the database file by running the following command:

eseutil /D

For additional information about how to defragment an Exchange database, click the following article number to view the article in the Microsoft Knowledge Base:

328804 How to defragment Exchange databases

Important If you defragment the Exchange database files, this affects your ability to play transaction log files forward. In this scenario, if you have a previous backup, you can only play log files forward up to the point where you defragmented the database. Therefore, if you must later restore your database file from a backup that was taken before you defragmented the database, you lose all the data that was added after you defragmented the database.

After you defragment your database, we recommend that you immediately back up your Exchange database files. We also recommend that you consider your earlier backups as unusable for rolling forward from transaction log files.

Database backup and database restore issues
If a -1018 error occurs in the database file in the original release version of Exchange 2003, you cannot back up that database by using an online backup operation. The online backup operation does not help prevent corruption in the database backup. Therefore, if an online database backup operation is completed successfully, no corrupted pages exist in the database backup. This means that you can restore that backup, roll the database forward by using transaction log files that were created after your database was backed up, and remove any -1018 errors that occurred in your database after the database backup was completed.

In Exchange 2003 SP1, if a single-bit error occurs in the database, the online backup operation reports this error, but the database backup still succeeds. In this scenario, the single-bit error is corrected in the backup set. However, the single-bit error is not corrected in the database that exists on the hard disk. The single-bit error in the database page that exists on the hard disk is not corrected until that page is re-written during typical database operations.

Note If a multiple-bit -1018 error occurs in the database page, the error is not correctible by Exchange 2003 SP1, and the backup is unsuccessful.

Backup set issues between Exchange 2003 and Exchange 2003 SP1
If you have to restore an Exchange 2003 database, consider the following factors:
 * You can restore a backup set from an original release version of Exchange 2003 to an Exchange 2003 SP1 computer.

Exchange 2003 SP1 correctly recognizes database backups that you created from a computer that is running the original release version of Exchange 2003.
 * You cannot restore a backup set from Exchange 2003 SP1 to a computer that is running the original release version of Exchange 2003.

The original release version of Exchange 2003 does not recognize the ECC data that is contained in the database page. Therefore, Exchange 2003 determines that the database page is corrupted.

For these reasons we recommend that you create a full backup of your Exchange 2003 database files immediately after you upgrade your Exchange computers to Exchange 2003 SP1.

ESE events
After you install Exchange 2003 SP1, the following two new application log event ID numbers may appear from the source ESE.

Note These two events do not appear in the original release version of Exchange 2003.  Event ID 398

this event typically occurs very rarely. This event only occurs if Exchange 2003 SP1 repairs a single-bit error, but the page where the error is fixed subsequently fails a test for logical validity. Because this event is so rare, if you experience this issue, we request that you report the problem to Microsoft Product Support Services (PSS), and that you preserve the database where this error occurred. For additional information about how to contact PSS, visit the following Microsoft Web site:

http://support.microsoft.com

 Event ID 399

This event indicates that a single-bit error has been detected, and that this error has been successfully corrected in memory. In this scenario, the page where this error occurred may or may not have been corrected on the physical hard disk. The single-bit error is not corrected on the physical hard disk unless the page has been written to. Therefore, if the database page is only read, the single-bit error is corrected in memory, but the single-bit error is not corrected on the physical hard disk.

Event ID: 399
Typically, event ID 399 appears similar to the following: Event type: Warning

Event source: ESE

Event category: Database Page Cache

Event ID: 399

Date:

Time:

User: N/A

Computer:

Description: Information Store (1532) Storage Group 1: The database page read from the file &quot;C:\Program Files\Exchsrvr\MDBDATA\Storage Group 1\MDB2.edb&quot; at offset 102400 (0x0000000000019000) for 4096 (0x00001000) bytes failed verification. Bit 128 was corrupted and has been corrected. This problem is likely due to faulty hardware and may continue. Transient failures such as these can be a precursor to a catastrophic failure in the storage subsystem containing this file. Please contact your hardware vendor for further assistance diagnosing the problem.

Event ID: 474
An unrecoverable (or multiple-bit) error is still reported as event 474 in Exchange 2003 SP1. Typically, event ID 474 appears similar to the following: Event type: Error

Event source: ESE

Event category: Logging/Recovery

Event ID: 474

Date:

Time:

User: N/A

Computer:

Description: Information Store (1532) Storage Group 1: The database page read from the file &quot;C:\Program Files\Exchsrvr\MDBDATA\Storage Group 1\MDB2.edb&quot; at offset 12611584 (0x0000000000c07000) for 4096 (0x00001000) bytes failed verification due to a page checksum mismatch. The expected checksum was 8700524288068713684 (0x78be78be1dfe7cd4) and the actual checksum was 564489450306895060 (0x07d5782a0cff7cd4). The read operation will fail with error -1018 (0xfffffc06). If this condition persists then please restore the database from a previous backup. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

In earlier versions of Exchange, event ID 475 is also used to report the occurrence of a -1018 error. Exchange 2003 SP1 does not use event ID 475. Exchange 2003 SP1 uses event ID 474 to report the occurrence of an unrecoverable -1018 error, and event ID 399 to report the occurrence of a recoverable -1018 error.

