Microsoft KB Archive/314917

From BetaArchive Wiki

Article ID: 314917

Article Last Modified on 10/25/2007



APPLIES TO

  • Microsoft Exchange Server 2003 Standard Edition
  • Microsoft Exchange Server 2003 Enterprise Edition
  • Microsoft Exchange 2000 Server Standard Edition
  • Microsoft Exchange Server 5.5 Standard Edition
  • Microsoft Exchange Server 5.0 Standard Edition
  • Microsoft Exchange Server 4.0 Standard Edition



This article was previously published under Q314917

SUMMARY

This article provides information to help you understand and analyze -1018, -1019, and -1022 Exchange database errors. This article describes the differences among these three errors and the types of issues in the database that cause each of these three errors to be reported.

MORE INFORMATION

Exchange includes the functionality to detect file-level damage to pages in its databases. The three most common errors that are associated with file-level damage to an Exchange database are as follows:

  • -1018 JET_errReadVerifyFailure
  • -1019 JET_errPageNotInitialized
  • -1022 JET_errDiskIO

The following three levels of damage can occur in an Exchange database:

  • Page (file system) level
  • Database (JET database engine) level
  • Application (Exchange information store) level

The Esefile.exe utility can detect errors in databases at the page level. The Eseutil.exe utility can detect and repair problems at both the page level and the database level. The Isinteg.exe utility detects and repairs problems at the application level.

Damage at a lower level (the page level) almost always results in problems at the higher levels (the database or application level). Therefore, after you repair a database with Eseutil, you almost always need to use Isinteg afterward.

Damage at the database and application level is related to issues in Exchange code or in third-party programs that integrate with Exchange. Damage at the page level is typically caused by driver, firmware, or hardware issues, although page-level damage might also be caused by problems in Exchange.

You almost always find the root cause for a -1018 error in one of the underlying systems that Exchange depends on, not in Exchange code itself. There are very few exceptions to this rule. The exceptions to date have been in regard to Exchange reporting a -1018 condition, not because Exchange itself causes a -1018 error. For more information, click the following article numbers to view the articles in the Microsoft Knowledge Base:

237953 Erroneous -1018 error returned during online backup


230215 Backup checksuming not performed on single processor computers


Although the majority of -1019 and -1022 errors are also caused by a fault in an underlying system, you cannot rule out the possibility that -1019 and -1022 errors might occur because of an error in Exchange code as quickly.

Error -1018 is the most commonly seen error, which indicates that an Exchange database has suffered damage at the file system level. Therefore, most of this article focuses on error -1018.

There are three fundamental ways that data on disk can become damaged:

  • The wrong data is written to the storage media.
  • Data is written to the wrong place on the storage media.
  • Data is damaged or changed after being stored.

Although it is very difficult to prevent or correct 100 percent of all damage, it is relatively easy to detect a problem that has occurred. Exchange detects both incorrect and misplaced data in its database files and reports a -1018 error or -1019 error. If a file is severely damaged and parts of it are missing entirely or are otherwise inaccessible when Exchange tries to read the file, a -1022 error is reported.

How Exchange Calculates Checksums and Numbers Database Pages

To understand how the mechanisms that trigger -1018 and -1019 errors work, you must understand how an Exchange database stores data pages.

At the lowest logical level, you can view an Exchange database file as a set of 4-kilobyte (KB) pages, numbered in sequential order. Data is read and written to an Exchange database one page at a time.

Each page that contains data stores its own page number, along with a checksum that is calculated from all of the data on the page. The checksum value itself is the only part of the page that is not included in this calculation.

Checksum algorithms, including the checksum algorithm that Exchange uses, are well understood and relatively simple. They are designed so that chances that the same checksum will result for any two different pages are low, even if the difference between the pages is only a single bit.

Although a checksum test is sufficient to determine whether or not the page has been altered since it was written, a checksum test is not enough to make sure that the page is in the right place. Because of this, Exchange stamps each page with its own page number as well as a checksum.

The first two 4-KB pages in the database are reserved for the database "header." When the database is stopped, you can use the Eseutil utility's /MH switch to view this header. The header contains identifying information about the database as a whole.

After these first two header pages, all of the other pages in the database are data. The data pages all share a common structure. Each page has its own page header, which contains identifying information about the particular page, followed by actual data.

Because the first data page in an Exchange database is located after the first two header pages, physical page 3 in the database is logical page 1. 2 is the logical page number of physical page 4, and so on.

Logical page numbers in the database map directly to physical page numbers by the following formula:

logical page number = physical page number - 2


Because the logical and physical page structures of the database file are so closely related, Exchange can easily determine whether each logical page is in the correct physical location in the file.

The only pages in the database for which a checksum is not calculated are "uninitialized pages." These are blocks of pages that are created when the database size is extended to make room for more data. An uninitialized page is one that has a zero checksum and a zero page number. Typically, every byte of an uninitialized page is filled with character 0x00, but this may not be true for databases that have been upgraded from Exchange Server 4.0 or Exchange Server 5.0.

After an uninitialized page is used for the first time, it does not return to the uninitialized state, even if it is emptied. Instead, a flag is set on the emptied page to mark it available for re-use. The page still carries a page number and checksum, even when it is empty.

Exchange Server 2003 Service Pack 1 (SP1) changed the checksum algorithm and the page format that are used. Also, Exchange Server 2003 SP1 introduced an error correcting code (ECC) algorithm to detect and to automatically correct single-bit errors. For more information about this new functionality, click the following article number to view the article in the Microsoft Knowledge Base:

867626 New error correcting code is included in Exchange Server 2003 Service Pack 1


What Causes a -1018 Error

Exchange reports a -1018 error when an initialized page in the database file is found with either of the following conditions:

  • The checksum that is stored on the page does not match the result of the checksum recalculation that is performed as the page is read.
  • The page number that is stored on the page does not match the page number that should be on the page, given the page's physical location in the database file.

Exchange might be responsible for self-generating a -1018 error if Exchange does one of the following:

  • Constructs a page that has the wrong checksum.
  • Constructs a page correctly, but tells the operating system to write the page in the wrong location.

After a system administrator encounters a -1018 error, if the administrator runs diagnostic hardware tests against the server and these tests report no issues, the administrator might conclude that Exchange must be responsible for the issue because the hardware passed the initial analysis.

However, in case after case, further investigation by Microsoft or hardware vendors uncovered subtle issues in hardware, firmware, or device drivers that are actually responsible for damaging the database file.

Ordinary diagnostic tests might not detect all of the transient faults for several reasons. Issues in firmware or driver software might fall outside the capabilities of diagnostic programs. Diagnostic tests might be unable to adequately simulate long run times or complex loads. Also, the addition of diagnostic monitoring or debug logging might change the system enough to prevent the issue from appearing again.

The simplicity and stability of the Exchange mechanisms that generate checksums and write pages to the database file is another reason that the probability that the root cause for a -1018 error is an Exchange issue is low. The checksum and incorrect page detection mechanisms are simple, reliable and have remained fundamentally the same since the first Exchange release, except for minor changes to adapt to database page format changes between database versions.

To explain further, a checksum is generated for a page that is about to be written to disk after all of the other data has been written to the page, including the page number itself. After Exchange adds the checksum to the page, Exchange instructs the Microsoft Windows operating system to write the page to disk by using standard, published Windows application programming interfaces (APIs).

The checksum might be generated correctly for a page, but then the page might be written to the wrong location on the hard disk. This may be caused by a transient memory error, such as a "bit flip." For example, suppose Exchange constructs a new version of page 70. The page itself does not experience an error, but the copy of the page number that is used by the disk controller or by the operating system is randomly altered. This problem may occur if 70 (binary 100110) has been changed to 6 (binary 000110) by an unstable memory cell. The page's checksum is still correct, but the location of the page in the database is now wrong. Exchange reports a -1018 error for the page when it detects that the logical page number does not match the physical location of the page. Another kind of page numbering error (caused by Exchange) might occur if Exchange writes the wrong page number on the page itself. But this causes other errors, not the -1018 error. If Exchange writes 71 on page 70, and then does the checksum on the page correctly, the page is written to location 71 and passes both the page number and checksum tests.

Frequently, a single -1018 error that is reported in an Exchange database does not cause the database to stop or result in a symptom other than the presence of the -1018 error itself. The page might be in a folder that is infrequently accessed (for example, the Sent or Deleted Items folders), or in an attachment that is seldom opened, or even empty.

Even though a single -1018 error is unlikely to cause extensive data loss, -1018 errors are still cause for concern because a -1018 error is proof that your storage system failed to reliably store or retrieve data at least once. Although the -1018 error might be a transient issue that will never occur again, it is more likely that this error is an early warning of an issue that will become progressively worse. Even if the first -1018 error is on an empty page in the database, you cannot know which page might be damaged next. If a critical global table is damaged, the database might become unstartable, and database repair might be unsuccessful or only partly successful.

After a -1018 error is logged, you must consider and plan for the possibility of imminent failure or further random damage to the database until you find and eliminate the root cause.

Recovering from -1018 Errors

Exchange treats a page that fails with a -1018 error as completely unreadable to prevent action on random data from causing further problems in the database.

A page that fails with a -1018 error cannot be repaired or salvaged. It must be expunged from the database. There are three methods that you can use to expunge the page from the database:

  • Restore the database from an online backup.
  • Use the Eseutil.exe /D switch to do an offline defragmentation of the database.
  • Use the Eseutil.exe /P switch to repair the database.

Restore the Database from an Online Backup

If a -1018 error is found during online backup, the backup stops. This ensures that the damaged page does not exist in the last successful backup. If circular logging is disabled, you can restore the most recent available full backup, and then roll the database forward from the succeeding transaction logs.

Use the Eseutil.exe "/D" Switch to Do an Offline Defragmentation of the Database

This method is effective if the -1018 error is reported on an empty page. If the -1018 error occurs only during online backup or nightly online maintenance, this indicates that the page is seldom accessed or that it may even be empty. Offline defragmentation discards all empty pages and secondary indexes in a database.

Use the Eseutil.exe "/P" Switch to Repair the Database

A bad page is not repaired if you use this method, but the bad page is discarded. If the page that is involved is a "leaf page," some data loss occurs. A leaf page in the database is a page that carries actual data. Interior pages carry only structural and logical information. In most cases, Eseutil can completely reconstruct a table if an interior page is lost. However, the majority of pages in a database are leaf pages.

Eseutil's repair functionality works well, and in most cases can restore a database to operation with minimal data loss. However, if many pages are damaged, or critical system tables are lost, the data loss may be catastrophic or the database may be unrepairable.

Repairing a database is usually an inferior strategy compared to restoring from a backup and rolling the database forward because repair usually takes longer than restoration and is riskier. Choose repair only if:

  • You do not have a backup.
  • You cannot roll forward completely from your backup.

Before you repair or restore a database, always make a backup copy of the current database files. If restoration does not work, you can repair the existing database. If repair does not work, but the previous copy of the database is still startable, you might be able to salvage data that would otherwise be lost.

Important After you repair a database, you must check the repair count in the database header. If the count is greater than zero, you must perform an off-line defragmentation by using Eseutil, and then you must repair the database at the information store level with the Isinteg utility. If you do not do so, users may encounter issues, such as an inability to open messages or attachments, or references in their mailboxes to items that no longer exist.

To check the repair count, examine the screen output that is generated when you run the following command:

ESEUTIL /MH [database_file_name]


To perform an offline defragmentation of the repaired database, run the following command:

ESEUTIL /D [database_file_name]


To do a comprehensive Isinteg fix after a repair in Exchange 2000, the Information Store service must be running, but the database that you want to repair must be dismounted. Run the following command for the database fix:

ISINTEG -S [server_name] -FIX -TEST ALLTESTS


To do a comprehensive Isinteg fix after a repair in Exchange Server 5.5, the Information Store service must be stopped. Run the following command, using the appropriate switch (-PRI or -PUB), depending on whether you are running repair against a private or public database:

ISINTEG -PRI|PUB -FIX -TEST ALLTESTS


Note You can run Eseutil and Esefile against raw database files regardless of their file system locations. The database files do not even need to be on an Exchange server. But you must run Isinteg while the database is in place on a fully configured Exchange server because Isinteg operates at the information store level and uses the Information Store service to access the database.

For more information, click the following article number to view the article in the Microsoft Knowledge Base:

244525 How to run Eseutil on a computer without Exchange Server


Recovering from a -1019 Error

A -1019 error (JET_errPageNotInitialized) is reported when a page that is expected to be in use is uninitialized or empty. If the page number field on a page in use is 0x00000000, a -1019 error is reported instead of a -1018 error, even though the page might also fail its checksum test.

The methods to correct a -1019 error are the same as those to correct a -1018 error. Note that a -1019 issue may go undetected longer than a -1018 issue because -1019 issues are not detected by online backup.

Although the root cause of a -1018 error is very likely to be outside of Exchange, a -1019 error might be caused by Exchange if logical pointers or links between pages are invalid.

However, it is more common that a -1019 error is caused because the file system was corrupted or mapped pages into the database file that do not belong in the file.

Recovering from a -1022 Error

If Exchange asks the operating system for a page in the database, and an error occurs instead of the page data being returned, a -1022 error (JET_errDiskIO) results. The -1022 error is a generic error that appears whenever a disk input/output (I/O) problem prevents Exchange from gaining access to a requested page in the database.

The most common reason for a -1022 error is a database file that was severely damaged or truncated. If this issue occurs, Exchange requests a page number that is larger than the number of pages in the database file, and a -1022 error results. This issue can occur because of issues in the file system or because of improper transaction log replay.

Exchange 2000 contains extensive safeguards to prevent transaction log replay that might harm the database, but in Exchange Server 5.5 it was possible to play an incomplete set of log files and damage the database. For example, this issue might occur if replay should start from log 9, but instead replay is forced to start from log 10. Replay might be forced if an administrator deletes the checkpoint file and log 9. If a transaction in log 9 extends the size of the database, but log 9 is not played into the database, a reference in log 10 to the new pages that are added to the databases causes a -1022 error. Sudden crashes, stops (hanging), and access violations are also common symptoms of replaying an incomplete transaction log set into a database.

Understanding and troubleshooting the root cause of a -1022 error is more complex than troubleshooting for a -1018 or -1019 error. If the error is caused by database damage in the file system, you need to verify or repair the file system, and then restore Exchange from a backup. Although repairing the database is still an option, repair is less likely to be successful than with the other errors because a -1022 error often signals extensive damage.

By far the most common reason for a -1022 error with an undamaged database another application holding files open and preventing the Information Store service from accessing them. In such cases, you might also see -1032 errors (JET_errFileAccessDenied). Restarting all of the Exchange services or restarting the server might remove the lock.

Third-party programs, such as virus scanners, might block Exchange access to Exchange data. Always configure file-level virus scanners to exclude Exchange data files from the file scanning operation. Several virus scanners are available that take advantage of the Exchange virus scanning application programming interface (API) to scan messages and attachments in the information store.

Analyzing -1018 and -1019 Errors

The information in this section is intended primarily for technical support and vendor personnel who are involved in root cause analysis.

After an administrator finds a -1018 or -1019 error, the administrator needs to know at least three things:

  • What was on the damaged page
  • What the likelihood of successful repair is
  • What caused the damage in the first place

-1018 and -1019 errors might occur at the command line when you start the service, in the Application event log, or in the output of Exchange utilities such as Eseutil. A -1018 error in the Application event log might not be reported when you run a database integrity check with the Eseutil /G command. In this situation, it is likely that the bad page is empty.

In most cases, the errors are reported in a form that allows you to identify the page that is reporting the problem. You can also scan the entire database with Esefile to identify bad pages. For more information about Esefile, click the following article number to view the article in the Microsoft Knowledge Base:

248406 Esefile support utility for Exchange Server 5.5 and Exchange 2000


The following examples are typical -1018 error descriptions from the Application event log for various versions of Exchange, along with analysis of the details in each error.

MSExchangeIS (248) Synchronous read page checksum error -1018
((1:3106 1:3106)(0-310013)(0-312215)) occurred.
Please restore the databases from a previous backup.
                    

In the preceding example, you can interpret the numbers in parentheses as follows:

  • (1:3106 1:3106) represents the page in the database that was requested (page 3106), and the page number that was actually found written on the page (page 3106). The 1: indicates that this is database 1, which is Priv.edb for Exchange Server 5.5. Database 2 is Pub.edb.
  • (0-310013) represents the dbtime value that is currently written on the page. The dbtime value is a 64-bit value that is written on each page that roughly correlates with how long it has been since the page was altered.
  • (0-312215) represents the current dbtime value for the database as a whole: likely the dbtime value that would be written on this page if the page was altered now. The dbtime value already on the page should always be smaller than the current dbtime value.

Given that the page number was read correctly from the page, and the dbtime values are reasonable (with the first dbtime value lower than the second), this page was not entirely replaced with a page from outside the database or a different page.

You can use Esefile to output the page itself with a command similar to the following:

Esefile /d database.edb 3106 > 3106.txt


Because this page appears to be substantially intact with regard to structure, you might also be able to use Eseutil to view more logical information about the page. You can use the Exchange 2000 version of Eseutil to view page structure information from both Exchange 2000 and Exchange Server 5.5 databases.

Warning Do not use the Exchange 2000 version of Eseutil against an Exchange Server 5.5 database in any mode that writes to the database. To be safe, use only the /M switches, and never use the /P, /G, or /R. Also, do not copy Exchange 2000 versions of Eseutil.exe and Ese.dll to an Exchange Server 5.5 computer. Instead, copy these files to a remote server and provide an explicit command line Universal Naming Convention (UNC) path to the database that you are examining.

A command similar to the following command outputs logical information for a page to a text file:

Eseutil /M \\exchange1\d$\exchsrvr\mdbdata\priv.edb /p3106 > 3106.txt

Initiating FILE DUMP mode...
      Database: priv.edb
          Page: 3106

                        pgnoThis <0x02360004,  4>:  3106 (0x00000c22)
                        objidFDP <0x02360018,  4>:  19 (0x00000013)
                ulChecksumParity <0x02360000,  4>:  4269350574 (0xfe791eae)
        ** computed checksum: 157180847 (0x095e63af)
                   dbtimeDirtied <0x02360008,  8>:  310013 (0x000000000004bafd)
                          cbFree <0x0236001c,  2>:  436 (0x01b4)
                       ibMicFree <0x02360020,  2>:  3608 (0x0e18)
                     itagMicFree <0x02360022,  2>:  3 (0x0003)
               cbUncommittedFree <0x0236001e,  2>:  0 (0x0000)
                        pgnoNext <0x02360014,  4>:  3108 (0x00000c24)
                        pgnoPrev <0x02360010,  4>:  3088 (0x00000c10)
                          fFlags <0x02360024,  4>:  2050 (0x00000802)
                Leaf page
                Primary page

From this output, you can see that the page is a leaf page, which means that it has actual data on it. If you repair this database, the repair will result in the loss of at least this data. For more information about how to find out which table or mailbox the page belongs to, click the following article number to view the article in the Microsoft Knowledge Base:

262196 How to determine which mailbox owns a particular page in a database


If the Eseutil output does not list Leaf page for the page, chances that repair will work completely are high. Most interior or structural pages can be completely reconstructed by the repair process.

The output might also show this as an "Empty Page." In this case, an offline defragmentation will discard the bad page from the database.

Remember that if a page has been completely replaced by a block of data that does not belong in the database file, the Eseutil output may be meaningless.

The following error is another example:

MSExchangeIS ((247) ) Synchronous read page checksum error -1018
((1:1057816 1:3688618971) (3688618971-3688618971) (0-16815256)) occurred.
Please restore the databases from a previous backup.

In this example, the page number that is actually read from the page (3688618971) does not match the page that was requested, which means that the page header area where the page number is stored is damaged. It is probable that the page number does not even exist in the database. To determine whether this is the case, multiply the page number by 4,096, and then compare that number to the byte size of the database file. In this case, the page number is unlikely to be one that Exchange originally wrote, unless the database is 15 terabytes in size (3,688,618,971 x 4,096 = 15,108,583,305,216).

Also notice that the first dbtime value repeats the page number pattern exactly. If you convert 3688618971 to hexadecimal (use Calc.exe in its Scientific mode), it becomes 0xDBDBDBDB. In Exchange 2000 and Exchange Server 5.5, the 8-byte dbtime value is stored immediately after the 4-byte page number value. Because of this, you know that at least twelve contiguous bytes for two different fields were overwritten with a specific pattern. If you use Esefile to look at this page directly, you will probably discover that the entire page was overwritten with the pattern 0xDB. Another frequently seen invalid byte pattern is 0xFF. If this was the case for the error above, the dbtime value would be 4294967295.

The following error provides page information as a byte offset into the file, not as a page number:

Information Store (2160) The database page read from the file
"d:\exchsrvr\MDBDATA\PRIV.EDB" at offset 897024 (0x00000000000db000)
for 4096 (0x00001000) bytes failed verification due to a page checksum
mismatch. The expected checksum was 2651583211 (0x9e0bf2eb) and the
actual checksum was 2651582996 (0x9e0bf214). The read operation will
fail with error -1018 (0xfffffc06). If this condition persists then
please restore the database from a previous backup.
                    

You can convert the first offset to the page number by removing the three trailing zeroes, subtracting 1 and converting the result to decimal. In this example, 0x00000000000db - 1 = 0xda = 218 decimal. You can use this decimal page number with Esefile or Eseutil.

Note You subtract only 1, instead of 2, to account for the two header pages in the database because offsets begin counting at 0x0 instead of 0x1. If you want to check the header pages with Esefile or Eseutil, reference page -1 and page 0.

An Exchange database header actually requires only a single page. The second page is a "shadow" copy of the header. The checksums that are reported when you use the Esefile /D page dump function should always be the same for pages -1 and 0 after the database is shut down cleanly. If the header is rewritten during a crash, Exchange uses the header copy with a clean checksum when Exchange restarts.

Continuing with the preceding example, the checksums are actually very close to each other, differing in only two characters. When checksums are close, this indicates that the changes on the page were minimal: perhaps only a single bit error. It is very likely that this page still contains enough of its logical structure to make it worth analyzing with Eseutil /M /P.

The expected checksum in the error message is the checksum that is actually read from the page as it exists now in the database. The actual checksum in the error message is the checksum that Exchange dynamically re-calculates as it reads the page.

If the actual checksum on a page is 0x89abcdef, the page contains all 0x00 characters. If the actual checksum is 0x76543210, the page contains all 0xFF characters.

The following example is a -1019 error:

Information Store (3928) The database page read from the file
"d:\exchsrvr\MDBDATA\PRIV.EDB" at offset 1675264 (0x0000000000199000)
for 4096 (0x00001000) bytes failed verification because it contains
no page data.  The read operation will fail with error -1019 (0xfffffc05).
If this condition persists then please restore the database from a
previous backup.
                    

During typical operation, if a page can report either a -1019 error or a -1018 error, the -1019 error takes precedence and is reported. Remember that a -1019 error occurs whenever the page number that is written on a page is 0x00000000, but Exchange expects the page to be in use. It can be difficult to prove whether a -1019 error is caused because the file system mapped a block of zeroes into the database file, or because Exchange made a mistake and referenced an unused page as "in use."

You cannot tell from the preceding error whether the page is uninitialized or in some other state. You must use Esefile and Eseutil to further examine the page. In this example, the page number is 408 decimal (derived from 0x199).

You can use Eseutil to further examine the page. The pgnoThis value should match the page number that is queried, and the ulChecksumParity value reports an additional ** computed checksum value if the checksum on the page is wrong. You can use the Esefile /D switch to look at the raw page to determine whether it is uninitialized (all 0x00 characters).

False -1018 Errors

A "false" -1018 error occurs when the page on disk is correct, but the I/O system retrieves the data incorrectly. Such errors are usually transient and difficult to isolate. But even a "false" -1018 error deserves serious attention. The reliability of the storage system is still compromised, and the system might be in danger of additional issues or failure.

If you suspect transient read errors in your system, use the Esefile /D switch or Eseutil /M /P to verify the individual pages that are involved. If you use either utility to scan the entire database, you put strain on the I/O system that might result in more false positives.

Exchange Server 5.5 Service Pack 2 (SP2) added functionality to help identify transient read errors. Exchange re-reads a page 16 times after a read verification failure. If the page read eventually succeeds after several tries, it indicates that there is a system issue in reading reliably from the disk. Even if all 16 reads fail, it does not conclusively prove that the page is bad. Perform a secondary test with Esefile or Eseutil.

Database Zeroing

Database zeroing is intended to obscure deleted information in an Exchange database so that it cannot be recovered or read by direct examination of the database file. For more information about database zeroing, click the following article number to view the article in the Microsoft Knowledge Base:

223161 Information on ESE zeroing


If database zeroing is enabled, sections of empty or partially empty pages might be overwritten with specific character patterns, but the page is still not returned to the uninitialized state.


Additional query words: VSAPI Virus Scanning API XADM

Keywords: kbinfo KB314917