Microsoft KB Archive/51170

From BetaArchive Wiki

ASCII 0 - 26 Must Be Preceded by Null in LINE SEQUENTIAL

PSS ID Number: Q51170 Article last modified on 04-23-1990

3.00 3.00a | 3.00 3.00a MS-DOS | OS/2

Summary: Microsoft COBOL Version 3.00 or 3.00a LINE SEQUENTIAL files should not contain lower order ASCII characters unless each one of these characters is preceded by a null byte (ASCII 0). Preceding a low order ASCII character with a NULL byte is called “escaping” the character. In the following text, “lower order ASCII” refers to the ASCII characters in the range of 0 hex to 1B hex (or 0 decimal through 26 decimal), which correspond to the control characters CTRL+A through CTRL+Z (or ^A through ^Z). Attempting to READ any file that has lower order ASCII characters that are not preceded by a null character and then REWRITING that record may corrupt the file. Any further attempt to READ such a file may cause an error. This is not a problem with the COBOL compiler but a side effect of using the REWRITE statement. The information below explains why this can occur and how you can work around it. This information applies to Microsoft COBOL 3.00 and 3.00a for MS-DOS and MS OS/2.

More Information: The lower order ASCII characters have special meaning and are not normally part of a strictly ASCII text file, or LINE SEQUENTIAL file. When a COBOL 3.00 or 3.00a program WRITEs a LINE SEQUENTIAL file to disk that contains lower order ASCII characters, it automatically precedes each one by a null character. In this way, when a LINE SEQUENTIAL file is READ back into a program and a null character is encountered, the COBOL program knows that the next character is not to be interpreted as an actual control character. Therefore, a COBOL program should not read a LINE SEQUENTIAL file that has lower order ASCII characters in it that are not preceded by a null character. This information is documented on Page 5-5 of the “Microsoft COBOL Compiler 3.0: Operating Guide.” [Note: Although the null switch parameter (N) allows the you to suppress writing out the ASCII null characters, it does not allow a file containing the lower order ASCII characters to be read in without leading null bytes.] Using the REWRITE statement on files containing lower order ASCII characters not preceded by null characters can cause problems, as explained below. The REWRITE verb can only WRITE a record exactly the same size as the record that was just READ. This is stated under the description for the REWRITE statement on Page 5-89 of the “Microsoft COBOL Compiler 3.0: Language Reference Manual.” General Rule 2 states that the record being rewritten must be exactly the same length as the record that was just read. To demonstrate this, consider the following LINE SEQUENTIAL file, which has only one record in it, which is five characters long, and that record is followed by a carriage return and a linefeed (as all LINE SEQUENTIAL records are). Graphically, it looks like the following: Character: A B ^E C D <carriage return> <linefeed> Hex code: 65 66 05 68 69 13 10 (Note: The carriage return and linefeed are not actually read in.) This would be the total contents of the file. When the COBOL program below READs this record, it correctly interprets the ^E and reads it in as a 1-byte value. However, when the program REWRITEs the record, it attempts to put a null character before the ^E. This is because a WRITE or REWRITE adds a null byte, ASCII 00 hex, onto any lower order ASCII characters. This means that the REWRITE statement actually attempts to write the following record: Character: A B <null char> ^E C D Hex code: 65 66 00 05 68 69 The original record was 5 bytes long (not counting the carriage return and linefeed), but this record is 6 bytes long. When the REWRITE is performed, the new 6-byte line is written out, causing the carriage return to be overwritten. After this, any attempt to READ the file will fail at that record. The READ error occurs because each record in a LINE SEQUENTIAL file must be terminated by both a carriage return and a linefeed. This is not a problem with the compiler, because it cannot correctly read a corrupted file. If a file with embedded lower order ASCII characters must be read, it should be converted to the proper format by writing a program that READs each record in the file and then WRITEs them to a completely new file. This will correctly WRITE each record with a null character added before each lower order ASCII character and both a carriage return and linefeed at the end of each record.

Code Example

The following code example can be used with the record described above to duplicate the problem: Compile line: COBOL rewr.cob; Link line: LINK rewr; $set MS(2) ERRQ IDENTIFICATION DIVISION. PROGRAM-ID. REWR. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT TESTFILE ASSIGN TO DISK ORGANIZATION LINE SEQUENTIAL ACCESS SEQUENTIAL FILE STATUS IS STATUSBYTES. DATA DIVISION. FILE SECTION. FD TESTFILE VALUE OF FILE-ID IS “TESTFILE”. 01 REC PIC X(10). WORKING-STORAGE SECTION. 01 STATUSBYTES PIC XX. PROCEDURE DIVISION. MAIN SECTION. BEGIN. DISPLAY (1, 1) ERASE. OPEN I-O TESTFILE. IF STATUSBYTES NOT = ZERO DISPLAY (1, 1) “STATUS”, STATUSBYTES STOP RUN. READ TESTFILE. IF STATUSBYTES NOT = ZERO DISPLAY (1, 1) “STATUS”, STATUSBYTES STOP RUN. REWRITE REC. IF STATUSBYTES NOT = ZERO DISPLAY (1, 1) “STATUS”, STATUSBYTES STOP RUN. CLOSE TESTFILE. STOP RUN.

Copyright Microsoft Corporation 1990.