Microsoft KB Archive/260823

= FIX: Some Character Encoders Ignore Null Bytes =

Article ID: 260823

Article Last Modified on 6/14/2006

-

APPLIES TO


 * Microsoft Java Virtual Machine, when used with:
 * Microsoft Windows 2000 Standard Edition
 * Microsoft Internet Explorer 3.0
 * Microsoft Internet Explorer 3.01
 * Microsoft Internet Explorer 3.02
 * Microsoft Internet Explorer 4.0 128-Bit Edition
 * Microsoft Internet Explorer 4.01 Service Pack 2
 * Microsoft Internet Explorer 4.01 Service Pack 1
 * Microsoft Internet Explorer 4.01 Service Pack 2
 * Microsoft Internet Explorer 5.0
 * Microsoft Internet Explorer 5.01
 * Microsoft Software Development Kit for Java 2.02
 * Microsoft Software Development Kit for Java 3.0
 * Microsoft Software Development Kit for Java 3.0
 * Microsoft Software Development Kit for Java 3.1
 * Microsoft Software Development Kit for Java 3.2
 * Microsoft Software Development Kit for Java 4.0
 * Microsoft Visual J++ 6.0 Standard Edition

-



This article was previously published under Q260823



SYMPTOMS
When you use certain character encoders that are included with the Microsoft virtual machine (Microsoft VM) for Java, null bytes in data streams are ignored.



CAUSE
This is a code defect in the respective byte-to-char converter classes.



STATUS
Microsoft has confirmed that this is a bug in the Microsoft products that are listed at the beginning of this article.

This bug has been verified to occur in the 2400, 3100, and 3200 series of the Microsoft VM.

This problem was corrected in Windows 2000 Service Pack 1.



MORE INFORMATION
In versions of the Microsoft VM series 3200 and earlier, three encoders mistakenly ignore null bytes (&quot;\0&quot;) in data streams. The encodings that ignore null bytes are:
 * Big5
 * GB2312
 * KSC5601

Steps to Reproduce Behavior

 * 1) Compile and run the sample code that is provided in this section.
 * 2) Notice the null byte on the end of the array that is used to construct the Java string object. If you are running on a 3200 series Microsoft VM or earlier, the output will indicate a string length of &quot;6&quot; for the three defective encodings and &quot;7&quot; for the other two.

import java.io.UnsupportedEncodingException; public class EncodingTest2 { public static void main (String[] args) throws Exception {   try {     String[] enc = {&quot;Big5&quot;, &quot;GB2312&quot;, &quot;KSC5601&quot;, &quot;ASCII&quot;, &quot;UTF8&quot;}; byte[] data = {(byte)'s',(byte)'t',(byte)'r',(byte)'i',(byte)'n',(byte)'g',(byte)'\0'}; for (int i = 0; i < enc.length; i++) {       String tmpStr = new String(data, enc[i]); System.out.println(&quot;Encoding = &quot;+enc[i]); System.out.println(&quot;\t\t\tString length = &quot;+tmpStr.length); System.out.println; }     System.in.read; }   catch (UnsupportedEncodingException ex) {     ex.printStackTrace; System.in.read; } } }

