RandomAccessFile的文件写入方式

import java.io.*;

public class RandomAFTest {

//按照指定的charset,将字符串转换为bytes,并打印出来
public static void printBytes(String str, String charsetName) {
try {
byte strBytes[] = str.getBytes(charsetName);
String strBytesContent = "";
for (int i = 0; i < strBytes.length; i++) {
strBytesContent = strBytesContent.concat(Integer.
toHexString(strBytes) + ",");
}
System.out.println("The Bytes of String " + str +
" within charset " + charsetName + " are: " +
strBytesContent);
} catch (UnsupportedEncodingException e) {
//Not handle;
}
}

//将字符串的chars打印出来
public static void printChars(String str) {
int strlen = str.length();
char strChars[] = new char[strlen];
str.getChars(0, strlen, strChars, 0);
String strCharsContent = "";
for (int i = 0; i < strlen; i++) {
strCharsContent = strCharsContent.concat(Integer.
toHexString(strChars) + ",");
}
System.out.println("The chars of String " + str + " are: " +
strCharsContent);
}

public static void main(String args[]) {
try {
RandomAccessFile rfWrite =
new RandomAccessFile("c:\\testWrite.dat", "rw");
RandomAccessFile rfWriteBytes =
new RandomAccessFile("c:\\testWriteBytes.dat", "rw");
RandomAccessFile rfWriteChars =
new RandomAccessFile("c:\\testWriteChars.dat", "rw");
RandomAccessFile rfWriteUTF =
new RandomAccessFile("c:\\testWriteUTF.dat", "rw");

String chStr = "中";
//打印字符串在GB2312下的bytes
printBytes(chStr, "GB2312");
//打印字符串在UTF-8下的bytes
printBytes(chStr, "UTF-8");
//打印字符串的UNICODE的chars
printChars(chStr);
try {
rfWrite.write(chStr.getBytes());
rfWrite.close();
System.out.println("Done write!");
rfWriteBytes.writeBytes(chStr);
rfWriteBytes.close();
System.out.println("Done writeBytes!");
rfWriteChars.writeChars(chStr);
rfWriteChars.close();
System.out.println("Done writeChars!");
rfWriteUTF.writeUTF(chStr);
rfWriteUTF.close();
System.out.println("Done writeUTF!");
} catch (IOException e) {
// Do not handle the IOException
}
} catch (FileNotFoundException e) {
//Do not handle
}

}
}
---------------------------------------------------------------------------
以下是该程序的部分运行结果:
The Bytes of String 中 within charset GB2312 are: ffffffd6,ffffffd0,
The Bytes of String 中 within charset UTF-8 are: ffffffe4,ffffffb8,ffffffad,
The chars of String 中 are: 4e2d,

我们可以看到"中"的
* GB2312编码为D6 D0
* UTF-8编码为 E4 B8 AD
* UNICODE编码为 4E 2D

那么实际写入的文件是什么样的呢,下面给出各个文件内容的16进制描述:
文件testWrite.dat:
D6 D0
文件testWriteBytes.dat:
2D
文件testWriteChars.dat:
4E 2D
文件testWriteUTF.dat:
00 03 E4 B8 AD

结合我们上述的1和2,我们不难看出:
1、String.getBytes()将会按照当前系统默认的encoding方式获得字符串的 Bytes,RandomAccessFile.write(byte[])将这个byte数组正确写入。由于写入的实际就是Windows平台的 nativecode编码,所以文件还能够被正确的阅读。
2、RandomAccessFile.writeBytes(String)将字符串的各个字符(当然是用unicode编码的)的高8位去掉,写入文件。
3、RandomAccessFile.writeChars(String)将字符串的各个字符按照unicode的编码,以Big-endian的方式写入文件。Windows平台上默认文件的编码方式为Little-endian,所以用写字板打开看到的是乱码,但是如果我们用浏览器打开这个文件(testWriteChars.dat)并指定编码方式为Unicode Big-endian,就能看到正常的“中”字了。
4、RandomAccessFile.writeUTF(String)首先写入00 03表示其后将写入3个实际的字节,然后写入“中”的UTF-8编码:E4 B8 AD

通过上面的分析,我建议如果使用RandomAccessFile来写入中文的话,最好用 RandomAccessFile.write(String.getBytes())的方式,如果为了保险起见,还可以进一步指定运行平台的默认 nativecode编码方式,例如使用:RandomAccessFile.write(String.getBytes("gb2312"))

在RandomAccessFile的Javadoc中,对于各种文件写入方式有不同的定义。

* public void write(byte[] b):Writes b.length bytes from the specified byte array to this file, starting at the current file pointer.

* public final void writeBytes(String s) throws IOException
Writes the string to the file as a sequence of bytes. Each character in the string is written out, in sequence, by discarding its high eight bits. The write starts at the current position of the file pointer.(请注意每个字符的高8位都会被抛弃掉。)

* public final void writeChar(int v) throws IOException
Writes a char to the file as a two-byte value, high byte first. The write starts at the current position of the file pointer.(采用的是Big-endian的存储方式,注意由于x86架构的限制,Windows默认采用Little-endian)

* public final void writeChars(String s) throws IOException
Writes a string to the file as a sequence of characters. Each character is written to the data output stream as if by the writeChar method. The write starts at the current position of the file pointer.(注意writeChars采用的是writeChar的写入方式。)

* public final void writeUTF(String str) throws IOException
Writes a string to the file using modified UTF-8 encoding in a machine-independent manner.
First, two bytes are written to the file, starting at the current file pointer, as if by the writeShort method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for each character. (注意writeUTF会首先写入两个字节,表示其后实际写入的字节数,然后才是对应字符串的UTF-8编码。)

  1. da shang
    donate-alipay
               donate-weixin weixinpay

发表评论↓↓