用SAX和XNI检测XML文档的编码_电脑软硬件应用网_国内最受关注的计算机应用解决中心

首页·电脑学院·电脑故障·操作系统·硬件教程·局域网技术·QQ技巧·网络安全·办公软件· 数据库

导航·设计学院·图像处理·网页设计·软件教程·服务器技术·笔记本·专家装机·网络编程·在线问答

当前位置：电脑软硬件应用网 > 设计学院 > 网络编程 > XML与XHTML教程 > 正文

用SAX和XNI检测XML文档的编码

用SAX和XNI检测XML文档的编码

2009-6-21 11:17:01　　文/佚名　　出处:ibm 　　

　　该方法花费 90% 的时间，有可能会更多一点。但是，SAX 解析器不需要支持 Locator 接口，更不用说 Locator2 以及其他的接口。如果知道正在使用的是 Xerces，第二种方法是使用 XNI。

　　Xerces Native Interface

　　使用 XNI 的方法与 SAX 是非常相似的（实际上，在 Xerces 中，SAX 解析器是本机 XNI 解析器之上很薄的一层）。总之，这种方法更容易一些，因为编码作为参数直接传递给 startDocument()。您只需要读取它，如清单 2 所示。

　　清单 2. 使用 XNI 确定文档的编码

import java.io.IOException; import org.apache.xerces.parsers.*; import org.apache.xerces.xni.*; import org.apache.xerces.xni.parser.*; public class XNIEncodingDetector extends XMLDocumentParser { 　　　　public static void main(String[] args) throws XNIException, IOException { 　　　　XNIEncodingDetector parser = new XNIEncodingDetector(); 　　　　for (int i = 0; i < args.length; i++) { 　　　　　　try { 　　　　　　　　XMLInputSource document = new XMLInputSource("", args[i], ""); 　　　　　　　　parser.parse(document); 　　　　　　} 　　　　　　catch (XNIException ex) { 　　　　　　　　System.out.println(parser.encoding); 　　　　　　} 　　　　} 　　} 　　　　private String encoding = "unknown"; 　　@Override 　　public void startDocument(XMLLocator locator, String encoding, 　　　　NamespaceContext context, Augmentations augs) 　　　　　　　　throws XNIException { 　　　　this.encoding = encoding; 　　　　throw new XNIException("Early termination"); 　　} }

请注意，因为一些未知的原因，该技术只使用 org.apache.xerces 中实际的 Xerces 类，而不使用与 Sun 的 JDK 6 绑定的 com.sun.org.apache.xerces.internal 中重新打包的 Xerces 类。


　　XNI 提供了另外一个 SAX 不具有的功能。在少数情况下，在 XML 声明中声明的编码不是实际的编码。SAX 只报告实际编码，但是，XNI 也可以告诉您在 xmlDecl() 方法中声明的编码，如清单 3 所示。 
　　清单 3. 使用 XNI 确定文档的声明的编码和实际的编码
import java.io.IOException;
import org.apache.xerces.parsers.*;
import org.apache.xerces.xni.*;
import org.apache.xerces.xni.parser.*;
public class AdvancedXNIEncodingDetector extends XMLDocumentParser {
　　
　　public static void main(String[] args) throws XNIException, IOException {
　　　　AdvancedXNIEncodingDetector parser = new AdvancedXNIEncodingDetector();
　　　　for (int i = 0; i < args.length; i++) {
　　　　　　try {
　　　　　　　　XMLInputSource document = new XMLInputSource("", args[i], "");
　　　　　　　　parser.parse(document);
　　　　　　}
　　　　　　catch (XNIException ex) {
　　　　　　　　System.out.println("Actual: " + parser.actualEncoding);
　　　　　　　　System.out.println("Declared: " + parser.declaredEncoding);
　　　　　　}
　　　　}
　　}
　　
　　private String actualEncoding = "unknown";
　　private String declaredEncoding = "none";
　　@Override
　　public void startDocument(XMLLocator locator, String encoding,
　　　　NamespaceContext namespaceContext, Augmentations augs)
　　　　　　　　throws XNIException {
　　　　this.actualEncoding = encoding;
　　　　this.declaredEncoding = "none"; // reset
　　}
　　@Override
　　// this method is not called if there's no XML declaration
　　public void xmlDecl(String version, String encoding,
　　　String standalone, Augmentations augs) throws XNIException {
　　　　this.declaredEncoding = encoding;
　　}
　　@Override
　　public void startElement(QName element, XMLAttributes attributes,
　　　Augmentations augs) throws XNIException {
　　　　 throw new XNIException("Early termination");
　　}
　　
}

　　通常情况下，如果声明的编码和实际的编码不同，就表明服务器存在一个 bug。最常见的原因是由于 HTTP Content-type 报头指定的编码与在 XML 声明中声明的编码不同。在本例中，要严格遵守规范，要求优先考虑 HTTP 报头的值。但实际上，很可能 XML 声明中的值是正确的。
　　结束语
　　通常情况下，您不需要了解输入文档的编码。只需要用解析器处理输入文档，以 UTF-8 编码输出结果即可。但是，有些情况下需要知道输入编码，SAX 和 XNI 可以提供快速而有效的方法来解决这一问题。

上一页 [1] [2]

上一篇文章： XHTML入门学习教程:框架标签使用

下一篇文章： XML教程一：XML是什么?

最新热点		最新推荐		相关文章
				XML的四种解析器原理及性能比较使用jquery的ajax解析xml的例子详解相互转换JSON-lib包的相关介绍教你使用quickwap的xml查询农历信息 XML HTTP Request的属性和方法简介 ASP生成XML数据文档的方法示范如何把XML读取数据放到内存使用DOM的技巧和诀窍总结兼容firefox与ie操作XML节点处理方法示例：asp读取xml文件的方法