冰蓝科技

sales@e-iceblue.com

028-81705109

2790765778

微信扫一扫

论坛

Spire.Cloud 纯前端文档控件

Spire.Office for Java 10.10.0 已发布

2025-10-31

Spire.Office for Java 10.10.0 已正式发布。在该版本中，Spire.Doc for Java 支持获取样式更改修订；Spire.PDF for Java支持设置 PdfTable 的列宽；Spire.Presentation for Java 支持将 Markdown 转换为 PPTX 文件。除此之外，一些在转换和操作Word、Excel、PDF和PPT文档时出现的问题也已成功被修复。更多新功能及问题修复详情如下。

获取Spire.Office for Java 10.10.0，请点击：

https://www.e-iceblue.cn/Downloads/Spire-Office-JAVA.html

Spire.Doc for Java

新功能:

支持接受或拒绝部分修订。

Document document = new Document();
        document.loadFromFile(inputFile);
        RevisionInfoCollection revisionInfoCollection= document.getRevisionInfos();
        for (int i = 0; i < revisionInfoCollection.getCount() ; i++) {
            RevisionInfo revisionInfo=revisionInfoCollection.get(i);
            if(revisionInfo.getRevisionType()== RevisionType.Format_Change){
                revisionInfo.accept();
				//reject
				revisionInfo.reject();
                i--;
            }
        }
        document.saveToFile(outputFile, FileFormat.Docx);
        document.close();

支持获取样式更改修订。

Document document = ConvertUtil.GetNewEngineDocument();
document.loadFromFile(inputFile);
RevisionInfoCollection revisionInfoCollection= document.getRevisionInfos();
for (RevisionInfo revisionInfo : (java.lang.Iterable)revisionInfoCollection) {
    if(revisionInfo.getRevisionType()==RevisionType.Format_Change){
        if(revisionInfo.getOwnerObject() instanceof TextRange){
            TextRange range = (TextRange)revisionInfo.getOwnerObject();
            TestUtil.writeAllText(outputFile,"TextRange:"+range.getText()+"\r\n");
            document.setRevisionsView(RevisionsView.Original);
            TestUtil.writeAllText(outputFile,"Original bold："+range.getCharacterFormat().getBold()+"\r\n");
            document.setRevisionsView(RevisionsView.Final);
            TestUtil.writeAllText(outputFile,"Final bold："+range.getCharacterFormat().getBold()+"\r\n");
        }
    }
}
document.close();

支持针对样式进行修订跟踪记录。

Document document = new Document();
document.loadFromFile("http://cdn.e-iceblue.cn/test.docx");
document.startTrackRevisions("e-iceblue");
for (int i=0; i<document.getSections().get(0).getParagraphs().get(0).getChildObjects().getCount();i++)
{
    if (document.getSections().get(0).getParagraphs().get(0).getChildObjects().get(i).getDocumentObjectType()== DocumentObjectType.Text_Range)
    {
        TextRange tr = (TextRange) document.getSections().get(0).getParagraphs().get(0).getChildObjects().get(i);
        tr.getCharacterFormat().setTextColor(Color.RED);
        tr.getCharacterFormat().setFontSize(28);
        tr.getCharacterFormat().setBold(true);


    }
}
document.getSections().get(0).getParagraphs().get(1).appendText("aaa");
document.stopTrackRevisions();
document.saveToFile("test-out.docx");

支持设置文档网格每行字符数。

sec.getPageSetup().setGridType(GridPitchType.Chars_And_Line);
sec.getPageSetup().setCharactersPerLine(30);

问题修复:

修复了接受修订效果不正确的问题。
修复了 Word 转 PDF 效果不正确的问题。
修复了不调用 AcceptChanges()，获取不到高亮颜色的问题。
修复了设置表格样式报错的问题。
修复了更新目录不正确的问题。/li>
修复了 HTML 转 Word 效果不正确的问题。
修复了转换 MHT 文件到 docx 乱码的问题。
修复了加载文档抛异常“IllegalArgumentException”的问题。
修复了加载文档抛“NullPointerException”的问题。
修复了移除内容控件效果不正确的问题。
修复了获取书签为空的问题。
修复了选中的复选框选中失败的问题。
修复了接受修订后，可编辑区域不能编辑的问题。
修复了保存文档到 wps 格式，图片内容丢失的问题。
修复了使用 useHarfBuzzTextShaper(true) 时，Word 转 PDF 抛出异常的问题。
修复了使用 replaceBookmarkContent 抛异常 “NullPointerException”的问题。
修复了替换效果不正确的问题。
修复了 StructureDocumentTagCell.removeSelfOnly，程序抛异常 “Cannot remove because there is no parent.”的问题。

Spire.XLS for Java

优化:

优化了加载 Excel 文档时的内存消耗。

问题修复:

修复了获取复选框失败的问题。
修复了配置 JVM 最大内存时，加载 Excel 文档程序挂起的问题。
修复了转换 Excel 到 PDF 时，内存溢出的问题。
修复了拷贝工作表时，格式不一致的问题。
修复了保存 Excel 文档时，程序抛出 “error in set print area” 异常的问题。
修复了加载 Excel 文档时，程序抛出 “Input string was not in the correct format” 异常的问题。

Spire.PDF for Java

新功能:

支持设置 PdfTable 列宽。

// Create PDF document
PdfDocument doc = new PdfDocument();
// Set margins
PdfUnitConvertor unitCvtr = new PdfUnitConvertor();
PdfMargins margin = new PdfMargins();
margin.setTop(unitCvtr.convertUnits(2.54f, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point));
margin.setBottom(margin.getTop());
margin.setLeft(unitCvtr.convertUnits(3.17f, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point));
margin.setRight(margin.getLeft());

// Add a page
PdfPageBase page = doc.getPages().add(PdfPageSize.A4, margin);

// Add table
PdfTable table = new PdfTable();
PdfSolidBrush brush = new PdfSolidBrush(new PdfRGBColor(Color.black));
table.getStyle().setBorderPen(new PdfPen(brush, 0.5f));
table.getStyle().getHeaderStyle().setStringFormat(new PdfStringFormat(PdfTextAlignment.Center));
table.getStyle().setHeaderSource(PdfHeaderSource.Rows);
table.getStyle().setHeaderRowCount(1);
table.getStyle().setShowHeader(true);
table.getStyle().setCellPadding(2);
table.getStyle().setHeaderSource(PdfHeaderSource.Rows);
table.getStyle().setHeaderRowCount(1);
table.getStyle().setShowHeader(true);

// Set header font and style
PdfTrueTypeFont font = new PdfTrueTypeFont(new Font("SimSun", Font.PLAIN, 12));
table.getStyle().getHeaderStyle().setFont(font);
table.getStyle().getHeaderStyle().setBackgroundBrush(PdfBrushes.getCadetBlue());
PdfTrueTypeFont fontBody = new PdfTrueTypeFont(new Font("SimSun", Font.PLAIN, 10));
// Set even row font
table.getStyle().getDefaultStyle().setFont(fontBody);
// Set odd row font
table.getStyle().getAlternateStyle().setFont(fontBody);
// false: distribute by total width proportion, true: use set column width
table.getStyle().isFixWidth(true);

// Define data
String[] data = {"1;2;3;4;5",
        "A1;B1;1,391,190,000;18.2%; ",
        "A1;B1;126,490,000;1.66%; ",
        "A1;B1;65,648,054;0.86%; ",
        "A1;B1;82,665,600;1.08%; ",
        "A1;B1;37,119,000;0.49%; ",
        "A1;B1;327,216,000;4.29%; "
};
String[][] dataSource = new String[data.length][];
for (int i = 0; i < data.length; i++) {
    dataSource[i] = data[i].split("[;]", -1);
}

table.setDataSource(dataSource);
for(int i = 0; i < table.getColumns().getCount(); i++)
{
    PdfColumn column = table.getColumns().get(i);
    column.setWidth(50);
    column.setStringFormat(new PdfStringFormat(PdfTextAlignment.Center, PdfVerticalAlignment.Middle));
}

// Add table to page
table.draw(page, new Point2D.Float(0, 50));

// Save document
doc.saveToFile("addTable.pdf", FileFormat.PDF);

Spire.Presentation for Java

新功能:

支持转换 Markdown 为 PPTX 文件。

Presentation pt = new Presentation();
pt.loadFromFile("input.md", FileFormat.Markdown);
pt.saveToFile("output.pptx", FileFormat.PPTX_2013);
pt.dispose();

新增 AddFromSVGAsShape 方法，用于将 SVG 转换为 Shape。

Presentation ppt = new Presentation();
ppt.loadFromFile("input.pptx");
for (int i = 0; i < ppt.getSlides().getCount(); i++)
{
ppt.getSlides().get(i).getShapes().addFromSVGAsShapes("in.svg");
}
ppt.saveToFile("output.pptx", FileFormat.PPTX_2013);
ppt.dispose();

问题修复:

修复了扫描图片，程序抛 “java.lang.OutOfMemoryError”的问题。
修复了扫描图片，结果数据不正确的问题。

Spire.Barcode for Java

问题修复:

修复了扫描图片，程序抛 “java.lang.OutOfMemoryError”的问题。
修复了扫描图片，结果数据不正确的问题。

Spire.Presentation 10.10.7 优化了使用模板创建 PPT 的保存时间

2025-10-29

Spire.Presentation 10.10.7 现已正式发布。该版本优化了使用模板创建 PPT 的保存时间。同时，新增了设置表格透明度的功能，并调整了 AddDigitalSignature 方法的使用方式，还修复了若干在转换 PPT 到 PDF 时内容不正确的问题。更多详情如下：

调整:

调整 AddDigitalSignature 方法的使用。

Presentation ppt = new Presentation();
     ppt.LoadFromFile("in.pptx");
     //Add a digital signature,The parameters: string certificatePath, string certificatePassword, string comments, DateTime signTime
     ppt.AddDigitalSignature("test.pfx", "e-iceblue", "111", DateTime.Now);
     ppt.SaveToFile("result.pptx", Spire.Presentation.FileFormat.Pptx2016);
     ppt.Dispose();

新功能:

支持给表格设置透明度。

table.Fill.Transparency = 0.5f; // Value range is 1-0, table default color is black
// Need to set specific table color, set color code as follows:
table[0, 0].FillFormat.FillType = Spire.Presentation.Drawing.FillFormatType.Solid;
table[0, 0].FillFormat.SolidColor.Color = Color.Orange;

问题修复:

修复了 PPT 转 PDF 时，内容不正确的问题。
优化了使用模板创建 PPT 的保存时间。

获取 Spire.Presentation 10.10.7，请点击：

https://www.e-iceblue.cn/Downloads/Spire-Presentation-NET.html

Spire.PDF for Java 11.10.3 支持设置 PdfTable 列宽

2025-10-29

Spire.PDF for Java 11.10.3 现已正式发布。该版本新增支持设置 PdfTable 的列宽。详情请阅读以下内容。

新功能:

支持设置 PdfTable 列宽。

// Create PDF document
PdfDocument doc = new PdfDocument();
// Set margins
PdfUnitConvertor unitCvtr = new PdfUnitConvertor();
PdfMargins margin = new PdfMargins();
margin.setTop(unitCvtr.convertUnits(2.54f, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point));
margin.setBottom(margin.getTop());
margin.setLeft(unitCvtr.convertUnits(3.17f, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point));
margin.setRight(margin.getLeft());

// Add a page
PdfPageBase page = doc.getPages().add(PdfPageSize.A4, margin);

// Add table
PdfTable table = new PdfTable();
PdfSolidBrush brush = new PdfSolidBrush(new PdfRGBColor(Color.black));
table.getStyle().setBorderPen(new PdfPen(brush, 0.5f));
table.getStyle().getHeaderStyle().setStringFormat(new PdfStringFormat(PdfTextAlignment.Center));
table.getStyle().setHeaderSource(PdfHeaderSource.Rows);
table.getStyle().setHeaderRowCount(1);
table.getStyle().setShowHeader(true);
table.getStyle().setCellPadding(2);
table.getStyle().setHeaderSource(PdfHeaderSource.Rows);
table.getStyle().setHeaderRowCount(1);
table.getStyle().setShowHeader(true);

// Set header font and style
PdfTrueTypeFont font = new PdfTrueTypeFont(new Font("SimSun", Font.PLAIN, 12));
table.getStyle().getHeaderStyle().setFont(font);
table.getStyle().getHeaderStyle().setBackgroundBrush(PdfBrushes.getCadetBlue());
PdfTrueTypeFont fontBody = new PdfTrueTypeFont(new Font("SimSun", Font.PLAIN, 10));
// Set even row font
table.getStyle().getDefaultStyle().setFont(fontBody);
// Set odd row font
table.getStyle().getAlternateStyle().setFont(fontBody);
// false: distribute by total width proportion, true: use set column width
table.getStyle().isFixWidth(true);

// Define data
String[] data = {"1;2;3;4;5",
        "A1;B1;1,391,190,000;18.2%; ",
        "A1;B1;126,490,000;1.66%; ",
        "A1;B1;65,648,054;0.86%; ",
        "A1;B1;82,665,600;1.08%; ",
        "A1;B1;37,119,000;0.49%; ",
        "A1;B1;327,216,000;4.29%; "
};
String[][] dataSource = new String[data.length][];
for (int i = 0; i < data.length; i++) {
    dataSource[i] = data[i].split("[;]", -1);
}

table.setDataSource(dataSource);
for(int i = 0; i < table.getColumns().getCount(); i++)
{
    PdfColumn column = table.getColumns().get(i);
    column.setWidth(50);
    column.setStringFormat(new PdfStringFormat(PdfTextAlignment.Center, PdfVerticalAlignment.Middle));
}

// Add table to page
table.draw(page, new Point2D.Float(0, 50));

// Save document
doc.saveToFile("addTable.pdf", FileFormat.PDF);

获取 Spire.PDF for Java 11.10.3 请点击：

https://www.e-iceblue.cn/Downloads/Spire-PDF-JAVA.html

Spire.Doc for Java 13.10.6 支持获取样式更改修订

2025-10-28

Spire.Doc for Java 13.10.6 现已正式发布。该版本获取样式更改修订，接受或拒绝部分修订，以及针对样式进行修订跟踪记录。同时，一些在转换 Word 到 PDF，加载和保存 Word 文档时出现的问题也已成功被修复。更多详情如下。

新功能:

支持接受或拒绝部分修订。

Document document = new Document();
        document.loadFromFile(inputFile);
        RevisionInfoCollection revisionInfoCollection= document.getRevisionInfos();
        for (int i = 0; i



支持获取样式更改修订。


Document document = ConvertUtil.GetNewEngineDocument();
document.loadFromFile(inputFile);
RevisionInfoCollection revisionInfoCollection= document.getRevisionInfos();
for (RevisionInfo revisionInfo : (java.lang.Iterable)revisionInfoCollection) {
    if(revisionInfo.getRevisionType()==RevisionType.Format_Change){
        if(revisionInfo.getOwnerObject() instanceof TextRange){
            TextRange range = (TextRange)revisionInfo.getOwnerObject();
            TestUtil.writeAllText(outputFile,"TextRange:"+range.getText()+"\r\n");
            document.setRevisionsView(RevisionsView.Original);
            TestUtil.writeAllText(outputFile,"Original bold："+range.getCharacterFormat().getBold()+"\r\n");
            document.setRevisionsView(RevisionsView.Final);
            TestUtil.writeAllText(outputFile,"Final bold："+range.getCharacterFormat().getBold()+"\r\n");
        }
    }
}
document.close(); 




支持针对样式进行修订跟踪记录。


Document document = new Document();
document.loadFromFile("http://cdn.e-iceblue.cn/test.docx");
document.startTrackRevisions("e-iceblue");
for (int i=0; i





支持设置文档网格每行字符数。


sec.getPageSetup().setGridType(GridPitchType.Chars_And_Line);
sec.getPageSetup().setCharactersPerLine(30);  



问题修复:

修复了接受修订效果不正确的问题。
修复了 Word 转 PDF 效果不正确的问题。
修复了不调用 AcceptChanges()，获取不到高亮颜色的问题。
修复了设置表格样式报错的问题。
修复了更新目录不正确的问题。
修复了 HTML 转 Word 效果不正确的问题。
修复了转换 MHT 文件到 docx 乱码的问题。
修复了加载文档抛异常“IllegalArgumentException”的问题。
修复了加载文档抛“NullPointerException”的问题。
修复了移除内容控件效果不正确的问题。
修复了获取书签为空的问题。
修复了选中的复选框选中失败的问题。
修复了接受修订后，可编辑区域不能编辑的问题。
修复了保存文档到 wps 格式，图片内容丢失的问题。
修复了使用 useHarfBuzzTextShaper(true) 时，Word 转 PDF 抛出异常的问题。
修复了使用 replaceBookmarkContent 抛异常 “NullPointerException”的问题。
修复了 替换效果不正确的问题。
修复了 StructureDocumentTagCell.removeSelfOnly，程序抛异常 “Cannot remove because there is no parent.”的问题。


获取 Spire.Doc for Java 13.10.6 请点击：
https://www.e-iceblue.cn/Downloads/Spire-Doc-JAVA.html








Spire.XLS for Java 15.10.5 优化加载 Excel 文档时的内存消耗 



 

2025-10-27 









Spire.XLS for Java 15.10.5 现已正式发布。该版本优化了加载 Excel 文档时的内存消耗。同时，还修复了在获取复选框、加载大文件、复制工作表格式及保存文件等场景中出现的多个问题。更多详情如下：
优化:

优化了加载 Excel 文档时的内存消耗。

问题修复:

修复了获取复选框失败的问题。
修复了配置 JVM 最大内存时，加载 Excel 文档程序挂起的问题。
修复了转换 Excel 到 PDF 时，内存溢出的问题。
修复了拷贝工作表时，格式不一致的问题。
修复了保存 Excel 文档时，程序抛出 “error in set print area” 异常的问题。
修复了加载 Excel 文档时，程序抛出 “Input string was not in the correct format” 异常的问题。


获取 Spire.XLS for Java 15.10.5 请点击：
https://www.e-iceblue.cn/Downloads/Spire-XLS-JAVA.html
 















Spire.XLS 15.10.3 默认支持读取 Office 缓存的云字体 



 

2025-10-27 









Spire.XLS 15.10.3 现已发布。本次更新新增了默认支持读取 Office 缓存的云字体的功能，使其在支持系统字体目录和内存字体的基础上，能够自动识别和加载云字体。同时，本版本调整了 AddDigitalSignature() 和 IDigitalSignatures.Add() 方法的参数定义，并修复了多个已知问题，从而提升了整体的稳定性和兼容性。详细信息如下：
新功能:

新增默认支持读取 Office 缓存的云字体。

调整:

修改了 AddDigitalSignature() 和 IDigitalSignatures.Add() 方法的参数定义。


原方法：
AddDigitalSignature(X509Certificate2 certificate, string comments, DateTime signTime)
新方法：
AddDigitalSignature(string certificatePath, string certificatePassword, string comments, DateTime signTime)



问题修复:

修复了将 Excel 转换为 PDF 时生成多余空白页的问题。
修复了加载 XLSB 文件时程序抛出 “ArgumentOutOfRangeException” 异常的问题。
修复了添加 HTML 字符串时程序抛出 “FormatException” 异常的问题。
修复了包含 FILTER 公式的文件在 Microsoft Excel 中打开时报错的问题。


下载Spire.XLS 15.10.3，请点击：
https://www.e-iceblue.cn/Downloads/Spire-XLS-NET.html
 















Spire.Doc 13.10.3 支持从 Word 文件中提取指定范围页面到单独文件 



 

2025-10-24 









Spire.Doc 13.10.3 现已发布，该版本支持从 Word 文件中提取指定范围的页面并保存为另一个文档。同时，修复了一系列与 Word 到 PDF 转换相关的问题。更多详情如下。
新功能:

新增ExtractPages(int index,int count)方法支持从文档中提取指定范围的页面，注：按最终布局提取页面，内容与导出PDF的页面一致。


Document doc = new Document();
doc.LoadFromFile("http://cdn.e-iceblue.cn/sample.docx");
Document extractPage = doc.ExtractPages(0, 1);
extractPage.SaveToFile("result.docx");



问题修复:

修复了转换Word到PDF，内容排版不正确的问题。
修复了转换Word到PDF，表格样式不正确的问题。
修复了转换Word到PDF，字体不正确的问题。
修复了加载Markdown流文件，程序抛出“System.NotSupportedException”异常的问题。
修复了转换Word到PDF，文本位置偏移的问题。
修复了获取页数，程序抛出“System.ArgumentException:“Parent cannot be null.”异常的问题。
修复了保存Word文档，表格内的限制编辑区域不正确的问题。


获取Spire.Doc 13.10.3，请点击：
https://www.e-iceblue.cn/Downloads/Spire-Doc-NET.html
 















Spire.PDF 11.10.4 支持验证时间戳服务 URL 地址的有效性 



 

2025-10-23 









Spire.PDF 11.10.4 现已正式发布。该版本支持验证时间戳服务 URL 地址的有效性，并成功修复了一些在转换和比较PDF文档时出现的问题。详情请查阅以下内容。
新功能:

支持验证时间戳服务 URL 地址的有效性。


TSAHttpService timestampService = new TSAHttpService("http://time2.certum.pl");

    TSAResponse response = timestampService.Check();

    //if it is success to receive tsa token
    if (response.Success)

    { formatter.TimestampService = timestampService; }



问题修复:

优化了 PDF 转 Word 的效果。
修复了 PDF 转图片后，部分内容丢失的问题。 
修复了 XPS 转PDF, 内容丢失的问题。
修复了高亮垂直文本效果不正确的问题。
修复了合并 PDF 文档时，程序抛异常的问题。
优化了替换内容字体效果问题。
修复了添加时间戳报错的问题。
修复了 PDF 转 Tiff，效果不一致的问题。
修复了加载 PDF 文档，程序抛“NullReferenceException”的问题。
修复了使用 CreateTemplate() 方法对比 PDF 时，程序抛“IndexOutOfRangeException”的问题。


获取Spire.PDF 11.10.4，请点击： 
https://www.e-iceblue.cn/Downloads/Spire-PDF-NET.html
 















使用 Python 将 CSV 转换为 XML（处理现实数据问题） 











CSV 因其简洁和跨平台的广泛支持，是最常见的表格数据交换格式之一。然而，当需要处理结构化应用程序、配置文件或层次化数据时，XML 通常成为首选格式，因为它能够表示嵌套关系并提供更严格的数据验证。
在本指南中，我们将探讨如何使用 Spire.XLS for Python 将 CSV 文件转换为 XML。你将学习如何将 CSV 转换为 Excel XML 格式以及 标准 XML 。同时，我们还将介绍如何清理和预处理真实世界中的 CSV 文件——处理无效表头、缺失值、特殊字符和嵌套字段等问题，以确保生成的 XML 输出始终有效且结构正确。
文章目录

为什么要将 CSV 转换为 XML
准备工作
将 CSV 转换为 Excel XML 格式
将 CSV 转换为标准 XML
处理现实 CSV 数据问题
使用 clean_csv 自动清理
总结
常见问题（FAQs）


为什么要将 CSV 转换为 XML
为什么开发者需要将 CSV 转 XML 呢？以下是一些实际应用场景：

企业数据迁移 ： 许多企业级应用程序（如 ERP 或 CRM 系统）在批量导入数据时要求 XML 格式。
配置与元数据 ： XML 常用于存储结构化元数据，而原始数据可能以 CSV 形式提供。
互操作性 ： 某些行业（如金融、医疗、政府）仍大量依赖 XML 数据格式进行数据交换。
可读性报告 ：XML 可以表示层次化数据，比扁平化的 CSV 文件更具描述性。
数据验证 ：XML 可通过 XSD 模式验证数据完整性，而 CSV 无法直接实现此功能。

CSV 以简洁取胜，XML 以结构见长。通过两者的互相转换，你可以兼得两种格式的优势。

准备工作
在开始编写代码之前，请确保准备好以下环境：

Python 3.7 及以上版本
Spire.XLS for Python → 一款功能强大的专业 Excel 操作库
标准 Python 库 → xml.etree.ElementTree、csv 和 re

通过 pip 安装 Spire.XLS（假设系统中已安装 Python 和 pip）：
pip install spire.xls

此外，请准备一个测试用 CSV 文件，例如：
员工ID,姓名,部门,职位,入职日期,薪资
1001,张三,技术部,软件工程师,2021-03-15,15000
1002,李四,市场部,市场专员,2022-07-01,12000
1003,王五,技术部,产品经理,2020-11-10,18000
1004,赵六,人力资源部,招聘经理,2019-05-22,14000


将 CSV 转换为 Excel XML 格式
第一种方法是将 CSV 转换为 Excel 兼容的 XML 格式，也称为 SpreadsheetML （Excel 2003 引入）。这种格式可以被 Excel 直接打开。
使用 Spire.XLS，这一过程非常简单：
from spire.xls import *

# 创建 Workbook
workbook = Workbook()

# 加载 CSV 文件
workbook.LoadFromFile("input.csv", ",", 1, 1)

# 保存为 Excel XML格式
workbook.SaveAsXml("output.xml")

# 释放资源
workbook.Dispose()

工作原理

读取 CSV 文件 : 使用 LoadFromFile() 方法将 CSV 文件读取到工作簿中。
保存为Excel XML 格式 : 使用 SaveAsXml() 方法保存为 Excel XML 格式。

效果图 ：

你可能喜欢：使用 Python 将CSV 转为 Excel

将 CSV 转换为标准 XML
更多时候，你可能需要如下所示标准的XML 结构，而不是 Excel 兼容格式：
<Employee>
  <employee_id>1001</employee_id>
  <name>张三</name>
  <department>技术部</department>
  <position>软件工程师</position>
  <hire_date>2021-03-15</hire_date>
  <salary>15000</salary>
</Employee>

实现方式如下：
from spire.xls import *
import xml.etree.ElementTree as ET
from xml.dom import minidom

def chinese_to_english_tag(chinese_header):
    """
    将特定的中文列名转换为英文XML标签
    """
    mapping = {
        '员工ID': 'employee_id',
        '姓名': 'name', 
        '部门': 'department',
        '职位': 'position',
        '入职日期': 'hire_date',
        '薪资': 'salary'
    }
    # 去除前后空格后查找映射
    cleaned_header = chinese_header.strip()
    return mapping.get(cleaned_header, cleaned_header)

# Step 1: 加载 CSV 文件
workbook = Workbook()
workbook.LoadFromFile(r"C:\Users\Administrator\Desktop\input.csv", ",", 1, 1)
sheet = workbook.Worksheets[0]

# Step 2: 创建根节点
root = ET.Element("Employees")

# Step 3: 处理表头 - 中文列名转英文
headers = []
for col in range(1, sheet.Columns.Count + 1):
    cell_value = sheet.Range[1, col].Value
    if not cell_value:
        break
    english_tag = chinese_to_english_tag(str(cell_value))
    headers.append(english_tag)

# Step 4: 添加数据行
for row in range(2, sheet.Rows.Count + 1):
    if not sheet.Range[row, 1].Value:
        break
    employee = ET.SubElement(root, "Employee")
    for col, english_header in enumerate(headers, start=1):
        cell_value = sheet.Range[row, col].Value
        field = ET.SubElement(employee, english_header)
        field.text = str(cell_value) if cell_value is not None else ""

# Step 5: 保存为格式化的 XML 文件
xml_str = ET.tostring(root, encoding='utf-8')
pretty_xml = minidom.parseString(xml_str).toprettyxml(indent="  ")

with open("output/standard.xml", 'w', encoding='utf-8') as f:
    f.write(pretty_xml)

# 释放资源
workbook.Dispose()

工作原理

读取 CSV 文件 ：使用 LoadFromFile() 方法导入 CSV 数据，加载到工作表中。
创建 XML 根节点 ：创建根节点 <Employees>，用于存放所有员工信息。
转换表头 ：读取第一行表头，通过映射函数将中文列名转换为对应的英文标签，例如“员工ID”→employee_id，以确保生成的 XML 符合英文命名规范。
生成数据节点 ：从第二行开始遍历数据，为每一行创建 <Employee> 元素，并根据表头生成子标签填入数据。
格式化并保存 ：对生成的 XML 进行缩进美化后，保存为 standard.xml 文件。

效果图 ：

你可能喜欢：如何在 Python 中将 CSV 转换为 JSON

处理现实 CSV 数据问题
将“完美”的 CSV 转换为 XML 很容易，但实际 CSV 往往并不理想。以下是常见问题及对应解决方案：

无效的表头名称


问题：如 “Employee ID” 或 “123Name” 在 XML 中无效。
解决：将空格替换为下划线 _，或为数字开头的列名添加前缀。


空值或缺失值


问题：缺失值可能导致 XML 结构错误。
解决：将空值替换为占位符（如 NULL、Unknown、0）。


特殊字符


问题：如 <, >, & 会破坏 XML。
解决：使用转义字符 &lt;, &gt;, &amp;。


CSV 中的嵌套数据


问题：某些单元格包含多个值，如：

OrderID,Customer,Products
1001,张三,"电脑;鼠标;键盘"

若直接转换，将丢失层次结构。

解决：检测并拆分嵌套字段，生成层次化 XML：

<Products>
  <Product>电脑</Product>
  <Product>鼠标</Product>
  <Product>键盘</Product>
</Products>


中文列名转换为英文


问题： XML 标签通常要求为英文，若 CSV 文件使用中文列名（如“姓名”、“部门”），生成的 XML 标签不符合通用标准。
解决： 在生成 XML 前，将中文列名映射为对应的英文标签，例如“姓名”→“name”，“部门”→“department”。(如“将 CSV 转换为标准 XML”部分代码所示)


使用 clean_csv 自动清理
可使用以下辅助函数自动预处理 CSV （不包含中文列名转换为英文）：
import csv
import re

def clean_csv(input_file, output_file, nested_columns=None, nested_delimiter=";"):
    if nested_columns is None:
        nested_columns = []

    cleaned_rows = []

    # 转义 XML 特殊字符
    def escape_xml(text):
        return (text.replace("&", "&amp;")
                    .replace("<", "&lt;")
                    .replace(">", "&gt;")
                    .replace('"', "&quot;")
                    .replace("'", "&apos;"))

    with open(input_file, "r", encoding="utf-8") as infile:
        reader = csv.reader(infile)
        headers = next(reader)

        # 清理表头
        cleaned_headers = []
        for h in headers:
            h = h.strip()                              # 去除首尾空格
            h = re.sub(r"\s+", "_", h)                 # 将空格替换为下划线
            h = re.sub(r"[^a-zA-Z0-9_]", "", h)        # 移除非法字符
            if re.match(r"^\d", h):                    # 若表头以数字开头，则加前缀
                h = "Field_" + h
            cleaned_headers.append(h)

        cleaned_rows.append(cleaned_headers)

        # 读取所有行数据
        raw_rows = []
        for row in reader:
            # 将空单元格替换为 "NULL"
            row = [cell if cell.strip() != "" else "NULL" for cell in row]
            raw_rows.append(row)

    # 处理嵌套列（如多值列）
    if nested_columns:
        expanded_rows = [cleaned_headers]  # 保留表头
        for row in raw_rows:
            row_variants = [row]
            for col_name in nested_columns:
                if col_name not in cleaned_headers:
                    continue
                col_index = cleaned_headers.index(col_name)
                temp = []
                for variant in row_variants:
                    cell_value = variant[col_index]
                    # 仅按嵌套分隔符拆分，不影响 XML 特殊字符
                    if nested_delimiter in cell_value:
                        items = [item.strip() for item in cell_value.split(nested_delimiter)]
                        for item in items:
                            new_variant = variant.copy()
                            new_variant[col_index] = item
                            temp.append(new_variant)
                    else:
                        temp.append(variant)
                row_variants = temp
            expanded_rows.extend(row_variants)
        cleaned_rows = expanded_rows
    else:
        cleaned_rows.extend(raw_rows)

    # 展开后再转义特殊字符
    final_rows = [cleaned_rows[0]]  # 保留表头
    for row in cleaned_rows[1:]:
        final_row = [escape_xml(cell) for cell in row]
        final_rows.append(final_row)

    # 写入清理后的 CSV 文件
    with open(output_file, "w", newline="", encoding="utf-8") as outfile:
        writer = csv.writer(outfile)
        writer.writerows(final_rows)

print(f"清理后的 CSV 已保存至 {output_file}")

你可以通过传入输入和输出 CSV 文件路径来调用 clean_csv 函数，并可选地指定需要展开嵌套值的列。
# 文件路径
input_file = r"C:\Users\Administrator\Desktop\input.csv"
output_file = r"C:\Users\Administrator\Desktop\cleaned_output.csv"

# 指定可能包含嵌套值的列
nested_columns = ["Products"]  # 你也可以添加更多，例如 ["Products", "Reviews"]

# 调用 clean_csv 函数
clean_csv(input_file, output_file, nested_columns=nested_columns, nested_delimiter=";")

该函数可确保 CSV 在转换为 XML 前干净、有效，功能包括：

清理表头（符合 XML 命名规则）
处理空单元格
拆分嵌套列值
转义特殊字符
生成 UTF-8 编码的清洁 CSV 文件


总结
使用 Spire.XLS for Python 将 CSV 转换为 XML，不仅高效，而且具备极强的灵活性。无论是快速导出、结构化集成，还是复杂的业务数据转换，都能轻松应对。

快速导出： 如果只是为了让文件可被 Excel 直接读取，采用 Excel XML 格式是最快捷的方式。
自定义结构： 若需生成具有特定标签或层级关系的 XML，可借助 xml.etree.ElementTree 构建标准 XML 文档，实现高度定制。
数据清理与增强： 面对真实环境中格式不规范或存在嵌套数据的 CSV，可先使用 clean_csv() 函数进行清洗，统一字段名、展开嵌套列，并自动转义特殊字符，确保生成的 XML 结构规范、可解析。

从企业系统集成、报表归档，到旧系统的数据迁移，这一流程充分结合了 CSV 的简洁性 与 XML 的结构化优势 ，为数据交换与自动化处理提供了稳健、高可维护的解决方案。

常见问题（FAQs）
Q1. 可以转换非常大的 CSV 文件吗？
可以，但建议采用流式处理（逐行处理）以避免内存问题。
Q2. Spire.XLS 是否支持将 CSV 转换为标准 XML？
支持。保存为 Excel XML 是内置功能，但自定义 XML 仍需代码实现。
Q3. 如何自动处理特殊字符？
可使用 escape_xml 辅助函数或 Python 内置的 xml.sax.saxutils.escape()。
Q4. 如果 CSV 有多个嵌套列怎么办？
调用 clean_csv 时，可在 nested_columns 参数中传入多个列名。
Q5. 可以验证生成的 XML 吗？
可以。生成 XML 后，可根据 XSD 模式进行验证。
申请临时License
如果您需要去除生成文档中的评估提示或解除功能限制，请该Email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用JavaScript。获取有效期 30 天的临时许可证。
 















Spire.PDF for C++ 11.10.0 修复了提取 PDF 页面文本抛异常的问题 



 

2025-10-23 









Spire.PDF for C++ 11.10.0 现已正式发布。最新版本修复了提取 PDF 页面文本抛异常的问题。详情请查阅下方的内容。
问题修复:

修复了提取 PDF 页面文本抛异常的问题


获取 Spire.PDF for C++ 11.10.0 请点击：
Spire.PDF for C++ 下载页面




















产品

.NET文档处理API
Java文档处理API
Python文档处理API
C++文档处理API
JavaScript文档处理API
在线编辑/私有化部署
免费产品



购买

申请试用
授权方式
价格详情
购买流程
常见问题FAQ
如何应用License
License协议



服务与支持

博客
星级服务
技术论坛
视频资源
在线教程
代码示例
API Reference



其他

关于我们
新闻中心
经典案例
真实评价
典型用户
定制Demo







联系我们


电话：86-028-81705109
QQ：3312989436（ 购买 ）2100065966（ 技术咨询 ）


邮箱：sales@e-iceblue.comsupport@e-iceblue.com
地址：中国四川省成都市武侯区九兴大道14号凯乐国际3栋9楼





关注 "冰蓝科技"







成都冰蓝科技有限公司版权所有©2024 蜀ICP备17015896号





























































登录






用户名 / Email *
 


密码 *



 记住我 


忘记密码?




注册



 











注册






姓名： *
 


用户名： *
 


邮箱地址： *
 


重填邮箱地址： *
 


密码： *
 

密码强度:
弱
中等
强
Strong
Very strong






验证密码： *
 





手机号码 *
 

用户勾选即代表同意
服务条款
 





验证码 *










 








登录






用户名 / Email *
 


密码 *



 记住我 


忘记密码?




注册



 











注册






姓名： *
 


用户名： *
 


邮箱地址： *
 


重填邮箱地址： *
 


密码： *
 

密码强度:
弱
中等
强
Strong
Very strong






验证密码： *
 





手机号码 *
 

用户勾选即代表同意
服务条款
 





验证码 *










 













 年终回馈


感谢您一年的支持！
E-ICEBLUE 为您带来限时特惠：




Spire 全系列产品
享 10% 折扣


任意两款产品组合购
享 15% 折扣



活动有效期：2025.12.1 — 2026.1.10


立即购买
查看详情