Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions .github/workflows/dotnet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,25 +12,37 @@ on:
pull_request:

jobs:
net8:
net5_above:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ github.workspace }}
strategy:
matrix:
dotnet-version: [ '8.0', '10.0' ]
steps:
- uses: actions/checkout@v4
- name: Setup .NET 8.
uses: actions/setup-dotnet@v4
- uses: actions/checkout@v5
- name: Setup .NET ${{ matrix.dotnet-version }}.
uses: actions/setup-dotnet@v5
with:
dotnet-version: '8.0.x'
dotnet-version: ${{ matrix.dotnet-version }}.x
- uses: actions/cache@v4
with:
path: ~/.nuget/packages
key: ${{ runner.os }}-nuget-${{ hashFiles('**/*.csproj', '**/*.sln') }}
restore-keys: |
${{ runner.os }}-nuget-
- name: Restore dependencies
run: dotnet restore
- name: Build
run: dotnet build --configuration Release --no-restore
- name: Run tests
run: dotnet test --framework net8.0 --configuration Release --no-build --verbosity normal
run: dotnet test --framework net${{ matrix.dotnet-version }} --configuration Release --no-build --verbosity normal

net462:
runs-on: windows-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: Setup .NET Framework
uses: microsoft/setup-msbuild@v2
- name: Build
Expand Down
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Changelog

## 3.3.0

- Rewriting of parsing to use `Span<char>` instead of Regex for +25% performance gain 🚀
- Set Timeout on remaining Regex to prevent any DoS attack
- Remove extra border space in table #156
- Added .NET 10 as an explicit target
- Fix page break. Add support for `break-before` and `break-after` css style #220
- Support registering custom bookmark with `data-bookmark` #219

## 3.2.8

- Fix a fatal crash when trying to convert multiple images #215
Expand Down
12 changes: 12 additions & 0 deletions Directory.Build.props
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<Project>
<PropertyGroup>
<Copyright>Copyright 2009-$([System.DateTime]::Today.Year) Olivier Nizet</Copyright>
<Nullable>enable</Nullable>
<LangVersion>latest</LangVersion>
<ImplicitUsings>enable</ImplicitUsings>
</PropertyGroup>

<PropertyGroup>
<DocumentFormatOpenXmlPackageVersion>3.4.1</DocumentFormatOpenXmlPackageVersion>
</PropertyGroup>
</Project>
66 changes: 37 additions & 29 deletions HtmlToOpenXml.sln
Original file line number Diff line number Diff line change
Expand Up @@ -13,34 +13,42 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Demo", "examples\Demo\Demo.
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "HtmlToOpenXml.Tests", "test\HtmlToOpenXml.Tests\HtmlToOpenXml.Tests.csproj", "{CA0A68E0-45A0-4A01-A061-F951D93D6906}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Benchmark", "examples\Benchmark\Benchmark.csproj", "{143A3684-FAEB-43D0-A895-09BE5FDF85F6}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.Build.0 = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.ActiveCfg = Release|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.Build.0 = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.Build.0 = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.Build.0 = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.ActiveCfg = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514} = {58520A98-BA53-4BA4-AAE3-786AA21331D6}
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{CA0A68E0-45A0-4A01-A061-F951D93D6906} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {14EE1026-6507-4295-9FEE-67A55C3849CE}
EndGlobalSection
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.Build.0 = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.ActiveCfg = Release|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.Build.0 = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.Build.0 = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.Build.0 = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.ActiveCfg = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.Build.0 = Release|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Debug|Any CPU.Build.0 = Debug|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Release|Any CPU.ActiveCfg = Release|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514} = {58520A98-BA53-4BA4-AAE3-786AA21331D6}
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{CA0A68E0-45A0-4A01-A061-F951D93D6906} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{143A3684-FAEB-43D0-A895-09BE5FDF85F6} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {14EE1026-6507-4295-9FEE-67A55C3849CE}
SolutionGuid = {194D4CBE-A20A-4E32-967B-E1BBD3922C29}
EndGlobalSection
EndGlobal
19 changes: 19 additions & 0 deletions examples/Benchmark/Benchmark.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFrameworks>net48;net8.0;net10.0</TargetFrameworks>
<SonarQubeExclude>true</SonarQubeExclude>
<SatelliteResourceLanguages>en</SatelliteResourceLanguages>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="BenchmarkDotnet" Version="0.15.6" />
<ProjectReference Include="..\..\src\Html2OpenXml\HtmlToOpenXml.csproj" />
</ItemGroup>

<ItemGroup>
<EmbeddedResource Include="*.html" />
</ItemGroup>

</Project>
37 changes: 37 additions & 0 deletions examples/Benchmark/Benchmarks.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using HtmlToOpenXml;

[MemoryDiagnoser(displayGenColumns: false)]
[SimpleJob(runtimeMoniker: RuntimeMoniker.Net48)]
[SimpleJob(runtimeMoniker: RuntimeMoniker.Net80, baseline: true)]
[SimpleJob(runtimeMoniker: RuntimeMoniker.Net10_0)]
public class Benchmarks
{
[Benchmark]
public async Task ParseWithSpan()
{
string html = ResourceHelper.GetString("benchmark.html");

using (MemoryStream generatedDocument = new MemoryStream())
using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart? mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}

HtmlConverter converter = new HtmlConverter(mainPart);
converter.RenderPreAsTable = true;
converter.ImageProcessing = ImageProcessingMode.LinkExternal;

await converter.ParseBody(html);
mainPart.Document!.Save();
}
}
}
3 changes: 3 additions & 0 deletions examples/Benchmark/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Benchmarks>();
7 changes: 7 additions & 0 deletions examples/Benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Benchmarks

How to run the benchmark tool.
First build the project: `dotnet build -c Release`

Then run the performance test targeting multiple runtimes:
`dotnet run -c Release -f net8.0 --runtimes net48 net8.0`
40 changes: 40 additions & 0 deletions examples/Benchmark/ResourceHelper.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Copyright (c) 2017 Deal Stream sàrl. All rights reserved
*/
using System.IO;
using System.Reflection;
using System.Resources;

/// <summary>
/// Helper class to get an embedded resources.
/// </summary>
public static class ResourceHelper
{
public static string GetString(string resourceName)
{
return GetString(typeof(ResourceHelper).GetTypeInfo().Assembly, resourceName);
}

public static string GetString(Assembly assembly, string resourceName)
{
using (var stream = GetStream(assembly, resourceName))
{
using (var reader = new StreamReader(stream))
return reader.ReadToEnd();
}
}

public static Stream GetStream(string resourceName)
{
return GetStream(typeof(ResourceHelper).GetTypeInfo().Assembly, resourceName);
}

public static Stream GetStream(Assembly assembly, string resourceName)
{
var stream = assembly.GetManifestResourceStream(assembly.GetName().Name + "." + resourceName);
if (stream == null)
throw new MissingManifestResourceException($"Requested resource `{resourceName}` was not found in the assembly `{assembly}`.");

return stream;
}
}
124 changes: 124 additions & 0 deletions examples/Benchmark/benchmark.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Sample HTML Page</title>
</head>
<body>

<!-- Header Section -->
<h1 style="color: blue; text-align: center;">Welcome to My Sample Page</h1>

<!-- Paragraph with styling -->
<p style="font-family: Arial, 'Times New Roman', sans-serif; font-size: 14px; color: grey;">
This is a sample paragraph. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</p>

<!-- Link with styling -->
<a href="https://www.example.com" style="text-decoration: none; color: green;">Visit Example</a>

<!-- Image with styling -->
<img src="https://via.placeholder.com/150" alt="Placeholder Image" style="border: 2px solid black;">

<!-- List with styling -->
<ul style="list-style-type: square;">
<li style="color: red;">Item 1</li>
<li style="color: orange;">Item 2</li>
<li style="color: yellow;">Item 3</li>
</ul>

<!-- Table with styling -->
<table style="width: 100%; border-collapse: collapse;">
<tr style="background-color: #f2f2f2;">
<th style="border: 1px solid black;">Header 1</th>
<th style="border: 1px solid black;">Header 2</th>
</tr>
<tr>
<td style="border: 1px solid black;">Cell 1</td>
<td style="border: 1px solid black;">Cell 2</td>
</tr>
<tr style="background-color: #f2f2f2;">
<td style="border: 1px solid black;">Cell 3</td>
<td style="border: 1px solid black;">Cell 4</td>
</tr>
</table>

<!-- Form with styling -->
<form style="background-color: #f9f9f9; padding: 20px; border: 1px solid #ccc;">
<label for="name" style="font-weight: bold;">Name:</label>
<input type="text" id="name" name="name" style="width: 100%; padding: 10px; margin-top: 5px;">

<label for="email" style="font-weight: bold; margin-top: 10px; display: block;">Email:</label>
<input type="email" id="email" name="email" style="width: 100%; padding: 10px; margin-top: 5px;">

<button type="submit" style="background-color: blue; color: white; padding: 10px 20px; margin-top: 10px;">Submit</button>
</form>

<!-- Additional Content to Reach 300 Lines -->

<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Fusce convallis, mauris imperdiet gravida bibendum, nisl turpis suscipit mauris, sed placerat ipsum ligula sed magna. Maecenas nisl est, ultrices nec, congue eget, auctor vitae, massa.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Fusce luctus vestibulum augue ut aliquet. Nunc sagittis dictum nisi. Sed id blandit purus. Proin quis orci. Quisque convallis libero in sapien pharetra tincidunt.
</p>

<!-- Additional lines of paragraph to reach 300 lines -->

<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Fusce convallis, mauris imperdiet gravida bibendum, nisl turpis suscipit mauris, sed placerat ipsum ligula sed magna. Maecenas nisl est, ultrices nec, congue eget, auctor vitae, massa.
</p>
<p style="font-family: Arial, 'Times New Roman', sans-serif; font-size: 14px; color: grey;">
Fusce luctus vestibulum augue ut aliquet. Nunc sagittis dictum nisi. Sed id blandit purus. Proin quis orci. Quisque convallis libero in sapien pharetra tincidunt.
</p>

</body>
</html>
3 changes: 2 additions & 1 deletion examples/Demo/Demo.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
<PropertyGroup>
<TargetFramework>net8.0</TargetFramework>
<OutputType>Exe</OutputType>
<SonarQubeExclude>true</SonarQubeExclude>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="DocumentFormat.OpenXml" Version="3.1.0" />
<PackageReference Include="DocumentFormat.OpenXml" Version="$(DocumentFormatOpenXmlPackageVersion)" />
<PackageReference Include="System.Diagnostics.Process" Version="4.3.0" />
</ItemGroup>

Expand Down
Loading