Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to parse according to CommonMark specification #287

Open
pengqian089 opened this issue May 9, 2022 · 4 comments
Open

I want to parse according to CommonMark specification #287

pengqian089 opened this issue May 9, 2022 · 4 comments

Comments

@pengqian089
Copy link

Because my server side is parsed strictly according to CommonMark specification, so I hope Html parsing to Markdown has the option of parsing according to CommonMark specification.

he<strong>ll</strong>o

output:

he**ll**o

expected output:

he **ll** o
@mysticmind
Copy link
Owner

Currently there is no explicit setting to adhere to CommonMark spec, will have a look at it.

@mysticmind
Copy link
Owner

with regards to your example he<strong>ll</strong>o being converted to he**ll**o. he**ll**o is correct and he **ll** o is incorrect even if you take into account CommonMark. Am I missing something here?

@pengqian089
Copy link
Author

pengqian089 commented May 10, 2022

with regards to your example he\<strong\>ll\</strong\>o being converted to he**ll**o. he**ll**o is correct and he **ll** o is incorrect even if you take into account CommonMark. Am I missing something here?

Because the language I use is Chinese, which belongs to the Unicode category,so CommonMark requires that emphasis be preceded by a whitespace character.

left-flanking-delimiter-run

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

C# code:

public void HtmlToMarkdown()
{
	var html = "<p><strong>4月19日,特斯拉中国方面发布消息称,</strong>在上海市各级政府部署协调下,4月17日和4月18日,特斯拉8000名员工陆续返厂。其中,工厂电池、电机车间于4月19日早晨恢复生产。“特斯拉会在接下来的3、4天内进行产能逐步爬坡,到整体单班满产。”特斯拉超级工厂生产制造高级总监宋钢表示。</p>";
	var config = new ReverseMarkdown.Config
	{
		UnknownTags = ReverseMarkdown.Config.UnknownTagsOption.PassThrough,
		GithubFlavored = true,
		DefaultCodeBlockLanguage = ""
	};
	var converter = new ReverseMarkdown.Converter(config);
	var markdown = converter.Convert(html);

	Console.WriteLine(markdown);
}

output:

**4月19日,特斯拉中国方面发布消息称,**在上海市各级政府部署协调下,4月17日和4月18日,特斯拉8000名员工陆续返厂。其中,工厂电池、电机车间于4月19日早晨恢复生产。“特斯拉会在接下来的3、4天内进行产能逐步爬坡,到整体单班满产。”特斯拉超级工厂生产制造高级总监宋钢表示。

@mysticmind
Copy link
Owner

This is going to be a huge set of changes to deal with, will look at how best to address it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants