-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include P tags within imported text #26
Comments
The lack of paragraph tags is also causing text to run together. e.g.: of Health, willfully fail to comply with a confinement, isolation, or quarantine order.</p>
<p>Any person violating the provisions of this section shall be guilty of a Class 2 misdemeanor.</p> is rendered as:
|
This is also happening with references, where we get the section number |
Argh. I think this is some fundamental misunderstanding of XSLT for me. |
Here's my proposed change to the XSLT and a couple of results, I'm explicitly including P tags, dropping https://gist.github.com/krues8dr/7d6c40f944e3b6abe07edfd1dc73f72d |
Ah. Looks like |
Oh no, they're misusing the pre tag.
|
Wow, that's...that's really something. |
So, the issue here is that in the export they're translating a nicely-formatted table into a garbage pile of Since they didn't bother to preserve the top-left blank spot, I have no reasonable way of reassembling this sanely. This "works": <xsl:template match="p">
<p>
<xsl:choose>
<!--We must handle preformatted tables specially-->
<xsl:when test="pre">
<xsl:for-each select="node()">
<xsl:choose>
<!--If this is a text node, wrap it in pre tags-->
<xsl:when test="self::text()">
<pre><xsl:value-of select="current()"/></pre>
</xsl:when>
<!--If it's an element, parse it as usual-->
<xsl:when test="self::*">
<xsl:apply-templates select="."/>
</xsl:when>
<!--Everything else gets removed-->
</xsl:choose>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates />
</xsl:otherwise>
</xsl:choose>
</p>
</xsl:template> However, there are other places where they use these self-closing Also noteworthy in TLDR: I give up. |
I'm going to preserve the |
This'll do. Also shows #29. |
Add template for pre and br tags. Per #26
I'm still cleaning this up a bit, on the parser side – I'm getting duplicate text from statedecoded/statedecoded#113 |
I'm still amazed at how poorly Lexis handles this. They had tabular data in the SGML, and they just punted on it with their move to XML. |
That should do it. There's still some seriously broken XML that Lexis is putting out, like |
Lexis-Nexis' XML includes paragraph tags, but we're not using them. That's creating problems, e.g. definitions that all run together. Stop stripping these out.
The text was updated successfully, but these errors were encountered: