-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix word-captialization XSL error #22
Comments
Ahhh, I see what the problems are. (There are two) There are a few dozen Lexis-Nexis XML files that lack a crucial tag: <hierarchy>
<hierarchyLevel levelType="title">
<heading>
<desig>TITLE 58.1.</desig>
<title>TAXATION</title>
</heading>
<hierarchyLevel levelType="title">
<heading>
<desig>GENERAL PROVISIONS OF TITLE 58.1</desig>
</heading>
<hierarchyLevel levelType="article">
<heading>
<desig>ARTICLE 2.</desig>
<title>RESPONSIBILITY OF FIDUCIARIES IN TAX MATTERS</title>
</heading>
</hierarchyLevel>
</hierarchyLevel>
</hierarchyLevel>
</hierarchy> At every level, we have |
Ohhh, I've got it. The XML on the official site helped me to figure it out. This law is in Chapter 0. Which is pretty weird, because I've never seen any Chapters 0. Looking at the Title 58.1 page on the official site, I see that the title is divided into Subtitles I–IV. Those are subdivided into Chapters. Subtitle I contains no Chapter 0. Chapter 0 is a fiction. Chapter 0 is a placeholder. In fact, Chapter 0 means that these laws are children of Title 58.1 (divided into Article 1 and Article 2). The official site tucks these up top. But Lexis-Nexis' XML is silent on chapter 0. There's no designation at all. I gotta chew over how to best handle these. |
Seems pretty similar to what I ran into with statedecoded/statedecoded#574 |
Given the source: <hierarchy>
<hierarchyLevel levelType="title">
<heading>
<desig>TITLE 58.1.</desig>
<title>TAXATION</title>
</heading>
<hierarchyLevel levelType="title">
<heading>
<desig>GENERAL PROVISIONS OF TITLE 58.1</desig>
</heading>
<hierarchyLevel levelType="article">
<heading>
<desig>ARTICLE 1.</desig>
<title>IN GENERAL</title>
</heading>
</hierarchyLevel>
</hierarchyLevel>
</hierarchyLevel>
</hierarchy> For the XSLT, this makes sense to me: <unit label="title" level="1" identifier="58.1">Taxation</unit>
<unit label="title" level="2">General Provisions Of Title 58.1</unit>
<unit label="article" level="3" identifier="1">In General</unit> And is achieved thus: <xsl:choose>
<xsl:when test="heading/title">
<xsl:attribute name="identifier">
<xsl:value-of select="replace(replace(normalize-space(heading/desig), '^(TITLE|SUBTITLE|ARTICLE|CHAPTER|SUBCHAPTER|PART) ', '' ), '.$', '')"/>
</xsl:attribute>
<xsl:value-of select="fn:capitalize_phrase(heading/title)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="fn:capitalize_phrase(heading/desig)"/>
</xsl:otherwise>
</xsl:choose> This leaves it up the Parser to handle these "administrative" divisions. We'll still need to generate an identifier for these in the short-term, since SD requires this. Since I've run into this a lot, I have hacked around it by custom rules – but ideally, we should support structures that do not have real identifiers. |
A very sensible solution! Legal codes are made up entirely of edge cases, I believe we've learned. |
Well, moreover – trying to nicely fit legal text into abstract node-based tree models just doesn't work. Whenever this thing gets a rewrite, it needs to be waaaaaay more flexible in how things are actually composed. |
This also is a massive cleanup of the structure class, to reduce the magic and sprawling code. In particular: * Removes magic global * Prefers passing in arguments to functions over context-specific magic properties. * Removes duplicate code in ancestor handling between id_ancestry and get_current * Replaces url mangling with a consistent interface to permalinks * Replaces inconsistent, renamed properties with standard internal naming * Eliminates abusing objects as arrays with real arrays.
Handle structures without identifiers. Fixes #574 openva/va-decoded#22
In the file in question, I can't identify any actual problems. At least in reading it (as opposed to parsing it on a per-character level for e.g. a zero-width Unicode character), everything seems fine.
Note that the outcome of this error is severe—it yields a zero-length XML file, as opposed to just a file with a minor error.
The text was updated successfully, but these errors were encountered: