Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix word-captialization XSL error #22

Open
waldoj opened this issue Jul 6, 2016 · 6 comments
Open

Fix word-captialization XSL error #22

waldoj opened this issue Jul 6, 2016 · 6 comments

Comments

@waldoj
Copy link
Member

waldoj commented Jul 6, 2016

Error on line 68 of decoded.xsl:
  XTTE0790: An empty sequence is not allowed as the first argument of fn:capitalize_phrase()
  at xsl:apply-templates (file:/vol/vacode.org/decoded.xsl#73)
     processing /legislativeDoc/metadata[1]/hierarchy[1]/hierarchyLevel[1]/hierarchyLevel[1]
  at xsl:apply-templates (file:/vol/vacode.org/decoded.xsl#27)
     processing /legislativeDoc/metadata[1]/hierarchy[1]/hierarchyLevel[1]
  at xsl:for-each (file:/vol/vacode.org/decoded.xsl#26)
     processing /legislativeDoc/metadata[1]/hierarchy[1]
  in built-in template rule
While processing 5K0Y-PTF0-004G-K44X-00000-00.xml: Run-time errors were reported

In the file in question, I can't identify any actual problems. At least in reading it (as opposed to parsing it on a per-character level for e.g. a zero-width Unicode character), everything seems fine.

Note that the outcome of this error is severe—it yields a zero-length XML file, as opposed to just a file with a minor error.

@waldoj
Copy link
Member Author

waldoj commented Jul 28, 2016

Ahhh, I see what the problems are. (There are two) There are a few dozen Lexis-Nexis XML files that lack a crucial tag:

<hierarchy>
   <hierarchyLevel levelType="title">
      <heading>
         <desig>TITLE 58.1.</desig>
         <title>TAXATION</title>
      </heading>
      <hierarchyLevel levelType="title">
         <heading>
            <desig>GENERAL PROVISIONS OF TITLE 58.1</desig>
         </heading>
         <hierarchyLevel levelType="article">
            <heading>
               <desig>ARTICLE 2.</desig>
               <title>RESPONSIBILITY OF FIDUCIARIES IN TAX MATTERS</title>
            </heading>
         </hierarchyLevel>
      </hierarchyLevel>
   </hierarchyLevel>
</hierarchy>

At every level, we have desig and title tags...except for "General Provisions of Title 58.1." But it gets weirder: that's a title. So we have a title nested inside of a...title? With an article inside of that? I don't have the foggiest idea of what to do here.

@waldoj
Copy link
Member Author

waldoj commented Jul 28, 2016

Ohhh, I've got it.

The XML on the official site helped me to figure it out. This law is in Chapter 0. Which is pretty weird, because I've never seen any Chapters 0. Looking at the Title 58.1 page on the official site, I see that the title is divided into Subtitles I–IV. Those are subdivided into Chapters. Subtitle I contains no Chapter 0.

Chapter 0 is a fiction. Chapter 0 is a placeholder. In fact, Chapter 0 means that these laws are children of Title 58.1 (divided into Article 1 and Article 2). The official site tucks these up top.

But Lexis-Nexis' XML is silent on chapter 0. There's no designation at all.

I gotta chew over how to best handle these.

@krusynth
Copy link
Collaborator

krusynth commented Mar 6, 2017

Seems pretty similar to what I ran into with statedecoded/statedecoded#574

@krusynth
Copy link
Collaborator

krusynth commented Mar 9, 2017

Given the source:

<hierarchy>
  <hierarchyLevel levelType="title">
    <heading>
      <desig>TITLE 58.1.</desig>
      <title>TAXATION</title>
    </heading>
    <hierarchyLevel levelType="title">
      <heading>
        <desig>GENERAL PROVISIONS OF TITLE 58.1</desig>
      </heading>
      <hierarchyLevel levelType="article">
        <heading>
          <desig>ARTICLE 1.</desig>
          <title>IN GENERAL</title>
        </heading>
      </hierarchyLevel>
    </hierarchyLevel>
  </hierarchyLevel>
</hierarchy>

For the XSLT, this makes sense to me:

<unit label="title" level="1" identifier="58.1">Taxation</unit>
<unit label="title" level="2">General Provisions Of Title 58.1</unit>
<unit label="article" level="3" identifier="1">In General</unit>

And is achieved thus:

<xsl:choose>
  <xsl:when test="heading/title">
    <xsl:attribute name="identifier">
      <xsl:value-of select="replace(replace(normalize-space(heading/desig), '^(TITLE|SUBTITLE|ARTICLE|CHAPTER|SUBCHAPTER|PART) ', '' ), '.$', '')"/>
    </xsl:attribute>
    <xsl:value-of select="fn:capitalize_phrase(heading/title)"/>
  </xsl:when>

  <xsl:otherwise>
    <xsl:value-of select="fn:capitalize_phrase(heading/desig)"/>
  </xsl:otherwise>
</xsl:choose>

This leaves it up the Parser to handle these "administrative" divisions. We'll still need to generate an identifier for these in the short-term, since SD requires this. Since I've run into this a lot, I have hacked around it by custom rules – but ideally, we should support structures that do not have real identifiers.

krusynth pushed a commit to krusynth/va-decoded that referenced this issue Mar 9, 2017
@waldoj
Copy link
Member Author

waldoj commented Mar 10, 2017

A very sensible solution!

Legal codes are made up entirely of edge cases, I believe we've learned.

@krusynth
Copy link
Collaborator

Well, moreover – trying to nicely fit legal text into abstract node-based tree models just doesn't work. Whenever this thing gets a rewrite, it needs to be waaaaaay more flexible in how things are actually composed.

krusynth pushed a commit to statedecoded/statedecoded that referenced this issue Mar 19, 2017
This also is a massive cleanup of the structure class, to reduce the magic and sprawling code.  In particular:

* Removes magic global
* Prefers passing in arguments to functions over context-specific magic properties.
* Removes duplicate code in ancestor handling between id_ancestry and get_current
* Replaces url mangling with a consistent interface to permalinks
* Replaces inconsistent, renamed properties with standard internal naming
* Eliminates abusing objects as arrays with real arrays.
waldoj added a commit to statedecoded/statedecoded that referenced this issue Mar 20, 2017
Handle structures without identifiers.  Fixes #574 openva/va-decoded#22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants