Skip to content

ErgTreebankingGuidelines

EmilyBender edited this page Nov 24, 2014 · 8 revisions

Heuristics for efficient treebanking

Top-down

  • Choose the construction that spans the whole sentence
    • Typically SUBJH
    • Typically not one of the FRAG* rules

Bottom-up

  • Disambiguate lexical entries early, to reduce remaining ambiguity

Technical choices

Complex proper names

Titles

  • |Mr. Browne|
    • Choose NP-TITLE-CMPND, not APPOS

Capitalized words in name

  • treat as parts of name, not ordinary words
  • |Rolls-Royce Motor Cars Inc.|
    • |Motor Cars|
      • NP_NAME_CMPND, not NOUN_N_CMPND
    • |Rolls-Royce|
      • Choose multi-word entry when available
    • |Rolls-Royce Motor Cars|
      • NP_NAME_CMPND
    • Attach |Inc.| with NADJ_RR

Profession modifier

  • treat as appositive
  • |Howard Mosher, president and CEO|
    • First combine |Howard Mosher|
    • Then combine it with |president and CEO| using APPOS_NBAR

Native names preferred when available

  • Company names
    • |Rolls-Royce|
      • Choose n_-_pn_le, not NP_NAME_CMPND
  • Country names
    • |U.S.|
      • Choose n_-_c-nm-pd_le, not n_-_pn-gen_le

Proper names and punctuation

  • Unknown names
    • |Elianti.|
      • Choose PUNCT_PERIOD_ORULE (period is not part of name)
  • Name abbreviations containing periods
    • |U.S.|
      • Choose PUNCT_PERIOD_ORULE if word is at end of sentence

PP attachment

  • Choose highest attachment point consistent with meaning
    • |remain steady at 1,200 cars|
      • attach to VP, not to |steady|
    • |reserve a room for Browne|
      • attach to VP, not to |room|
  • In copula constructions (with forms of verb "be"), attach PP inside
    • |be payable Feb. 15|
      • First combine |payable| with |Feb. 15| with HADJ_I_UNS
  • Complement vs. modifier - choose complement when available
    • |based in Los Angeles|
      • Choose HCOMP, not HADJ_I_UNS
  • PP modifier inserted between verb and its complement NP
    • |publish in statements the names of insiders|
      • First combine |publish| with |in statements| using VMOD_I

Temporal modifiers

  • When precede VP, attach to subject NP
    • |the maker last year sold cars|
      • attach |last year| to |maker|
  • Treat as modifiers, pumping temporal NP to a PP
    • |last year|
      • Choose NPADV, not ADJN
    • |Feb. 15|
      • Combine with HSPECHC, then choose NPADV
  • Complex phrases
    • |early next year|
      • Combine |early| with |next year| using NADJ_RR

Complex compound nouns

  • Choose bracketing with intended sense
    • |luxury auto maker|
      • first combine |luxury| with |auto|
  • When intended bracketing is not clear, group from right to left
    • |airline ticket counter|
      • first combine |ticket| with |counter|

Coordination

  • Nominal phrases
    • Choose N_COORD_TOP_2, not N_COORD_TOP_3 when given the choice
  • Sentence-initial conjunction - treat as incomplete coordination of clauses
    • |But Abrams arrived early.|
      • Combine |But| with |Abrams arrived early.| with HMARK_CL

Passive verb vs. adjective

  • Choose verb if the meaning is agentive; otherwise choose adjective
    • |A date hasn't been set|
      • For |set|, choose v_np*_le, not aj_-_i_le

Punctuation

  • Paired commas marking off a modifier: choose "paired" rule (-PR suffix)
    • |Bell, based in Los Angeles|
      • Choose NADJ_RC_PR to combine modifier phrase with |Bell|

Adverbs

  • Negation - always attach |not| to preceding auxiliary if possible
    • |did not meet|
      • First combine |did| with |not| using HCOMP
  • Other adverbs between auxiliary and main VP - attach adverb to following VP
    • |can really sing|
      • First combine |really| with |sing| using ADJH_S
  • Sentence-initial - Prefer attachment without extraction when possible
    • |Apparently the commission met|
      • Choose ADJ_S, not FILLHEAD_NON_WH_IG

Measure phrases

  • Degree modifiers - combine with the number word
    • |about 25 % of them|
      • First combine |about| with |25| using HSPECHC
      • Combine |%| with |of them| using HCOMP
  • Dollar amounts - treat the symbol |$| as the head (the unit of measure)
    • |$ 80 billion|
      • Combine |$| with |80 billion| using MEAS_NP_SYMB

Quotations with explicit attribution

  • treat as extraction from 'saying' verb
    • |They arrived, Browne said.|
      • Combine |They arrived,| with |Browne said.| using FILLHEAD_NON_WH

Partitive NPs

  • First pump determiner to noun, and treat of-PP as complement
    • |some of the books|
      • Combine |some| with |of the books| using HCOMP
  • For |all|, |not all|, |both|, and |half|, treat following NP as complement
    • |not all those who wrote|
      • For |not all|, choose native entry n_np_mc-neg_le
      • Combine |not all| with |those who wrote| using HCOMP

Modification in noun phrases

  • Modifiers to the right of the head noun are always attached _before_
    • any modifiers to the left
    • |important changes by the SEC|
      • First combine |changes| with |by the SEC| using NADJ_RR

Notes from Tomar meeting

  1. Where lexical ambiguity is hard to decide (e.g. even-deg vs even-conj), choose based on frequency in redwoods/deepbank

  2. Disprefer modifier attachment to semantically vacuous heads e.g. attach adverbs to hiring..., not be hiring...

  3. For there-copula:

    1. Avoid double-object choice and avoid modification of there-cop
    2. Also prefer low attachment of modifier after obj NP
    3. Accept extraction of PP for there-cop as is
  4. When choice of verb-particle or verb-mod as in go away, if you can modify the `particle' as in go far away, it is not verb-particle.

  5. When choice of spr-hd or mod-hd for Adv-Adj, choose mod-hd

  6. Avoid adv-add except for not

  7. When WH-Q of form NP-be-NP [EMB: guessing this is choose subj-head; Dan please confirm]

  8. For complement of saying, if there's a main clause option for the quoted material choose it:

    • |"Who did Kim hire" asked Mary| not |*Who Kim hired, asked Mary|
  9. No free relatives

  10. Attach three-dot punct as low as possible

  11. Reject ellipsis

  12. For ndash between clauses, use run-on

  13. For degree specifiers, when there's a choice, take the shortest lexent type name

  14. Attach subord clause high [EMB: subordinate clauses are understood as clauses with all arguments overt; do not include in+order+to purposives, etc.]

Clone this wiki locally