-
Notifications
You must be signed in to change notification settings - Fork 1
/
display.xsl
301 lines (256 loc) · 9.61 KB
/
display.xsl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:mods="http://www.loc.gov/mods/v3"
version="2.0">
<xsl:output omit-xml-declaration="yes" encoding="UTF-8" method="xml" indent="yes" version="1.0"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="/">
<xsl:variable name="allheadlines" select="number(results/@artcount)"/>
<xsl:variable name="headlines" select="number(count(results/title))"/>
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Princeton Periodicals QC</title>
<link rel="stylesheet" type="text/css" href="main.css"/>
</head>
<body>
<div id="wrapper">
<div id="introduction">
<h2>Instructions</h2>
<p>Print out this page.</p>
<p>Use the following tables to assess the quality of this
batch. Errors are counted by <em>character</em>, not by
<em>word</em>. The following are errors:</p>
<ol>
<li>Incorrect characters (e.g., <samp>c</samp> for
<samp>e</samp>).</li>
<li>Transposed characters (e.g., <samp>teh</samp> for
<samp>the</samp>).</li>
<li>Missing characters, including missing spaces (e.g.,
<samp>tht</samp> for <samp>that</samp> or <samp>isit</samp>
for <samp>is it</samp>).</li>
<li>Inserted characters, including spaces (e.g., <samp>c
at</samp> for <samp>cat</samp>).</li>
</ol>
<p>The following are not considered errors:</p>
<ol>
<li>Differences in capitalization (e.g., <samp>pRince</samp>
for <samp>Prince</samp>).</li>
<li>Extra spaces (e.g., <pre>is it</pre> for <samp>is
it</samp>.</li>
<li>Typographical errors present in the original.</li>
</ol>
</div>
<xsl:apply-templates />
<div id="footer">
<hr/>
<p><i>Thanks to Adriane Hanson for developing the instructions and procedure.</i></p>
</div>
</div>
</body>
</html>
</xsl:template>
<xsl:template match="results">
<div id="results">
<h2>Sample Headlines
<xsl:if test="@batch">
<xsl:value-of select="concat(' from ', @batch)"/>
</xsl:if>
</h2>
<xsl:apply-templates />
</div>
</xsl:template>
<xsl:template match="stats">
<p>
<xsl:value-of select="/results/@batch"/> contains <xsl:value-of select="totalheadlinecount"/> headlines. The
following sample of <xsl:value-of select="sampleheadlinecount"/> headlines contains
<xsl:value-of select="samplecharcount"/> characters. To match an error rate of
<xsl:value-of select="errorrate"/> per cent, this sample can contain no more than
<strong><xsl:value-of select="maxerrors"/></strong> significant errors.
</p>
</xsl:template>
<xsl:template match="sample">
<h3>Headlines</h3>
<p>Article titles are weighted most heavily in searching and
so must be particularly accurate.</p>
<ol>
<li>Open each article to be tested on the <a
href="http://libserv23.princeton.edu/princetonperiodicals/cgi-bin/princetonperiodicals">Princeton Periodicals</a> website. It may be faster to search for an article title than navigate to it.</li>
<li>Check the spelling of each word in the title. Circle
errors or highlight them in red. Record the number of errors in the <samp>error count</samp> box.</li>
<li>When you have checked all titles, count the number of
character errors and note the total at the bottom of the table.</li>
</ol>
<table>
<thead>
<tr>
<th>#</th>
<th>title</th>
<th>page</th>
<th class="date">date</th>
<th>error count</th>
</tr>
</thead>
<tbody>
<xsl:apply-templates/>
<tr>
<td colspan="4" align="right"><b>Total Number of Errors:</b></td>
<td class="fail"><xsl:text> </xsl:text></td>
</tr>
</tbody>
</table>
</xsl:template>
<xsl:template match="title">
<tr>
<xsl:if test="position() mod 2">
<xsl:attribute name="class">shaded</xsl:attribute>
</xsl:if>
<td><xsl:value-of select="position()"/></td>
<td><xsl:value-of select="."/></td>
<td><xsl:value-of select="./@page"/></td>
<td><xsl:value-of select="./@date"/></td>
<td class="fail"><xsl:text> </xsl:text></td>
</tr>
</xsl:template>
<xsl:template match="mastheads">
<h3>Issue-Level Metadata</h3>
<ol>
<li>Open the issue you are going to test on the <a
href="http://libserv23.princeton.edu/princetonperiodicals/cgi-bin/princetonperiodicals">Princeton Periodicals</a> website.</li>
<li>Check that the title, volume number, issue number, and date
in the masthead match what is in the table below.</li>
<li>Check that the edition number is correct: if there is one
issue that day, it should be 01. If there are more, the
editions are numbered consecutively. The edition number is not
printed on the issue. </li>
<li>Check that the total number of pages is correct, and that
the pages are in the correct order. Zoom in enough to read the
page number and then scroll across.</li>
</ol>
<table>
<thead>
<tr>
<th>#</th>
<th>text</th>
<th>volume</th>
<th>issue</th>
<th>date</th>
<th class="pass">pass</th>
<th class="fail">fail</th>
</tr>
</thead>
<tbody>
<xsl:for-each select="masthead">
<tr>
<xsl:if test="position() mod 2">
<xsl:attribute name="class">shaded</xsl:attribute>
</xsl:if>
<td><xsl:value-of select="position()" /></td>
<td><xsl:value-of select="." /></td>
<td><xsl:value-of select="@volume" /></td>
<td><xsl:value-of select="@issue" /></td>
<td><xsl:value-of select="@date" /></td>
<td class="pass"><xsl:text> </xsl:text></td>
<td class="fail"><xsl:text> </xsl:text></td>
</tr>
</xsl:for-each>
</tbody>
</table>
</xsl:template>
<xsl:template match="zones">
<h3>Zones</h3>
<p>Zoning is how the different sections and articles of the newspaper
are divided and identified. Articles that are printed on more than
one page are zoned together as one article. Advertisements are zoned
together as a block, by column.</p>
<ol>
<li>Open the issue you are going to test on the <a
href="http://libserv23.princeton.edu/princetonperiodicals/cgi-bin/princetonperiodicals">Newspapers</a>
website in the issue images view.</li>
<li>Mouse over each article and confirm that the grey box
includes all of the right text. If the article continues on
another page, both sections should be highlighted when you mouse
over one. Common errors are to link the wrong two sections, to
cut off a few lines that belong at the end of an article, or to
include the first few lines of the next article. </li>
<li>Mouse over each article and confirm that the right article
title is in the pop up box. The most common error is that no
title pops up, but occasionally the wrong title pops up.</li>
</ol>
<table>
<thead>
<tr>
<th>#</th>
<th>issue</th>
<th class="pass">pass</th>
<th class="fail">fail</th>
</tr>
</thead>
<tbody>
<xsl:for-each select="issue">
<tr>
<xsl:if test="position() mod 2">
<xsl:attribute name="class">shaded</xsl:attribute>
</xsl:if>
<td><xsl:value-of select="position()" /></td>
<td><xsl:value-of select="." /></td>
<td class="pass"><xsl:text> </xsl:text></td>
<td class="fail"><xsl:text> </xsl:text></td>
</tr>
</xsl:for-each>
</tbody>
</table>
</xsl:template>
<xsl:template match="articleText">
<h3>Article Text</h3>
<p>The vendors are allowed 50 errors per article, so as long as the
number is not approaching that, it is acceptable to not catch
absolutely every single error.</p>
<ol>
<li>Open the article you are going to test on the <a
href="http://libserv23.princeton.edu/princetonperiodicals/cgi-bin/princetonperiodicals">Newspapers</a>
website.</li>
<li>Click on the article and select "Text of this article" to get the OCR of the article. </li>
<li>Copy the text to a Word document. Use spell check to
identify potential errors. For each of these, check the
original text of the article to confirm if it is an error. </li>
<li>Record the number of character errors in the spreadsheet.
We will add five errors to this count using a formula once you
are done checking, which is the average number of errors missed
by using spell check, so do not include in your count any errors
you notice that are not highlighted by spell check.</li>
<li>For articles with a large number of errors (20 or more),
note what caused the problem. Examples are: incorrectly
inserted lines of text, specific characters always incorrectly
rendered (e.g., <samp>1</samp> for <samp>i</samp>), characters inserted where there is a
column border, and very long articles with a mix of errors.</li>
</ol>
<table>
<thead>
<tr>
<th>#</th>
<th>title</th>
<th>page</th>
<th class="date">date</th>
<th class="pass">pass</th>
<th class="fail">fail</th>
</tr>
</thead>
<tbody>
<xsl:for-each select="title">
<tr>
<xsl:if test="position() mod 2">
<xsl:attribute name="class">shaded</xsl:attribute>
</xsl:if>
<td><xsl:value-of select="position()"/></td>
<td><xsl:value-of select="."/></td>
<td><xsl:value-of select="./@page"/></td>
<td><xsl:value-of select="./@date"/></td>
<td class="pass"><xsl:text> </xsl:text></td>
<td class="fail"><xsl:text> </xsl:text></td>
</tr>
</xsl:for-each>
</tbody>
</table>
</xsl:template>
</xsl:stylesheet>