-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathNEWS
179 lines (104 loc) · 4.91 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
NEWS
====
Versioning
----------
Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
* Breaking backward compatibility bumps the major (and resets the minor
and patch)
* New additions without breaking backward compatibility bumps the minor
(and resets the patch)
* Bug fixes and misc changes bumps the patch
textreadr 1.2.1 -
----------------------------------------------------------------
BUG FIXES
NEW FEATURES
MINOR FEATURES
IMPROVEMENTS
CHANGES
textreadr 1.0.3 - 1.2.0
----------------------------------------------------------------
BUG FIXES
* `read_docx` would return the same word as 2 separate words if different
characters within the word had different styling (pseudocode example:
'<w:p><bold>h</bold>ello word<w:p>' returned 'h ello world').
NEW FEATURES
* `read_odt` added to read in .odt files.
textreadr 0.9.1 - 1.0.2
----------------------------------------------------------------
BUG FIXES
* `read_pdf` threw an error when `ocr = TRUE` but the **tesseract** package was
unavailable. This has been fixed.
* `Read_xxx` functions failed when a URL was provided for the path. This behavior
has been corrected. Thanks to Brent Brewington for the spot in issue #18.
NEW FEATURES
* `un_zip` & `un_tar` added as convenience functions (wrapper for `?utils::unzip`
& `?utils::untar`) to make the functions more pipe-able.
* `read_pptx` added to read in .pptx files.
MINOR FEATURES
* `read_xml` basic functionality added and part of `read_document`.
* Looping utilities `loop_counter`, `base_name`, and `try_limit` added for use
inside of loops. Makes loop reporting and error handling easier and more readable.
IMPROVEMENTS
* `read_docx` would return non-text, formatting information. Issue #19 provides
a demonstration of this issue. This behavior has been corrected to grab text
(w:t) tags with paragraphs (w:p).
textreadr 0.8.0 - 0.9.0
----------------------------------------------------------------
NEW FEATURES
* `peek` picks up a `strings.left` argument to align strings to the left. This
is the default because this is a text reading package that deals primarily
with strings.
* `read_pdf` picks up an `ocr` argument in order to properly handle image based
,pdf files in order to extract the text. For this task optical character
recognition (OCR) is required. The **tesseract** package provides the back-end
for processing these types of .pdfs.
* `browse` added to open files and directories.
textreadr 0.6.0 - 0.7.0
----------------------------------------------------------------
BUG FIXES
* `read_dir` did not handle errored read-ins correctly resulting in an R error.
NEW FEATURES
* `read_document` picks up an explicit `skip`, `remove.empty`, and `trim`
argument like the other `read_` functions.
* `read_rtf` added to the document forms that can be parsed. This relies on the
**striprtf** package as a back-end. `read_document` and `read_transcript` pick
up the ability to read rich text format as well.
MINOR FEATURES
* `as_transcript` added for coercion of internal strings to transcript. This
function adds the ability to call out the person variable via a regex. For
example one may split after all caps as the leading string.
* `read_dir` and `read_dir_transcript` pick up an `ignore.case` function for pattern.
Pattern becomes more powerful in that it was moved outside of the `dir` command
via a `grep` call.
textreadr 0.4.0 - 0.5.1
----------------------------------------------------------------
BUG FIXES
* The README.md called for `ex_` functions from **qdapRegex**. This was the dev
version of **qdapRegex**. This is now the CRAN version and now works for users.
NEW FEATURES
* `read_html` added for reading in the text from the body of .html documents.
`read_document` inherits this ability as well.
MINOR FEATURES
* The low level read functions all now have consistent arguments: `skip`,
`remove.empty`, & `trim` to make their use more interoperable.
IMPROVEMENTS
* **textreadr** no longer uses the **antiword** program directly, instead the
R antiword package is called for `read_doc`. This makes installation across
operating systems more standardized.
CHANGES
* The logo has been moved to tools to conform to CRAN standards.
* `read_doc`'s argument `format` is now `FALSE` by default rather than `TRUE` to
be consistent with the other read functions.
* `read_docx` no longer uses the **XML** package but now uses **xml2** as
suggested by Jeroen Ooms (see issue #7).
textreadr 0.3.1
----------------------------------------------------------------
NEW FEATURES
* `read_dir_transcript` added to complement `read_dir` aimed at a directory of
transcripts.
textreadr 0.0.1 - 0.3.0
----------------------------------------------------------------
This package is a collection of convenience tools for reading text documents
into R.