You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The behaviour from feedparser's point of view is this:
Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:32:06)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import feedparser
doc = feedparser.parse("https://115.146.87.34:8080/")
if doc.bozo:
... raise doc.bozo_exception
...
Traceback (most recent call last):
File "", line 2, in
xml.sax._exceptions.SAXParseException: :137:14: unclosed token
The feed content used to trigger the error above is being dynamically generated by a Node.js application. If I instead serve the same feed content (saved into a static document) from an Apache web server, then the problem is avoided, so perhaps it is related to a timing issue, i.e. Node.js pausing part-way through serving up the atom feed. One timing issue which could affect urllib2 is case 3 in this question: http://stackoverflow.com/questions/7174927/when-does-socket-recvrecv-size-return
I have experienced truncation of feeds retrieved by urllib2 as described here:
http://stackoverflow.com/questions/13222376/urllib2-https-truncated-response
and here:
http://bugs.python.org/issue17569
The behaviour from feedparser's point of view is this:
Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:32:06)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
The feed content used to trigger the error above is being dynamically generated by a Node.js application. If I instead serve the same feed content (saved into a static document) from an Apache web server, then the problem is avoided, so perhaps it is related to a timing issue, i.e. Node.js pausing part-way through serving up the atom feed. One timing issue which could affect urllib2 is case 3 in this question:
http://stackoverflow.com/questions/7174927/when-does-socket-recvrecv-size-return
The truncation could be avoided by replacing use of the "urllib2" module in feedparser.py with use of the "requests" module, as described here:
http://stackoverflow.com/questions/13222376/urllib2-https-truncated-response
The text was updated successfully, but these errors were encountered: