Description
I'm working on a simple NLP project and am trying to connect to the CoreNLP server via Jupyter Notebook. When I pass my .txt file to the server for parsing, it returns this error:
HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cpos%2Clemma%2Cssplit%2Cparse%22%7D
When I follow the url and simply paste the text of my file, it parses just fine.
Here's my simple script:
<><><><><><><><><><><><><><><><><><><>><><
from nltk.parse.corenlp import CoreNLPParser
from pprint import pprint
import io
parser = CoreNLPParser()
with open("INPUT_FILE.txt", "r", encoding="utf-8") as file:
text = file.read() # Read entire content
parsed_sentences = list(parser.parse_text(text))
with open("OUTPUT_FILE.txt", "w", encoding="utf-8") as output_file:
output_file.write(tree.pformat() + "\n\n")
<><><><><><><><><><><><><><><><><><><>><><
I'm using the stanford-corenlp-4.5.8.jar, and the stanford-corenlp-4.5.8-models.jar.
Here's the full error message I receive:
<><><><><><><><><><><><><><><><><><><>><><
HTTPError Traceback (most recent call last)
Cell In[8], line 13
10 text = file.read() # Read entire content
12 # Parse the text
---> 13 parsed_sentences = list(parser.parse_text(text))
15 ## Write parsed trees to a file
16 with open("HHB_c1_syntax_diagram.txt", "w", encoding="utf-8") as output_file:
File ~/Library/Python/3.11/lib/python/site-packages/nltk/parse/corenlp.py:303, in GenericCoreNLPParser.parse_text(self, text, *args, **kwargs)
294 def parse_text(self, text, *args, **kwargs):
295 """Parse a piece of text.
296
297 The text might contain several sentences which will be split by CoreNLP.
(...)
301
302 """
--> 303 parsed_data = self.api_call(text, *args, **kwargs)
305 for parse in parsed_data["sentences"]:
306 yield self.make_tree(parse)
File ~/Library/Python/3.11/lib/python/site-packages/nltk/parse/corenlp.py:255, in GenericCoreNLPParser.api_call(self, data, properties, timeout)
245 default_properties.update(properties or {})
247 response = self.session.post(
248 self.url,
249 params={"properties": json.dumps(default_properties)},
(...)
252 timeout=timeout,
253 )
--> 255 response.raise_for_status()
257 return response.json(strict=self.strict_json)
File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
1019 http_error_msg = (
1020 f"{self.status_code} Server Error: {reason} for url: {self.url}"
1021 )
1023 if http_error_msg:
-> 1024 raise HTTPError(http_error_msg, response=self)
HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cpos%2Clemma%2Cssplit%2Cparse%22%7D
<><><><><><><><><><><><><><><><><><><>><><
The text is one chapter of a book; I tried uploading the whole book text to the online CoreNLP version 4.5.8, but apparently it was over 200K tokens and the server only handles up to 100K. I thought maybe cutting down to a single chapter would work, but it did not seem to make a difference. Any ideas what's amiss? Appreciate any help.