Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request 838 of 885 timed out, would you like to [A]bort or [S]kip or [R]etry and continue? #54

Open
ArieRudich opened this issue Jul 9, 2018 · 5 comments

Comments

@ArieRudich
Copy link

ArieRudich commented Jul 9, 2018

Let me start by giving BIG thanks for a most useful tool! THANKS!

The attached Traceback is of a timeout that occured after a LONG all night dump
(request 838 out of 885 requests using resultOffset method).

Might be a good idea to catch this and turn it into a user prompt, something like:

  • "Request 838 of 885 timed out, would you like to [A]bort or [S]kip or [R]etry and continue?"

preferably with a default TimeOutRetry=3 ( or more general FailRetry ) and a flag argument to override it.

Another helpful aid in this and similar situations (Just had a "similar"situation with "socket.gaierror: [Errno 11002] getaddrinfo failed") can be to expose the --resultOffset so it can restart an aborted download at the offset last reported by -v or, even better, reported by the exception handler.

What do you think?

Thanks!

Traceback (most recent call last):
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "c:\python37\Lib\http\client.py", line 1321, in getresponse
    response.begin()
  File "c:\python37\Lib\http\client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "c:\python37\Lib\http\client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "c:\python37\Lib\socket.py", line 589, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\adapters.py", line 445, in send
    timeout=timeout
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\util\retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
    raise value
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='***XXX***', port=80): Read timed out. (read timeout=30)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\esridump\dumper.py", line 418, in __iter__
    response = self._request('POST', query_url, headers=headers, data=query_args)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\esridump\dumper.py", line 43, in _request
    return requests.request(method, url, timeout=self._http_timeout, **kwargs)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\requests\adapters.py", line 526, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='www***XXX***', port=80): Read timed out. (read timeout=30)

During handling of the above exception, another exception occurred:

2018-07-09 05:22:02,786 - cli.esridump - DEBUG - POST http://www.***XXX***/MapServer/18/query, args {'resultOffset': 838000, 'resultRecordCount': 1000, 'where': '1=1', 'geometryPrecision': 7, 'returnGeometry': True, 'outSR': '4326', 'outFields': '*', 'f': 'json'}

Traceback (most recent call last):
  File "c:\python37\Lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\python37\Lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\xampp\htdocs\Data\Snippets\esridump\Scripts\esri2geojson.exe\__main__.py", line 9, in <module>
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\esridump\cli.py", line 111, in main
    feature = next(feature_iter)
  File "c:\xampp\htdocs\data\snippets\esridump\lib\site-packages\esridump\dumper.py", line 425, in __iter__
    raise EsriDownloadError("Could not connect to URL", e)
esridump.errors.EsriDownloadError: ('Could not connect to URL', ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='www.***XXX***', port=80): Read timed out. (read timeout=30)")))
@iandees
Copy link
Member

iandees commented Jul 9, 2018

Thanks for the suggestion! This might be a little tricky to pull off because of the way I built the command line tool on top of the library, but I'll think through it some this week.

@ArieRudich
Copy link
Author

ArieRudich commented Jul 9, 2018 via email

@andrewharvey
Copy link
Contributor

👍 to automated retry, I commonly encounter things like below, which upon just trying again (I've hacked in a way to pass in the offset and total feature count) to pick up where we left off)

Traceback (most recent call last):
  File "/usr/local/bin/esri2geojson", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/esridump/cli.py", line 116, in main
    feature = next(feature_iter)
  File "/usr/local/lib/python2.7/dist-packages/esridump/dumper.py", line 425, in __iter__
    raise EsriDownloadError("Could not connect to URL", e)
esridump.errors.EsriDownloadError: ('Could not connect to URL', EsriDownloadError("http://maps.six.nsw.gov.au/arcgis/rest/services/public/NSW_Property/MapServer/4/query: Could not retrieve this chunk of objects HTTP 504 <html><body><h1>504 Gateway Time-out</h1>\nThe server didn't respond in time.\n</body></html>\n",))

@andrewharvey
Copy link
Contributor

@ArieRudich Although this ticket is about being able to restart when timeouts occur, for your original timeout issue, if the layer has an ID field, then I've found forcing esri2geojson to query by ID range avoids timeouts, you can now do this with --paginate-oid, from a4c68db. Could you try that and see if it makes a difference?

@jayarehart
Copy link

@andrewharvey I have this similar problem as described above, tried --paginate-oid, but did not have any luck with stopping timeouts. Ended up doing it manually, but that required closing the brackets on time the file timed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants