Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type json (1e9999) #323

Open
sentry-io bot opened this issue Jan 14, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@sentry-io
Copy link

sentry-io bot commented Jan 14, 2021

Sentry Issue: KINGFISHER-PROCESS-7X

InvalidTextRepresentation: invalid input syntax for type json
LINE 1: ...data) VALUES ('fe8a9342ae1a38e6ef3c574e69a8d21f', '{"ocid": ...
                                                             ^
DETAIL:  Token "Infinity" is invalid.
CONTEXT:  JSON data, line 1: ...tDetails": {"maximumLotsBidPerSupplier": Infinity...

  File "sqlalchemy/engine/base.py", line 1236, in _execute_context
    cursor, statement, parameters, context
  File "sqlalchemy/engine/default.py", line 536, in do_execute
    cursor.execute(statement, parameters)

DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type json
LINE 1: ...data) VALUES ('fe8a9342ae1a38e6ef3c574e69a8d21f', '{"ocid": ...
                                                             ^
DETAIL:  Token "Infinity" is invalid.
CONTEXT:  JSON data, line 1: ...tDetails": {"maximumLotsBidPerSupplier": Infinity...
 [SQL: 'INSERT INTO data (hash_md5, data) VALUES (%(hash_md5)s, %(data)s) RETURNING data.id'] [parameters: {'hash_md5': 'fe8a9342ae1a38e6ef3c574e69a8d21f', 'data': '{"oc...
(18 additional frame(s) were not displayed)
...
  File "sqlalchemy/engine/base.py", line 1458, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "sqlalchemy/util/compat.py", line 296, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "sqlalchemy/util/compat.py", line 276, in reraise
    raise value.with_traceback(tb)
  File "sqlalchemy/engine/base.py", line 1236, in _execute_context
    cursor, statement, parameters, context
  File "sqlalchemy/engine/default.py", line 536, in do_execute
    cursor.execute(statement, parameters)

This happens with the UK spider, eg https://www.find-tender.service.gov.uk/api/1.0/ocdsReleasePackages?limit=100&updatedTo=2021-01-13T16:21:39&cursor=dXBkYXRlZFRvPTIwMjEtMDEtMTNUMTY6MjE6Mzl8bmV4dEN1cnNvcj0xNTAxMzM=

cc @aguilerapy @duncandewhurst

@jpmckinney
Copy link
Member

jpmckinney commented Jan 14, 2021

Ah, yes, per our guidance, unlimited maximum lots is expressed as 1e9999, which gets parsed as Infinity. However, SQLAlchemy doesn't know how to serialize Infinity to JSON.

Ideally, we would update the SQL adapter. A quick fix would instead replace occurrences of 1e9999 with a very large 32-bit integer, so that PostgreSQL can store it, without changing the meaning of the data.

FYI @jakubkrafka as an edge case for the Django version.

@duncandewhurst
Copy link

duncandewhurst commented Jan 14, 2021

A quick fix would instead replace occurrences of 1e9999 with a very large 32-bit integer, so that PostgreSQL can store it, without changing the meaning of the data.

@jpmckinney is that something we can do in the Kingfisher Collect spider? Or do you mean do a manual find and replace and then local load the data?

@jpmckinney
Copy link
Member

jpmckinney commented Jan 14, 2021

It can be done in either Kingfisher Collect or Kingfisher Process. Maybe better in Process, since 1e9999 is allowed in JSON - it's just that Process currently stumbles on it. But in a pinch an analyst can also do the manual find-and-replace.

@duncandewhurst
Copy link

Is it worth fixing in Kingfisher Process before the rewrite? The next mini-sprint is scheduled for w/c 9th February, so it may be better to apply the fix in the scraper as we may run several scrapes before then if the publisher is responsive to feedback and it would be good to save analyst time on applying the fix manually.

@jpmckinney
Copy link
Member

Sure, we can have a temporary fix in Kingfisher Collect (cc @yolile), and also open an issue in Kingfisher Collect to remove that fix once this issue is closed in the new version of Kingfisher Process.

@yolile
Copy link
Member

yolile commented Jan 15, 2021

@duncandewhurst I fixed the issue in collect and ran the spider successfully now

@jpmckinney jpmckinney added the bug Something isn't working label Feb 18, 2021
@jpmckinney jpmckinney added this to the V3 milestone Feb 18, 2021
@jpmckinney jpmckinney changed the title DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type json DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type json (1e9999) Nov 11, 2021
@jpmckinney jpmckinney removed this from the V3 milestone Jun 8, 2022
@jpmckinney
Copy link
Member

Not a priority as it only affects https://github.com/open-contracting/kingfisher-collect/blob/main/kingfisher_scrapy/spiders/united_kingdom_fts.py#L48 and the data modification doesn’t impact correct data analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants