Skip to content

Examples of Harvest Job Errors

Fuhu Xia edited this page Feb 27, 2023 · 3 revisions

Errors that require additional attention

- Could not harvest WAF link https://data.noaa.gov/waf/NOAA/nos/ocm/iso/xml/47968.xml: HTTPSConnectionPool(host='data.noaa.gov', port=443): Read timed out.

Source URL gives time-out.


- Error loading json content: not enough values to unpack (expected 2, got 0).
- JSONDecodeError loading json. Expecting value: line 2 column 1 (char 1)

Broken JSON file


- Error loading json content: not enough values to unpack (expected 2, got 0).
- ProxyError getting json source: HTTPSConnectionPool(host='www.nrc.gov', port=443): Max retries exceeded with url: /data.json (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response'))).

ProxyError. Check Egress, or remote server.


- Error loading json content: not enough values to unpack (expected 2, got 0).
- ProxyError getting json source: HTTPSConnectionPool(host='www.opendataphilly.org', port=443): Max retries exceeded with url: /data.json (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden'))).

Blocked by egress rules.


- Error loading json content: not enough values to unpack (expected 2, got 0).
-----------
- HTTPError getting json source: 503 Server Error: Service Unavailable for url: https://www.bls.gov/data.json.

- HTTPError getting json source: 404 Client Error: Not Found for url: https://data.baltimorecity.gov/data.json?version=2.
- ConnectionError getting json source: HTTPSConnectionPool(host='opendurham.nc.gov', port=443): Max retries exceeded with url: /data.json (Caused by SSLError(CertificateError("hostname 'opendurham.nc.gov' doesn't match either of '*.durhamnc.gov', 'durhamnc.gov'"))).
- HTTPError getting json source: 403 Client Error: Forbidden for url: http://www.state.gov/data.json.
- HTTPError getting json source: 504 Server Error: Gateway Time-out for url: https://data.usaid.gov/data.json.
- ConnectionError getting json source: HTTPConnectionPool(host='www.ed.gov', port=80): Max retries exceeded with url: /data.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8883befbb0>: Failed to establish a new connection: [Errno 110] Connection timed out')).

Other connection issues.

Errors that are ok

- Identifier: some-identofier; Title: some-title; 1 Error(s) Found. ### ERROR #1: 'some-field':some-error-message

Source dataset fails to validate.


- Element not found: some-xml-field
- Transformation to ISO failed

Source dataset fails to validate.


- spatial: Maximum allowed size is 32766. Actual size is #####.

Source dataset spatial field exceed Solr limit.


- Duplicate entry ignored for identifier: 'VA-VHA-OPP-001'.

Source dataset fails to validate.


- Element '{http://www.isotc211.org/2005/gmd}LI_Source', attribute 'id': '_none' is not a valid value of the atomic type 'xs:ID'.

Source dataset fails to validate.


- Object some-id already has this guid some-value

Duplicate guid. guid should be globlly unique.


- Element 'some-xml-field': This element is not expected. Expected is one of ( some-xml-fields).

Source dataset fails to validate.


- Parent identifier not found: "some-id"

Datajson missing parent identifier


 - No records to change

WAF file timestamp changed but content has no change. This could happen if the WAF server touched the file without actual change.


- Error parsing bounding box value: could not convert string to float: ''

Wrong values in the dataset.


- Validation Error: {'Name': 'That URL is already in use.'}

Known GitHub issue https://github.com/GSA/data.gov/issues/4046


- Could not parse XML file: internal error: Huge input lookup, line 185572, column 131 (<string>, line 185572)

Reason unknown. Only happens to certain XML. Could be size or validation.


- title: Search. That name cannot be used.

Search is a CKAN keyword. Cant be used as title.


- Point extent defined instead of polygon

Dataset uses wrong value for spatial field.


- Validation Error: {'Name or id': 'Missing value'}

Source dataset fails to validate.

Clone this wiki locally