Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore broken datetime strings on eleasticsearch #626

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

weidenka
Copy link

@weidenka weidenka commented Nov 1, 2023

For me this fixes an error related to a wrong format (1-01-01 00:00:00 ) of a single timestamp on the ES side. I don't see a disadvantage excluding those data points in the conversion.

Stacktrace

Traceback (most recent call last):
  File "/mypath/env/lib/python3.9/site-packages/eland/common.py", line 135, in elasticsearch_date_to_pandas_date
    return pd.to_datetime(
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1102, in to_datetime
    result = convert_listlike(np.array([arg]), format)[0]
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 393, in _convert_listlike_datetimes
    return _to_datetime_with_unit(arg, unit, name, tz, errors)
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 557, in _to_datetime_with_unit
    arr, tz_parsed = tslib.array_with_unit_to_datetime(arg, unit, errors=errors)
  File "pandas/_libs/tslib.pyx", line 364, in pandas._libs.tslib.array_with_unit_to_datetime
ValueError: non convertible value 0001-01-01T00:00:00+00:00 with the unit 'ms'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mypath/scripts/score_imputer.py", line 19, in <module>
    korro_data = query_data_from_elastic(use_cache=True)
  File "/mypath/daprod_health_data/korro_data.py", line 39, in query_data_from_elastic
    df = ed.eland_to_pandas(elastic_df)
  File "/mypath/env/lib/python3.9/site-packages/eland/etl.py", line 292, in eland_to_pandas
    return ed_df.to_pandas(show_progress=show_progress)
  File "/mypath/env/lib/python3.9/site-packages/eland/dataframe.py", line 1351, in to_pandas
    return self._query_compiler.to_pandas(show_progress=show_progress)
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 506, in to_pandas
    return self._operations.to_pandas(self, show_progress)
  File "/mypath/env/lib/python3.9/site-packages/eland/operations.py", line 1226, in to_pandas
    for df in self.search_yield_pandas_dataframes(query_compiler=query_compiler):
  File "/mypath/env/lib/python3.9/site-packages/eland/operations.py", line 1278, in search_yield_pandas_dataframes
    df = query_compiler._es_results_to_pandas(hits)
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 268, in _es_results_to_pandas
    rows.append(self._flatten_dict(row, field_mapping_cache))
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 348, in _flatten_dict
    flatten(y)
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 312, in flatten
    flatten(x[a], name + a + ".")
  File "/mypath/env/lib/python3.9/site-packages/eland/query_compiler.py", line 322, in flatten
    x = elasticsearch_date_to_pandas_date(
  File "/mypath/env/lib/python3.9/site-packages/eland/common.py", line 139, in elasticsearch_date_to_pandas_date
    return pd.to_datetime(value)
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1102, in to_datetime
    result = convert_listlike(np.array([arg]), format)[0]
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 438, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64ns(
  File "/mypath/env/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2177, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas/_libs/tslib.pyx", line 427, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 678, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 674, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 649, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/np_datetime.pyx", line 212, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00 present at position 0

Process finished with exit code 1

Copy link

❌ Author of the following commits did not sign a Contributor Agreement:
521cf6f

Please, read and sign the above mentioned agreement if you want to contribute to this project

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! The disadvantage is that you're going to silently accept wrong dates in the dataframe. What we want to do instead is to add an errors parameters to Eland itself.

@weidenka
Copy link
Author

weidenka commented Nov 6, 2023

You silently ignore errors in ES, yes. For me that sounds ok. Adding an error parameter would be better, I agree. Do you plan this for the near future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants