Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression - Inconsistent DBAPI serialization of pendulum datetimes #798

Open
2 tasks done
rishi-kulkarni opened this issue Jan 26, 2024 · 0 comments
Open
2 tasks done

Comments

@rishi-kulkarni
Copy link

  • I am on the latest Pendulum version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • OS version and name: AmazonLinux2023
  • Pendulum version: 3.0.0

Issue

My team uses Airflow's SQLiteHook to store some data on disk between tasks. We timestamp the column by creating some pandas DataFrame and inserting into the SQLite file with the hook.

    data_interval_end = pendulum.now()
    df = df.rename(columns=str.lower).astype({"user_id": int}).assign(date_created=data_interval_end)
    sqlite_hook.insert_rows(
        failure_event_sqlite_table,
        list(df[["user_id", "shift_id", "successes", "failures", "date_created"]].itertuples(index=False)),
    )

This has (and still does) make rows that look like this:

   user_id  shift_id  successes  failures               date_created
0        1        11        1.0       0.1  2023-09-25T00:00:00+00:00

In later tasks, we may pull out some data with a query that looks like this:

sqlite_hook.get_pandas_df(
        f"select * from {failure_event_sqlite_table} where date_created = ?", parameters=(data_interval_end,)
    )

This used to generate a query with a timestamp in the same format, but now it generates a query with a timestamp that looks like this (note the missing T)

where date_created = '2023-07-04 18:00:00+00:00'

This is inconsistent behavior breaks all of these queries - the pendulum timestamps should serialize and deserialize in the same format.

What you think should happen instead?

The previous behavior was to make statements like this:

where date_created = '2023-07-04T18:00:00+00:00'

But it's fine either way, as long as it's consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant