chore!: replace direct download of nyc_taxi with hvsampledata#1462
chore!: replace direct download of nyc_taxi with hvsampledata#1462
Conversation
hoxbro
left a comment
There was a problem hiding this comment.
You should remove pyct + setuptools from pyproject.toml + pixi.toml
| "import hvsampledata as hvs\n", | ||
| "\n", | ||
| "df = pd.read_csv('../data/nyc_taxi.csv', usecols=['dropoff_x', 'dropoff_y'])\n", | ||
| "df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", |
There was a problem hiding this comment.
| "df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", | |
| "df = hvs.nyc_taxi_remote(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", |
Also, mention that this is the first time this cell is run.
pixi.toml
Outdated
| setuptools = "*" # distutils for pyct | ||
| toolz = "*" | ||
| xarray = "*" | ||
| fastparquet = "*" |
There was a problem hiding this comment.
Please check that this has not indirectly pinned anything like pandas.
CodSpeed Performance ReportMerging #1462 will improve performances by 48.77%Comparing Summary
Benchmarks breakdown
|
maximlt
left a comment
There was a problem hiding this comment.
I think this should be first deprecated before being removed https://holoviz.org/about/heps/hep2.html.
Are you okay with not having pyct as a required dependency? but still having the entrypoint and explaining that:
This is also a little bit weird, because we don't break running code, but "only" the CLI. |
|
The CLI is part of public interface so I think it deserves the same treatment as the API. I don't see an urgent reason to immediately remove pyct from the dependencies. I'm in favor of simply adding deprecation warnings, and removing the whole thing in a few releases. |
datashader/__init__.py
Outdated
| from . import transfer_functions as tf # noqa (API import) | ||
| from . import data_libraries # noqa (API import) | ||
|
|
||
| with suppress(ImportError): |
There was a problem hiding this comment.
This is not the right place it should be in main.py.
| 'numpy', | ||
| 'pandas', | ||
| 'param', | ||
| 'pyct', |
There was a problem hiding this comment.
I don't think pyct should be removed yet from the dependencies.
There was a problem hiding this comment.
It's not removed totally. Just moved to optional dependencies.
There was a problem hiding this comment.
With these changes if I install datashader and run datashader example it breaks. So that's a breaking change. The goal is to deprecate it so people get a chance to stop using datashader example. Then pyct can be removed entirely from the dependencies.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1462 +/- ##
==========================================
- Coverage 88.34% 88.31% -0.03%
==========================================
Files 96 96
Lines 18932 18943 +11
==========================================
+ Hits 16725 16730 +5
- Misses 2207 2213 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I added a temporary patch to allow the CI to pass pending a new hvsampledata release but otherwise I think it's ready for review now. |
pyproject.toml
Outdated
| # 2025-09 | ||
| "ignore:Signature .* for <class 'numpy.longdouble'>.*:UserWarning", | ||
| # 2025-10 | ||
| "ignore:The 'pyct' package bundled as a datashader dependency is deprecated since version 0.19 and will be removed in version 0.20.*" # https://github.com/holoviz/datashader/pull/1462 |
There was a problem hiding this comment.
I think we should remove this filtering. Running datashader in Python (and not CLI) should never show this warning.
| "import hvsampledata as hvs\n", | ||
| "import hvsampledata as hvs, pandas as pd\n", | ||
| "\n", | ||
| "df = hvs.nyc_taxi_remote(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", |
pixi.toml
Outdated
| setuptools = "*" # distutils for pyct | ||
| toolz = "*" | ||
| xarray = "*" | ||
| pyarrow = "*" |
There was a problem hiding this comment.
This should be added back to where it previously was.
| with suppress(ImportError): | ||
| import pyct.cmd | ||
| from datashader import _warn_pyct_deprecated | ||
|
|
||
| _warn_pyct_deprecated(stacklevel=1) | ||
| pyct.cmd.fetch_data(name="data", path="examples", datasets="datasets.yml") |
There was a problem hiding this comment.
The scripts directory is for internal use only; please remove this.
datashader/__init__.py
Outdated
|
|
||
| def _warn_pyct_deprecated(stacklevel=2): | ||
| warnings.warn( | ||
| "The 'pyct' package is deprecated since version 0.19 " |
There was a problem hiding this comment.
If a user saw this warning, do you think they would understand it?
I will also move warnings import into the function.
There was a problem hiding this comment.
If a user saw this warning, do you think they would understand it?
Reads simple enough for me, yeah.
I will also move warnings import into the function.
OK.
There was a problem hiding this comment.
Let me rephrase it: what does the pyct package do, and why is it affecting the deprecated functions? Without looking at the code, it seems very confusing.
There was a problem hiding this comment.
Hmm, OK I get it now. A user wouldn't have used pyct directly and would have been confused seeing the mention of it in the message. Will rephrase to mention the actual functions being called.
datashader/__init__.py
Outdated
| """Wrapper to add deprecation warning to pyct functions.""" | ||
| @wraps(func) | ||
| def wrapper(*args, **kwargs): | ||
| _warn_pyct_deprecated(stacklevel=2) |
There was a problem hiding this comment.
I'm not sure if these stack levels are set correctly, but they are likely still emitted because the category is a FutureWarning.
You can test this out by setting the CategoryWarning to DeprecationWarning.
There was a problem hiding this comment.
You're right. Thanks. The correct stacklevel is 3 in this instance. I tested it via
import datashader as ds
ds.fetch_data("datashader")| "The file `nyc_taxi.csv` used above is located at\n", | ||
| "[nyc_taxi.csv](https://github.com/holoviz/datashader/blob/main/examples/data/.data_stubs/nyc_taxi.csv) in the Datashader repository. When running this example, make sure the file is available locally and update the path accordingly.\n", | ||
| ":::\n" | ||
| "The first time this cell is run, it may take some time for the NYC taxi dataset to be downloaded remotely (about 260MB).\n", |
There was a problem hiding this comment.
| "The first time this cell is run, it may take some time for the NYC taxi dataset to be downloaded remotely (about 260MB).\n", | |
| "The first time this cell is run, it will download the NYC taxi dataset (about 260MB).\n", |
|
|
||
| # make pyct's example/data commands available if possible | ||
| from functools import partial | ||
| from functools import partial, wraps |
There was a problem hiding this comment.
Let's del wraps as we did with the others.
datashader/__main__.py
Outdated
| with suppress(ImportError): | ||
| import pyct # noqa: F401 | ||
|
|
||
| from . import _warn_pyct_deprecated | ||
| _warn_pyct_deprecated(stacklevel=3) | ||
| try: | ||
| import pyct.cmd |
There was a problem hiding this comment.
| with suppress(ImportError): | |
| import pyct # noqa: F401 | |
| from . import _warn_pyct_deprecated | |
| _warn_pyct_deprecated(stacklevel=3) | |
| try: | |
| import pyct.cmd | |
| try: | |
| import pyct.cmd | |
| from . import _warn_pyct_deprecated | |
| _warn_pyct_deprecated(stacklevel=3) |
pixi.toml
Outdated
| shapely = ">=2.0.0" | ||
| spatialpandas = "*" | ||
| streamz = "*" | ||
| hvsampledata = ">=0.1.4" |
There was a problem hiding this comment.
| hvsampledata = ">=0.1.4" | |
| hvsampledata = ">=0.1.5a3" |
Let's use a dev version to clearly indicate which version works here. We can remove the dev version once 0.1.5 is released.
|
|
||
| warnings.warn( | ||
| "The 'fetch_data()', 'copy_examples()', and 'examples()' functions are " | ||
| "deprecated since version 0.19 and will be removed in version 0.20. " |
There was a problem hiding this comment.
With the release candidacy of datashader, I feel like 0.20 would be released at least 6 months after 0.19.
Seems like each minor release has been a year in between (ignoring 0.17.0, which did not correctly drop a Python version).
There was a problem hiding this comment.
Ok, so do you suggest we remove it in 0.19.xx, and modify the message to that effect?
There was a problem hiding this comment.
No. I suggest 0.20, and not 0.21. We would never drop something in a patch release.
There was a problem hiding this comment.
OK, the message is correct then. It says 0.20 already.
There was a problem hiding this comment.
Yes, but still want to highlight it as we normally do two minor releases.
There was a problem hiding this comment.
FYI, as per HEP2, there's no requirement to indicate the removal version:
https://holoviz.org/about/heps/hep2.html is worth a read if you haven't done it yet.
This PR replaces the direct download of the large
nyc_taxidata with the version from hvsampledata.Testing with an editable install of hvsampledata