Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to access cuDF due to RuntimeError: cuDF failure : Unsupported type_id conversion to cudf #1803

Open
mtnt-2022 opened this issue Apr 24, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@mtnt-2022
Copy link

mtnt-2022 commented Apr 24, 2023

Describe the bug
A clear and concise description of what the bug is.
I am trying to run the example code at https://nvidia-merlin.github.io/NVTabular/main/api/ops/categorify.html

import cudf
import nvtabular as nvt

# Create toy dataset
df = cudf.DataFrame({
    'author': ['User_A', 'User_B', 'User_C', 'User_C', 'User_A', 'User_B', 'User_A'],
    'productID': [100, 101, 102, 101, 102, 103, 103],
    'label': [0, 0, 1, 1, 1, 0, 0]
}). # ERROR: RuntimeError: cuDF failure at: /opt/rapids/src/cudf/cpp/src/interop/from_arrow.cu:86: Unsupported type_id conversion to cudf
dataset = nvt.Dataset(df)

# Define pipeline
CATEGORICAL_COLUMNS = ['author', 'productID']
cat_features = CATEGORICAL_COLUMNS >> nvt.ops.Categorify(
    freq_threshold={"author": 3, "productID": 2},
    num_buckets={"author": 10, "productID": 20})


# Initialize the workflow and execute it
proc = nvt.Workflow(cat_features)
proc.fit(dataset)
ddf = proc.transform(dataset).to_ddf()

# Print results
print(ddf.compute())

also, at https://github.com/NVIDIA-Merlin/NVTabular/blob/main/tests/unit/examples/test_02-Advanced-NVTabular-workflow.py
I got error for

from merlin.core.compat import cudf

ImportError                               Traceback (most recent call last)
Cell In[12], line 1
----> 1 from merlin.core.compat import cudf

ImportError: cannot import name 'cudf' from 'merlin.core.compat' (/usr/local/lib/python3.8/dist-packages/merlin/core/compat.py)

Expected behavior
It should work well.

Environment details (please complete the following information):
Platform: Debian 4.19.269-1
Python version: 3.8.10
PyTorch version (GPU?): 2.0.0 (yes support GPU)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
    GCP

  • Method of NVTabular install: [conda, Docker, or from source]
    Docker

  • If method of install is [Docker], provide docker pull & docker run commands used
    I am using nvcr.io/nvidia/merlin/merlin-pytorch:23.02. All cudf libs were installed by GCP by default.

Additional context

cudf : 22.8.0a0+304.g6ca81bbc78.dirty
dask-cudf : 22.8.0a0+304.g6ca81bbc78.dirty

CUDA Version: 11.8
NVIDIA-SMI 510.47.03
Driver Version: 510.47.03

merlin 1.9.1
merlin-core 0.5.0
merlin-dataloader 0.0.3
merlin-models 23.2.0
merlin-systems 23.2.0

nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
nvidia-pyindex 1.0.9
nvtabular 23.2.0

GPU : Tesla T4
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

triton 2.0.0
tritonclient 2.32.0

Ubuntu 20.04.5 LTS
rmm 22.8.0a0+62.gf6bf047.dirty
torch 2.0.0

@mtnt-2022 mtnt-2022 added the bug Something isn't working label Apr 24, 2023
@karlhigley
Copy link
Contributor

It looks like you have an older version of merlin-core. The latest is 23.02.01. Based on when merlin.core.compat was added, I'm fairly confident installing a newer version of merlin-core will resolve the cudf import issue you described.

@mtnt-2022
Copy link
Author

@karlhigley May I use

FROM nvcr.io/nvidia/merlin/merlin-pytorch:latest

in the docker file so that I can always install the latest one ?

@mtnt-2022
Copy link
Author

mtnt-2022 commented Apr 25, 2023

@karlhigley , I got a build error:

   FROM nvcr.io/nvidia/merlin/merlin-pytorch:23.02.01. (same error for :latest)
 "Containerize the artifact": manifest for nvcr.io/nvidia/merlin/merlin-pytorch:23.02.01 not found: manifest unknown: manifest unknown"

@karlhigley
Copy link
Contributor

Ah sorry, I meant the latest version of merlin-core is 23.02.01; there's no 23.02.01 container version. The latest version of the Torch container comes with merlin-core 23.2.0 pre-installed, which should be new enough to avoid the merlin.core.compat error you mentioned. Since you have merlin-core 0.5.0, I'm guessing you may have installed one of the Merlin libraries from source, some of which have overly permissive version specifiers and can cause this issue. Using the merlin-pytorch 23.02 container, it should be sufficient to pip install merlin-core after installing any of the other Merlin libraries from source.

@mtnt-2022
Copy link
Author

mtnt-2022 commented Apr 25, 2023

@karlhigley , I am using this for the container image

   FROM nvcr.io/nvidia/merlin/merlin-pytorch:nightly

I got:

merlin                                        1.10.0
merlin-core                               23.2.1
merlin-dataloader                    23.2.1
merlin-models                         23.2.0
merlin-systems                        0+untagged.1.ge94d2a9
cuda-python                           11.8.1
cudf                                         22.8.0a0+304.g6ca81bbc78.dirty
cupy-cuda117                          10.6.0

When I run

      import cudf
      # import pandas as pd
print('cuDF Version:', cudf.__version__)

I got:


 ---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 import cudf
     2 # import pandas as pd
     3 print('cuDF Version:', cudf.__version__)

File /usr/local/lib/python3.8/dist-packages/cudf/__init__.py:12
     8 from numba import config as numba_config, cuda
    10 import rmm
---> 12 from cudf.api.types import dtype
    13 from cudf import api, core, datasets, testing
    14 from cudf._version import get_versions

File /usr/local/lib/python3.8/dist-packages/cudf/api/__init__.py:3
     1 # Copyright (c) 2021, NVIDIA CORPORATION.
----> 3 from cudf.api import extensions, types
     5 __all__ = ["extensions", "types"]

File /usr/local/lib/python3.8/dist-packages/cudf/api/types.py:18
    15 from pandas.api import types as pd_types
    17 import cudf
---> 18 from cudf.core.dtypes import (  # noqa: F401
    19     _BaseDtype,
    20     dtype,
    21     is_categorical_dtype,
    22     is_decimal32_dtype,
    23     is_decimal64_dtype,
    24     is_decimal128_dtype,
    25     is_decimal_dtype,
    26     is_interval_dtype,
    27     is_list_dtype,
    28     is_struct_dtype,
    29 )
    32 def is_numeric_dtype(obj):
    33     """Check whether the provided array or dtype is of a numeric dtype.
    34 
    35     Parameters
  (...)
    43         Whether or not the array or dtype is of a numeric dtype.
    44     """

File /usr/local/lib/python3.8/dist-packages/cudf/core/dtypes.py:13
    11 from pandas.api import types as pd_types
    12 from pandas.api.extensions import ExtensionDtype
---> 13 from pandas.core.arrays._arrow_utils import ArrowIntervalType
    14 from pandas.core.dtypes.dtypes import (
    15     CategoricalDtype as pd_CategoricalDtype,
    16     CategoricalDtypeType as pd_CategoricalDtypeType,
    17 )
    19 import cudf

ModuleNotFoundError: No module named 'pandas.core.arrays._arrow_utils'

@karlhigley
Copy link
Contributor

You can build an image that way, but we don't generally guarantee the stability of the nightly images. Are you seeing the same issue building

FROM nvcr.io/nvidia/merlin/merlin-pytorch:23.02

?

@mtnt-2022
Copy link
Author

@karlhigley , yes, I got the same error for

 FROM nvcr.io/nvidia/merlin/merlin-pytorch:23.02

@karlhigley
Copy link
Contributor

@jperez999 Are there known version incompatibility issues between Pandas and cuDF that might explain this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants