Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask .head() returns error as .compute returns ok! #11120

Closed
frbelotto opened this issue May 14, 2024 · 2 comments · Fixed by dask/dask-expr#1068
Closed

Dask .head() returns error as .compute returns ok! #11120

frbelotto opened this issue May 14, 2024 · 2 comments · Fixed by dask/dask-expr#1068
Labels
needs triage Needs a response from a contributor

Comments

@frbelotto
Copy link

frbelotto commented May 14, 2024

Hello guys!
I´ve noticed a weird bug when trying to see the dataframe.head(). As I could not identify if the error is related to the "original" Dataframe or the operations done after importing it, I kept it all.

This is a sample dataframe in CSV file
test3.csv

There are the operations done after

dataframe['status'] = 'cancelado'
dataframe['status'] = dataframe['status'].mask(cond=(dataframe['cd_tip_est_slct'].isin([5,9])), other='confirmado')
    
dataframe['produto'] = str()
dataframe['produto'] = 'Giftcards'
    
dataframe['parceiro'] = str()
dataframe['parceiro'] = np.nan
dataframe['parceiro'] = dataframe['parceiro'].mask(cond=(dataframe['cd_idfr_pcr'] == 516515741), other='X1')
dataframe['parceiro'] = dataframe['parceiro'].mask(cond=(dataframe['cd_idfr_pcr'] == 935454784), other='X2')
dataframe['parceiro'] = dataframe['parceiro'].fillna('novo parceiro')
    
dataframe['data_transacao'] = dataframe['tscompra']
dataframe['mci'] = dataframe['cd_cli_cprd']
    
dataframe['marca'] = dataframe['ProductLine']
dataframe['marca'] = dataframe['marca'].fillna('novo produto')
    
dataframe['marca'] = dataframe['parceiro'].mask(cond=(dataframe['marca'] == 'Sony PlayStation'), other='PlayStation')

dataframe['forma_pagamento'] = np.nan
dataframe['forma_pagamento'] = dataframe['forma_pagamento'].mask(cond=(dataframe['cd_tip_fma_pgto'] == 0), other=np.nan)
dataframe['forma_pagamento'] = dataframe['forma_pagamento'].mask(cond=(dataframe['cd_tip_fma_pgto'] == 1), other='débito em conta corrente')
dataframe['forma_pagamento'] = dataframe['forma_pagamento'].mask(cond=(dataframe['cd_tip_fma_pgto'] == 2), other='cartão de crédito')
dataframe['forma_pagamento'] = dataframe['forma_pagamento'].mask(cond=(dataframe['cd_tip_fma_pgto'] == 3), other='pix')
dataframe['forma_pagamento'] = dataframe['forma_pagamento'].mask(cond=(dataframe['cd_tip_fma_pgto'] == 4), other='pix open banking')
dataframe['forma_pagamento'] = dataframe['forma_pagamento'].mask(cond=(dataframe['cd_tip_fma_pgto'] == 5), other='débito em conta poupança')

if I run the compute, it does work
dataframe.compute()
image

if I run the head, it doe not work

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[42], [line 1](vscode-notebook-cell:?execution_count=42&line=1)
----> [1](vscode-notebook-cell:?execution_count=42&line=1) dataframe.head()

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_collection.py:702, in FrameBase.head(self, n, npartitions, compute)
    [700](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:700) out = new_collection(expr.Head(self, n=n, npartitions=npartitions))
    [701](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:701) if compute:
--> [702](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:702)     out = out.compute()
    [703](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:703) return out

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_collection.py:475, in FrameBase.compute(self, fuse, **kwargs)
    [473](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:473) if not isinstance(out, Scalar):
    [474](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:474)     out = out.repartition(npartitions=1)
--> [475](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:475) out = out.optimize(fuse=fuse)
    [476](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:476) return DaskMethodsMixin.compute(out, **kwargs)

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_collection.py:590, in FrameBase.optimize(self, fuse)
    [572](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:572) def optimize(self, fuse: bool = True):
    [573](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:573)     """Optimizes the DataFrame.
    [574](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:574) 
    [575](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:575)     Runs the optimizer with all steps over the DataFrame and wraps the result in a
   (...)
    [588](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:588)         The optimized Dask Dataframe
    [589](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:589)     """
--> [590](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_collection.py:590)     return new_collection(self.expr.optimize(fuse=fuse))

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_expr.py:94, in Expr.optimize(self, **kwargs)
     [93](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:93) def optimize(self, **kwargs):
---> [94](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:94)     return optimize(self, **kwargs)

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_expr.py:3028, in optimize(expr, fuse)
   [3007](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:3007) """High level query optimization
   [3008](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:3008) 
   [3009](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:3009) This leverages three optimization passes:
   (...)
   [3024](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:3024) optimize_blockwise_fusion
   [3025](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:3025) """
   [3026](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:3026) stage: core.OptimizerStage = "fused" if fuse else "simplified-physical"
-> [3028](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:3028) return optimize_until(expr, stage)

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_expr.py:2989, in optimize_until(expr, stage)
   [2986](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2986)     return expr
   [2988](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2988) # Lower
-> [2989](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2989) expr = expr.lower_completely()
   [2990](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2990) if stage == "physical":
   [2991](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2991)     return expr

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_core.py:436, in Expr.lower_completely(self)
    [434](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:434) expr = self
    [435](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:435) while True:
--> [436](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:436)     new = expr.lower_once()
    [437](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:437)     if new._name == expr._name:
    [438](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:438)         break

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_core.py:404, in Expr.lower_once(self)
    [402](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:402) for operand in out.operands:
    [403](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:403)     if isinstance(operand, Expr):
--> [404](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:404)         new = operand.lower_once()
    [405](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:405)         if new._name != operand._name:
    [406](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:406)             changed = True

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_core.py:404, in Expr.lower_once(self)
    [402](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:402) for operand in out.operands:
    [403](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:403)     if isinstance(operand, Expr):
--> [404](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:404)         new = operand.lower_once()
    [405](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:405)         if new._name != operand._name:
    [406](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:406)             changed = True

    [... skipping similar frames: Expr.lower_once at line 404 (10 times)]

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_core.py:404, in Expr.lower_once(self)
    [402](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:402) for operand in out.operands:
    [403](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:403)     if isinstance(operand, Expr):
--> [404](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:404)         new = operand.lower_once()
    [405](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:405)         if new._name != operand._name:
    [406](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:406)             changed = True

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_core.py:393, in Expr.lower_once(self)
    [390](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:390) expr = self
    [392](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:392) # Lower this node
--> [393](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:393) out = expr._lower()
    [394](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:394) if out is None:
    [395](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:395)     out = expr

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_expr.py:2439, in Head._lower(self)
   [2435](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2435)     raise ValueError(
   [2436](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2436)         f"only {self.frame.npartitions} partitions, head received {npartitions}"
   [2437](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2437)     )
   [2438](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2438) partitions = self._partitions
-> [2439](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2439) if is_index_like(self._meta):
   [2440](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2440)     return BlockwiseHeadIndex(
   [2441](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2441)         Partitions(self.frame, partitions), self.n, safe=False
   [2442](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2442)     )
   [2444](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2444) safe = True if npartitions == 1 and self.frame.npartitions != 1 else False

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\functools.py:1001, in cached_property.__get__(self, instance, owner)
    [999](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/functools.py:999) val = cache.get(self.attrname, _NOT_FOUND)
   [1000](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/functools.py:1000) if val is _NOT_FOUND:
-> [1001](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/functools.py:1001)     val = self.func(instance)
   [1002](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/functools.py:1002)     try:
   [1003](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/functools.py:1003)         cache[self.attrname] = val

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_expr.py:2393, in Head._meta(self)
   [2391](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2391) @functools.cached_property
   [2392](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2392) def _meta(self):
-> [2393](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_expr.py:2393)     return self.frame._meta

File c:\Users\F3164582\AppData\Local\Programs\Python\Python311\Lib\site-packages\dask_expr\_core.py:455, in Expr._meta(self)
    [453](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:453) @property
    [454](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:454) def _meta(self):
--> [455](file:///C:/Users/F3164582/AppData/Local/Programs/Python/Python311/Lib/site-packages/dask_expr/_core.py:455)     raise NotImplementedError()

NotImplementedError:

Environment:

  • Dask version: dask 2024.5.0, dask-expr 1.1.0
  • Python version: 3.11.7
  • Operating System: windows
  • Install method (conda, pip, source): pip
@github-actions github-actions bot added the needs triage Needs a response from a contributor label May 14, 2024
@frbelotto
Copy link
Author

Obs. I think that its related to dask-expr 1.1.0. As soon as I uninstalled it, it all works.

@mrocklin
Copy link
Member

Thank you for the example. I'm curious, are you able to reduce your example to something that is easier to try? This article might contain some helpful hints. This would be very helpful if you have the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Needs a response from a contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants