Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pelican can't pickle cache: TypeError: can't pickle 'generator' object #2905

Open
2 tasks done
GiovanH opened this issue Aug 1, 2021 · 16 comments
Open
2 tasks done
Assignees
Labels

Comments

@GiovanH
Copy link
Contributor

GiovanH commented Aug 1, 2021

  • I have read the Filing Issues and subsequent “How to Get Help” sections of the documentation.
  • I have searched the issues (including closed ones) and believe that this is not a duplicate.
  • OS version and name: Windows 10
  • Python version: 3.8.5
  • Pelican version: git/master

Issue

When enabling caching on windows, Pelican is unable to save the caches as pickles because they apparently contain generators.

Cache configuration looks like

CACHE_CONTENT = True
CHECK_MODIFIED_METHOD = 'mtime'
# LOAD_CONTENT_CACHE = True

(Using #2904 to get output)

WARNING: Could not save cache cache\ArticlesGenerator-Readers
  |  ... cannot pickle 'generator' object

What's strange about this is there don't seem to actually be any generators in the cache. If I look at the self._cache object save_cache is trying to pickle, it just looks like a map between filename strings and values like this:

dict_values([(1626146886.6559033, ('Article Text', {'title': 'article title', 'date': SafeDatetime(2021, 7, 10, 0, 0), 'category': <Category 'categoryname'>, 'status': 'published', 'tags': [<Tag 'tagname'>]}))])

This doesn't seem to be a duplicate of #2400, as this has nothing to do with threading or watchers. It's just failing to pickle a boring object.

@GiovanH GiovanH added the bug label Aug 1, 2021
@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 5, 2021

To be clear, there is no additional information when run with full debug tracing, and this happens without any plugins enabled:

DEBUG: Read file drafts/draft1.md -> Article
DEBUG: Read file fandom/article.md -> Article
DEBUG: Read file drafts/draft2.md -> Article
DEBUG: Read file drafts/draft3.md -> Article
WARNING: Could not save cache cache\ArticlesGenerator-Readers
  |  ... cannot pickle 'generator' object
DEBUG: Read file pages/page.md -> Page
DEBUG: Signal page_generator_preread.send(PagesGenerator)
DEBUG: Signal page_generator_context.send(PagesGenerator, <metadata>)
DEBUG: Read file pages/thanks.md -> Page
DEBUG: Read file pages/recommendations.md -> Page
DEBUG: Read file pages/markdown-extensions-test.md -> Page
DEBUG: Read file pages/test.md -> Page
WARNING: Could not save cache cache\PagesGenerator-Readers
  |  ... cannot pickle 'generator' object
DEBUG: Read file dev/attachment.png -> Static
DEBUG: Signal static_generator_preread.send(StaticGenerator)
DEBUG: Signal static_generator_context.send(StaticGenerator, <metadata>)
DEBUG: Read file drafts/attachment2.png -> Static

@avaris
Copy link
Member

avaris commented Aug 8, 2021

Well, I don't see any generators there and I could not reproduce this on Win 8.1 Python 3.7. So I'm not sure how to investigate this. If you don't mind some debugging, you could throw some prints around save_cache and see what's going on.

@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 14, 2021

I tried debugging save_cache but it's just getting a normal dictionary. A dictionary goes into the pickler, and the pickler throws the error from a codespace I can't debug. It's very strange.

Here's some debug code:

def test_cache(obj):
    if isinstance(obj, dict):
        iterator = obj.items()
    elif isinstance(obj, list) or isinstance(obj, tuple):
        iterator = enumerate(obj)

    for key, value in iterator:

        if isinstance(value, dict):
            test_cache(value)
        elif isinstance(value, list) or isinstance(value, tuple):
            test_cache(value)
        else:
            print(type(value), str(value)[:70])
...

class FileDataCacher:
    ...
    def save_cache(self):
        """Save the updated cache"""

        if self._cache_data_policy:
            try:
                mkdir_p(self.settings['CACHE_PATH'])
                test_cache(self._cache)
                with self._cache_open(self._cache_path, 'wb') as fhandle:
                    pickle.dump(self._cache, fhandle)
            except (OSError, pickle.PicklingError) as err:
                logger.warning('Could not save cache %s\n ... %s',
                               self._cache_path, err)

And here's the output:

<class 'float'> 1626146886.6559033
<class 'str'> body
<class 'str'> misc meta
<class 'pelican.utils.SafeDatetime'> 2021-07-10 00:00:00
<class 'pelican.urlwrappers.Category'> category
<class 'str'> published
<class 'pelican.urlwrappers.Tag'> tag name
CRITICAL: cannot pickle 'generator' object

So there has to be a generator in one of those pelican objects, right?

@avaris
Copy link
Member

avaris commented Aug 14, 2021

SafeDatetime is a thin wrapper around datetime, that definitely doesn't have anything inside resembling a generator. Category and Tag normally doesn't have any generators inside, unless put by a plugin. Can you try with clearing existing cache?

@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 14, 2021

I don't think it was trying to load any caches before, but manually deleting the cache directory has the same result:

$ rm -r cache; py -m pelican ./content/test -o ./output -s ./pelicanconf.py
WARNING: Feeds generated without SITEURL set properly may not be valid
WARNING: Docutils has no localization for 'english'. Using 'en' instead.
WARNING: Watched path does not exist: .../content\test\favicon.png
WARNING: Watched path does not exist: .../content\test\robots.txt
WARNING: Watched path does not exist: .../content\test\media/
<class 'float'> 1626146886.6559033
<class 'str'> <p>test
<class 'str'> title
<class 'pelican.utils.SafeDatetime'> 2021-07-10 00:00:00
<class 'pelican.urlwrappers.Category'> catname
<class 'str'> published
<class 'pelican.urlwrappers.Tag'> tagname
<class 'str'> thumbnail.png
CRITICAL: cannot pickle 'generator' object

@avaris
Copy link
Member

avaris commented Aug 14, 2021

Installed Python 3.8.5 to see if that would make any difference, but no, still cannot reproduce here (win 8.1 though).

(pelican) > pelican content -s pelicanconf.py
WARNING: Docutils has no localization for 'english'. Using 'en' instead.
WARNING: Watched path does not exist: D:\workspace\pelican-test\content\images
{'D:\\workspace\\pelican-test\\content\\article.md': (1628975297.942422, ('<p>Content</p>', {'title': 'test', 'date': SafeDatetime(2019, 1, 1, 0, 0), 'tags': [<Tag 'foo'>, <Tag 'bar'>], 'category': <Category 'Baz'>}))}
{}
Done: Processed 1 article, 0 drafts, 0 hidden articles, 0 pages, 0 hidden pages and 0 draft pages in 0.33 seconds.

The dictionaries in the log are just me adding print(self._cache).

@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 14, 2021

I know, it's bizarre.

What other troubleshooting steps are there? I can't get pickle to tell me exactly what object it thinks is failing, either.

The only thing I can identify is that this is the only case where pickling outputs this specific TypeError:

# Python code for object.__reduce_ex__ for protocols 0 and 1

def _reduce_ex(self, proto):
    assert proto < 2
    cls = self.__class__
    for base in cls.__mro__:
        if hasattr(base, '__flags__') and not base.__flags__ & _HEAPTYPE:
            break
    else:
        base = object # not really reachable
    if base is object:
        state = None
    else:
        if base is cls:
            raise TypeError(f"cannot pickle {cls.__name__!r} object")
        state = base(self)
    args = (cls, base, state)
    try:
        getstate = self.__getstate__
    except AttributeError:
        if getattr(self, "__slots__", None):
            raise TypeError(f"cannot pickle {cls.__name__!r} object: "
                            f"a class that defines __slots__ without "
                            f"defining __getstate__ cannot be pickled "
                            f"with protocol {proto}") from None
        try:
            dict = self.__dict__
        except AttributeError:
            dict = None
    else:
        dict = getstate()
    if dict:
        return _reconstructor, args, dict
    else:
        return _reconstructor, args

If I patch in dill I get the same error. It just uses the stock pickler here, so that's not a surprise.

Yeah, I'm sorry that this doesn't make any sense, but if I could make sense of it this wouldn't be an issue.

@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 14, 2021

I am not getting these same errors on the blank sample project, so this has something to do with my configuration... somehow? Again, this doesn't make any sense to me, because something would have to be somehow injecting a generator into a pelican wrapper, which I don't think anything does. And I know it's not a plugin setting, because I can reproduce the issue exactly when all plugins are disabled.

@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 14, 2021

I found it. Buried deep in a subconfiguration file, pelicanconf was getting this in the main configuration namespace:

LINKS = (link[:1] for link in INLINKS_EX + LINKS_EX)

Now, how this got into the article generator cache I have no idea. Does SafeDatetime have a full copy of the pelican context or something?

@avaris
Copy link
Member

avaris commented Aug 14, 2021

Oh... Category and Tag objects store settings.

@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 14, 2021

Might be worth adding a note to the caching documentation about avoiding the use of generators in pelicanconf.

@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 17, 2021

Perhaps generators and functions in the configuration file should only be folded into objects if their names are all uppercase, as the documentation suggests?

@dflock
Copy link

dflock commented Aug 17, 2021

Putting custom template functions into the pelicanconf tends to break the cache with pickle errors too. I ended up moving all my custom template functions out into a local plugin to get around this.

@GiovanH
Copy link
Contributor Author

GiovanH commented Aug 17, 2021

Putting custom template functions into the pelicanconf tends to break the cache with pickle errors too. I ended up moving all my custom template functions out into a local plugin to get around this.

You can get around a surprising amount of pickle errors by just putting jinja filters into a separate configuration file and doing

from jinja_filters import *

@avaris
Copy link
Member

avaris commented Aug 28, 2021

Putting custom template functions into the pelicanconf tends to break the cache with pickle errors too. I ended up moving all my custom template functions out into a local plugin to get around this.

That's annoying and we should probably address that. I think pickle allows storing/restoring objects in a customized way. We can probably take the SETTINGS out of objects before storing and reconstruct them afterwards. That would also help with cache issues where objects "freeze" SETTINGS to the point they were cached and changes afterwards was not reflecting.

@avaris avaris self-assigned this Aug 28, 2021
@vbharadwaj-bk
Copy link
Contributor

I had a similar issue where Pelican was failing to Pickle a lambda function included in pelicanconf.py. Thanks for the hint @GiovanH !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants