-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sankaku] API URL change #7155
Comments
Doing a couple of long test runs I was getting the same error at different places for different tags, had a hunch that it may be posts with notes and confirmed that it was that. For whatever reason the API URL for notes is just Don't know if pools may have the same issue, I don't download those. |
Everything should hopefully be fixed for the time being (1254c4e), but I suspect there are going to be more site-related changes in the coming days, especially for pools. Note: extended / categorized tags now require an extra API request. edit: I am not sure if the extended tags information from just one API request is always complete. Posts with more than 50/100/? tags might require even more API requests to truly fetch all extended tag information. I've only tested this with a 40 tag post. edit2: The |
Yeah, it definitely needs additional API requests for posts with more than 40 tags. Ugh, almost every single change those guys do ends up in a worse experience. At this point I'm just glad the old Chan site is still available, the new site is terrible. Is it practical to scrape the Chan version when a post has more than 80 tags to save on API requests? All the tags are present there from the start, so just one more request. Though their API has a good rate limit, so maybe it's not that costly? Couple of examples I have in my archive with the most tags (SFW): https://chan.sankakucomplex.com/en/posts/XbayBEEg1rG |
That's a very good suggestion. https://chan.sankakucomplex.com/en/posts/XbayBEEg1rG with its 1460 tags would need 15 requests just to fetch tag category data, which takes ~3 seconds on my end. Then again, opening this post in a browser takes a lot longer than that. It also redirects to https://chan.sankakucomplex.com/en/posts/show_empty when not logged in and the auth tokens for the API don't appear to work. |
Yeah but it's not like we need to render the page and load its assets, we just need the HTML code so it should be faster than loading in the browser. Is handling the session cookies difficult? Feel like it's already done for a bunch of other sites, but it's an additional hurdle, yeah. Maybe it's not worth it, though. Those two examples are extremes. Got curious and took a look in my archive. I have downloaded 1,853,920 posts in total from Sankaku, and on average each of my tag files has 100.19 lines (I'm still calculating the median). Each tag file is a text file with the following structure:
There's a bit of overhead with the lines for each category (5 of them) and the empty lines as category separators. So let's say the average is 90 tags per post (higher than I thought, tbh). That means on average 3 API calls are needed to the tags endpoint to get all the available data. The most balanced way to go about it for the average post, I think, would be to make 1 API call to figure out the total amount of tags, if it's 80 or less we stick with the API (1 or 2 API calls total), but if it's more than 80 we scrape the Chan page (1 API call and 1 page request), saving 1 API call per post on average, at the cost of increased processing time. Is that enough justification to implement it? Probably not if we consider that their rate limit is relatively permissive, at least so far. As thing are now, I think it's fine to stick to the API to get all the tags data, but that's like my opinion, man. Edit: the median ended up being 47 lines per tag file, again with the same overhead, as low as 38 tags. More in line with what I expected, it reinforces my conclusion that sticking to the API is fine for now. |
It is possible to fetch 100 tags per API call, so it would need only one extra.
The total number of tags as well as the tag names themselves are known before making any extra API calls. It is only tag categories that are missing. I've tested loading the Chan page in gallery-dl and it took ~24 seconds for the 1460 tag post, so API it is. The code to fetch all tag information has already been written and committed locally, by the way, and will be available with the next |
Somehow I failed to register that the limit was 100 and not 40. Went on a bit of a tangent there, didn't I? Sorry about that.
Thanks, mikf! |
is there an option that i need to add to the command for extended tags? |
question for yall, I've been using a simple command for chan.sankakucomplex.com being gallery-dl -u "username" -p "password" "URL". |
If you don't want to dabble with building the program yourself, just wait for the update. This program gets updated quite frequently, so you shouldn't have to wait for too long (mb a week or two?). |
Directly in the command you can enable it by using the
Or the short version:
|
So in other words since im illiterate at python I should just wait for the update and then my command will work again? 😅 |
Yes. |
thanks, but it doesn't seem to do what I thought will do, the latest version of gallery-dl won't grab all the tags from a post using the latest version of gallery-dl as of today: old version of gallery-dl (version 1.27.0): |
and i can confirm that's not because the api changes since i can still use the old version to extract the tags just fine, of course i replaced the old api with the new one so it can work again |
@ImVantexHD regular |
well that was fast, thank you @mikf |
Found two instances where the current code enters an infinite loop retrieving the tags data. Using These are the specific posts (NSFW): https://www.sankakucomplex.com/posts/8yrxO7WOvaE Edit: |
I updated to the latest version but still get [sankaku][error] Unable to download data: JSONDecodeError: Expecting value: line 1 column 1 (char 0) |
The latest version (1.29.1) was released before the issue started, that's why you still get the error. Try installing the latest dev version with Pip: https://github.com/mikf/gallery-dl?tab=readme-ov-file#pip |
Get and error on Version 1.29.2-dev when using sankaku download extractor. Disk X exists and accessible. Other extractors like e621, kemono works pretty well with the same baseDir and Acrhive paths. Extractor Config:
Error Message:
|
@FoxieP Your error is completely unrelated to this issue. You should open a new one instead of posting it here. |
Looks like Sankaku changed their API URL.
I started getting:
And checking in the browser showed a 403 Forbidden message. Their main site (https://www.sankakucomplex.com) was working fine so I checked what they were using and replaced the existing API URL in the
sankaku.py
extractor:for the one I found:
and at least for pagination and posts it seems to be working fine again.
The text was updated successfully, but these errors were encountered: