Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sankaku] API URL change #7155

Closed
taskhawk opened this issue Mar 11, 2025 · 22 comments
Closed

[Sankaku] API URL change #7155

taskhawk opened this issue Mar 11, 2025 · 22 comments

Comments

@taskhawk
Copy link

Looks like Sankaku changed their API URL.

I started getting:

[sankaku][error] Unable to download data:  JSONDecodeError: Expecting value: line 1 column 1 (char 0)

And checking in the browser showed a 403 Forbidden message. Their main site (https://www.sankakucomplex.com) was working fine so I checked what they were using and replaced the existing API URL in the sankaku.py extractor:

https://capi-v2.sankakucomplex.com

for the one I found:

https://sankakuapi.com/v2

and at least for pagination and posts it seems to be working fine again.

@mikf mikf pinned this issue Mar 11, 2025
@taskhawk
Copy link
Author

Doing a couple of long test runs I was getting the same error at different places for different tags, had a hunch that it may be posts with notes and confirmed that it was that.

For whatever reason the API URL for notes is just https://sankakuapi.com, without the v2 part.

Don't know if pools may have the same issue, I don't download those.

mikf added a commit that referenced this issue Mar 11, 2025
and fix errors due to other changes
@mikf
Copy link
Owner

mikf commented Mar 11, 2025

Everything should hopefully be fixed for the time being (1254c4e), but I suspect there are going to be more site-related changes in the coming days, especially for pools.

Note: extended / categorized tags now require an extra API request.

edit: I am not sure if the extended tags information from just one API request is always complete. Posts with more than 50/100/? tags might require even more API requests to truly fetch all extended tag information. I've only tested this with a 40 tag post.

edit2: The /tags endpoint does support a limit parameter, but it only returns a max of 100 tags even when limit is set to a number greater than that.

@mikf mikf added the fixed label Mar 11, 2025
@taskhawk
Copy link
Author

Note: extended / categorized tags now require an extra API request.

edit: I am not sure if the extended tags information from just one API request is always complete. Posts with more than 50/100/? tags might require even more API requests to truly fetch all extended tag information. I've only tested this with a 40 tag post.

edit2: The /tags endpoint does support a limit parameter, but it only returns a max of 100 tags even when limit is set to a number greater than that.

Yeah, it definitely needs additional API requests for posts with more than 40 tags.

Ugh, almost every single change those guys do ends up in a worse experience. At this point I'm just glad the old Chan site is still available, the new site is terrible.

Is it practical to scrape the Chan version when a post has more than 80 tags to save on API requests? All the tags are present there from the start, so just one more request. Though their API has a good rate limit, so maybe it's not that costly?

Couple of examples I have in my archive with the most tags (SFW):

https://chan.sankakucomplex.com/en/posts/XbayBEEg1rG
https://chan.sankakucomplex.com/en/posts/XEa1WoVNERq

@mikf
Copy link
Owner

mikf commented Mar 11, 2025

Is it practical to scrape the Chan version when a post has more than 80 tags to save on API requests?

That's a very good suggestion.

https://chan.sankakucomplex.com/en/posts/XbayBEEg1rG with its 1460 tags would need 15 requests just to fetch tag category data, which takes ~3 seconds on my end.

Then again, opening this post in a browser takes a lot longer than that. It also redirects to https://chan.sankakucomplex.com/en/posts/show_empty when not logged in and the auth tokens for the API don't appear to work.

@taskhawk
Copy link
Author

taskhawk commented Mar 12, 2025

Yeah but it's not like we need to render the page and load its assets, we just need the HTML code so it should be faster than loading in the browser.

Is handling the session cookies difficult? Feel like it's already done for a bunch of other sites, but it's an additional hurdle, yeah.

Maybe it's not worth it, though. Those two examples are extremes. Got curious and took a look in my archive. I have downloaded 1,853,920 posts in total from Sankaku, and on average each of my tag files has 100.19 lines (I'm still calculating the median). Each tag file is a text file with the following structure:

CATEGORY
    tag
    tag

CATEGORY
    tag
    tag

There's a bit of overhead with the lines for each category (5 of them) and the empty lines as category separators.

So let's say the average is 90 tags per post (higher than I thought, tbh). That means on average 3 API calls are needed to the tags endpoint to get all the available data.

The most balanced way to go about it for the average post, I think, would be to make 1 API call to figure out the total amount of tags, if it's 80 or less we stick with the API (1 or 2 API calls total), but if it's more than 80 we scrape the Chan page (1 API call and 1 page request), saving 1 API call per post on average, at the cost of increased processing time.

Is that enough justification to implement it? Probably not if we consider that their rate limit is relatively permissive, at least so far.

As thing are now, I think it's fine to stick to the API to get all the tags data, but that's like my opinion, man.

Edit: the median ended up being 47 lines per tag file, again with the same overhead, as low as 38 tags. More in line with what I expected, it reinforces my conclusion that sticking to the API is fine for now.

@mikf
Copy link
Owner

mikf commented Mar 12, 2025

So let's say the average is 90 tags per post (higher than I thought, tbh). That means on average 3 API calls are needed to the tags endpoint to get all the available data.

It is possible to fetch 100 tags per API call, so it would need only one extra.

The most balanced way to go about it for the average post, I think, would be to make 1 API call to figure out the total amount of tags

The total number of tags as well as the tag names themselves are known before making any extra API calls. It is only tag categories that are missing.


I've tested loading the Chan page in gallery-dl and it took ~24 seconds for the 1460 tag post, so API it is.

The code to fetch all tag information has already been written and committed locally, by the way, and will be available with the next git push

@taskhawk
Copy link
Author

It is possible to fetch 100 tags per API call, so it would need only one extra.

Somehow I failed to register that the limit was 100 and not 40.

Went on a bit of a tangent there, didn't I? Sorry about that.

The code to fetch all tag information has already been written and committed locally, by the way, and will be available with the next git push

Thanks, mikf!

@ImVantexHD
Copy link

Note: extended / categorized tags now require an extra API request.

is there an option that i need to add to the command for extended tags?

@mikf mikf marked this as a duplicate of #7163 Mar 12, 2025
@chazz1560
Copy link

question for yall, I've been using a simple command for chan.sankakucomplex.com being gallery-dl -u "username" -p "password" "URL".
Obviously this no longer works so I'm wondering if you guys could point me in the right direction, I'm not exactly an expert when it comes to python it took a bit for me to even get Gallery-DL running but will this issue actually be fixed in a future update or am I going to need to manually install the new API?
Also if I do how would I do that?

@Ikkoru
Copy link

Ikkoru commented Mar 12, 2025

If you don't want to dabble with building the program yourself, just wait for the update. This program gets updated quite frequently, so you shouldn't have to wait for too long (mb a week or two?).

@taskhawk
Copy link
Author

is there an option that i need to add to the command for extended tags?

Directly in the command you can enable it by using the --option argument:

gallery-dl --option tags=true ...

Or the short version:

gallery-dl -o tags=true

@chazz1560
Copy link

If you don't want to dabble with building the program yourself, just wait for the update. This program gets updated quite frequently, so you shouldn't have to wait for too long (mb a week or two?).

So in other words since im illiterate at python I should just wait for the update and then my command will work again? 😅

@Ikkoru
Copy link

Ikkoru commented Mar 12, 2025

Yes.

@ImVantexHD
Copy link

is there an option that i need to add to the command for extended tags?

Directly in the command you can enable it by using the --option argument:

gallery-dl -o tags=true

thanks, but it doesn't seem to do what I thought will do, the latest version of gallery-dl won't grab all the tags from a post

using the latest version of gallery-dl as of today:
gallery-dl --write-tags -o tags=true --no-download https://chan.sankakucomplex.com/en/posts/zjrmmWK7BrD

Image

old version of gallery-dl (version 1.27.0):
gallery-dl --write-tags -o tags=true --no-download https://chan.sankakucomplex.com/en/posts/zjrmmWK7BrD

Image

@ImVantexHD
Copy link

the latest version of gallery-dl won't grab all the tags from a post

and i can confirm that's not because the api changes since i can still use the old version to extract the tags just fine, of course i replaced the old api with the new one so it can work again

mikf added a commit that referenced this issue Mar 12, 2025
rename 'tag_names' to 'tags'
@mikf
Copy link
Owner

mikf commented Mar 12, 2025

@ImVantexHD regular tags are fixed in 898a09b
@taskhawk tags categories are fixed in 94bbbbb

@ImVantexHD
Copy link

@ImVantexHD regular tags are fixed in 898a09b @taskhawk tags categories are fixed in 94bbbbb

well that was fast, thank you @mikf

@taskhawk
Copy link
Author

taskhawk commented Mar 14, 2025

Found two instances where the current code enters an infinite loop retrieving the tags data. Using --verbose shows it keeps making requests non-stop with an increasing page parameter value.

These are the specific posts (NSFW):

https://www.sankakucomplex.com/posts/8yrxO7WOvaE
https://www.sankakucomplex.com/posts/26MPkn6JRKx

Edit:
Found another one (SFW-ish?):
https://www.sankakucomplex.com/posts/8JaGEODYeRL

@ForxBase
Copy link

ForxBase commented Mar 15, 2025

I updated to the latest version but still get

[sankaku][error] Unable to download data: JSONDecodeError: Expecting value: line 1 column 1 (char 0)

@taskhawk
Copy link
Author

The latest version (1.29.1) was released before the issue started, that's why you still get the error.

Try installing the latest dev version with Pip: https://github.com/mikf/gallery-dl?tab=readme-ov-file#pip

@FoxieP
Copy link

FoxieP commented Mar 15, 2025

Get and error on Version 1.29.2-dev when using sankaku download extractor. Disk X exists and accessible. Other extractors like e621, kemono works pretty well with the same baseDir and Acrhive paths.

Extractor Config:

    "sankaku":
    {
  	"username": "*******",
                 "password": "*******",
  	"base-directory": "X:/Sankaku",
  	"archive":   ["X:/Sankaku", "{search_tags}", "archive.db"],
  	"metadata": false,
  	"directory": ["{search_tags}"],
  	"filename": "{md5}.{extension}",
    },

Error Message:

C:\WINDOWS\system32>gallery-dl -v "https://chan.sankakucomplex.com/?tags=naytlayt"
gallery-dl: Version 1.29.2-dev
gallery-dl: Python 3.12.1 - Windows-10-10.0.19045-SP0
gallery-dl: requests 2.31.0 - urllib3 2.1.0
gallery-dl: Configuration Files ['%APPDATA%\gallery-dl\config.json']
gallery-dl: Starting DownloadJob for 'https://chan.sankakucomplex.com/?tags=naytlayt'
sankaku: Using SankakuTagExtractor for 'https://chan.sankakucomplex.com/?tags=naytlayt'
urllib3.connectionpool: Starting new HTTPS connection (1): sankakuapi.com:443
urllib3.connectionpool: https://sankakuapi.com:443 "GET /v2/posts/keyset?tags=naytlayt&lang=en&limit=100 HTTP/1.1" 200 None
urllib3.connectionpool: https://sankakuapi.com:443 "GET /posts/6ea4l7wp8a3/tags?lang=en&page=1&limit=100 HTTP/1.1" 200 None
sankaku: Failed to open download archive at '['X:/Sankaku', '{search_tags}', 'archive.db']' (FileNotFoundError: [WinError 3] System cannot find specified path: 'X:/')
urllib3.connectionpool: Starting new HTTPS connection (1): s.sankakucomplex.com:443
urllib3.connectionpool: https://s.sankakucomplex.com:443 "GET /data/d3/c3/d3c37909d4f42e1de3effddae08402be.png?e=1742029185&expires=1742029185&m=oYQammQGmRahnmZ6F8S1pg&token=c9OkPqj2uinHoq2YjbRqVc2a8HwwrAYiIP2Gx3eLl7c HTTP/1.1" 416 592
sankaku: Unable to download data: FileNotFoundError: [WinError 3] System cannot find specified path: '\\?\X:\'
sankaku:
Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\gallery_dl\path.py", line 343, in finalize
os.replace(self.temppath, self.realpath)
FileNotFoundError: [WinError 3] System cannot find specified path: '/tmp/.download/1740768478 d3c37909d4f42e1de3effddae08402be.png.part' -> '\\?\X:\Sankaku\naytlayt\1740768478 d3c37909d4f42e1de3effddae08402be.png'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\gallery_dl\job.py", line 153, in run
self.dispatch(msg)
File "C:\Python312\Lib\site-packages\gallery_dl\job.py", line 197, in dispatch
self.handle_url(url, kwdict)
File "C:\Python312\Lib\site-packages\gallery_dl\job.py", line 368, in handle_url
pathfmt.finalize()
File "C:\Python312\Lib\site-packages\gallery_dl\path.py", line 347, in finalize
os.makedirs(self.realdirectory)
File "", line 215, in makedirs
File "", line 215, in makedirs
File "", line 225, in makedirs
FileNotFoundError: [WinError 3] System cannot find specified path: '\\?\X:\'

@mikf mikf closed this as completed Mar 15, 2025
@mikf
Copy link
Owner

mikf commented Mar 15, 2025

@FoxieP Your error is completely unrelated to this issue. You should open a new one instead of posting it here.

@mikf mikf unpinned this issue Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants