Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download all related illustrations with illust_related() #247

Open
towakuka opened this issue Aug 9, 2022 · 8 comments
Open

Comments

@towakuka
Copy link

towakuka commented Aug 9, 2022

I would like to use illust_related() to download illustrations related to a certain illustration. However, when I execute the following code, the next_url retrieved the first time and the next_url retrieved the second time are almost the same, and I can only download a small portion of the related illustrations. How can I download all the related illustrations?

  • OS: Windows 10 Home 21H2
  • Python version: 3.7.9
  • pixivpy3 version: 3.7.1
from pixivpy3 import *
import urllib.parse as up

REFRESH_TOKEN = 'xxxxx'

aapi = AppPixivAPI()
aapi.auth(refresh_token=REFRESH_TOKEN)

# https://www.pixiv.net/artworks/98699730
res = aapi.illust_related('98699730')
print(up.unquote(res.next_url))

next_qs = aapi.parse_qs(res.next_url)
print(next_qs)

res = aapi.illust_related(**next_qs)
print(up.unquote(res.next_url))

The results are as follows.

https://app-api.pixiv.net/v2/illust/related?illust_id=98699730&filter=for_ios&seed_illust_ids[0]=98699730
&viewed[0]=66304207&viewed[1]=78519399&viewed[2]=90635714&viewed[3]=100141684&viewed[4]=97112634
&viewed[5]=97789918&viewed[6]=100162560&viewed[7]=96163314&viewed[8]=97490930&viewed[9]=95632136
&viewed[10]=62759610&viewed[11]=98840284&viewed[12]=98876790&viewed[13]=80461026&viewed[14]=99193150
&viewed[15]=76880757&viewed[16]=98376241&viewed[17]=61546752&viewed[18]=93903011&viewed[19]=99171188
&viewed[20]=94524475&viewed[21]=99370255&viewed[22]=87511157&viewed[23]=64346020&viewed[24]=94084919
&viewed[25]=99507076&viewed[26]=60600324&viewed[27]=98921186

{'illust_id': '98699730', 'filter': 'for_ios', 'seed_illust_ids': ['98699730'], 'viewed': ['98921186']}

https://app-api.pixiv.net/v2/illust/related?illust_id=98699730&filter=for_ios&seed_illust_ids[0]=98699730
&viewed[0]=66304207&viewed[1]=100162560&viewed[2]=100104444&viewed[3]=78519399&viewed[4]=97112634
&viewed[5]=97789918&viewed[6]=100141684&viewed[7]=95632136&viewed[8]=97490930&viewed[9]=62759610
&viewed[10]=94745733&viewed[11]=98840284&viewed[12]=98876790&viewed[13]=80461026&viewed[14]=99193150
&viewed[15]=76880757&viewed[16]=98376241&viewed[17]=61546752&viewed[18]=93903011&viewed[19]=99171188
&viewed[20]=94524475&viewed[21]=99370255&viewed[22]=87511157&viewed[23]=64346020&viewed[24]=94084919
&viewed[25]=92555703&viewed[26]=60600324&viewed[27]=98921186

If I extract the values of viewed[num] in next_url (and sort them), most of the illust_ids are duplicates.

# the values of viewed[num] of the first next_url
100141684
100162560
60600324
61546752
62759610
64346020
66304207
76880757
78519399
80461026
87511157
93903011
94084919
94524475
94745733
95632136
97112634
97490930
97789918
98376241
98840284
98876790
98921186
99171188
99193150
99370255
99507076
99928488

# the values of viewed[num] of the second next_url
100141684
100162560
60600324
61546752
62759610
64346020
66304207
76880757
78519399
80461026
87511157
88928934
93717572
93903011
94084919
94524475
95632136
97112634
97490930
98376241
98440469
98482730
98785502
98876790
98921186
99223979
99370255
99507076
@eggplants
Copy link
Contributor

eggplants commented Aug 12, 2022

Currentry, parse_qs parses array parameters like viewed[num] into a list contains only last value.

It seems problematic behavior, isn't it?

result_qs[key.split("[")[0]] = value

Maybe need to fix.

@eggplants
Copy link
Contributor

for key, value in up.parse_qs(query).items():
    # merge seed_illust_ids[] liked PHP params to array
    if "[" in key and key.endswith("]"):
        # keep the origin sequence, just ignore array length
        key_, *_ = key.split("[")
        if key_ not in result_qs:
            result_qs[key_] = value
        elif isinstance(result_qs[key_], list):
            result_qs[key_].extend(value)
        else:
            # error
    else:
        result_qs[key] = value[-1]

@towakuka
Copy link
Author

Thank you for your prompt reply. I replaced the relevant part of aapi.py and ran the code in the first question again, and now 'viewed' in next_qs has multiple values, but 22 of the 28 viewed[num]'s were duplicates.

# first next_url
https://app-api.pixiv.net/v2/illust/related?illust_id=98699730&filter=for_ios&seed_illust_ids[0]=98699730
&viewed[0]=80691434&viewed[1]=97043375&viewed[2]=97974102&viewed[3]=100336008&viewed[4]=97112634
&viewed[5]=97490930&viewed[6]=93903011&viewed[7]=100274468&viewed[8]=97789918&viewed[9]=95908114
&viewed[10]=91462133&viewed[11]=98840284&viewed[12]=69925083&viewed[13]=98376241&viewed[14]=99193150
&viewed[15]=62759610&viewed[16]=95943995&viewed[17]=85294754&viewed[18]=99171188&viewed[19]=64346020
&viewed[20]=73188428&viewed[21]=99370255&viewed[22]=78552357&viewed[23]=81176933&viewed[24]=98921186
&viewed[25]=66304207&viewed[26]=84590312&viewed[27]=65293346

# next_qs
{'illust_id': '98699730', 'filter': 'for_ios', 'seed_illust_ids': ['98699730'], 'viewed': ['80691434',
'97043375', '97974102', '100336008', '97112634', '97490930', '93903011', '100274468', '97789918',
'95908114', '91462133', '98840284', '69925083', '98376241', '99193150', '62759610', '95943995',
'85294754', '99171188', '64346020', '73188428', '99370255', '78552357', '81176933', '98921186',
'66304207', '84590312', '65293346']}

# second next_url (22 of the 28 viewed[num]'s were duplicates)
https://app-api.pixiv.net/v2/illust/related?illust_id=98699730&filter=for_ios&seed_illust_ids[0]=98699730
&viewed[0]=80691434&viewed[1]=97043375&viewed[2]=98365209&viewed[3]=100336008&viewed[4]=68233822
&viewed[5]=97490930&viewed[6]=100274468&viewed[7]=99370255&viewed[8]=93903011&viewed[9]=91462133
&viewed[10]=98921186&viewed[11]=69925083&viewed[12]=98376241&viewed[13]=62759610&viewed[14]=98440469
&viewed[15]=95943995&viewed[16]=85294754&viewed[17]=98785502&viewed[18]=64346020&viewed[19]=73188428
&viewed[20]=98482730&viewed[21]=78552357&viewed[22]=81176933&viewed[23]=99223979&viewed[24]=66304207
&viewed[25]=84590312&viewed[26]=97112634&viewed[27]=65293346

@eggplants
Copy link
Contributor

Since it does not appear to be intended to be paging in the related works displayed on the browser's works page, perhaps it is a specification that duplicates are returned in the request by next_url.

@upbit How about this?

@upbit
Copy link
Owner

upbit commented Aug 23, 2022

    def illust_related(
        self,
        illust_id: int | str,
        filter: _FILTER = "for_ios",
        seed_illust_ids: int | str | list[str] | None = None,
        offset: int | str | None = None,
        viewed: list[str] | None = None,
        req_auth: bool = True,
    ) -> ParsedJson:
        url = "%s/v2/illust/related" % self.hosts
        params: dict[str, Any] = {
            "illust_id": illust_id,
            "filter": filter,
            "offset": offset,
        }
        if isinstance(seed_illust_ids, str):
            params["seed_illust_ids[]"] = [seed_illust_ids]
        elif isinstance(seed_illust_ids, list):
            params["seed_illust_ids[]"] = seed_illust_ids
        r = self.no_auth_requests_call("GET", url, params=params, req_auth=req_auth)
        return self.parse_result(r)

Sorry, it seems like a bug. viewed is not passed to the params like seed_illust_ids[], try add:

        elif isinstance(seed_illust_ids, list):
            params["seed_illust_ids[]"] = seed_illust_ids
+       if isinstance(viewed, list):
+           params["viewed[]"] = viewed
        r = self.no_auth_requests_call("GET", url, params=params, req_auth=req_auth)

@towakuka
Copy link
Author

Thank you for your reply. I have applied your correction and was able to download a large number of related illustrations. However, there still seem to be a lot of duplicates in the viewed of next_url.

The bar graph below shows the number of related illustrations at each step of the next_url retrieval process, which was repeated about 100 times (a post consisting of multiple images is counted as one).

If all duplicate illustrations are removed from here, the bar graph is as follows.

I have found that I can download a sufficient number of illustrations if I set the number of repetitions to about 20, so I will continue to operate with the number of repetitions set to 20 from now on.

Thank you very much for your time.

@upbit
Copy link
Owner

upbit commented Aug 25, 2022

I'm not sure if the client side has merged the viewed parameter, did you try to pass all the previously returned viewed ids when pagination?

In addition, the number of repetitions refers to the offset=20 parameter?

@towakuka
Copy link
Author

The first source code is reproduced below.

from pixivpy3 import *
import urllib.parse as up

REFRESH_TOKEN = 'xxxxx'

aapi = AppPixivAPI()
aapi.auth(refresh_token=REFRESH_TOKEN)

# https://www.pixiv.net/artworks/98699730
res = aapi.illust_related('98699730')
print(up.unquote(res.next_url))

next_qs = aapi.parse_qs(res.next_url)
print(next_qs)

res = aapi.illust_related(**next_qs)
print(up.unquote(res.next_url))

I repeated getting the next_url and running aapi.illust_related(**next_qs) about 100 times, so I'm not passing all the previously returned viewed ids. Only the viewed ids returned in each step are passed.

The number of iterations is determined by looking at the bar chart above only, so the offset=20 parameter is not referenced.

nautics889 added a commit to nautics889/pixivpy that referenced this issue Mar 17, 2023
Updated methods `illust_related()`, `illust_recommended()` in
`AppPixivAPI` class. Both of them have had `viewed` parameter in their
signatures but the parameter was unused.
nautics889 added a commit to nautics889/pixivpy that referenced this issue Mar 18, 2023
Updated methods `illust_related()`, `illust_recommended()` in
`AppPixivAPI` class. Both of them have had `viewed` parameter in their
signatures but the parameter was unused.
upbit pushed a commit that referenced this issue Mar 18, 2023
Updated methods `illust_related()`, `illust_recommended()` in
`AppPixivAPI` class. Both of them have had `viewed` parameter in their
signatures but the parameter was unused.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants