Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WebAPI for fetching torrent metadata #21015

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Piccirello
Copy link
Member

@Piccirello Piccirello commented Jul 1, 2024

This PR implements a new API for fetching a torrent's metadata. The API accepts a magnet URI, torrent hash, .torrent URL, or uploaded .torrent file, and returns the torrent's associated metadata. This PR also modifies the /torrents/add API to support downloading a torrent whose metadata has been previously fetched. The ultimate goal is for the WebUI to provide an Add Torrent experience equivalent to that of the GUI, where content can be reprioritized/unchecked before the torrent is added.

/metadata API

HTTP request

To request metadata for a torrent, specify the torrent in the sources query parameter of a GET request to /api/v2/torrents/metadata. All sources must be url encoded and multiple sources can be delineated by a (non-url encoded) comma. sources supports torrents in the following formats:

  • magnet URI (e.g. magnet:?xt=urn:btih:a8eeefc8a0dc402b24686ddfd775a409fe4b00e0&dn=example)
  • hash (e.g. a8eeefc8a0dc402b24686ddfd775a409fe4b00e0)
  • .torrent URL (e.g. https://example.com/example.torrent)

You may also upload .torrent files. To do so, submit the file(s) as multipart MIME data. You may use any key for the uploaded value. To test file upload using curl, specify the -F flag (e.g. curl https://127.0.0.1 --get -F file=@"/root/example.torrent").

HTTP response

Given the asynchronous nature of retrieving metadata, there are two successful HTTP status codes used.

When metadata is requested for a torrent that requires asynchronous background work (i.e. connecting to DHT/peers), the client receives a 202. A 202 indicates that the request was successful, but additional background work must be completed before a meaningful response can be provided.

GET /api/v2/torrents/metadata?sources=abc
HTTP/1.1 202 OK
{}

When metadata is available for all requested torrents, either because the torrent has been added or because the metadata has been retrieved from a prior request, the client receives a 200.

GET /api/v2/torrents/metadata?sources=abc
HTTP/1.1 200 OK
{
    abc: {
    "comment": string,
    "created_by": string,
    "creation_date": int,
    "files": [],
    "hash": string,
    "infohash_v1": string,
    "infohash_v2": string,
    "name": string,
    "piece_size": int,
    "pieces_num": int,
    "private": bool,
    "total_size": int,
    "trackers": [],
    "webseeds": []
    }
}

When metadata is requested for multiple torrents, a 202 will be returned if any of the torrents requires asycnhronous background work. This means that a 202 may include data for some of the request torrents.

GET /api/v2/torrents/metadata?sources=abc,def
HTTP/1.1 202 OK
{
    abc: { /** more data **/ }
}

Retrieved metadata will be cached in the current web session. Subsequent requests performed within the same web session will return the metadata immediately, while other web sessions will be required to reretrieve the torrent's metadata from peers. Once a torrent is added using the cached metadata, the metadata is removed from the cache.

/add API

The existing /add API now supports using the metadata cache that's populated by the new /metadata API. When specifying a url and/or torrent file to download, the metadata cache is first checked for the torrent. If found, the metadata is used directly from the cache, rather than needing to re-retrieve it.

When metadata is retrieved directly from the cache, you may also specify a new filePriorities parameter. This parameter allows for specifying the file priority of each file in the torrent. This parameter may only be specified when adding a single torrent.

Alternatives:

I explored having the metadata API leave the request open until the metadata was available. Once the metadata was fetched, it would be returned directly in the response of the original request. One downside of this approach is that metadata retrieval can take an arbitrary long amount of time. This could result in torrents whose metadata could never be retrieved via this API (e.g. due to the retrieval taking longer than the client's/reverse proxy's request timeout). This approach would also require some further modification of qBittorrent's web application layer to suppress the default behavior of returning a blank response.

Future work:

  • Modify the WebUI to make use of the new /metadata API. This will likely mean splitting the current Add/Upload Torrent dialog into two dialogs. The first dialog will support specifying the URL(s)/.torrent file(s) to submit, while the second dialog will display the torrent's metadata and allow for modification of file priorities.
  • Support downloading the retrieved metadata as a .torrent file (as supported in the GUI)

Closes #20966.

@@ -69,6 +69,7 @@ inline const QString KEY_TORRENT_CONTENT_PATH = u"content_path"_s;
inline const QString KEY_TORRENT_ADDED_ON = u"added_on"_s;
inline const QString KEY_TORRENT_COMPLETION_ON = u"completion_on"_s;
inline const QString KEY_TORRENT_TRACKER = u"tracker"_s;
inline const QString KEY_TORRENT_TRACKERS = u"trackers"_s;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't certain if this file was the best place for these new keys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used in dictionary returned by serialize() below. So it seems to be a wrong place for other keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a new section in torrentscontroller.cpp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a new section in torrentscontroller.cpp.

I would put it into serialize/serialize_torrentinfo.* as serialize(const TorrentInfo &).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes tricky as it reuses keys defined in torrentscontroller.cpp, and I don't think it'd be worthwhile to duplicate them. I'm going to leave the serialize function in the anonymous namespace in torrentscontroller.cpp since that pattern is heavily used.

Comment on lines 1603 to 1617
// the TorrentInfo returned by Torrent::info() contains an empty list of trackers
// we must fetch the trackers directly via Torrent::trackers()
QVector<BitTorrent::TrackerEntry> trackers;
for (const BitTorrent::TrackerEntryStatus &tracker : torrent->trackers())
trackers << BitTorrent::TrackerEntry
{
.url = tracker.url,
.tier = tracker.tier
};

setResult(serializeTorrentMetadata(torrent->info(), trackers));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised by this and figured it was worth calling out.

private:
void onMetadataDownloaded(const BitTorrent::TorrentInfo &metadata);

QHash<BitTorrent::InfoHash, BitTorrent::TorrentInfo> m_torrentMetadata;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrieved metadata is stored in the session and is only cleaned up when the session is destroyed. Alternatively, we could clean up the metadata once it's retrieved via the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could clean up the metadata once it's retrieved via the API.

👍
I believe the metadata won't be needed more than once in the vast majority of cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the /add API now supports using this metadata, the metadata will be deleted when the torrent is added.

@glassez glassez self-assigned this Jul 1, 2024
@glassez glassez added the WebAPI WebAPI-related issues/changes label Jul 1, 2024
Copy link
Member

@glassez glassez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only preliminary comments after brief review.

src/webui/api/apicontroller.h Outdated Show resolved Hide resolved
src/webui/api/apicontroller.h Outdated Show resolved Hide resolved
src/webui/api/apicontroller.h Outdated Show resolved Hide resolved
src/webui/api/apicontroller.h Outdated Show resolved Hide resolved
@@ -69,6 +69,7 @@ inline const QString KEY_TORRENT_CONTENT_PATH = u"content_path"_s;
inline const QString KEY_TORRENT_ADDED_ON = u"added_on"_s;
inline const QString KEY_TORRENT_COMPLETION_ON = u"completion_on"_s;
inline const QString KEY_TORRENT_TRACKER = u"tracker"_s;
inline const QString KEY_TORRENT_TRACKERS = u"trackers"_s;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used in dictionary returned by serialize() below. So it seems to be a wrong place for other keys.

src/webui/api/torrentscontroller.cpp Outdated Show resolved Hide resolved
m_torrentMetadata.insert(infoHash, BitTorrent::TorrentInfo{});
}

setResult(QJsonObject {}, 202);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't change setResult but provide corresponding exception for this case to not break current API design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it makes logical sense that setResult would accept content, a status code, a mime type, etc. What's the downside of "breaking" (i.e. modifying) current API design?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it makes logical sense that setResult would accept content, a status code, a mime type, etc.

The layer we talked about (implemented as API controller classes) is designed to be HTTP (or any other network protocol) independent. Its area of responsibility is to parse request, forward them to the application core, maintain some intermediate state between requests (if any functions require it), and serialize the results. Another layer is responsible for delivering requests and sending responses via HTTP.

What's the downside of "breaking" (i.e. modifying) current API design?

Breaking things around you in between is always a bad thing, isn't it? If you have an idea of a better architecture than we have now, and you are ready to implement it or at least outline its concept in sufficient detail so that someone else can implement it, you are welcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layer we talked about (implemented as API controller classes) is designed to be HTTP (or any other network protocol) independent. Its area of responsibility is to parse request, forward them to the application core, maintain some intermediate state between requests (if any functions require it), and serialize the results. Another layer is responsible for delivering requests and sending responses via HTTP.

How would you handle this then? We still need some way of passing an HTTP status code. I can add a setStatus method to the APIController class but I'm not sure if this would also violate the constraints you've set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you handle this then?

As I suggested initially, through a suitable exception (this is how it is supposed to work if it is impossible to return the result). If you really believe that it should be interpreted in some way other than the regular "not found", you can add a new type to the APIError and handle it in WebApplication. I guess you can find fault with the fact that it's not an "error" case. These are conventions. It is quite acceptable at this layer to interpret any case of inability to return a result as an "error". HTTP layer should care about transforming it into HTTP-relevant form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even from an HTTP perspective, I personally don't see much point in returning 202 rather than just 404. The "downloading" torrent metadata process has an "indeterminate" essence. It may never end for unspecified reasons.

Copy link
Member Author

@Piccirello Piccirello Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstand HTTP status codes. I also think you're missing the context that this code is running asynchronously and not in the path of an HTTP request.

This is a good resource explaining 202 and why it's appropriate here: https://http.dev/202.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think you're missing the context that this code is running asynchronously and not in the path of an HTTP request.

As I understand it, it is used for similar cases, i.e. when you give the server a task that it cannot perform (or complete) immediately in order to return the result in current response. I just meant that "searching metadata" differs from a regular task, which has a certain scope of execution. So you can't "promise" the client that "I'll complete this task then and there", you can't provide any progress, etc. The only thing that distinguishes it from the usual "not found" is that you take some action to find the metadata in between. But all the client has from this is that he could try to request it again after a while, which he could do after receiving the 404 code as well.

As per https://restfulapi.net/http-status-202-accepted/:

When using HTTP 202, it’s important to follow these best practices:

  • Provide a way for the client to monitor the progress or receive the result of the request.
  • Include relevant headers, such as ‘Location‘, to indicate where the client can obtain more information about the request’s status.
  • Clearly document the expected behavior and handling of the request, including any time limits for polling or callbacks.

It doesn't look like we can satisfy the above in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I don't care much about the HTTP layer. I have expressed my opinion on this, but I leave the rest to your discretion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted the changes to setResult and added a new setStatus method.

src/webui/api/torrentscontroller.cpp Outdated Show resolved Hide resolved
private:
void onMetadataDownloaded(const BitTorrent::TorrentInfo &metadata);

QHash<BitTorrent::InfoHash, BitTorrent::TorrentInfo> m_torrentMetadata;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could clean up the metadata once it's retrieved via the API.

👍
I believe the metadata won't be needed more than once in the vast majority of cases.

@glassez
Copy link
Member

glassez commented Jul 1, 2024

@Piccirello
I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists.
I believe that if we had used one, we would have got a more universal solution. Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent. As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

@Piccirello
Copy link
Member Author

Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent.

I'm not convinced that's the approach we'll eventually take. I can imagine sending the .torrent file once, returning the metadata to the client, and then allowing the torrent to be added without needing to re-send the file (likely by transmitting the info hash).

As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

To me the session seems like an appropriate place to store this. I don't think the client should be responsible for parsing this data. It would also mean each client (official and unofficial) would need to implement it.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

I agree with this. I'll try to sketch out what the next steps would look like and how this API would be used.

@Piccirello Piccirello force-pushed the metadata-api branch 2 times, most recently from cec83e2 to f857170 Compare July 1, 2024 18:38
@NikcN22
Copy link

NikcN22 commented Jul 1, 2024

@Piccirello I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists. I believe that if we had used one, we would have got a more universal solution. Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent. As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

I think there is no need to embed the Bencode decoder directly into the client API. This is quite easy to do directly in the “graphical” part. More interesting is the need to provide the ability to assign priority to files in the add method.

@Piccirello Piccirello force-pushed the metadata-api branch 2 times, most recently from 1490741 to f736e62 Compare July 1, 2024 20:22
@Chocobo1
Copy link
Member

Chocobo1 commented Jul 2, 2024

I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists.

FYI, there certainly exists bencode encoder/decoder library in JS however I'm not aware that they are compatible with bittorrent v2. In the past, I had to mod one to suit my need.

@glassez
Copy link
Member

glassez commented Jul 3, 2024

FYI, there certainly exists bencode encoder/decoder library in JS however I'm not aware that they are compatible with bittorrent v2.

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too. Or do you refer to torrent file specific parsers?

@Chocobo1
Copy link
Member

Chocobo1 commented Jul 3, 2024

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too.

They had deficiencies in their implementations. Not fully conform with the spec.

@glassez
Copy link
Member

glassez commented Jul 3, 2024

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too.

They had deficiencies in their implementations. Not fully conform with the spec.

It seems to be the same problem as with bencode editors. I couldn't find BitTorrent independent editor for Linux.

@Piccirello
Copy link
Member Author

I ended up exploring how retrieved metadata would tie into the /add API, resulting in some changes to the /metadata API. Namely, the /metadata API now supports processing multiple sources at once. I've also made the necessary changes to the /add API to support downloading a torrent whose metadata has been previously retrieved via /metadata. This allows for adding the torrent with custom file priorities, which will enable a future PR to modify the WebUI's Add Torrent experience to mimic that of the GUI. PR description has been modified with the full changes.

src/webui/api/torrentscontroller.cpp Fixed Show resolved Hide resolved
@Piccirello Piccirello marked this pull request as ready for review July 9, 2024 21:42
@Piccirello Piccirello requested a review from a team July 9, 2024 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WebAPI WebAPI-related issues/changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load torrent metadata by magnet, hash... in qBittorrent-nox
4 participants