Add WebAPI for fetching torrent metadata #21015

Piccirello · 2024-07-01T06:11:00Z

This PR implements a new API for fetching a torrent's metadata. The API accepts a magnet URI, torrent hash, .torrent URL, or uploaded .torrent file, and returns the torrent's associated metadata. This PR also modifies the /torrents/add API to support downloading a torrent whose metadata has been previously fetched. The ultimate goal is for the WebUI to provide an Add Torrent experience equivalent to that of the GUI, where content can be reprioritized/unchecked before the torrent is added.

`/metadata` API

HTTP request

To request metadata for a torrent, specify the torrent in the sources query parameter of a GET request to /api/v2/torrents/metadata. All sources must be url encoded and multiple sources can be delineated by a (non-url encoded) comma. sources supports torrents in the following formats:

magnet URI (e.g. magnet:?xt=urn:btih:a8eeefc8a0dc402b24686ddfd775a409fe4b00e0&dn=example)
hash (e.g. a8eeefc8a0dc402b24686ddfd775a409fe4b00e0)
.torrent URL (e.g. https://example.com/example.torrent)

You may also upload .torrent files. To do so, submit the file(s) as multipart MIME data. You may use any key for the uploaded value. To test file upload using curl, specify the -F flag (e.g. curl https://127.0.0.1 --get -F file=@"/root/example.torrent").

HTTP response

Given the asynchronous nature of retrieving metadata, there are two successful HTTP status codes used.

When metadata is requested for a torrent that requires asynchronous background work (i.e. connecting to DHT/peers), the client receives a 202. A 202 indicates that the request was successful, but additional background work must be completed before a meaningful response can be provided.

GET /api/v2/torrents/metadata?sources=abc
HTTP/1.1 202 OK
{}

When metadata is available for all requested torrents, either because the torrent has been added or because the metadata has been retrieved from a prior request, the client receives a 200.

GET /api/v2/torrents/metadata?sources=abc
HTTP/1.1 200 OK
{
    abc: {
    "comment": string,
    "created_by": string,
    "creation_date": int,
    "files": [],
    "hash": string,
    "infohash_v1": string,
    "infohash_v2": string,
    "name": string,
    "piece_size": int,
    "pieces_num": int,
    "private": bool,
    "total_size": int,
    "trackers": [],
    "webseeds": []
    }
}

When metadata is requested for multiple torrents, a 202 will be returned if any of the torrents requires asycnhronous background work. This means that a 202 may include data for some of the request torrents.

GET /api/v2/torrents/metadata?sources=abc,def
HTTP/1.1 202 OK
{
    abc: { /** more data **/ }
}

Retrieved metadata will be cached in the current web session. Subsequent requests performed within the same web session will return the metadata immediately, while other web sessions will be required to reretrieve the torrent's metadata from peers. Once a torrent is added using the cached metadata, the metadata is removed from the cache.

`/add` API

The existing /add API now supports using the metadata cache that's populated by the new /metadata API. When specifying a url and/or torrent file to download, the metadata cache is first checked for the torrent. If found, the metadata is used directly from the cache, rather than needing to re-retrieve it.

When metadata is retrieved directly from the cache, you may also specify a new filePriorities parameter. This parameter allows for specifying the file priority of each file in the torrent. This parameter may only be specified when adding a single torrent.

Alternatives:

I explored having the metadata API leave the request open until the metadata was available. Once the metadata was fetched, it would be returned directly in the response of the original request. One downside of this approach is that metadata retrieval can take an arbitrary long amount of time. This could result in torrents whose metadata could never be retrieved via this API (e.g. due to the retrieval taking longer than the client's/reverse proxy's request timeout). This approach would also require some further modification of qBittorrent's web application layer to suppress the default behavior of returning a blank response.

Future work:

Modify the WebUI to make use of the new /metadata API. This will likely mean splitting the current Add/Upload Torrent dialog into two dialogs. The first dialog will support specifying the URL(s)/.torrent file(s) to submit, while the second dialog will display the torrent's metadata and allow for modification of file priorities.
Support downloading the retrieved metadata as a .torrent file (as supported in the GUI)

Closes #20966.

src/base/bittorrent/torrentinfo.h

Piccirello · 2024-07-01T06:14:17Z

src/webui/api/serialize/serialize_torrent.h

@@ -69,6 +69,7 @@ inline const QString KEY_TORRENT_CONTENT_PATH = u"content_path"_s;
 inline const QString KEY_TORRENT_ADDED_ON = u"added_on"_s;
 inline const QString KEY_TORRENT_COMPLETION_ON = u"completion_on"_s;
 inline const QString KEY_TORRENT_TRACKER = u"tracker"_s;
+inline const QString KEY_TORRENT_TRACKERS = u"trackers"_s;


I wasn't certain if this file was the best place for these new keys.

They are used in dictionary returned by serialize() below. So it seems to be a wrong place for other keys.

Moved to a new section in torrentscontroller.cpp.

Moved to a new section in torrentscontroller.cpp.

I would put it into serialize/serialize_torrentinfo.* as serialize(const TorrentInfo &).

This becomes tricky as it reuses keys defined in torrentscontroller.cpp, and I don't think it'd be worthwhile to duplicate them. I'm going to leave the serialize function in the anonymous namespace in torrentscontroller.cpp since that pattern is heavily used.

Piccirello · 2024-07-01T06:15:01Z

src/webui/api/torrentscontroller.cpp

+            // the TorrentInfo returned by Torrent::info() contains an empty list of trackers
+            // we must fetch the trackers directly via Torrent::trackers()
+            QVector<BitTorrent::TrackerEntry> trackers;
+            for (const BitTorrent::TrackerEntryStatus &tracker : torrent->trackers())
+                trackers << BitTorrent::TrackerEntry
+                {
+                    .url = tracker.url,
+                    .tier = tracker.tier
+                };
+
+            setResult(serializeTorrentMetadata(torrent->info(), trackers));


I was surprised by this and figured it was worth calling out.

Piccirello · 2024-07-01T06:20:14Z

src/webui/api/torrentscontroller.h

+private:
+    void onMetadataDownloaded(const BitTorrent::TorrentInfo &metadata);
+
+    QHash<BitTorrent::InfoHash, BitTorrent::TorrentInfo> m_torrentMetadata;


Retrieved metadata is stored in the session and is only cleaned up when the session is destroyed. Alternatively, we could clean up the metadata once it's retrieved via the API.

Alternatively, we could clean up the metadata once it's retrieved via the API.

👍
I believe the metadata won't be needed more than once in the vast majority of cases.

Given that the /add API now supports using this metadata, the metadata will be deleted when the torrent is added.

glassez

Only preliminary comments after brief review.

src/webui/api/apicontroller.h

glassez · 2024-07-01T12:00:08Z

src/webui/api/serialize/serialize_torrent.h

@@ -69,6 +69,7 @@ inline const QString KEY_TORRENT_CONTENT_PATH = u"content_path"_s;
 inline const QString KEY_TORRENT_ADDED_ON = u"added_on"_s;
 inline const QString KEY_TORRENT_COMPLETION_ON = u"completion_on"_s;
 inline const QString KEY_TORRENT_TRACKER = u"tracker"_s;
+inline const QString KEY_TORRENT_TRACKERS = u"trackers"_s;


They are used in dictionary returned by serialize() below. So it seems to be a wrong place for other keys.

src/webui/api/torrentscontroller.cpp

glassez · 2024-07-01T12:16:41Z

src/webui/api/torrentscontroller.cpp

+            m_torrentMetadata.insert(infoHash, BitTorrent::TorrentInfo{});
+        }
+
+        setResult(QJsonObject {}, 202);


I wouldn't change setResult but provide corresponding exception for this case to not break current API design.

To me it makes logical sense that setResult would accept content, a status code, a mime type, etc. What's the downside of "breaking" (i.e. modifying) current API design?

To me it makes logical sense that setResult would accept content, a status code, a mime type, etc.

The layer we talked about (implemented as API controller classes) is designed to be HTTP (or any other network protocol) independent. Its area of responsibility is to parse request, forward them to the application core, maintain some intermediate state between requests (if any functions require it), and serialize the results. Another layer is responsible for delivering requests and sending responses via HTTP.

What's the downside of "breaking" (i.e. modifying) current API design?

Breaking things around you in between is always a bad thing, isn't it? If you have an idea of a better architecture than we have now, and you are ready to implement it or at least outline its concept in sufficient detail so that someone else can implement it, you are welcome.

The layer we talked about (implemented as API controller classes) is designed to be HTTP (or any other network protocol) independent. Its area of responsibility is to parse request, forward them to the application core, maintain some intermediate state between requests (if any functions require it), and serialize the results. Another layer is responsible for delivering requests and sending responses via HTTP.

How would you handle this then? We still need some way of passing an HTTP status code. I can add a setStatus method to the APIController class but I'm not sure if this would also violate the constraints you've set.

How would you handle this then?

As I suggested initially, through a suitable exception (this is how it is supposed to work if it is impossible to return the result). If you really believe that it should be interpreted in some way other than the regular "not found", you can add a new type to the APIError and handle it in WebApplication. I guess you can find fault with the fact that it's not an "error" case. These are conventions. It is quite acceptable at this layer to interpret any case of inability to return a result as an "error". HTTP layer should care about transforming it into HTTP-relevant form.

Even from an HTTP perspective, I personally don't see much point in returning 202 rather than just 404. The "downloading" torrent metadata process has an "indeterminate" essence. It may never end for unspecified reasons.

I think you misunderstand HTTP status codes. I also think you're missing the context that this code is running asynchronously and not in the path of an HTTP request.

This is a good resource explaining 202 and why it's appropriate here: https://http.dev/202.

I also think you're missing the context that this code is running asynchronously and not in the path of an HTTP request.

As I understand it, it is used for similar cases, i.e. when you give the server a task that it cannot perform (or complete) immediately in order to return the result in current response. I just meant that "searching metadata" differs from a regular task, which has a certain scope of execution. So you can't "promise" the client that "I'll complete this task then and there", you can't provide any progress, etc. The only thing that distinguishes it from the usual "not found" is that you take some action to find the metadata in between. But all the client has from this is that he could try to request it again after a while, which he could do after receiving the 404 code as well.

As per https://restfulapi.net/http-status-202-accepted/:

When using HTTP 202, it’s important to follow these best practices:

Provide a way for the client to monitor the progress or receive the result of the request.

Include relevant headers, such as ‘Location‘, to indicate where the client can obtain more information about the request’s status.

Clearly document the expected behavior and handling of the request, including any time limits for polling or callbacks.

It doesn't look like we can satisfy the above in this case.

Anyway, I don't care much about the HTTP layer. I have expressed my opinion on this, but I leave the rest to your discretion.

I've reverted the changes to setResult and added a new setStatus method.

src/webui/api/torrentscontroller.cpp

glassez · 2024-07-01T12:29:16Z

src/webui/api/torrentscontroller.h

+private:
+    void onMetadataDownloaded(const BitTorrent::TorrentInfo &metadata);
+
+    QHash<BitTorrent::InfoHash, BitTorrent::TorrentInfo> m_torrentMetadata;


Alternatively, we could clean up the metadata once it's retrieved via the API.

👍
I believe the metadata won't be needed more than once in the vast majority of cases.

glassez · 2024-07-01T12:48:39Z

@Piccirello
I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists.
I believe that if we had used one, we would have got a more universal solution. Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent. As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

src/webui/api/torrentscontroller.cpp

Piccirello · 2024-07-01T18:18:18Z

Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent.

I'm not convinced that's the approach we'll eventually take. I can imagine sending the .torrent file once, returning the metadata to the client, and then allowing the torrent to be added without needing to re-send the file (likely by transmitting the info hash).

As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

To me the session seems like an appropriate place to store this. I don't think the client should be responsible for parsing this data. It would also mean each client (official and unofficial) would need to implement it.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

I agree with this. I'll try to sketch out what the next steps would look like and how this API would be used.

NikcN22 · 2024-07-01T18:57:20Z

@Piccirello I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists. I believe that if we had used one, we would have got a more universal solution. Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent. As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

I think there is no need to embed the Bencode decoder directly into the client API. This is quite easy to do directly in the “graphical” part. More interesting is the need to provide the ability to assign priority to files in the add method.

Chocobo1 · 2024-07-02T05:33:26Z

I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists.

FYI, there certainly exists bencode encoder/decoder library in JS however I'm not aware that they are compatible with bittorrent v2. In the past, I had to mod one to suit my need.

glassez · 2024-07-03T05:45:51Z

FYI, there certainly exists bencode encoder/decoder library in JS however I'm not aware that they are compatible with bittorrent v2.

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too. Or do you refer to torrent file specific parsers?

Chocobo1 · 2024-07-03T07:10:17Z

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too.

They had deficiencies in their implementations. Not fully conform with the spec.

glassez · 2024-07-03T07:19:38Z

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too.

They had deficiencies in their implementations. Not fully conform with the spec.

It seems to be the same problem as with bencode editors. I couldn't find BitTorrent independent editor for Linux.

Piccirello · 2024-07-09T04:16:52Z

I ended up exploring how retrieved metadata would tie into the /add API, resulting in some changes to the /metadata API. Namely, the /metadata API now supports processing multiple sources at once. I've also made the necessary changes to the /add API to support downloading a torrent whose metadata has been previously retrieved via /metadata. This allows for adding the torrent with custom file priorities, which will enable a future PR to modify the WebUI's Add Torrent experience to mimic that of the GUI. PR description has been modified with the full changes.

src/webui/api/torrentscontroller.cpp

Piccirello commented Jul 1, 2024

View reviewed changes

src/base/bittorrent/torrentinfo.h Outdated Show resolved Hide resolved

Piccirello commented Jul 1, 2024

View reviewed changes

glassez self-assigned this Jul 1, 2024

glassez added the WebAPI WebAPI-related issues/changes label Jul 1, 2024

glassez reviewed Jul 1, 2024

View reviewed changes

src/webui/api/torrentscontroller.cpp Outdated Show resolved Hide resolved

Piccirello force-pushed the metadata-api branch 2 times, most recently from cec83e2 to f857170 Compare July 1, 2024 18:38

Piccirello force-pushed the metadata-api branch 2 times, most recently from 1490741 to f736e62 Compare July 1, 2024 20:22

Support sending custom status codes via WebAPI

5bf02c3

Piccirello force-pushed the metadata-api branch from f736e62 to fcbbac3 Compare July 9, 2024 04:14

github-advanced-security bot found potential problems Jul 9, 2024

View reviewed changes

src/webui/api/torrentscontroller.cpp Fixed Show resolved Hide resolved

Piccirello added 2 commits July 8, 2024 21:25

Add WebAPI for fetching torrent metadata

dc4bea9

Support downloading torrent from previously fetched metadata

a7da584

Piccirello force-pushed the metadata-api branch from fcbbac3 to a7da584 Compare July 9, 2024 04:26

Piccirello mentioned this pull request Jul 9, 2024

Add WebAPI for managing torrent webseeds #21043

Open

Piccirello marked this pull request as ready for review July 9, 2024 21:42

Piccirello requested a review from a team July 9, 2024 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WebAPI for fetching torrent metadata #21015

Add WebAPI for fetching torrent metadata #21015

Piccirello commented Jul 1, 2024 •

edited

Loading

Piccirello Jul 1, 2024

glassez Jul 1, 2024

Piccirello Jul 1, 2024

glassez Jul 2, 2024

Piccirello Jul 7, 2024

Piccirello Jul 1, 2024

Piccirello Jul 1, 2024

glassez Jul 1, 2024

Piccirello Jul 9, 2024

glassez left a comment

glassez Jul 1, 2024

glassez Jul 1, 2024

Piccirello Jul 1, 2024

glassez Jul 2, 2024

Piccirello Jul 2, 2024

glassez Jul 3, 2024

glassez Jul 3, 2024

Piccirello Jul 3, 2024 •

edited

Loading

glassez Jul 3, 2024

glassez Jul 3, 2024

Piccirello Jul 9, 2024

glassez Jul 1, 2024

glassez commented Jul 1, 2024

Piccirello commented Jul 1, 2024

NikcN22 commented Jul 1, 2024

Chocobo1 commented Jul 2, 2024 •

edited

Loading

glassez commented Jul 3, 2024

Chocobo1 commented Jul 3, 2024 •

edited

Loading

glassez commented Jul 3, 2024

Piccirello commented Jul 9, 2024

Add WebAPI for fetching torrent metadata #21015

Are you sure you want to change the base?

Add WebAPI for fetching torrent metadata #21015

Conversation

Piccirello commented Jul 1, 2024 • edited Loading

/metadata API

HTTP request

HTTP response

/add API

Alternatives:

Future work:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glassez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Piccirello Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glassez commented Jul 1, 2024

Piccirello commented Jul 1, 2024

NikcN22 commented Jul 1, 2024

Chocobo1 commented Jul 2, 2024 • edited Loading

glassez commented Jul 3, 2024

Chocobo1 commented Jul 3, 2024 • edited Loading

glassez commented Jul 3, 2024

Piccirello commented Jul 9, 2024

Piccirello commented Jul 1, 2024 •

edited

Loading

`/metadata` API

`/add` API

Piccirello Jul 3, 2024 •

edited

Loading

Chocobo1 commented Jul 2, 2024 •

edited

Loading

Chocobo1 commented Jul 3, 2024 •

edited

Loading