From 2c490fe95a0936ba81c0e3c3124f940844a74a49 Mon Sep 17 00:00:00 2001 From: nugaon Date: Wed, 30 Jun 2021 11:50:32 +0200 Subject: [PATCH 1/6] first thoughts about streaming of feeds --- SWIPs/swip-streaming-of-feeds.md | 136 +++++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 SWIPs/swip-streaming-of-feeds.md diff --git a/SWIPs/swip-streaming-of-feeds.md b/SWIPs/swip-streaming-of-feeds.md new file mode 100644 index 0000000..34a5c0e --- /dev/null +++ b/SWIPs/swip-streaming-of-feeds.md @@ -0,0 +1,136 @@ +--- +SWIP: +title: Fault tolerant and flexible streaming of feeds +author: Viktor Levente Tóth (@nugaon) +discussions-to: +status: Draft +type: Interface +category (*only required for Standard Track): ERC +created: 2021-06-30 +--- + +## Simple Summary + +Mutable content can be streamed periodically from a content creator, where the completeness of the stream is neglectable, but getting the _closest state_ to an arbitrary time as fast as possible is the most important factor. +For that, the content creator that operates the below described _feed indexing schema and its corresponding lookup_ has indulgent obligation for uploading on an arbitrary time interval meanwhile the consumers of the content can quickly and cheaply retrieve the most up-to-date state of a mutable content. + +## Abstract + +Achive the fastest retrieval method of a feed stream which is optimised to download the closest available segment of the feed at a given update time. +It is intended to decrease the lookup processes on the network as much as possible. +The main use-case is to get the last updated state of the content, for that there is a finite set of feed indexes in order to start its lookup method from the last (possible) updated feed index. +The owner of the feed _promises_ to upload feed segments for every time interval choosen, but it is a weak requirement and it is possible to leave out indexes from the stream in exchange for slower retrieval speed of the stream. + +## Motivation + +The lookup time of feeds can be significantly long where the lookup process only stops if there is no newer content under a feed. +A solution is needed where this approach is reversed: a lookup that stops if there is a successful hit before of a given upload time. +This lookup time can be shorthened already by not waiting for the last (non-existing) feed retrieval. +If the uploader keeps itself to the periodic feed indexing and it uploads every time when is needed, the retrieval for the user is `O(1)`. +Nevertheless, the lookups can easily go wrong, because of (1) network issues or (2) the uploader cannot upload the content in time. +These problems are equivalent from the retrieval perspective and the first one is also related to the epoch-based and sequential feeds. +Any chunk of the lookup trajectory is not available (even temporary) then it could cause huge inaccuracy, but an approach like this is always closer or the same close to a desired upload time. +Moreover, the current lookup methods keep those chunks alive that maybe unnecessary from the content usage side when the goal of the content (or dApp) is to provide the most up-to-date state. + +## Specification + + +In the following subsections I would like to detail how to: +* [get the nearest last index of an arbitrary time](###-Nearest-Last-Index-for-an-Arbitrary-Time) +* [construct feed topic](###-Feed-Topic-Construction) +* [upload feed chunk](###-Upload-Feed-Chunk) +* [download feed chunk at specific time](###-Download-Feed-Chunk-At-Specific-Time) +* [and download (whole) feed stream](###-Download-Feed-Stream) + +The feeds always have a `topic` and an `index`, that we have to define. + +### Nearest Last Index for an Arbitrary Time + +Let's figure out what is the nearest last `feed index` of a given arbitrary time. +We know the current time right now (`Tp`) which is surely greater or equal to the last upload time (`Tn`) of the feed stream. +We also know two metadata of the stream: initial time (`T0`) and update period (`Δ1`). +We want to find the nearest last index to an arbitrary time (`Tx`) where `T0 <= Tx <= Tn <= Tp` +All of these are timestamps and their smallest unit is 1 second, e.g. even with this time unit it is still uncertain whether the latest update can be downloaded if `Tp = Tn`, because of the nature of P2P storage systems. + +So the formula is really simple how to calculate the nearest last index parameter (`i`) of the `Tx`: + +```ts +function getIndexForArbitraryTime(Tx: number, T0: number, updatePeriod: number): number { + return Math.floor((Tx - T0) / updatePeriod) // which is the `i` +} +``` + +### Feed Topic Construction + +The `feed topic` has to contain the initial time (`T0`) and optionally can be prefixed/suffixed with an additional identification so that the uploader with the same key can maintain many distinct feed streams. +Either of them is chosen, the feed topic construction algorithm has to be the same and well-known between the parties. + +### Upload Feed Chunk + +Within the upload function, the current time (`Tp`) is initialized and the `feed index` of the newest feed chunk can be calculated by calling `getIndexForArbitraryTime(Tp, T0, updatePeriod)`. +After the `feed topic` is initialized as well in the before mentioned way, the uploading can happen as at other feed uploads. + +### Download Feed Chunk at Specific Time + +The feed download method only has 2 required parameters, where the last one is optional: `function downloadFeed(owner: EthAddress, topic: bytes[], Tx?: number): Feed`. + +If the last parameter has not been passed, the `downloadFeed` function will calculate the nearest last index based on the current time like within the uploading feed function. + +There are cases when the chunk is not available on the first calculated index, then the lookup method starts and tries to find the nearest downloadable chunk. +The most reasonable approach here is check the previous (`i-1`) and the next fetch index (`i+1`) until the worst case respectively `0` and last (`n`) index. +This lookup also can happen paralelly and checking _n_ chunks simulteniously on both sides in order to raise the certainty for the successful hit. + +### Download Feed Stream + +The downloading of the stream is really straightforward, we should download all feeds one-by-one or paralelly starting from `0` index until and included `getIndexForArbitraryTime(Tp, T0, updatePeriod)` index. + +The integrity check of the stream can only happen by putting versioning metadata into the feed segments, because the content creator may not intend to upload for every uploading time period (despite of the incentivised factor of this feed type). + +Some other optimalizations, features and other aspects are described in the [next chapter](##-Rationale). + +## Rationale + + +The current worked out solutions of handling a content stream (Book of Swarm, Subscriptions, page 112) approach the update tracking by: +- push notifications about the update (PSS) +- epoch-based indexing for sporadically updated feeds +- polling for almost ideal periodic updates (frequent updates) + +Though the book mentiones also _periodic feeds_ for calculating index of feed starting from a specific well-known time, +but it does not touch the scenario what if the updates are _parlty sporadic_ and those are may not available or not uploaded. + +This requires an **indexing method** that extends the `periodic feeds` indexing and a **specific retrieval method** to handle the holes in the update stream. + +The best approach to get the closest feed of an arbitrary time is using epoch-based feeds, but its downside - additionaly to that I summarized [above](##-Motivation) - is after every writing the retrieval is slower. +It is similar to that analogy when a hard drive is fragmenting because of the many frequent writes within one sector (base fragment of the epoch time lookup). +This proposed feed indexing method is the opposite as at lookups: +if there is expected amount of writes periodically, the retrieval of the data is faster, that even can be _O(1)_. +Compared to the epoch-based feeds, basically the length of the base segment is arbitrary, and it encourages the user to make periodic uploads and stick to this base segment instead of sporadic uploading in exchange for better retrieval time. + +The downside is within the concerted update time period the content creator cannot update the state of the feed - although it is possible to overwrite the content of a feed by uploading it again with different payload, but on download-side it is possible to still to get back the old one. +This problem can be mitigated by changing the registry of the feed, from where the consumers get the initial metadatas of the feed: +- the initial timestamp (`T0`) should be changed to that point from where the time period changes (`T1`) +- change the new time period (`Δ2`) from the old one (`Δ1`) for state uploads + +If the registry settled on blockchain in a smart contract, then it can have a defined event on metadata change of the content, on which the clients can listen. +Thereby, the consumers of the feed can immediately react to the upload frequence change and they can poll according to the new rules. +If somehow the blockchain listening on client side is not suitable, it is possible to put the `uploading time period` and `initial timestamp` metadatas into all of the feed stream updates so that the consumers can sync to the stream after `MAX(Δ1,Δ2)` time if `MAX(Δ1,Δ2) % MIN(Δ1,Δ2) = 0` and there is one `k` and `m` positive integer where `T0 + (k * Δ1) = T0 + (m * Δ2) = T1` + +Though it is stated this approach does not address downloading the whole feed stream, it is still possible: +* In case of punctual updates without any upload time period change, the stream download is identical with the sequential/periodic feed stream download. +* If the uploading time period has been changed, the feed index set construction should happen either considering the before mentioned blockchain event emitting or if it is contained only in the feed metadata, then when it changes in the downloading process it is necessary to start a lookup backwards if `Δ2 < Δ1` until the change, but maximum `z = Δ1 / Δ2` units. + +## Backwards Compatibility + +The whole idea can be implemented on application layer using single owner chunks, but optionally the solution also can be integrated to the P2P client. + +## Test Cases + +_Pending_ + +## Implementation + +_Pending_ + +## Copyright +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 30ac90312bbf49091b085d5891b5de83585dd15c Mon Sep 17 00:00:00 2001 From: nugaon Date: Wed, 30 Jun 2021 13:21:06 +0200 Subject: [PATCH 2/6] amend summary --- SWIPs/swip-streaming-of-feeds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SWIPs/swip-streaming-of-feeds.md b/SWIPs/swip-streaming-of-feeds.md index 34a5c0e..dedce22 100644 --- a/SWIPs/swip-streaming-of-feeds.md +++ b/SWIPs/swip-streaming-of-feeds.md @@ -12,7 +12,7 @@ created: 2021-06-30 ## Simple Summary Mutable content can be streamed periodically from a content creator, where the completeness of the stream is neglectable, but getting the _closest state_ to an arbitrary time as fast as possible is the most important factor. -For that, the content creator that operates the below described _feed indexing schema and its corresponding lookup_ has indulgent obligation for uploading on an arbitrary time interval meanwhile the consumers of the content can quickly and cheaply retrieve the most up-to-date state of a mutable content. +For that, the content creator that operates the below described _feed indexing method and its corresponding lookup_ has indulgent obligation for uploading on an arbitrary time interval meanwhile the consumers of the content can quickly and cheaply retrieve the most up-to-date state of a mutable content. ## Abstract From 1d96dc06b36dc1bc14a15c085fe78cb7df7840d4 Mon Sep 17 00:00:00 2001 From: nugaon Date: Wed, 30 Jun 2021 14:19:38 +0200 Subject: [PATCH 3/6] amend summary --- SWIPs/swip-streaming-of-feeds.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/SWIPs/swip-streaming-of-feeds.md b/SWIPs/swip-streaming-of-feeds.md index dedce22..699223a 100644 --- a/SWIPs/swip-streaming-of-feeds.md +++ b/SWIPs/swip-streaming-of-feeds.md @@ -11,11 +11,17 @@ created: 2021-06-30 ## Simple Summary -Mutable content can be streamed periodically from a content creator, where the completeness of the stream is neglectable, but getting the _closest state_ to an arbitrary time as fast as possible is the most important factor. -For that, the content creator that operates the below described _feed indexing method and its corresponding lookup_ has indulgent obligation for uploading on an arbitrary time interval meanwhile the consumers of the content can quickly and cheaply retrieve the most up-to-date state of a mutable content. +This _feed indexing method and its corresponding lookup_ achive the retriaval of a feed's most recent state in `O(1)` queries. + +For that, the feed indices are anchored to their uploading times. + +This method is optimized for downloading a feed chunk queried by a point in time as quickly and cheaply as possible. + +It introduces new requirements for the uploader, but those are not strict and the lookup method corrects the inaccuracies of the stream. ## Abstract +Mutable content can be streamed periodically from a content creator, where getting the _closest state_ to an arbitrary time as fast as possible is the most important factor. Achive the fastest retrieval method of a feed stream which is optimised to download the closest available segment of the feed at a given update time. It is intended to decrease the lookup processes on the network as much as possible. The main use-case is to get the last updated state of the content, for that there is a finite set of feed indexes in order to start its lookup method from the last (possible) updated feed index. From 3b7193768133fe36464b83cfd8981a2275531f3d Mon Sep 17 00:00:00 2001 From: nugaon Date: Wed, 30 Jun 2021 15:06:44 +0200 Subject: [PATCH 4/6] rephrase abstract and motivation --- SWIPs/swip-streaming-of-feeds.md | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/SWIPs/swip-streaming-of-feeds.md b/SWIPs/swip-streaming-of-feeds.md index 699223a..fb849e8 100644 --- a/SWIPs/swip-streaming-of-feeds.md +++ b/SWIPs/swip-streaming-of-feeds.md @@ -21,22 +21,28 @@ It introduces new requirements for the uploader, but those are not strict and th ## Abstract -Mutable content can be streamed periodically from a content creator, where getting the _closest state_ to an arbitrary time as fast as possible is the most important factor. -Achive the fastest retrieval method of a feed stream which is optimised to download the closest available segment of the feed at a given update time. -It is intended to decrease the lookup processes on the network as much as possible. -The main use-case is to get the last updated state of the content, for that there is a finite set of feed indexes in order to start its lookup method from the last (possible) updated feed index. -The owner of the feed _promises_ to upload feed segments for every time interval choosen, but it is a weak requirement and it is possible to leave out indexes from the stream in exchange for slower retrieval speed of the stream. +- Mutable content can be streamed periodically from a content creator, where getting the _closest state_ to an arbitrary time as fast as possible is the most important factor. +- Define the fastest retrieval method of a feed stream which is optimised to download the closest available segment of the feed at a given update time. +- It is intended to decrease the lookup processes on the network as much as possible. +- The main use-case is to get the most recent updated state of the content. For that, there is a finite set of feed indexes in order to start its lookup method from the last (possible) updated feed index. +- The owner of the feed _promises_ to upload feed segments for every time interval choosen. It is a weak requirement and it is possible to leave out indices from the stream in exchange for slower retrieval speed of the stream. ## Motivation The lookup time of feeds can be significantly long where the lookup process only stops if there is no newer content under a feed. -A solution is needed where this approach is reversed: a lookup that stops if there is a successful hit before of a given upload time. -This lookup time can be shorthened already by not waiting for the last (non-existing) feed retrieval. + +A solution is needed where this approach is reversed: a lookup that stops if there is a successful hit. +The lookup time can be shorthened already by not waiting for the last (non-existing) feed retrieval. + If the uploader keeps itself to the periodic feed indexing and it uploads every time when is needed, the retrieval for the user is `O(1)`. Nevertheless, the lookups can easily go wrong, because of (1) network issues or (2) the uploader cannot upload the content in time. These problems are equivalent from the retrieval perspective and the first one is also related to the epoch-based and sequential feeds. + Any chunk of the lookup trajectory is not available (even temporary) then it could cause huge inaccuracy, but an approach like this is always closer or the same close to a desired upload time. -Moreover, the current lookup methods keep those chunks alive that maybe unnecessary from the content usage side when the goal of the content (or dApp) is to provide the most up-to-date state. + +Moreover, the current lookup methods keep those chunks alive that maybe unnecessary from the content usage perspective. + +One use-case is the feed stream of the content (or dApp) is for providing its most recent state. ## Specification From 80d8d126835bfe65238f8dd6ffbf7fe5ec87c4be Mon Sep 17 00:00:00 2001 From: nugaon Date: Tue, 6 Jul 2021 10:00:04 +0200 Subject: [PATCH 5/6] reindexing time period according to the time indexing --- SWIPs/swip-streaming-of-feeds.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/SWIPs/swip-streaming-of-feeds.md b/SWIPs/swip-streaming-of-feeds.md index fb849e8..e1c0ee0 100644 --- a/SWIPs/swip-streaming-of-feeds.md +++ b/SWIPs/swip-streaming-of-feeds.md @@ -60,7 +60,7 @@ The feeds always have a `topic` and an `index`, that we have to define. Let's figure out what is the nearest last `feed index` of a given arbitrary time. We know the current time right now (`Tp`) which is surely greater or equal to the last upload time (`Tn`) of the feed stream. -We also know two metadata of the stream: initial time (`T0`) and update period (`Δ1`). +We also know two metadata of the stream: initial time (`T0`) and update period (`Δ0`). We want to find the nearest last index to an arbitrary time (`Tx`) where `T0 <= Tx <= Tn <= Tp` All of these are timestamps and their smallest unit is 1 second, e.g. even with this time unit it is still uncertain whether the latest update can be downloaded if `Tp = Tn`, because of the nature of P2P storage systems. @@ -122,15 +122,15 @@ Compared to the epoch-based feeds, basically the length of the base segment is a The downside is within the concerted update time period the content creator cannot update the state of the feed - although it is possible to overwrite the content of a feed by uploading it again with different payload, but on download-side it is possible to still to get back the old one. This problem can be mitigated by changing the registry of the feed, from where the consumers get the initial metadatas of the feed: - the initial timestamp (`T0`) should be changed to that point from where the time period changes (`T1`) -- change the new time period (`Δ2`) from the old one (`Δ1`) for state uploads +- change the new time period (`Δ1`) from the old one (`Δ0`) for state uploads If the registry settled on blockchain in a smart contract, then it can have a defined event on metadata change of the content, on which the clients can listen. Thereby, the consumers of the feed can immediately react to the upload frequence change and they can poll according to the new rules. -If somehow the blockchain listening on client side is not suitable, it is possible to put the `uploading time period` and `initial timestamp` metadatas into all of the feed stream updates so that the consumers can sync to the stream after `MAX(Δ1,Δ2)` time if `MAX(Δ1,Δ2) % MIN(Δ1,Δ2) = 0` and there is one `k` and `m` positive integer where `T0 + (k * Δ1) = T0 + (m * Δ2) = T1` +If somehow the blockchain listening on client side is not suitable, it is possible to put the `uploading time period` and `initial timestamp` metadatas into all of the feed stream updates so that the consumers can sync to the stream after `MAX(Δ0,Δ1)` time if `MAX(Δ0,Δ1) % MIN(Δ0,Δ1) = 0` and there is one `k` and `m` positive integer where `T0 + (k * Δ0) = T0 + (m * Δ1) = T1` Though it is stated this approach does not address downloading the whole feed stream, it is still possible: * In case of punctual updates without any upload time period change, the stream download is identical with the sequential/periodic feed stream download. -* If the uploading time period has been changed, the feed index set construction should happen either considering the before mentioned blockchain event emitting or if it is contained only in the feed metadata, then when it changes in the downloading process it is necessary to start a lookup backwards if `Δ2 < Δ1` until the change, but maximum `z = Δ1 / Δ2` units. +* If the uploading time period has been changed, the feed index set construction should happen either considering the before mentioned blockchain event emitting or if it is contained only in the feed metadata, then when it changes in the downloading process it is necessary to start a lookup backwards if `Δ1 < Δ0` until the change, but maximum `z = Δ0 / Δ1` units. ## Backwards Compatibility From 9e2342e6f40c778bffa6fc19e183078b31901c93 Mon Sep 17 00:00:00 2001 From: nugaon Date: Tue, 6 Jul 2021 10:03:39 +0200 Subject: [PATCH 6/6] change modulo percentage to 'mod' operator --- SWIPs/swip-streaming-of-feeds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SWIPs/swip-streaming-of-feeds.md b/SWIPs/swip-streaming-of-feeds.md index e1c0ee0..f5ad081 100644 --- a/SWIPs/swip-streaming-of-feeds.md +++ b/SWIPs/swip-streaming-of-feeds.md @@ -126,7 +126,7 @@ This problem can be mitigated by changing the registry of the feed, from where t If the registry settled on blockchain in a smart contract, then it can have a defined event on metadata change of the content, on which the clients can listen. Thereby, the consumers of the feed can immediately react to the upload frequence change and they can poll according to the new rules. -If somehow the blockchain listening on client side is not suitable, it is possible to put the `uploading time period` and `initial timestamp` metadatas into all of the feed stream updates so that the consumers can sync to the stream after `MAX(Δ0,Δ1)` time if `MAX(Δ0,Δ1) % MIN(Δ0,Δ1) = 0` and there is one `k` and `m` positive integer where `T0 + (k * Δ0) = T0 + (m * Δ1) = T1` +If somehow the blockchain listening on client side is not suitable, it is possible to put the `uploading time period` and `initial timestamp` metadatas into all of the feed stream updates so that the consumers can sync to the stream after `MAX(Δ0,Δ1)` time if `MAX(Δ0,Δ1) mod MIN(Δ0,Δ1) = 0` and there is one `k` and `m` positive integer where `T0 + (k * Δ0) = T0 + (m * Δ1) = T1` Though it is stated this approach does not address downloading the whole feed stream, it is still possible: * In case of punctual updates without any upload time period change, the stream download is identical with the sequential/periodic feed stream download.