-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add block cache, remove sqlite3 #149
Conversation
Codecov Report
@@ Coverage Diff @@
## master #149 +/- ##
==========================================
- Coverage 34.46% 31.09% -3.38%
==========================================
Files 11 11
Lines 1532 1489 -43
==========================================
- Hits 528 463 -65
- Misses 935 968 +33
+ Partials 69 58 -11
Continue to review full report at Codecov.
|
Nitpick: You can give @adityapk00 formal credit for a co-authored commit (so it's shown in the GitHub UI and recognized by other automated things) by using Co-authored-by in the commit message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly done with this review, I just need to think a little bit more about reorgs, posting what I have now.
frontend/service.go
Outdated
@@ -29,93 +25,113 @@ var ( | |||
|
|||
// the service type | |||
type SqlStreamer struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having Sql
/ SQL
in things' names will be confusing since there's no longer an SQL db. I suggest renaming.
frontend/service.go
Outdated
"strconv" | ||
"strings" | ||
"time" | ||
|
||
"github.com/btcsuite/btcd/rpcclient" | ||
"github.com/golang/protobuf/proto" | ||
|
||
// blank import for sqlite driver support | ||
_ "github.com/mattn/go-sqlite3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this dependency?
frontend/service.go
Outdated
} | ||
return &walletrpc.RawTransaction{Data: txBytes}, nil | ||
return &walletrpc.RawTransaction{Data: txBytes, Height: uint64(txHeight)}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case txf.Block.Hash == nil
and txf.Hash == nil
, this will return an all-zero txBytes and zero txHeight. The comment in the old code says it used to return the genesis coinbase tx. We should either preserve that behavior or return an error (my preference to return an error unless something's actually relying on the old behavior).
return nil, errors.Wrap(err, fmt.Sprintf("getting tx (%x, %d)", blockHash, txIndex)) | ||
} | ||
return txBytes, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay for deleting code!
} | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need new tests to replace these but this PR need not block on that, it's tracked here: #146
if m != nil { | ||
return m.Height | ||
} | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this lead to some kind of confusion where some other code thinks a nil
RawTransaction
was actually mined at height 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is machine-generated code, and there is no way to indicate an "invalid value" within type uint64
. If height had type int64
, then we could use -1 to mean invalid value. But that would likely be an invasive change. Let's see if it causes problems. I think height zero having special meaning is okay (no one will ever really want to do anything with the genesis block).
cmd/server/main.go
Outdated
flag.StringVar(&opts.tlsCertPath, "tls-cert", "./cert.pem", "the path to a TLS certificate") | ||
flag.StringVar(&opts.tlsKeyPath, "tls-key", "./cert.key", "the path to a TLS key file") | ||
flag.Uint64Var(&opts.logLevel, "log-level", uint64(logrus.InfoLevel), "log level (logrus 1-7)") | ||
flag.StringVar(&opts.logPath, "log-file", "./server.log", "log file to write to") | ||
flag.StringVar(&opts.zcashConfPath, "conf-file", "./zcash.conf", "conf file to pull RPC creds from") | ||
flag.BoolVar(&opts.veryInsecure, "very-insecure", false, "run without the required TLS certificate, only for debugging, DO NOT use in production") | ||
flag.BoolVar(&opts.noTLS, "no-tls", false, "Disable TLS, serve un-encrypted traffic.") // XXX redundant with -veryInsecure, remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this new argument had the same effect as veryInsecure
had (my bad for the confusing name). We still want it to be really obvious that not using TLS is very insecure, so I suggest removing veryInsecure
and renaming no-tls
's option to no-tls-very-insecure
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these have the same effect; what happened was that the same enhancement was made to the two forks (ours and Aditya's) independently. I intentionally allowed both options because I wanted our version of lightwalletd to be backward compatible with Aditya's version, at least initially.
But I like your suggestion, and having both is one of those things that's easy to forget about (since everything is working), and then will be much harder to fix later. I'll make your suggested change. I can coordinate with Aditya so he can make the required adjustment to his code when he switches over to using our lightwalletd.
cmd/server/main.go
Outdated
|
||
stopChan := make(chan bool, 1) | ||
|
||
// Start the block cache importer at latestblock - 100k(cache size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is out of date, the default value is 40,000. I suggest just: latestblock - cache size
.
common/cache.go
Outdated
|
||
// Don't allow out-of-order blocks. This is more of a sanity check than anything | ||
// If there is a reorg, then the ingestor needs to handle it. | ||
if c.m[height-1] != nil && !bytes.Equal(block.PrevHash, c.m[height-1].hash) { |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
10fccc4
to
65bd2c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if m != nil { | ||
return m.Height | ||
} | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is machine-generated code, and there is no way to indicate an "invalid value" within type uint64
. If height had type int64
, then we could use -1 to mean invalid value. But that would likely be an invasive change. Let's see if it causes problems. I think height zero having special meaning is okay (no one will ever really want to do anything with the genesis block).
cmd/server/main.go
Outdated
flag.StringVar(&opts.tlsCertPath, "tls-cert", "./cert.pem", "the path to a TLS certificate") | ||
flag.StringVar(&opts.tlsKeyPath, "tls-key", "./cert.key", "the path to a TLS key file") | ||
flag.Uint64Var(&opts.logLevel, "log-level", uint64(logrus.InfoLevel), "log level (logrus 1-7)") | ||
flag.StringVar(&opts.logPath, "log-file", "./server.log", "log file to write to") | ||
flag.StringVar(&opts.zcashConfPath, "conf-file", "./zcash.conf", "conf file to pull RPC creds from") | ||
flag.BoolVar(&opts.veryInsecure, "very-insecure", false, "run without the required TLS certificate, only for debugging, DO NOT use in production") | ||
flag.BoolVar(&opts.noTLS, "no-tls", false, "Disable TLS, serve un-encrypted traffic.") // XXX redundant with -veryInsecure, remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these have the same effect; what happened was that the same enhancement was made to the two forks (ours and Aditya's) independently. I intentionally allowed both options because I wanted our version of lightwalletd to be backward compatible with Aditya's version, at least initially.
But I like your suggestion, and having both is one of those things that's easy to forget about (since everything is working), and then will be much harder to fix later. I'll make your suggested change. I can coordinate with Aditya so he can make the required adjustment to his code when he switches over to using our lightwalletd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting one small change to prevent the cache from entering weird states. Also see the message I'm about to send on Slack.
README.md
Outdated
|
||
## To run SERVER | ||
|
||
Assuming you used `make` to build SERVER: | ||
|
||
``` | ||
./server --very-insecure=true --conf-file /home/zcash/.zcash/zcash.conf --db-path /db/sql.db --log-file /logs/server.log --bind-addr 127.0.0.1:18232 | ||
./server --very-insecure=true --conf-file /home/zcash/.zcash/zcash.conf --log-file /logs/server.log --bind-addr 127.0.0.1:18232 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be --no-tls-very-insecure
common/common.go
Outdated
case <-time.After(15 * time.Second): | ||
for { | ||
if reorgCount > 0 { | ||
height -= 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can set height
to a value less than startHeight
. If this happens, then the cache can contain invalid blocks, because the cache's Add
function only clears higher-height cached blocks if the block being added falls within the range of cached blocks (if height < startHeight
, then it won't fall in range, so the reorged-away blocks don't get cleared). We can do: height = Max(height - 10, startHeight);
here to prevent this.
delete(c.m, i) | ||
} | ||
c.LastBlock = height - 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should fail with an error unless either height >= c.FirstBlock && height <= c.LastBlock
or height = c.LastBlock + 1
. This will make sure the cache doesn't reach any of several weird states:
- Some
nil
blocks in the cache (currently could be reached by adding a block at height > c.LastBlock + 1). - Blocks from a mixture of reorg forks (currently could be reached by adding a block at height < c.FirstBlock; the higher-height blocks won't be deleted)
- Blocks where the PrevHash doesn't match the previous block's hash.
AFAICT enforcing this constraint won't break the ingestor code (except for the edge case about height < startHeight
I left a comment about there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made many changes to this function, which together should solve all these problems (but some in a different way). The cache should be very resilient to anything the caller (block ingestor thread) can do -- not just what it does today, but anything it may do in the future (there should be no assumptions about the behavior of the caller). It should not be possible for any extra blocks to be present in the cache (a type of memory leak)
As a side-effect, the code is much simpler; there is much less dependency on these state variables (FirstBlock
and LastBlock
).
Reviewers, please give this a careful look. I will write a good test but I haven't done that yet.
Makefile
Outdated
# Start the lightwalletd ingester in the zcashdlwd container | ||
docker_img_run_lightwalletd_ingest: | ||
docker exec -i zcashdlwd ingest --conf-file /home/zcash/.zcash/zcash.conf --db-path /db/sql.db --log-file /logs/ingest.log | ||
|
||
# Start the lightwalletd server in the zcashdlwd container | ||
docker_img_run_lightwalletd_insecure_server: | ||
docker exec -i zcashdlwd server --very-insecure=true --conf-file /home/zcash/.zcash/zcash.conf --db-path /db/sql.db --log-file /logs/server.log --bind-addr 127.0.0.1:18232 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be changed to --no-tls-very-insecure
.
var txBytes []byte | ||
var err error | ||
var txHeight float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be uint64
(it gets cast to uint64 right away anyway)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried changing that variable's type to uint64
(and int
), and made the corresponding change to where it's assigned, and the result was a type assertion:
$ go run cmd/server/main.go -cache-size 4 -no-tls-very-insecure -conf-file ~/.zcash/zcash.conf
panic: interface conversion: interface {} is float64, not uint64
(...)
exit status 2
$
Then I found that this code is correct; this value comes from unmarshalling a JSON string, and as stated here: https://golang.org/pkg/encoding/json/#Unmarshal
float64, for JSON numbers
So, in JSON, all numbers are (or can be) floating point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense, thanks!
3f05749
to
da4a963
Compare
# Use lightwallet server and ingest binaries from prior layer | ||
COPY --from=lightwalletd_base /usr/bin/ingest /usr/bin/server /usr/bin/ | ||
# Use lightwallet server binary from prior layer | ||
COPY --from=lightwalletd_base /usr/bin/server /usr/bin/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test these changes to Dockerfile
@@ -75,7 +75,7 @@ docker_img_stop_zcashd: | |||
|
|||
# Start the lightwalletd server in the zcashdlwd container | |||
docker_img_run_lightwalletd_insecure_server: | |||
docker exec -i zcashdlwd server --very-insecure=true --conf-file /home/zcash/.zcash/zcash.conf --db-path /db/sql.db --log-file /logs/server.log --bind-addr 127.0.0.1:18232 | |||
docker exec -i zcashdlwd server --no-tls-very-insecure=true --conf-file /home/zcash/.zcash/zcash.conf --log-file /logs/server.log --bind-addr 127.0.0.1:18232 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test this yet
#rm -f $(PROJECT_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for some reason this was missing a newline)
} | ||
|
||
// (this first RPC also verifies that we can communicate with zcashd) | ||
saplingHeight, blockHeight, chainName, branchID := common.GetSaplingInfo(rpcClient, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can no longer fail (if this call fails, the app exits)
@@ -244,8 +237,7 @@ func main() { | |||
log.WithFields(logrus.Fields{ | |||
"signal": s.String(), | |||
}).Info("caught signal, stopping gRPC server") | |||
// Stop the block ingestor | |||
stopChan <- true | |||
os.Exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for clean shutdown (because there is no longer any persistent state, no db).
@@ -48,7 +53,7 @@ func GetSaplingInfo(rpcClient *rpcclient.Client) (int, int, string, string, erro | |||
consensus := f.(map[string]interface{})["consensus"] | |||
branchID := consensus.(map[string]interface{})["nextblock"].(string) | |||
|
|||
return int(saplingHeight), int(blockHeight), chainName, branchID, nil | |||
return int(saplingHeight), int(blockHeight), chainName, branchID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return has been removed (function will exit the entire app if it fails, see call to log.Fatal
above).
common/common.go
Outdated
@@ -175,18 +161,15 @@ func GetBlock(rpcClient *rpcclient.Client, cache *BlockCache, height int) (*wall | |||
return block, nil | |||
} | |||
|
|||
// If a block was not found, make sure user is requesting a historical block | |||
if height > cache.GetLatestBlock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This situation is now handled in a more robust way. One interesting difference is that if a GetBlock
api request arrives while lightwalletd is syncing (retrieving and caching the most recent 40k or however many blocks), for a block height that's more recent than the current height being synced, the old behavior is to return an error here. The lightwalletd client must try again later.
The new code, upon not finding the block in the cache, will just go ahead and call getBlockFromRPC()
, bypassing the cache, and return the correct results immediately. So, with this change, it's safe to start using lightwalletd immediately after it starts, reducing the effective sync time to zero.
|
||
// TODO these are called Error but they aren't at the moment. | ||
// A success will return code 0 and message txhash. | ||
return &walletrpc.LightdInfo{ | ||
Version: "0.2.0", | ||
Version: "0.2.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this right, no functionality changes, just hardness and bug fixes.
} | ||
} | ||
// We're up to date in our polling; wait for a new block | ||
time.Sleep(10 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire function should better handle zcashd
disappearing or not responding for any reason.
result, rpcErr := rpcClient.RawRequest("getblockchaininfo", make([]json.RawMessage, 0)) | ||
if rpcErr == nil { | ||
if retryCount > 0 { | ||
log.Warn("getblockchaininfo RPC successful") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if we weren't successful on our first attempt (retryCount > 0
), that means we logged some errors... so now that we're finally successful, let's indicate that in the log file (otherwise it's unclear that we're okay now). We could log this every time (not just if retryCount > 0
), but it's really only confusing not to if we've had earlier errors.
da4a963
to
e1e9873
Compare
cmd/server/main.go
Outdated
flag.BoolVar(&opts.wantVersion, "version", false, "version (major.minor.patch)") | ||
flag.IntVar(&opts.cacheSize, "cache-size", 40000, "number of blocks to hold in the cache") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least double this default to 80,000 with blossom
common/common.go
Outdated
|
||
// Check for reorgs once we have inital block hash from startup | ||
if reorg { | ||
height -= 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be just -1 or -2. Re-reading 10 blocks for each re-org seems excessive.
} | ||
func (s *LwdStreamer) GetAddressTxids(addressBlockFilter *walletrpc.TransparentAddressBlockFilter, resp walletrpc.CompactTxStreamer_GetAddressTxidsServer) error { | ||
params := make([]json.RawMessage, 1) | ||
st := "{\"addresses\": [\"" + addressBlockFilter.Address + "\"]," + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@defuse Identified a security problem here where the address field could contain multiple addresses, and cause a DoS attack.
See his fix here: https://github.com/adityapk00/lightwalletd/pull/4/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comment I just wrote on that PR: adityapk00#4 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I just cherry-picked this commit: adityapk00@cc30455 (but I didn't include the later correction to replace the ^
and $
from the pattern because I don't think that's necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment on that PR (adityapk00#4 (comment)) saying \A
and \z
make it more obvious nothing's wrong, but this PR doesn't have to block on that.
e1e9873
to
1a9b3b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One bug to fix in the cache Add
function, other comments are non-blocking.
go.mod
Outdated
@@ -3,20 +3,17 @@ module github.com/zcash-hackworks/lightwalletd | |||
go 1.12 | |||
|
|||
require ( | |||
github.com/adityapk00/lightwalletd v0.0.0-20191204220227-18e8f8490371 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency on aditya's fork came back
height -= 10 | ||
reorgCount++ | ||
if reorgCount > 10 { | ||
log.Fatal("Reorg exceeded max of 100 blocks! Help!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the function returned in this case. Should it still do that now? Edit: oh wait, log.Fatal
will exit the whole program right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the program will exit.
// must have occurred; these must be re-added | ||
for i := height; i <= c.LastBlock; i++ { | ||
delete(c.m, i) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'm convinced this new approach ensures there's never any reorged-away blocks in the cache. A couple issues remain:
If there's a reorg detected or the JSON unmarshalling fails below, then the function will return after this loop has deleted from height
up to c.LastBlock
, but c.LastBlock
won't have been updated (leaving nil blocks in the cache). The same is also true of c.FirstBlock
, calling this with a really low height will empty the cache but c.FirstBlock
won't be updated if the function exits through those returns. We should make sure the cache is in a valid state immediately after this loop, which probably means something like this:
if height <= c.FirstBlock {
// the entire cache was just emptied
c.FirstBlock = -1;
c.LastBlock = -1;
// maybe assert that c.m is really empty here just in case
} else {
// height > c.FirstBlock, so we know there's still a block at height - 1 in the cache
c.LastBlock = height - 1;
}
(Please double-check this is correct since I haven't fully thought it through)
It's also still possible to add blocks at height
> c.LastBlock
+ 1
to get the cache into a state containing nil
blocks (I think this case just needs to be explicitly checked for).
17aca79
to
66e22da
Compare
I added a reorg unit test, and addressed the review comments |
@LarryRuane I checked out that branch and I'm having issues running make
|
I pulled latest changes and was able to run make. I checked that upon connecting to a yet not initialized zcashd, lwd retries and finally starts syncing.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK, awesome reorg test!
Looks good to me! |
Co-authored-by: Aditya Kulkarni <[email protected]>
a9a9064
to
758cdec
Compare
Thank you for the reviews! I'll squash; to make sure I don't make a mistake and change anything, here's tree hash before:
(Tree hash depends only on the code, not the commit comments plus code, as commit hash does.) It's the same after squashing:
It's also the same after pushing to github:
|
This PR is based on #148 (so until that PR gets merged, this PR includes its commits), so only review the most recent commit.