Skip to content

Tracing: Introduction

Martin Holst Swende edited this page Jun 1, 2018 · 21 revisions

There are two different types of transactions in Ethereum: plain value transfers and contract executions. A plain value transfer just moves Ether from one account to another and as such is uninteresting from this guide's perspective. If however the recipient of a transaction is a contract account with associated EVM (Ethereum Virtual Machine) bytecode – beside transferring any Ether – the code will also be executed as part of the transaction.

Having code associated with Ethereum accounts permits transactions to do arbitrarilly complex data storage and enables them to act on the previously stored data by further transacting internally with outside accounts and contracts. This creates an intertwined ecosystem of contracts, where a single transaction can interact with tens or hunderds of accounts.

The downside of contract execution is that it is very hard to say what a transaction actually did. A transaction receipt does contain a status code to check whether execution succeeded or not, but there's no way to see what data was modified, nor what external contracts where invoked. In order to introspect a transaction, we need to trace its execution.

Tracing prerequisites

In its simplest form, tracing a transaction entails requesting the Ethereum node to reexecute the desired transaction with varying degrees of data collection and have it return the aggregated summary for post processing. Reexecuting a transaction however has a few prerequisites to be met.

In order for an Ethereum node to reexecute a transaction, it needs to have available all historical state accessed by the transaction:

  • Balance, nonce, bytecode and storage of both the recipient as well as all internally invoked contracts.
  • Block metadata referenced during execution of both the outer as well as all internally created transactions.
  • Intermediate state generated by all preceding transactions contained in the same block as the one being traced.

Depending on your node's mode of synchronization and pruning, different configurations result in different capabilities:

  • An archive node retaining all historical data can trace arbitrary transactions at any point in time. Tracing a single transaction also entails reexecuting all preceding transactions in the same block.
  • A fast synced node retaining all historical data after initial sync can only trace transactions from blocks following the initial sync point. Tracing a single transaction also entails reexecuting all preceding transactions in the same block.
  • A fast synced node retaining only periodic state data after initial sync can only trace transactions from blocks following the initial sync point. Tracing a single transaction entails reexecuting all preceding transactions both in the same block, as well as all preceding blocks until the previous stored snapshot.
  • A light synced node retrieving data on demand can in theory trace transactions for which all required historical state is readily available in the network. In practice, data availability is not a feasible assumption.

There are exceptions to the above rules when running batch traces of entire blocks or chain segments. Those will be detailed later.

Basic traces

The simplest type of transaction trace that go-ethereum can generate are raw EVM opcode traces. For every VM instruction the transaction executes, a structured log entry is emitted, containing all contextual metadata deemed useful. This includes the program counter, opcode name, opcode cost, remaining gas, execution depth and any occurred error. The structured logs can optionally also contain the content of the execution stack, execution memory and contract storage.

An example log entry for a single opcode looks like:

{
	"pc":      48,
	"op":      "DIV",
	"gasCost": 5,
	"gas":     64532,
	"depth":   1,
	"error":   null,
	"stack": [
		"00000000000000000000000000000000000000000000000000000000ffffffff",
		"0000000100000000000000000000000000000000000000000000000000000000",
		"2df07fbaabbe40e3244445af30759352e348ec8bebd4dd75467a9f29ec55d98d"
	],
	"memory": [
		"0000000000000000000000000000000000000000000000000000000000000000",
		"0000000000000000000000000000000000000000000000000000000000000000",
		"0000000000000000000000000000000000000000000000000000000000000060"
	],
	"storage": {
	}
}

The entire output of an raw EVM opcode trace is a JSON object having a few metadata fields: consumed gas, failure status, return value; and a list of opcode entries that take the above form:

{
	"gas":         25523,
	"failed":      false,
	"returnValue": "",
	"structLogs":  []
}

Generating basic traces

To generate a raw EVM opcode trace, go-ethereum provides a few RPC API endpoints, out of which the most commonly used is debug_traceTransaction.

In its simplest form, traceTransaction accepts a transaction hash as its sole argument, traces the transaction, aggregates all the generated data and returns it as a large JSON object. A sample invocation from the Geth console would be:

debug.traceTransaction("0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f")

The same call can of course be invoked from outside the node too via HTTP RPC. In this case, please make sure the HTTP endpoint is enabled via --rpc and the debug API namespace exposed via --rpcapi=debug.

$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f"]}' localhost:8545

Running the above operation on the Rinkeby network (with a node retaining enough history) will result in this trace dump.

Tuning basic traces

By default the raw opcode tracer emits all relevant events that occur within the EVM while processing a transaction, such as EVM stack, EVM memory and updated storage slots. Certain use cases however may not need some of these data fields reported. To cater for those use cases, these massive fields may be omitted using a second options parameter for the tracer:

{
	"disableStack": true,
	"disableMemory": true,
	"disableStorage": true
}

Running the previous tracer invocation from the Geth console with the data fields disabled:

debug.traceTransaction("0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {disableStack: true, disableMemory: true, disableStorage: true})

Analogously running the filtered tracer from outside the node too via HTTP RPC:

$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {"disableStack": true, "disableMemory": true, "disableStorage": true}]}' localhost:8545

Running the above operation on the Rinkeby network will result in this significantly shorter trace dump.

Limits of basic traces

Although the raw opcode traces we've generated above have their use, this basic way of tracing is problematic in the real world. Having an individual log entry for every single opcode is too low level for most use cases, and will require developers to create additional tools to post-process the traces. Additionally, a full opcode trace can easily go into the hundreds of megabytes, making them very resource intensive to get out of the node and process externally.

To avoid all of the previously mentioned issues, go-ethereum supports running custom JavaScript tracers within the Ethereum node, which have full access to the EVM stack, memory and contract storage. This permits developers to only gather the data they need, and do any processing at the data. Please see the next section for our custom in-node tracers.

Pruning

Geth by default does in-memory pruning of state, discarding state entries that it deems is no longer necessary to maintain. This is configured via the --gcmode option. Often, people run into the error that state is not available.

Say you want to do a trace on block B. Now there are a couple of cases:

  1. You have done a fast-sync, pivot block P where P <= B.
  2. You have done a fast-sync, pivot block P where P > B.
  3. You have done a full-sync, with pruning
  4. You have done a full-sync, without pruning (--gcmode=archive)

Here's what happens in each respective case:

  1. Geth will regenerate the desired state by replaying blocks from the closest point in time before B where it has full state. This defaults to 128 blocks max, but you can specify more in the actual call ... "reexec":1000 .. } to the tracer.
  2. Sorry, can't be done without replaying from genesis.
  3. Same as 1)
  4. Does not need to replay anything, can immediately load up the state and serve the request.

There is one other option available to you, which may or may not suit your needs. That is to use Evmlab.

docker pull holiman/evmlab && docker run -it holiman/evmlab

There you can use the reproducer. The reproducer will incrementally fetch data from infura until it has all the information required to create the trace locally on an evm which is bundled with the image. It will create a custom genesis containing the state that the transaction touches (balances, code, nonce etc). It should be mentioned that the evmlab reproducer is strictly guaranteed to be totally exact with regards to gascosts incurred by the outer transaction, as evmlab does not fully calculate the gascosts for nonzero data etc, but is usually sufficient to analyze contracts and events.

Clone this wiki locally