Skip to content

WIP: new transaction index #32063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

omerfirmak
Copy link
Contributor

@omerfirmak omerfirmak commented Jun 19, 2025

This PR adds a lot of contextual data to be stored in transaction index entry to help with speeding up tx-hash-based RPC queries.

First commit only moves some code around, so the actual changes are in the second commit.

@rjl493456442
Copy link
Member

I personally against this change. Reason:

The transaction index is quite heavy. We've chosen to retain indexes for only the last 2,350,000 blocks, yet it’s still substantial with the current format, 40 bytes per entry (32-byte key and 8-byte value).

| Key-Value store       | Transaction index         | 13.39 GiB  |  388446872 |

With this change, the size of the transaction index will increase significantly. The value size is approximately 110 bytes, which would add around 36.9 GB for the past 2,350,000 blocks. EIP-4444 won't mitigate this issue. The situation is even worse for archive nodes, which store all transaction indexes.

type TxIndex struct {
	Type              uint8  // Transaction Type
	Nonce             uint64 // Transaction Nonce
	BlockNumber       uint64
	BlockHash         common.Hash
	BlockTime         uint64
	BaseFee           *big.Int
	TxIndex           uint32
	Sender            common.Address
	EffectiveGasPrice *big.Int
	GasUsed           uint64
	LogIndex          uint32
	To                *common.Address `rlp:"optional"`
}

@omerfirmak
Copy link
Contributor Author

Yeah, I'm not sure if this is worth it either. Lets discuss in the call today.

@omerfirmak
Copy link
Contributor Author

omerfirmak commented Jun 20, 2025

So, here is the breakdown of total CPU time spent (with this PR merged) on parts of the "eth_getTransactionReceipt" RPC over ~400 queries.

'GetTransactionReceipt': {
    'GetTransaction':  {
        'ReadTxLookupEntry': 3.032301, // %53.4
        'findTxInBlockBody': 0.233809, // %4
        'ReadBodyRLP': 0.127024, // %2
        'ReadCanonicalHash': 0.021332
    } = 3.483035, 
    'GetReceiptByIndex': {
        'ReadRawReceipt': {
            'ReadReceiptsRLP': 1.207674, // %21.2
            'Walked': 0.011179, 
            'Parsed': 0.010929
        } = 1.289601,
        'DeriveFields': 0.016529, 
        'GetHeader': 0.006045
    } = 1.336599, 
    'HeaderByHash': 0.771045, // %13
    'Marshal': 0.085203
} = 5.675882

We cannot get rid of some of these. For example ReadReceiptsRLP and ReadTxLookupEntry are not possible to avoid and they make up almost %75 of the runtime. So, the best we can do is %25 reduction in runtime. A breakdown of where this reduction might come from:

// TxIndex is the collection of data that is stored per transaction for
// speeding up hashed based lookups
type TxIndex struct {
    // Allows us to avoid finding the tx in block body (69 bytes) (~ 27GB per year)
    // This gives us ~%7 speed up
	Type              uint8  // Transaction Type
	Nonce             uint64 // Transaction Nonce
	To                *common.Address `rlp:"optional"`
	BlobGas           uint64          `rlp:"optional"`
	TxIndex           uint32
	Sender            common.Address
	EffectiveGasPrice *big.Int

    // Allows us to avoid reading the header (64 bytes) (~ 25GB per year)
    // Gives us ~%13 speed up
	BlockNumber       uint64
	BlockHash         common.Hash
	BlockTime         uint64
	BaseFee           *big.Int
	BlobGasPrice      *big.Int        `rlp:"optional"`

    // Allows us to avoid decoding all the receipts (12 bytes) (~ 5GB per year)
    // Walking the receipts is cheap, gives pretty much nothing
	GasUsed           uint64
	LogIndex          uint32
}

I don't find any of these additions to be worth it. Assuming that most of the eth_getTransactionReceipt requests are coming for txs that are recently included, we should look at better caching mechanisms to avoid reading from the disk at the tip and let the requests coming for old transactions pay the reading from disk penalty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants