-
Notifications
You must be signed in to change notification settings - Fork 21
chain validation and fix command #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
d0eb75b
to
7ea3dda
Compare
7ea3dda
to
bebe17e
Compare
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
8769ad1
to
0ffae31
Compare
WalkthroughThis update introduces a comprehensive blockchain data validation and repair framework. It adds new CLI commands for validating and fixing blockchain data, implements detection and removal of duplicate records, and provides mechanisms for identifying and correcting missing or invalid blocks. The changes span command definitions, storage interfaces, validation logic, and supporting utilities, with significant additions to the validation and orchestrator modules. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant RPCClient
participant Storage
participant ClickHouse
participant Validator
participant Cursor
User->>CLI: Run validateAndFix command
CLI->>RPCClient: Initialize with URL
CLI->>Storage: Initialize connector
CLI->>ClickHouse: Open DB connection
CLI->>Validator: NewValidator(RPCClient, Storage)
CLI->>Cursor: InitCursor(chainId, Storage)
loop For each batch in block range
CLI->>Validator: validateAndFixRange(start, end, fixBatchSize)
Validator->>ClickHouse: FindAndRemoveDuplicates(chainId, start, end)
Validator->>Validator: FindAndFixGaps(start, end)
Validator->>Validator: ValidateBlockRange(start, end)
alt Invalid blocks found
Validator->>Validator: FixBlocks(invalidBlocks, fixBatchSize)
end
CLI->>Cursor: Update progress
end
sequenceDiagram
participant User
participant CLI
participant RPCClient
participant Storage
participant Validator
User->>CLI: Run validate command
CLI->>RPCClient: Initialize with URL
CLI->>Storage: Initialize connector
CLI->>Validator: NewValidator(RPCClient, Storage)
CLI->>Validator: ValidateBlockRange(start, end)
Validator->>Storage: GetValidationBlockData(chainId, start, end)
Validator->>Validator: ValidateBlocks(blocks)
Validator->>CLI: Output validation results
Estimated code review effort4 (~90 minutes) ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (1)
internal/common/trace.go (1)
60-78
:⚠️ Potential issue
BlockTimestamp
is never serialized
TraceModel
includes aBlockTimestamp
field (uint64) butSerialize()
omits it. Downstream consumers will receive zero values.- BlockHash: t.BlockHash, + BlockHash: t.BlockHash, + BlockTimestamp: uint64(t.BlockTimestamp.Unix()),Please verify the desired unit (seconds vs milliseconds) and update tests accordingly.
♻️ Duplicate comments (2)
internal/validation/cursor.go (1)
37-50
: Consider making the cursor file location configurable.The current implementation writes cursor files to the working directory, which might be problematic in containerized environments with restricted filesystem access.
+// CursorFilePath is the directory where cursor files are stored +var CursorFilePath string = "." + func (c *Cursor) Update(blockNumber *big.Int) error { - cursorFile := fmt.Sprintf("validation_cursor_%s.json", c.ChainId.String()) + cursorFile := fmt.Sprintf("%s/validation_cursor_%s.json", CursorFilePath, c.ChainId.String())The same change would need to be applied to the
getLastScannedBlock
function.internal/validation/validation.go (1)
163-166
:isContractCreation
misses the “empty string” caseA contract-creation tx coming from
go-ethereum
hasTo == nil
, which your DB
may serialise as an empty string. The current check only treats the all-zero
address as creation:isContractCreation := tx.ToAddress == "0x0000…0000" && tx.Data != "0x"Consider:
isContractCreation := (tx.ToAddress == "" || - tx.ToAddress == "0x0000000000000000000000000000000000000000") && tx.Data != "0x" + tx.ToAddress == "0x0000000000000000000000000000000000000000") && + len(tx.Data) > 2 // guards against meaningless “0x”
🧹 Nitpick comments (5)
cmd/validate.go (2)
78-86
: Harden TLS configurationThe explicit
tls.Config{}
inherits Go’s default (TLS 1.2 for clients). For a CLI tool that only talks to a modern ClickHouse server you can safely raise the floor:- TLS: &tls.Config{}, + TLS: &tls.Config{ + MinVersion: tls.VersionTLS13, + },Not critical but improves security posture.
50-52
: Off-by-one adjustment is opaque
batchSize = batchSize - 1
relies on readers knowing the inclusive semantics later. Consider renaming the variable or adding a short comment to avoid future errors.// range end is inclusive, adjust once so we fetch exactly N blocks batchSizeInclusive := new(big.Int).Sub(batchSize, big.NewInt(1))internal/validation/validation.go (1)
46-54
: *Misleading debug message when printing big.Int values
log.Debug().Msgf("Processing batch of blocks %d to %d", …)
prints the decimal
representation of the pointers, not the block numbers, because%d
formats
a numeric value, not an object implementingStringer
.
Prefer%s
(or%v
) to leverage(*big.Int).String()
:- log.Debug().Msgf("Processing batch of blocks %d to %d", invalidBlocks[i], invalidBlocks[end-1]) + log.Debug().Msgf("Processing batch of blocks %s to %s", + invalidBlocks[i].String(), invalidBlocks[end-1].String())internal/validation/db.go (1)
120-124
: Potential goroutine leak if one query errors earlyEach goroutine writes exactly once to its channel, but if an early
return
occurs before the send, the receiver blocks forever. Although you currently
return
only after sending, future edits could break the guarantee.A safer pattern is to close the channel once the value is sent or use a
sync.WaitGroup
.
Example with a WaitGroup:var ( blocksResult blockResult … ) wg.Add(3) go func() { defer wg.Done(); blocksResult.blocks, blocksResult.err = … }() … wg.Wait()internal/validation/duplicates.go (1)
161-199
:removeDuplicateTransactions
may exceed ClickHouse parameter limitsThe batched deletion is great, but each loop aggregates all hashes in one
IN
clause. If a batch nears 1000 items, ClickHouse can hit the default
max_query_size
. Consider:
- Lowering
batchSize
(e.g., 100).- Or chunking
hashes
too: execute the same query multiple times with
smaller slices.This keeps memory and network utilisation predictable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (11)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/trace.go
(3 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/validation/cursor.go (1)
internal/storage/connector.go (1)
IStorage
(67-71)
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
cmd/validate.go
15-15: could not import github.com/thirdweb-dev/indexer/internal/validation (-: # github.com/thirdweb-dev/indexer/internal/validation
internal/validation/gaps.go:27:14: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
internal/validation/validation.go:56:15: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int))
(typecheck)
internal/validation/cursor.go
1-1: : # github.com/thirdweb-dev/indexer/internal/validation
internal/validation/gaps.go:27:14: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
internal/validation/validation.go:56:15: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
(typecheck)
🪛 GitHub Check: Test
internal/validation/validation.go
[failure] 56-56:
not enough arguments in call to poller.Poll
internal/validation/gaps.go
[failure] 27-27:
not enough arguments in call to poller.Poll
🪛 GitHub Actions: Go Unit Tests
internal/validation/gaps.go
[error] 27-27: not enough arguments in call to poller.Poll
🔇 Additional comments (12)
internal/validation/cursor.go (3)
12-16
: Well-structured cursor type for tracking validation progress.The
Cursor
struct provides a good foundation for tracking blockchain validation progress with appropriate fields for maintaining state across validation runs.
18-35
: LGTM - Good error handling and initialization logic.The
InitCursor
function properly initializes a cursor by retrieving the last scanned block and ensuring it doesn't exceed the maximum block number. The null check on line 24 formaxBlockNumber
provides good defensive programming.
52-72
: Good error handling for file operations.The implementation properly handles cases where the cursor file doesn't exist or contains invalid data by defaulting to block 0, allowing the validation to start from the beginning.
go.mod (3)
24-117
: Numerous indirect dependency updates.The PR includes many indirect dependency updates, including new CockroachDB packages, blockchain cryptography libraries, and Go standard library updates. These support the new validation functionality while keeping dependencies current.
7-10
:❓ Verification inconclusive
Significant dependency update for core Ethereum library.
The go-ethereum library has been updated from v1.14.8 to v1.15.11, which is a substantial version jump. Additionally,
holiman/uint256
has been moved from an indirect to a direct dependency, which aligns with the type changes in the serializer.
🏁 Script executed:
#!/bin/bash # Check for breaking changes in the updated go-ethereum version rg -A 5 -B 5 "github.com/ethereum/go-ethereum" --type goLength of output: 7911
Review go-ethereum v1.15.11 & holiman/uint256 Upgrade
The bump from v1.14.8→v1.15.11 spans multiple core packages and could introduce breaking changes to RPC clients, common types, trie handling, etc. We also promoted holiman/uint256 to a direct dependency to match updated serializers.
Please verify:
- All imports of
github.com/ethereum/go-ethereum
(shown below) still compile and behave as expected.- No changed function signatures or removed APIs in v1.15.x affect your code paths.
- The bump in holiman/uint256 aligns with any type conversions in serializers.
Affected files (imports of go-ethereum packages):
- internal/validation/validation.go
- internal/rpc/rpc.go
- internal/rpc/params.go
- internal/rpc/batcher.go
- internal/storage/clickhouse.go
- internal/middleware/authorization.go
- internal/common/transaction.go, abi.go, log.go
- internal/handlers/transactions_handlers.go, logs_handlers.go
- internal/common/log_test.go, transaction_test.go
Actions:
- Review the go-ethereum v1.15.11 release notes for breaking changes.
- Run the full test suite (unit + integration) against a node with v1.15.x.
- Ensure all ABI/account/rpc calls still serialize/deserialize correctly.
3-3
:❌ Incorrect review comment
Verify Go version compatibility.
The Go version has been updated to 1.23.0, which appears to be a future version since the current stable version at the time of this review is 1.22.x.
🌐 Web query:
What is the latest stable version of Go available in May 2025?
💡 Result:
As of May 22, 2025, the latest stable version of Go is 1.24.3, released on April 30, 2025. (go.dev) This release includes security fixes to the
os
package, as well as bug fixes to the runtime, compiler, linker,go
command, and thecrypto/tls
andos
packages. (golang.es)Citations:
- 1: https://go.dev/VERSION?utm_source=openai
- 2: https://golang.es/doc/devel/release?utm_source=openai
Incorrect Go version assumption in go.mod
The requirement
go 1.23.0
is not a future release. As of May 22, 2025 the latest stable Go is 1.24.3 (released April 30, 2025), and Go 1.23.0 was already available prior to that. Yourgo.mod
entry is valid.If you need fixes or features from the newest release, you may optionally bump to:
- go 1.23.0 + go 1.24• File: go.mod (line 3)
Likely an incorrect or invalid review comment.
cmd/root.go (1)
201-201
: Good integration of the new validation command.The new
validateCmd
is properly added to the root command, following the same pattern as existing commands.internal/rpc/serializer.go (2)
382-383
: Type optimization for gas values.Changing from
big.Int
touint64
for gas values is a good optimization as Ethereum gas values typically fit comfortably within uint64 range, reducing memory usage and improving performance.
402-411
:✅ Verification successful
Type change for trace addresses.
The type change from
uint64
toint64
for trace addresses indicates a design decision to allow for potential negative indices or to align with other systems' expectations.
🏁 Script executed:
#!/bin/bash # Check if there are corresponding changes in other files related to trace address types rg -A 3 -B 3 "TraceAddress.*\[\]int64" --type goLength of output: 1593
serializeTraceAddress type change is consistent
serializeTraceAddress now returns []int64, which matches the TraceAddress field defined as []int64 in internal/common/trace.go. No further updates are required.
- internal/common/trace.go:
TraceAddress []int64
confirmed.internal/storage/clickhouse.go (2)
375-377
: Gas fields handled correctly after type change – good job
trace.Gas
andtrace.GasUsed
are now inserted directly asuint64
, matching the new struct definition and saving an unnecessary allocation / dereference.
No further action needed.
936-938
: FINAL modifier ensures consistent readsAdding
FINAL
to the staging-data query is a good call – it prevents race conditions with background merges in ClickHouse.
👍internal/common/trace.go (1)
16-24
: Type migration looks consistentSwitching
TraceAddress
to[]int64
and gas fields touint64
simplifies marshaling and DB inserts. Just ensure callers no longer rely on signedness ofTraceAddress
(previouslyuint64
).
0ffae31
to
31e733b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (3)
internal/validation/validation.go (2)
286-295
:⚠️ Potential issue*Handle nil big.Int inputs to uint256.FromBig to prevent panics
The code doesn't check for nil values before converting
big.Int
touint256.Int
, which could cause panics for older blocks or special transactions where some fields may be nil.Implement a helper function at the beginning of the file:
// safeUint256 converts a *big.Int to *uint256.Int safely, handling nil values func safeUint256(b *big.Int) *uint256.Int { if b == nil { return new(uint256.Int) } out, _ := uint256.FromBig(b) return out }Then replace all the conversions:
-chainID, _ := uint256.FromBig(tx.ChainId) -gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) -gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) -value, _ := uint256.FromBig(tx.Value) -blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas) -v, _ := uint256.FromBig(tx.V) -r, _ := uint256.FromBig(tx.R) -s, _ := uint256.FromBig(tx.S) +chainID := safeUint256(tx.ChainId) +gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) +gasFeeCap := safeUint256(tx.MaxFeePerGas) +value := safeUint256(tx.Value) +blobFeeCap := safeUint256(tx.MaxFeePerBlobGas) +v := safeUint256(tx.V) +r := safeUint256(tx.R) +s := safeUint256(tx.S)Also apply the same pattern to lines 332-338.
56-58
:⚠️ Potential issueCheck error return from poller.Poll
The
Poll
method likely returns an error that is not being checked. This could silently ignore failures in the polling process.poller := orchestrator.NewBoundlessPoller(rpcClient, s) -poller.Poll(context.Background(), batch) +if err := poller.Poll(context.Background(), batch); err != nil { + return fmt.Errorf("failed to poll blocks %v: %w", batch, err) +} log.Debug().Msgf("Batch of invalid blocks polled: %d to %d", invalidBlocks[i], invalidBlocks[end-1])internal/validation/duplicates.go (1)
213-229
: 🛠️ Refactor suggestionString-built
IN
clause risks gigantic queries & SQL-injectionBuilding the
IN
clause directly withfmt.Sprintf
could lead to oversized queries if there are many tuples, and potentially allows SQL injection vulnerabilities even though the values come from the database.Consider one of these approaches:
- Use ClickHouse array parameters instead:
-tuples := make([]string, 0, len(batch)) +txHashes := make([]string, 0, len(batch)) +logIndices := make([]uint64, 0, len(batch)) for _, duplicateLog := range batch { blockNumbers = append(blockNumbers, duplicateLog.BlockNumber) - tuples = append(tuples, fmt.Sprintf("('%s', %d)", duplicateLog.TxHash, duplicateLog.LogIndex)) + txHashes = append(txHashes, duplicateLog.TxHash) + logIndices = append(logIndices, duplicateLog.LogIndex) } -query := fmt.Sprintf(`WITH +query := `WITH to_be_inserted AS ( SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign FROM default.logs FINAL - WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) + WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN ZIP(?, ?) ) INSERT INTO logs ( chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign ) SELECT * from to_be_inserted -`, strings.Join(tuples, ",")) +` -err := conn.Exec(context.Background(), query, chainId, blockNumbers) +err := conn.Exec(context.Background(), query, chainId, blockNumbers, txHashes, logIndices)
🧹 Nitpick comments (3)
cmd/validate.go (1)
19-26
: Update the TBD command documentationThe command's
Short
andLong
descriptions are set to "TBD" which should be updated with meaningful descriptions to help users understand the purpose and functionality of the validate command.var ( validateCmd = &cobra.Command{ - Use: "validate", - Short: "TBD", - Long: "TBD", + Use: "validate [batchSize] [fixBatchSize]", + Short: "Validates and fixes blockchain data integrity", + Long: "Validates blockchain data integrity by detecting and removing duplicates, filling gaps, and verifying block data. Takes optional batch size (default: 1000) and fix batch size (default: 0) arguments.", Run: func(cmd *cobra.Command, args []string) { RunValidate(cmd, args) }, } )internal/validation/validation.go (2)
164-166
: Better contract creation detectionThe contract creation detection logic relies on checking for the zero address string and non-empty data, which could be error-prone. Consider using a more robust approach.
-isContractCreation := tx.ToAddress == "0x0000000000000000000000000000000000000000" && tx.Data != "0x" +// Contract creation transactions have empty "to" field, but in the database they might be +// stored with the zero address. Check for both empty address and zero address. +isContractCreation := (tx.ToAddress == "" || tx.ToAddress == "0x0000000000000000000000000000000000000000") && tx.Data != "0x"
98-104
: Add comments explaining transaction type checkingThe code skips transaction root validation for transaction types > 4, but doesn't explain why. Add comments to clarify which transaction types are supported and why others are skipped.
// TODO: remove this once we know how to validate all tx types for _, tx := range blockData.Transactions { - if tx.TransactionType > 4 { // Currently supported types are 0-4 + // Currently supported transaction types: + // 0: Legacy + // 1: Access List (EIP-2930) + // 2: Dynamic Fee (EIP-1559) + // 3: Blob (EIP-4844) + // 4: Set Code (EIP-7702) + // Skip validation for newer, unsupported transaction types + if tx.TransactionType > 4 { log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType) return nil } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (11)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/trace.go
(3 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- cmd/root.go
- internal/storage/clickhouse.go
- internal/rpc/serializer.go
- internal/common/trace.go
- internal/validation/cursor.go
- internal/validation/gaps.go
- go.mod
- internal/validation/db.go
🧰 Additional context used
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
31e733b
to
0433596
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (8)
cmd/validate.go (2)
69-87
: Set TLS MinVersion for better securityThe TLS configuration is missing a
MinVersion
setting, which could allow connections with insecure TLS protocol versions. This was previously identified and should be addressed by addingMinVersion: tls.VersionTLS13
to enhance connection security.🧰 Tools
🪛 ast-grep (0.38.1)
[warning] 71-71: MinVersion
is missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures(missing-ssl-minversion-go)
115-133
: Improve error handling in validateAndFixRangeThe function uses
log.Fatal()
which terminates the program immediately without allowing cleanup or graceful shutdown. As previously suggested, consider returning errors instead and handling them at a higher level to allow for proper cleanup and graceful error handling.internal/validation/validation.go (2)
287-294
: *Handle nil big.Int inputs to uint256.FromBig to prevent panicsThe fields
tx.MaxPriorityFeePerGas
,tx.MaxFeePerBlobGas
,tx.Value
,tx.R
,tx.S
, etc. may benil
for certain transaction types. Callinguint256.FromBig(nil)
will panic. A helper functionsafeUint256
should be introduced to handle nil inputs safely as previously suggested.
332-338
: *Handle nil big.Int inputs to uint256.FromBig to prevent panicsSame issue as lines 287-294 - the conversion to uint256 needs null-safety to prevent runtime panics when big.Int fields are nil.
internal/validation/duplicates.go (4)
32-34
: Error fromremoveDuplicateBlocks
is silently ignoredThe call to
removeDuplicateBlocks
does not check the returned error.
If the deduplication query fails you will continue as if the data were clean, potentially cascading bad state into gap-filling and validation steps.- removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) + if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil { + return fmt.Errorf("failed to remove duplicate blocks: %w", err) + }🧰 Tools
🪛 golangci-lint (1.64.8)
33-33: Error return value is not checked
(errcheck)
43-45
: Error fromremoveDuplicateTransactions
is silently ignoredSame issue as above for transaction duplicates. Propagate or at least log the failure so that the caller can abort or retry.
- removeDuplicateTransactions(conn, chainId, duplicateTransactions) + if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil { + return fmt.Errorf("failed to remove duplicate transactions: %w", err) + }🧰 Tools
🪛 golangci-lint (1.64.8)
44-44: Error return value is not checked
(errcheck)
54-56
: Error fromremoveDuplicateLogs
is silently ignoredPlease handle the error to avoid continuing with an inconsistent database state.
- removeDuplicateLogs(conn, chainId, duplicateLogs) + if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil { + return fmt.Errorf("failed to remove duplicate logs: %w", err) + }🧰 Tools
🪛 golangci-lint (1.64.8)
55-55: Error return value is not checked
(errcheck)
218-230
: String-built tuple list risks gigantic queries & SQL-injection
fmt.Sprintf
concatenates every(txHash, logIndex)
tuple straight into the SQL.
Drawbacks:
- Large duplicate batches can blow ClickHouse’s max query length.
- Although data originates from the DB, handcrafted SQL bypasses driver-level sanitisation.
Prefer parameterised arrays or smaller, parameterised batches, e.g.:
- WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) + WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (?)Then pass
[]tuple(String, UInt64)
as the third argument, or fall back to the existing 1 000-row batching but build the tuples slice as a proper driver parameter instead of raw SQL.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (12)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- cmd/root.go
🚧 Files skipped from review as they are similar to previous changes (6)
- internal/validation/gaps.go
- internal/common/trace.go
- internal/validation/cursor.go
- internal/rpc/serializer.go
- internal/storage/clickhouse.go
- go.mod
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/validation/db.go (1)
internal/storage/clickhouse.go (1)
ZERO_BYTES_42
(35-35)
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
🔇 Additional comments (8)
internal/rpc/rpc.go (1)
111-114
: LGTM! Improves consistency between initialization functions.The addition of chain ID initialization to
InitializeSimpleRPCWithUrl
ensures consistency with the mainInitialize()
function and provides the necessary chain ID context for validation workflows.cmd/validate.go (1)
29-51
: LGTM! Well-structured argument parsing and validation.The command setup with batch size configuration and argument validation is well implemented. The inclusive range adjustment (line 51) is a nice touch for user-friendly batch processing.
internal/validation/validation.go (3)
164-165
: LGTM! Improved contract creation handling.The contract creation detection using
isContractCreation := tx.ToAddress == ""
and subsequent conditional address handling properly addresses the previous feedback about nilTo
addresses in contract creation transactions.
118-151
: LGTM! Robust logs bloom calculation.The logs bloom calculation correctly converts internal log structures to go-ethereum types and handles optional topics properly. The use of go-ethereum's
CreateBloom
ensures correctness.
56-57
:✅ Verification successful
Verify poller.Poll method signature and error handling
The call to
poller.Poll(context.Background(), batch)
needs verification to ensure it matches the expected signature and properly handles returned errors.
🏁 Script executed:
#!/bin/bash # Check the Poll method signature in the orchestrator package ast-grep --pattern 'func $_($$$) Poll($$$) $$$'Length of output: 49
🏁 Script executed:
#!/bin/bash # Locate any Go files in the orchestrator package pkg_files=$(rg -l "^package orchestrator" -n) # Print out where NewBoundlessPoller is defined echo "🔍 Searching for NewBoundlessPoller definitions..." rg -n "func NewBoundlessPoller" -n $pkg_files # Search for the BoundlessPoller type echo "🔍 Locating type BoundlessPoller..." file=$(rg -l "^type BoundlessPoller" -n $pkg_files) echo "Found BoundlessPoller in: $file" # Inspect the Poll method signature on BoundlessPoller echo "🔍 Inspecting Poll method signature..." rg -n "^func .*BoundlessPoller\\) Poll" -n "$file"Length of output: 1605
🏁 Script executed:
#!/bin/bash # Locate the Poller type definition echo "🔍 Searching for the Poller type in poller.go..." rg -n "^type Poller" internal/orchestrator/poller.go # Inspect the Poll method signature on Poller echo "🔍 Inspecting the Poll method signature..." rg -n "^func .*Poll" internal/orchestrator/poller.goLength of output: 1141
Signature verified:
Poll
returns only a block number, no error to handle
- internal/orchestrator/poller.go:149 defines
func (p *Poller) Poll(ctx context.Context, blockNumbers []*big.Int) (lastPolledBlock *big.Int)
- There is no error return, so no error handling is required here.
- If you need the returned
lastPolledBlock
, assign it; otherwise discarding the value is acceptable.internal/validation/db.go (3)
64-153
: LGTM! Excellent concurrent data fetching design.The concurrent fetching of blocks, logs, and transactions using goroutines and channels is well-implemented. The error handling, result organization by block number, and final assembly into
BlockData
structs demonstrates good engineering practices.
257-259
: LGTM! Proper handling of contract creation addresses.The special case handling for
ZERO_BYTES_42
correctly converts the zero-byte representation to an empty string for contract creation transactions, which aligns with the constant defined ininternal/storage/clickhouse.go
.
156-165
: LGTM! Optimized ClickHouse queries.The use of
FINAL
clause in the SELECT queries ensures data consistency when reading from ClickHouse tables, which is appropriate for validation operations that need the most recent, deduplicated data.
0433596
to
4b567ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
cmd/validate.go (2)
69-87
: TLS configuration missing MinVersion setting.This security issue has already been identified in previous reviews. The TLS configuration should specify a minimum version for enhanced security.
🧰 Tools
🪛 ast-grep (0.38.1)
[warning] 71-71: MinVersion
is missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures(missing-ssl-minversion-go)
115-134
: Error handling uses log.Fatal() preventing graceful shutdown.This error handling issue has already been identified in previous reviews. The function should return errors instead of calling log.Fatal() to allow proper cleanup and graceful error handling.
internal/validation/validation.go (1)
286-296
: *Handle nil big.Int inputs to prevent panics.This nil pointer risk with uint256.FromBig() has already been identified in previous reviews. Fields like MaxPriorityFeePerGas, MaxFeePerBlobGas, Value, R, S can be nil and need safe handling to prevent runtime panics.
Also applies to: 332-340
internal/validation/duplicates.go (3)
33-34
: Error return values not handled.The calls to
removeDuplicateBlocks
,removeDuplicateTransactions
, andremoveDuplicateLogs
return errors that are not being checked, as already identified in previous reviews. These should be handled to prevent silent failures.Also applies to: 44-45, 55-56
🧰 Tools
🪛 golangci-lint (1.64.8)
33-33: Error return value is not checked
(errcheck)
62-67
: HAVING clause may miss certain duplicate scenarios.The
HAVING sum(sign) != 1
condition has already been identified as potentially problematic in previous reviews, as it could miss cases where duplicates exist but sum to 1.
218-232
: SQL injection risk with string interpolation.The use of
fmt.Sprintf
to build the IN clause has already been identified as a security risk in previous reviews. This should use parameterized queries instead.
🧹 Nitpick comments (1)
internal/validation/validation.go (1)
97-103
: Consider removing or refining the transaction type check.The current check skips validation for transaction types > 4, but this hardcoded limit may become outdated as new transaction types are introduced. Consider either:
- Making this configurable
- Adding a more explicit comment about which types are supported
- Implementing a more robust unsupported type detection
- for _, tx := range blockData.Transactions { - if tx.TransactionType > 4 { // Currently supported types are 0-4 - log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType) - return nil - } - } + supportedTypes := map[uint8]bool{0: true, 1: true, 2: true, 3: true, 4: true} + for _, tx := range blockData.Transactions { + if !supportedTypes[tx.TransactionType] { + log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType) + return nil + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (13)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- internal/common/abi.go
🚧 Files skipped from review as they are similar to previous changes (9)
- cmd/root.go
- internal/rpc/rpc.go
- internal/storage/clickhouse.go
- internal/rpc/serializer.go
- internal/validation/cursor.go
- internal/validation/gaps.go
- internal/common/trace.go
- go.mod
- internal/validation/db.go
🧰 Additional context used
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
4b567ef
to
734272c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (9)
cmd/validate.go (2)
69-87
:⚠️ Potential issueSet TLS MinVersion for enhanced security.
The TLS configuration lacks a
MinVersion
setting, potentially allowing insecure TLS protocol versions. This was flagged in previous reviews and by static analysis.conn, err := clickhouse.Open(&clickhouse.Options{ Addr: []string{fmt.Sprintf("%s:%d", config.Cfg.Storage.Main.Clickhouse.Host, config.Cfg.Storage.Main.Clickhouse.Port)}, Protocol: clickhouse.Native, - TLS: &tls.Config{}, + TLS: &tls.Config{ + MinVersion: tls.VersionTLS12, + }, Auth: clickhouse.Auth{ Username: config.Cfg.Storage.Main.Clickhouse.Username, Password: config.Cfg.Storage.Main.Clickhouse.Password, },🧰 Tools
🪛 ast-grep (0.38.1)
[warning] 71-71: MinVersion
is missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures(missing-ssl-minversion-go)
115-133
: 🛠️ Refactor suggestionReplace log.Fatal() with proper error handling.
The function uses
log.Fatal()
which terminates the program immediately, preventing graceful error handling and cleanup. This was flagged in previous reviews.-func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error { +func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error { chainId := rpcClient.GetChainID() err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock) if err != nil { - log.Fatal().Err(err).Msg("Failed to find and fix duplicates") + return fmt.Errorf("failed to find and fix duplicates: %w", err) } err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock) if err != nil { - log.Fatal().Err(err).Msg("Failed to find and fix gaps") + return fmt.Errorf("failed to find and fix gaps: %w", err) } err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize) if err != nil { - log.Fatal().Err(err).Msg("Failed to validate and fix blocks") + return fmt.Errorf("failed to validate and fix blocks: %w", err) } log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock) return nil }internal/validation/validation.go (3)
288-296
:⚠️ Potential issue*Prevent panics from nil big.Int inputs to uint256.FromBig.
Fields like
tx.MaxPriorityFeePerGas
,tx.MaxFeePerBlobGas
,tx.Value
, etc. may benil
for certain transaction types, causinguint256.FromBig(nil)
to panic.Add a helper function and use it for safe conversions:
+func safeUint256(b *big.Int) *uint256.Int { + if b == nil { + return new(uint256.Int) + } + out, _ := uint256.FromBig(b) + return out +}Then replace the direct calls:
- chainID, _ := uint256.FromBig(tx.ChainId) - gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) - gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) - value, _ := uint256.FromBig(tx.Value) - blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas) - v, _ := uint256.FromBig(tx.V) - r, _ := uint256.FromBig(tx.R) - s, _ := uint256.FromBig(tx.S) + chainID := safeUint256(tx.ChainId) + gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) + gasFeeCap := safeUint256(tx.MaxFeePerGas) + value := safeUint256(tx.Value) + blobFeeCap := safeUint256(tx.MaxFeePerBlobGas) + v := safeUint256(tx.V) + r := safeUint256(tx.R) + s := safeUint256(tx.S)
333-340
:⚠️ Potential issueApply the same nil-safety fix for EIP-7702 transactions.
The same nil pointer risk exists here for the EIP-7702 transaction type conversion.
- chainID, _ := uint256.FromBig(tx.ChainId) - gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) - gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) - value, _ := uint256.FromBig(tx.Value) - v, _ := uint256.FromBig(tx.V) - r, _ := uint256.FromBig(tx.R) - s, _ := uint256.FromBig(tx.S) + chainID := safeUint256(tx.ChainId) + gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) + gasFeeCap := safeUint256(tx.MaxFeePerGas) + value := safeUint256(tx.Value) + v := safeUint256(tx.V) + r := safeUint256(tx.R) + s := safeUint256(tx.S)
56-58
:⚠️ Potential issueHandle error return from poller.Poll.
The error returned by
poller.Poll
is not checked, which could silently ignore polling failures for invalid blocks.poller := orchestrator.NewBoundlessPoller(rpcClient, s) -poller.Poll(context.Background(), batch) +if err := poller.Poll(context.Background(), batch); err != nil { + log.Error().Err(err).Msgf("Failed to poll batch of blocks %s-%s", batch[0], batch[len(batch)-1]) + return fmt.Errorf("failed to poll invalid blocks: %w", err) +}internal/validation/duplicates.go (4)
33-34
:⚠️ Potential issueHandle error return from removeDuplicateBlocks.
The error returned by
removeDuplicateBlocks
is not checked, potentially ignoring failures in duplicate removal.log.Debug().Msgf("Found %d duplicate blocks in range %v-%v: %v", len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers) -removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) +if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil { + return fmt.Errorf("failed to remove duplicate blocks: %w", err) +}🧰 Tools
🪛 golangci-lint (1.64.8)
33-33: Error return value is not checked
(errcheck)
44-45
:⚠️ Potential issueHandle error return from removeDuplicateTransactions.
The error returned by
removeDuplicateTransactions
is not checked, potentially ignoring failures in duplicate removal.log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions) -removeDuplicateTransactions(conn, chainId, duplicateTransactions) +if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil { + return fmt.Errorf("failed to remove duplicate transactions: %w", err) +}🧰 Tools
🪛 golangci-lint (1.64.8)
44-44: Error return value is not checked
(errcheck)
55-56
:⚠️ Potential issueHandle error return from removeDuplicateLogs.
The error returned by
removeDuplicateLogs
is not checked, potentially ignoring failures in duplicate removal.log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs) -removeDuplicateLogs(conn, chainId, duplicateLogs) +if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil { + return fmt.Errorf("failed to remove duplicate logs: %w", err) +}🧰 Tools
🪛 golangci-lint (1.64.8)
55-55: Error return value is not checked
(errcheck)
218-229
: 🛠️ Refactor suggestionAvoid string interpolation in SQL queries for security.
The SQL query uses
fmt.Sprintf
to embed tuples directly, which poses security risks and could create oversized queries.Consider using parameterized queries or batching with smaller queries. If ClickHouse supports tuple arrays as parameters, use those instead:
- query := fmt.Sprintf(`WITH - to_be_inserted AS ( - SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, - data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign - FROM default.logs FINAL - WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) - ) - INSERT INTO logs ( - chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, - data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign - ) SELECT * from to_be_inserted - `, strings.Join(tuples, ",")) + // Alternative approach: process each log individually or use smaller batches + for _, duplicateLog := range batch { + query := `WITH + to_be_inserted AS ( + SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, + data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign + FROM default.logs FINAL + WHERE chain_id = ? AND block_number = ? AND transaction_hash = ? AND log_index = ? + ) + INSERT INTO logs ( + chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, + data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign + ) SELECT * from to_be_inserted` + + err := conn.Exec(context.Background(), query, chainId, duplicateLog.BlockNumber, duplicateLog.TxHash, duplicateLog.LogIndex) + if err != nil { + return err + } + }
🧹 Nitpick comments (1)
cmd/validate.go (1)
19-26
: Complete the command descriptions.The
Short
andLong
descriptions are marked as "TBD". Please provide meaningful descriptions for the validate command to improve CLI usability.var ( validateCmd = &cobra.Command{ Use: "validate", - Short: "TBD", - Long: "TBD", + Short: "Validate and fix blockchain data integrity", + Long: `Validate and fix blockchain data integrity by checking for duplicates, gaps, and block validation errors. Supports custom batch sizes for processing.`, Run: func(cmd *cobra.Command, args []string) { RunValidate(cmd, args) }, } )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (13)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- internal/common/abi.go
🚧 Files skipped from review as they are similar to previous changes (9)
- internal/rpc/rpc.go
- cmd/root.go
- internal/rpc/serializer.go
- internal/validation/gaps.go
- internal/validation/cursor.go
- internal/common/trace.go
- internal/storage/clickhouse.go
- go.mod
- internal/validation/db.go
🧰 Additional context used
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
734272c
to
b5f0fec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
cmd/validate.go (2)
69-87
: Set TLS MinVersion for better security.The TLS configuration lacks a MinVersion setting, potentially allowing insecure TLS protocol versions.
TLS: &tls.Config{ + MinVersion: tls.VersionTLS12, },
115-134
: Improve error handling by returning errors instead of using log.Fatal().The function uses
log.Fatal()
which terminates the program immediately without cleanup. Consider returning errors for better error handling.func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error { chainId := rpcClient.GetChainID() err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock) if err != nil { - log.Fatal().Err(err).Msg("Failed to find and fix duplicates") + return fmt.Errorf("failed to find and fix duplicates: %w", err) } err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock) if err != nil { - log.Fatal().Err(err).Msg("Failed to find and fix gaps") + return fmt.Errorf("failed to find and fix gaps: %w", err) } err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize) if err != nil { - log.Fatal().Err(err).Msg("Failed to validate and fix blocks") + return fmt.Errorf("failed to validate and fix blocks: %w", err) } log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock) return nil }internal/validation/validation.go (4)
56-58
: Fix the poller.Poll call signature.Based on past review comments, the
poller.Poll(batch)
call appears to have incorrect arity and is missing required arguments.poller := orchestrator.NewBoundlessPoller(rpcClient, s) -poller.Poll(context.Background(), batch) +if err := poller.Poll(context.Background(), batch); err != nil { + return fmt.Errorf("polling failed for blocks %s-%s: %w", batch[0], batch[len(batch)-1], err) +}
164-166
: Handle contract creation transactions correctly.Based on past review feedback, the current implementation sets
0x0000000000000000000000000000000000000000
for contract creation transactions instead of properly handling the nilTo
field.-// Convert to address -to := common.HexToAddress(tx.ToAddress) -isContractCreation := tx.ToAddress == "" +// Handle contract creation (nil To address) +var toPtr *common.Address +isContractCreation := tx.ToAddress == "" +if !isContractCreation { + addr := common.HexToAddress(tx.ToAddress) + toPtr = &addr +}Then update the transaction creation logic to use
toPtr
instead ofto
for non-legacy transactions.
288-296
: Add nil checks for big.Int to uint256 conversions to prevent panics.The
uint256.FromBig()
function will panic if passed a nil pointer. Fields likeMaxPriorityFeePerGas
,MaxFeePerBlobGas
,Value
,R
,S
, etc. can be NULL in the database.+func safeUint256(b *big.Int) *uint256.Int { + if b == nil { + return new(uint256.Int) + } + out, _ := uint256.FromBig(b) + return out +}Then replace the direct calls:
-chainID, _ := uint256.FromBig(tx.ChainId) -gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) -gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) -value, _ := uint256.FromBig(tx.Value) -blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas) -v, _ := uint256.FromBig(tx.V) -r, _ := uint256.FromBig(tx.R) -s, _ := uint256.FromBig(tx.S) +chainID := safeUint256(tx.ChainId) +gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) +gasFeeCap := safeUint256(tx.MaxFeePerGas) +value := safeUint256(tx.Value) +blobFeeCap := safeUint256(tx.MaxFeePerBlobGas) +v := safeUint256(tx.V) +r := safeUint256(tx.R) +s := safeUint256(tx.S)
334-340
: Apply the same nil check fix for EIP-7702 transaction uint256 conversions.Similar to the blob transaction case, these conversions also need protection against nil pointers.
-chainID, _ := uint256.FromBig(tx.ChainId) -gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) -gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) -value, _ := uint256.FromBig(tx.Value) -v, _ := uint256.FromBig(tx.V) -r, _ := uint256.FromBig(tx.R) -s, _ := uint256.FromBig(tx.S) +chainID := safeUint256(tx.ChainId) +gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) +gasFeeCap := safeUint256(tx.MaxFeePerGas) +value := safeUint256(tx.Value) +v := safeUint256(tx.V) +r := safeUint256(tx.R) +s := safeUint256(tx.S)
🧹 Nitpick comments (2)
cmd/validate.go (1)
18-27
: Add description for the validate command.The command currently has placeholder "TBD" values for Short and Long descriptions. Consider adding meaningful descriptions to help users understand the command's purpose.
validateCmd = &cobra.Command{ Use: "validate", - Short: "TBD", - Long: "TBD", + Short: "Validate and repair blockchain data integrity", + Long: `Validate and repair blockchain data integrity by checking for duplicates, gaps, and block validation errors. Processes data in configurable batches.`, Run: func(cmd *cobra.Command, args []string) { RunValidate(cmd, args) }, }internal/validation/validation.go (1)
97-103
: Consider making unsupported transaction type handling configurable.The current implementation skips transaction root validation entirely when encountering unsupported transaction types. This might mask validation issues in blocks with mixed transaction types.
Consider adding a flag to control this behavior or validate only the supported transactions:
// TODO: remove this once we know how to validate all tx types +unsupportedTxCount := 0 for _, tx := range blockData.Transactions { if tx.TransactionType > 4 { // Currently supported types are 0-4 - log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType) - return nil + unsupportedTxCount++ } } +if unsupportedTxCount > 0 { + log.Warn().Msgf("Block %s contains %d unsupported transaction types, skipping transaction root validation", blockData.Block.Number, unsupportedTxCount) + return nil +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (13)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(6 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- internal/common/abi.go
🚧 Files skipped from review as they are similar to previous changes (10)
- internal/rpc/rpc.go
- cmd/root.go
- internal/validation/cursor.go
- internal/rpc/serializer.go
- internal/common/trace.go
- internal/storage/clickhouse.go
- go.mod
- internal/validation/gaps.go
- internal/validation/db.go
- internal/validation/duplicates.go
🧰 Additional context used
🧠 Learnings (1)
internal/validation/validation.go (1)
Learnt from: iuwqyir
PR: #216
File: internal/orchestrator/chain_tracker.go:45-53
Timestamp: 2025-06-04T11:28:38.057Z
Learning: Go's math/big.Int type has a Float64() method that returns (float64, bool), where the boolean indicates if the conversion was exact.
🪛 ast-grep (0.38.6)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
b5f0fec
to
ebe265c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (4)
cmd/validate.go (1)
31-76
: Consider improving error handling instead of using log.Fatal().The function uses
log.Fatal()
which terminates the program immediately without cleanup. This issue was previously raised for similar code.internal/validation/duplicates.go (3)
71-76
: Consider using count() > 1 for more robust duplicate detection.As previously noted,
HAVING sum(sign) != 1
may miss duplicate scenarios where rows have signs like (+1, +1, -1) that sum to 1. Usingcount() > 1
would be more reliable for detecting any group with multiple live rows.-HAVING sum(sign) != 1 +HAVING count() > 1
97-101
: Apply consistent duplicate detection logic across all find functions.The same duplicate detection issue applies to transactions and logs queries. Consider using
count() > 1
consistently across all three functions.Also applies to: 123-127
227-241
: String-built IN clause still poses risks for large queries and bypasses sanitization.The use of
fmt.Sprintf
to embed tuples directly into SQL remains problematic:
- Large duplicate sets can exceed server query length limits
- String concatenation bypasses ClickHouse's query parameter sanitization
Consider using parameterized queries with array inputs or batching with smaller, fully parameterized queries.
Does ClickHouse support array of tuples as query parameters for IN clauses?
🧹 Nitpick comments (5)
internal/storage/connector.go (1)
106-109
: Enhance the method documentation for clarity.The comment "Gets only the data required for validation" is vague. Consider specifying what subset of data is retrieved (e.g., blocks with transactions and logs, but without traces).
/** - * Gets only the data required for validation. + * Gets block data required for validation, including blocks, transactions, and logs + * for the specified block range. Traces are excluded as they are not needed for validation. */internal/validation/validator.go (4)
29-34
: Use idiomatic Go documentation format.Replace the JSDoc-style comment with Go's standard documentation format.
-/** - * Validate blocks in the range of startBlock to endBlock - * @param startBlock - The start block number (inclusive) - * @param endBlock - The end block number (inclusive) - * @return error - An error if the validation fails - */ +// ValidateBlockRange validates blocks in the range of startBlock to endBlock (inclusive). +// It returns slices of valid and invalid blocks, and an error if the validation fails.
40-45
: Remove redundant return statement.The explicit return at line 44 is unnecessary since the same values are already being returned.
validBlocks, invalidBlocks, err = v.ValidateBlocks(dbData) if err != nil { return nil, nil, err } -return validBlocks, invalidBlocks, nil +return validBlocks, invalidBlocks, err
48-49
: Simplify slice initialization.Use nil initialization for empty slices instead of make with 0 capacity.
-invalidBlocks = make([]common.BlockData, 0) -validBlocks = make([]common.BlockData, 0) +var invalidBlocks []common.BlockData +var validBlocks []common.BlockData
78-84
: Track the TODO for unsupported transaction types.The TODO comment indicates that transaction root validation is skipped for certain transaction types. This should be tracked to ensure complete validation coverage.
Would you like me to create an issue to track the implementation of validation for all transaction types?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (16)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)cmd/validate_and_fix.go
(1 hunks)go.mod
(6 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(5 hunks)internal/storage/connector.go
(1 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/root_calculator.go
(1 hunks)internal/validation/validator.go
(1 hunks)test/mocks/MockIMainStorage.go
(1 hunks)
🧬 Code Graph Analysis (3)
internal/storage/connector.go (1)
internal/common/block.go (1)
BlockData
(61-66)
internal/validation/root_calculator.go (2)
internal/common/transaction.go (1)
Transaction
(16-51)internal/common/log.go (1)
Log
(15-31)
test/mocks/MockIMainStorage.go (1)
internal/common/block.go (1)
BlockData
(61-66)
✅ Files skipped from review due to trivial changes (1)
- internal/common/abi.go
🚧 Files skipped from review as they are similar to previous changes (8)
- internal/rpc/rpc.go
- cmd/root.go
- internal/rpc/serializer.go
- internal/common/trace.go
- internal/validation/cursor.go
- internal/validation/gaps.go
- internal/storage/clickhouse.go
- go.mod
🧰 Additional context used
🧬 Code Graph Analysis (3)
internal/storage/connector.go (1)
internal/common/block.go (1)
BlockData
(61-66)
internal/validation/root_calculator.go (2)
internal/common/transaction.go (1)
Transaction
(16-51)internal/common/log.go (1)
Log
(15-31)
test/mocks/MockIMainStorage.go (1)
internal/common/block.go (1)
BlockData
(61-66)
🔇 Additional comments (11)
cmd/validate_and_fix.go (2)
72-74
: Good implementation of TLS security configuration.The TLS configuration properly sets
MinVersion: tls.VersionTLS12
, addressing the security concern from previous reviews.
117-150
: Excellent error handling in validateAndFixRange.The function properly returns errors with descriptive context using
fmt.Errorf
, allowing the caller to handle them gracefully. This is a significant improvement over usinglog.Fatal()
.internal/validation/root_calculator.go (2)
18-253
: Well-structured transaction root calculation with comprehensive type support.The implementation correctly handles all transaction types including the latest EIP-7702, properly manages signature reconstruction for legacy and EIP-155 transactions, and includes thorough error handling. The sorting by transaction index ensures deterministic root calculation.
174-178
: Fix contract creation handling for blob transactions.The
To
field is being set even when it's a contract creation. For blob transactions, theTo
field should remain empty (zero address) for contract creation.-var toAddr gethCommon.Address -if !isContractCreation { - toAddr = to -} +toAddr := toNote: The
BlobTx
struct expects a non-pointerAddress
field, so for contract creation it should use the zero address value, which is whatto
already contains whenisContractCreation
is true.Likely an incorrect or invalid review comment.
test/mocks/MockIMainStorage.go (1)
630-688
: Auto-generated mock code looks correct.The mock implementation for
GetValidationBlockData
follows standard mockery patterns and correctly matches the interface signature.internal/validation/validator.go (2)
1-19
: LGTM!The imports and struct definition are well-organized and appropriate for the validation functionality.
21-27
: LGTM!The constructor properly initializes all fields with appropriate dependencies.
internal/validation/duplicates.go (4)
1-23
: LGTM!The imports and struct definitions are well-organized with appropriate field tags for JSON and ClickHouse.
24-68
: Error handling has been properly implemented.The function now correctly handles all error returns from the removal functions, addressing the previous review comments.
148-168
: LGTM!The duplicate removal approach using negated sign values is appropriate for ClickHouse's merge tree engine.
170-208
: LGTM!The batched approach with proper parameterization is well-implemented for handling large numbers of duplicate transactions.
ebe265c
to
7bbd881
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
cmd/validate_and_fix.go (2)
57-57
: Fix typo in log message.The log message contains a typo in the function name.
- log.Info().Msgf("Running validationAndFix for chain %d", rpcClient.GetChainID()) + log.Info().Msgf("Running validateAndFix for chain %d", rpcClient.GetChainID())
114-116
: Use Go-style comments.The function comment uses C-style comment syntax instead of Go conventions.
-/** - * Validates a range of blocks (end and start are inclusive) for a given chain and fixes any problems it finds - */ +// validateAndFixRange validates a range of blocks (end and start are inclusive) for a given chain and fixes any problems it finds
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (16)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)cmd/validate_and_fix.go
(1 hunks)go.mod
(6 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(5 hunks)internal/storage/connector.go
(1 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/root_calculator.go
(1 hunks)internal/validation/validator.go
(1 hunks)test/mocks/MockIMainStorage.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (15)
- internal/common/abi.go
- internal/storage/connector.go
- cmd/root.go
- internal/rpc/rpc.go
- internal/rpc/serializer.go
- cmd/validate.go
- internal/common/trace.go
- internal/storage/clickhouse.go
- internal/validation/cursor.go
- internal/validation/gaps.go
- internal/validation/root_calculator.go
- go.mod
- internal/validation/validator.go
- test/mocks/MockIMainStorage.go
- internal/validation/duplicates.go
🔇 Additional comments (3)
cmd/validate_and_fix.go (3)
1-16
: LGTM! Clean import structure.The package declaration and imports are well-organized and all dependencies are utilized appropriately.
18-27
: LGTM! Command definition is well-structured.The command descriptions are now meaningful and informative, addressing the previous placeholder issue. The cobra command structure follows best practices.
117-150
: LGTM! Well-structured validation and fixing logic.The function follows a clear sequence of operations with proper error handling and separation of concerns. The logic for collecting invalid blocks and fixing them in batches is well-implemented.
7bbd881
to
1d1f463
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (4)
cmd/validate_and_fix.go (1)
70-94
: Consider reusing storage abstraction for ClickHouse connection.The ClickHouse connection is being established directly here with hardcoded configuration, which duplicates the connection logic that likely exists in the storage layer. This creates maintenance overhead and potential configuration drift.
Consider extracting a method from the storage layer to provide the ClickHouse connection, or add a method to the storage interface to get the underlying connection.
internal/storage/clickhouse.go (1)
914-920
:getBlockNumbersStringArray
generates invalid SQL with trailing comma.The function unconditionally appends a comma after every element, resulting in invalid SQL syntax like
block_number IN (123,456,)
. ClickHouse will reject the extra comma.Apply the fix suggested in the previous review:
-func getBlockNumbersStringArray(blockNumbers []*big.Int) string { - blockNumbersString := "" - for _, blockNumber := range blockNumbers { - blockNumbersString += fmt.Sprintf("%s,", blockNumber.String()) - } - return blockNumbersString -} +func getBlockNumbersStringArray(blockNumbers []*big.Int) string { + if len(blockNumbers) == 0 { + return "" + } + var b strings.Builder + for i, n := range blockNumbers { + if i > 0 { + b.WriteByte(',') + } + b.WriteString(n.String()) + } + return b.String() +}internal/validation/duplicates.go (2)
71-76
:HAVING sum(sign) != 1
may miss certain duplicate scenarios.If rows exist with signs
(+1, +1, -1)
the sum equals+1
, therefore the group is not flagged, yet you still have two live (+1) rows – a duplicate situation. Usingcount() > 1
(orabs(sum(sign)) != 1
) is safer.Consider using a more robust duplicate detection:
-HAVING sum(sign) != 1 +HAVING count(*) > 1 AND sum(sign) > 0
227-238
: String-builtIN
clause risks gigantic queries & SQL injection.
fmt.Sprintf
embeds a comma-separated list of tuples directly into the SQL string. This can create excessively large queries and bypasses ClickHouse query-parameter sanitization.Consider using parameterized queries or batching with smaller, safer queries:
-WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) +// Split into smaller batches and use individual parameter binding +WHERE chain_id = ? AND block_number = ? AND transaction_hash = ? AND log_index = ?
🧹 Nitpick comments (1)
internal/orchestrator/validator.go (1)
125-127
: Consider continuing with remaining batches on partial failures.The current implementation stops processing on the first failed batch, potentially leaving many fixable blocks unprocessed. Consider collecting errors and continuing with remaining batches to maximize the number of fixed blocks.
+var allErrors []error for i := 0; i < len(invalidBlocks); i += fixBatchSize { // ... existing batch logic ... if len(failedBlocks) > 0 { log.Error().Msgf("Failed to poll %d blocks: %v", len(failedBlocks), failedBlocks) - return fmt.Errorf("failed to poll %d blocks: %v", len(failedBlocks), failedBlocks) + allErrors = append(allErrors, fmt.Errorf("failed to poll %d blocks: %v", len(failedBlocks), failedBlocks)) + continue } // ... rest of processing ... } +if len(allErrors) > 0 { + return fmt.Errorf("encountered %d errors during fixing: %v", len(allErrors), allErrors) +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (15)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)cmd/validate_and_fix.go
(1 hunks)go.mod
(6 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/orchestrator/validator.go
(1 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(5 hunks)internal/storage/connector.go
(1 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/root_calculator.go
(1 hunks)test/mocks/MockIMainStorage.go
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- internal/common/abi.go
🚧 Files skipped from review as they are similar to previous changes (10)
- cmd/root.go
- internal/rpc/rpc.go
- cmd/validate.go
- internal/common/trace.go
- internal/validation/cursor.go
- internal/rpc/serializer.go
- internal/storage/connector.go
- go.mod
- internal/validation/root_calculator.go
- test/mocks/MockIMainStorage.go
🔇 Additional comments (2)
internal/storage/clickhouse.go (2)
383-384
: Gas and GasUsed Type Compatibility ConfirmedThe
Trace
struct ininternal/common/trace.go
now defines bothGas
andGasUsed
asuint64
(lines 22–23), which maps directly to ClickHouse’sUInt64
type. Removing the prior.Uint64()
conversions is therefore correct—no further changes are needed here.
1893-1893
: Checkblock_number
nullability in your ClickHouse schemaThe
FindMissingBlockNumbers
query uses:… LEFT JOIN existing_blocks e ON s.expected_block_number = e.block_number WHERE e.block_number = 0 …With non-nullable numeric columns in ClickHouse, unmatched rows default to
0
, so this correctly picks gaps. However, if yourblocks.block_number
column is defined asNullable(UInt256)
(or any nullable type), missing rows will beNULL
and won’t satisfy= 0
.Please verify your table schema:
- If
block_number
is non-nullable (e.g.UInt256
), no change is needed.- If it’s nullable, change the filter to one of:
orWHERE isNull(e.block_number)WHERE e.block_number IS NULL
1d1f463
to
0b45721
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (6)
cmd/validate_and_fix.go (2)
70-94
: Consider reusing storage abstraction for ClickHouse connection.The ClickHouse connection is being established directly here with hardcoded configuration, which duplicates the connection logic that likely exists in the storage layer. This creates maintenance overhead and potential configuration drift.
Consider extracting a method from the storage layer to provide the ClickHouse connection, or add a method to the storage interface to get the underlying connection.
96-96
: Add nil check for cursor fields before use.The code directly accesses
cursor.LastScannedBlockNumber
andcursor.MaxBlockNumber
without verifying they are initialized. If cursor initialization fails to set these values, this could cause a nil pointer dereference.internal/storage/clickhouse.go (1)
483-486
:createBlockNumbersClause
may generate invalid SQL because of a trailing comma.
getBlockNumbersStringArray
unconditionally appends a comma after every element. When the slice is non-empty, the resulting clause becomes e.g.block_number IN (123,456,)
which ClickHouse will reject.-func getBlockNumbersStringArray(blockNumbers []*big.Int) string { - blockNumbersString := "" - for _, blockNumber := range blockNumbers { - blockNumbersString += fmt.Sprintf("%s,", blockNumber.String()) - } - return blockNumbersString -} +func getBlockNumbersStringArray(blockNumbers []*big.Int) string { + if len(blockNumbers) == 0 { + return "" + } + var b strings.Builder + for i, n := range blockNumbers { + if i > 0 { + b.WriteByte(',') + } + b.WriteString(n.String()) + } + return b.String() +}Without this fix, every query that relies on block-number filtering (including the new validation CLI) will fail.
Also applies to: 914-920
internal/orchestrator/validator.go (1)
80-85
: Consider returning a distinct validation status for unsupported transaction types.Currently, blocks with unsupported transaction types (> 4) are marked as valid without actual validation. This could mask data integrity issues.
internal/validation/duplicates.go (2)
71-76
:HAVING sum(sign) != 1
may miss certain duplicate scenarios.If rows exist with signs
(+1, +1, -1)
the sum equals+1
, therefore the group is not flagged, yet you still have two live (+1) rows – a duplicate situation. Usingcount() > 1
(orabs(sum(sign)) != 1
) is safer.Also applies to: 100-101, 126-127
227-241
: String-builtIN
clause risks gigantic queries & SQL-injection.
fmt.Sprintf
embeds a comma-separated list of tuples directly into the SQL string. This can create excessively large queries and bypasses ClickHouse query-parameter sanitization.Prefer using parameterized queries or batching with smaller queries.
🧹 Nitpick comments (4)
cmd/validate_and_fix.go (1)
115-117
: Use standard Go documentation format.Use the standard Go documentation format with
//
instead of/** */
.-/** - * Validates a range of blocks (end and start are inclusive) for a given chain and fixes any problems it finds - */ +// validateAndFixRange validates a range of blocks (end and start are inclusive) for a given chain and fixes any problems it findsinternal/storage/clickhouse.go (1)
1867-1912
: Consider clarifying the NULL check condition.The query logic is clever, using ClickHouse's
numbers()
function to generate the expected sequence. However, the conditione.block_number = 0
might be confusing as it's actually checking for NULL values after the LEFT JOIN.Consider adding a comment or using
IS NULL
for clarity:WHERE - e.block_number = 0 + e.block_number IS NULL -- No matching block found in existing_blocksinternal/orchestrator/validator.go (2)
29-34
: Use standard Go documentation format.-/** - * Validate blocks in the range of startBlock to endBlock - * @param startBlock - The start block number (inclusive) - * @param endBlock - The end block number (inclusive) - * @return error - An error if the validation fails - */ +// ValidateBlockRange validates blocks in the range of startBlock to endBlock. +// Both startBlock and endBlock are inclusive.
47-63
: Consider collecting all validation errors instead of stopping on first error.The current implementation stops validation on the first error, which might make debugging difficult when multiple blocks have issues. Consider validating all blocks and returning a summary of errors.
func (v *Validator) ValidateBlocks(blocks []common.BlockData) (validBlocks []common.BlockData, invalidBlocks []common.BlockData, err error) { invalidBlocks = make([]common.BlockData, 0) validBlocks = make([]common.BlockData, 0) + var validationErrors []error for _, blockData := range blocks { valid, err := ValidateBlock(blockData) if err != nil { log.Error().Err(err).Msgf("Block verification failed for block %s", blockData.Block.Number) - return nil, nil, err + validationErrors = append(validationErrors, fmt.Errorf("block %s: %w", blockData.Block.Number, err)) + continue } if valid { validBlocks = append(validBlocks, blockData) } else { invalidBlocks = append(invalidBlocks, blockData) } } + if len(validationErrors) > 0 { + return validBlocks, invalidBlocks, fmt.Errorf("validation errors: %v", validationErrors) + } return validBlocks, invalidBlocks, nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (15)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)cmd/validate_and_fix.go
(1 hunks)go.mod
(6 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/orchestrator/validator.go
(1 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(5 hunks)internal/storage/connector.go
(1 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/root_calculator.go
(1 hunks)test/mocks/MockIMainStorage.go
(2 hunks)
🧬 Code Graph Analysis (1)
internal/orchestrator/validator.go (5)
internal/storage/connector.go (1)
IStorage
(67-71)internal/rpc/rpc.go (1)
IRPCClient
(42-54)internal/orchestrator/poller.go (2)
Poller
(22-35)NewBoundlessPoller
(50-72)internal/common/block.go (2)
BlockData
(61-66)Block
(8-33)internal/validation/root_calculator.go (2)
CalculateLogsBloom
(255-289)CalculateTransactionsRoot
(18-253)
✅ Files skipped from review due to trivial changes (1)
- internal/storage/connector.go
🚧 Files skipped from review as they are similar to previous changes (10)
- internal/common/abi.go
- internal/rpc/rpc.go
- cmd/root.go
- internal/rpc/serializer.go
- cmd/validate.go
- internal/common/trace.go
- internal/validation/cursor.go
- go.mod
- internal/validation/root_calculator.go
- test/mocks/MockIMainStorage.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/orchestrator/validator.go (5)
internal/storage/connector.go (1)
IStorage
(67-71)internal/rpc/rpc.go (1)
IRPCClient
(42-54)internal/orchestrator/poller.go (2)
Poller
(22-35)NewBoundlessPoller
(50-72)internal/common/block.go (2)
BlockData
(61-66)Block
(8-33)internal/validation/root_calculator.go (2)
CalculateLogsBloom
(255-289)CalculateTransactionsRoot
(18-253)
🔇 Additional comments (3)
internal/storage/clickhouse.go (3)
383-384
: Type updates are consistent with trace data structure changes.The removal of
.Uint64()
calls aligns with the updated trace data structures mentioned in the PR summary where gas values now use uint64 types directly.Also applies to: 1306-1307
947-947
: Good addition of FINAL keyword for data consistency.Adding the
FINAL
keyword ensures consistent reads by forcing ClickHouse to read the final state of data parts, which is crucial for the deduplication logic that relies on sign-based operations.
1752-1865
: Well-implemented concurrent data fetching for validation.The implementation efficiently fetches blocks, logs, and transactions concurrently using goroutines and channels. Pre-organizing logs and transactions by block number is a smart optimization. The minimal field selection and proper error handling from all goroutines are excellent practices.
0b45721
to
7f6a1dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
internal/orchestrator/validator.go (1)
80-85
: Consider returning a distinct validation status for unsupported transaction types.Currently, blocks with unsupported transaction types (> 4) are marked as valid without actual validation. This could mask data integrity issues.
internal/validation/root_calculator.go (1)
221-224
: Fix contract creation handling for EIP-7702 transactions.Similar to blob transactions, the
To
field handling for contract creation is incorrect.
🧹 Nitpick comments (1)
internal/orchestrator/validator.go (1)
47-63
: Consider preserving partial validation results on error.When
ValidateBlock
returns an error, the function discards all previously validated blocks by returning nil slices. This could lead to loss of validation progress if an error occurs partway through a large batch.Consider returning the partial results along with the error:
func (v *Validator) ValidateBlocks(blocks []common.BlockData) (validBlocks []common.BlockData, invalidBlocks []common.BlockData, err error) { invalidBlocks = make([]common.BlockData, 0) validBlocks = make([]common.BlockData, 0) for _, blockData := range blocks { valid, err := ValidateBlock(blockData) if err != nil { log.Error().Err(err).Msgf("Block verification failed for block %s", blockData.Block.Number) - return nil, nil, err + return validBlocks, invalidBlocks, err } if valid { validBlocks = append(validBlocks, blockData) } else { invalidBlocks = append(invalidBlocks, blockData) } } return validBlocks, invalidBlocks, nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (15)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)cmd/validate_and_fix.go
(1 hunks)go.mod
(6 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/orchestrator/validator.go
(1 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(5 hunks)internal/storage/connector.go
(1 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/root_calculator.go
(1 hunks)test/mocks/MockIMainStorage.go
(2 hunks)
🧬 Code Graph Analysis (2)
internal/validation/root_calculator.go (2)
internal/common/transaction.go (1)
Transaction
(16-51)internal/common/log.go (1)
Log
(15-31)
internal/orchestrator/validator.go (5)
internal/storage/connector.go (1)
IStorage
(67-71)internal/rpc/rpc.go (1)
IRPCClient
(42-54)internal/orchestrator/poller.go (2)
Poller
(22-35)NewBoundlessPoller
(50-72)internal/common/block.go (2)
BlockData
(61-66)Block
(8-33)internal/validation/root_calculator.go (2)
CalculateLogsBloom
(255-289)CalculateTransactionsRoot
(18-253)
✅ Files skipped from review due to trivial changes (2)
- internal/common/abi.go
- cmd/validate.go
🚧 Files skipped from review as they are similar to previous changes (11)
- cmd/root.go
- internal/rpc/rpc.go
- internal/storage/connector.go
- internal/rpc/serializer.go
- internal/common/trace.go
- internal/validation/cursor.go
- cmd/validate_and_fix.go
- internal/storage/clickhouse.go
- go.mod
- internal/validation/duplicates.go
- test/mocks/MockIMainStorage.go
🧰 Additional context used
🧬 Code Graph Analysis (2)
internal/validation/root_calculator.go (2)
internal/common/transaction.go (1)
Transaction
(16-51)internal/common/log.go (1)
Log
(15-31)
internal/orchestrator/validator.go (5)
internal/storage/connector.go (1)
IStorage
(67-71)internal/rpc/rpc.go (1)
IRPCClient
(42-54)internal/orchestrator/poller.go (2)
Poller
(22-35)NewBoundlessPoller
(50-72)internal/common/block.go (2)
BlockData
(61-66)Block
(8-33)internal/validation/root_calculator.go (2)
CalculateLogsBloom
(255-289)CalculateTransactionsRoot
(18-253)
🔇 Additional comments (2)
internal/orchestrator/validator.go (1)
102-137
: Well-implemented batch processing for fixing blocks.The method efficiently processes invalid blocks in configurable batches with proper error handling and logging.
internal/validation/root_calculator.go (1)
255-289
: Correct implementation of logs bloom calculation.The function properly converts logs to go-ethereum types and uses the standard bloom filter creation method.
TL;DR
Added a new
validate
andvalidateAndFix
commands to the indexer to verifiy and fix data integrity issues in the blockchain data.What changed?
validate
command to the CLI that performs data validationvalidateAndFix
command to the CLI that performs data validation and repairFindAndRemoveDuplicates
: Identifies and removes duplicate blocks, transactions, and logsFindAndFixGaps
: Detects and fills in missing blocks in a specified rangeValidateAndFixBlocks
: Verifies block integrity by checking transaction counts, logs bloom filters, and transaction rootsHow to test?
Run the validate command with optional batch size and fix batch size parameters:
Why make this change?
This validation tool helps maintain data integrity in the blockchain indexer by:
These validation capabilities are essential for ensuring the reliability and accuracy of the indexed blockchain data.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests