Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid panic when reading corrupt index in TSM file #24033

Open
wants to merge 2 commits into
base: gw-master-1.x-24033
Choose a base branch
from

Conversation

gggevorgyan
Copy link

@gggevorgyan gggevorgyan commented Jan 11, 2023

… corrupt file)

Added check on indexOfsStart, to make index slice length fit in int32, and count should have >0 value otherwise it is corruption.

  • CHANGELOG.md updated

  • [*] Rebased/mergable

  • [*] Tests pass

  • [*] Sign CLA (if not already signed)

  • Closes #

Required checklist

  • Sample config files updated (both /etc folder and NewDemoConfig methods) (influxdb and plutonium)
  • openapi swagger.yml updated (if modified API) - link openapi PR
  • Signed CLA (if not already signed)

Description

1-3 sentences describing the PR (or link to well written issue)

Context

Why was this added? What value does it add? What are risks/best practices?

Affected areas (delete section if not relevant):

List of user-visible changes. As a user, what would I need to see in docs?
Examples:
CLI commands, subcommands, and flags
API changes
Configuration (sample config blocks)

Severity (delete section if not relevant)

i.e., ("recommend to upgrade immediately", "upgrade at your leisure", etc.)

Note for reviewers:

Check the semantic commit type:

  • Feat: a feature with user-visible changes
  • Fix: a bug fix that we might tell a user “upgrade to get this fix for your issue”
  • Chore: version bumps, internal doc (e.g. README) changes, code comment updates, code formatting fixes… must not be user facing (except dependency version changes)
  • Build: build script changes, CI config changes, build tool updates
  • Refactor: non-user-visible refactoring
  • Check the PR title: we should be able to put this as a one-liner in the release notes

@gggevorgyan
Copy link
Author

Added two checks for indexOfsStart,and count against corruption, to avoid influxdb exit with panic. on start.
More details are here in issue #19916.
The following PR is opened for 1.8 branch only, but it would be good to consider to include fix other branches as well.

@@ -1233,6 +1233,10 @@ func (d *indirectIndex) UnmarshalBinary(b []byte) error {
return fmt.Errorf("indirectIndex: not enough data for index entries count")
}
count := int32(binary.BigEndian.Uint16(b[i : i+indexCountSize]))

if count <= 0 {
return fmt.Errorf("indirectIndex: the count should have >0 value")
Copy link
Contributor

@ayang64 ayang64 Jan 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better message might include the value that was passed in; something like:

fmt.Errorf("invalid index %d; count must be greater than 0", count)

Copy link
Author

@gggevorgyan gggevorgyan Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done , replaced with your suggested one .
Thank you @ayang64

Comment on lines +1237 to +1239
if count <= 0 {
return fmt.Errorf("invalid index %d; count must be greater than 0", count)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way for count to be less than 0. All possible values of a uint16 will fit in an int32 and since uint16 is only positive, count will never be less than 0.

count could be 0, though, and we definitely don't want to continue further in this case. It's possible count == 0 is valid, but having logged is good for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental solution for this case which I have considered was to add CRC for whole Index block as well, as it is done for other parts. But the solution obviously will break backward compatibility.
Are there any mechanism allowing to have support of different incompatible versions of *tsm file, with special hangling?
Another thing I have taken account during implementation is the following check in writer code tsdb/engine/tsm1/writer.go: 708 line , which skips count == 0 cases.

func (t *tsmWriter) WriteIndex() error {
	indexPos := t.n

	if t.index.KeyCount() == 0 {
		return ErrNoValues
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unlikely we will introduce an incompatible TSM format at this point in the product's lifecycle.

@@ -1350,7 +1354,7 @@ func (m *mmapAccessor) init() (*indirectIndex, error) {

indexOfsPos := len(m.b) - 8
indexStart := binary.BigEndian.Uint64(m.b[indexOfsPos : indexOfsPos+8])
if indexStart >= uint64(indexOfsPos) {
if indexStart >= uint64(indexOfsPos) || (uint64(indexOfsPos)-indexStart) > math.MaxInt32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added check ensures that index can be accessed using signed 32-bit offsets. Since int32 index offsets are used internally (see indirectIndex.UnmarshalBinary), I think this is a correct check. I would prefer a distinct error message for this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like the correct check. The error message (all the error messages in this method) should include the file name, which we have through m.f.Name()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most effective way to wrap the file name onto error massage is the line 240 in tsdb/engine/tsm1/reader.go.
What go you think @davidby-influx ?

index, err := t.accessor.init()
	if err != nil {
		**_return nil, err_**
	}

@gwossum gwossum changed the base branch from 1.8 to master-1.x February 16, 2024 21:54
@gwossum gwossum changed the base branch from master-1.x to gw-master-1.x-24033 February 16, 2024 22:50
@gwossum gwossum changed the title Fix for critical issue #19916 (influxdb server crashed when reading a… fix: avoid panic when reading corrupt index in TSM file Feb 16, 2024
@gwossum
Copy link
Member

gwossum commented Feb 16, 2024

@gggevorgyan I would like to merge this PR and for you to get credit for it, but we have some checks on commit messages that are failing. InfluxData requires that all commit messages meet semantic commit guidelines (https://www.conventionalcommits.org/en/v1.0.0/). Starting the commit message with fix: should work. Would it be possible for you to rewrite the commit messages and then force push those to your pull request branch? Thanks!

@gggevorgyan gggevorgyan force-pushed the 19916 branch 3 times, most recently from 9be14a8 to 5ee614f Compare February 19, 2024 10:42
…n reading a corrupt file)

Added check on indexOfsStart, to make index slice length fit in int32, and count should have >0 value otherwise it is corruption.
- [ ] CHANGELOG.md updated
- [*] Rebased/mergable
- [*] Tests pass
- [*] Sign [CLA](https://influxdata.com/community/cla/) (if not already signed)
@gggevorgyan
Copy link
Author

Thank you @gwossum,
I have added fix: prefixes to boith commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants