Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to reuse batch if append failed #1223

Open
JILeXanDR opened this issue Mar 6, 2024 · 4 comments
Open

Allow to reuse batch if append failed #1223

JILeXanDR opened this issue Mar 6, 2024 · 4 comments

Comments

@JILeXanDR
Copy link

JILeXanDR commented Mar 6, 2024

Describe the bug

Steps to reproduce

  1. Prepare valid and bad data in any order
  2. Prepare batch
  3. Call AppendStruct in loop
  4. See logs

Expected behaviour

Currently it's not possible to understand what row is corrupted, even if 1 row of 10000 will have some invalid value, it affects all next data in the current batch.

There are 2 possible ways to give some flexibility:

  1. when AppendStruct detects any problem with current struct data it returns an error and doesn't append corrupted values to batch, developer must decide what to do with that error on his own, but it should be possible to continue and skip that rows

  2. when AppendStruct detects any problem with current struct data it returns an error, appends row like now, but any next calls of AppendStruct with valid data will be succeed

Code example

func AppendStructWithBadData() error {
	conn, err := GetNativeConnection(nil, nil, nil)
	if err != nil {
		return err
	}
	ctx := context.Background()
	defer func() {
		conn.Exec(ctx, "DROP TABLE example")
	}()
	if err := conn.Exec(ctx, `DROP TABLE IF EXISTS example`); err != nil {
		return err
	}
	if err := conn.Exec(ctx, `
		CREATE TABLE example (
			  Col1 String
			, Col2 DateTime
		) Engine = Memory
		`); err != nil {
		return err
	}

	batch, err := conn.PrepareBatch(context.Background(), "INSERT INTO example")
	if err != nil {
		return err
	}

	data := []struct {
		Col1 string
		Col2 time.Time
	}{
		{
			Col1: "valid data", // no error
			Col2: time.Now(),
		},
		{
			Col1: "bad data", // error=clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59
			Col2: time.Time{},
		},
		{
			Col1: "valid data", // error=clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59: clickhouse: batch is invalid. check appended data is correct
			Col2: time.Now(),
		},
	}

	for i, r := range data {
		err := batch.AppendStruct(&r)
		if err != nil {
			fmt.Printf("AppendStruct failed: index=%d, error=%+v\n", i, err.Error())
		} else {
			fmt.Printf("AppendStruct succed: index=%d\n", i)
		}
	}

	fmt.Printf("send batch: rows=%d\n", batch.Rows())

	return batch.Send()
}

Error log

AppendStruct succed: index=0
AppendStruct failed: index=1, error=clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59
AppendStruct failed: index=2, error=clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59: clickhouse: batch is invalid. check appended data is correct

send batch: rows=2

        	Error:      	Received unexpected error:
        	            	clickhouse: batch is invalid. check appended data is correct
        	            	clickhouse: dateTime overflow. Col2 must be between 1970-01-01 00:00:00 and 2105-12-31 23:59:59

Configuration

Environment

  • Client version:
  • Language version:
  • OS:
  • Interface: ClickHouse API / database/sql compatible driver

ClickHouse server

  • ClickHouse Server version:
  • ClickHouse Server non-default settings, if any:
  • CREATE TABLE statements for tables involved:
  • Sample data for all these tables, use clickhouse-obfuscator if necessary
@JILeXanDR JILeXanDR added the bug label Mar 6, 2024
@jkaflik
Copy link
Contributor

jkaflik commented Mar 6, 2024

Hi @JILeXanDR

This is the current behavior of batch insertion. On the very first error for data append, the connection is released, and all next append calls will return the previous error.

This error is intentionally wrapped with

ErrBatchInvalid = errors.New("clickhouse: batch is invalid. check appended data is correct")

I agree it might be counter-intuitive, but also, I don't see a good solution here. There might be client-side data validation, where we can recover (like in this case), however we cannot recover for errors sent from ClickHouse.

@jkaflik jkaflik closed this as completed Mar 7, 2024
@jkaflik jkaflik reopened this Mar 7, 2024
@yujiarista
Copy link
Contributor

however we cannot recover for errors sent from ClickHouse.

But any error from Append is local rather than from Clickhouse right?

@jkaflik
Copy link
Contributor

jkaflik commented Mar 25, 2024

@yujiarista @JILeXanDR
at the end of day, I agree this should not break batch. This requires enhancement.

@jkaflik jkaflik removed the bug label Mar 27, 2024
@jkaflik jkaflik added this to the v3.0.0-WIP milestone Mar 27, 2024
@jkaflik
Copy link
Contributor

jkaflik commented Mar 27, 2024

@yujiarista @JILeXanDR I had a look on this today and found a discussion (I forgot about it 🤦 ) we already had in the past on this: #655

tl;dr given columnar append, it's not trivial to guarantee reusable batch without data corruption. I still agree it needs to be enhanced, but not sooner than in v3.

@jkaflik jkaflik changed the title Strange behaviour of AppendStruct with partially bad data Allow to reuse batch if append failed Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants