Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement zlib compression #1487

Merged
merged 97 commits into from
Dec 19, 2024
Merged

Implement zlib compression #1487

merged 97 commits into from
Dec 19, 2024

Conversation

joe-mann
Copy link
Contributor

@joe-mann joe-mann commented Oct 5, 2023

Description

Implemented the SQL compression protocol. This new feature is enabled by:

  • Adding compress=true in DSN.
  • cfg.Apply(Compress(True))

A new PR to revive, rebase, and complete #649.

Checklist

  • Code compiles correctly
  • Created tests which fail without the change (if possible)
  • All tests passing
  • Extended the README / documentation, if necessary
  • Added myself / the copyright holder to the AUTHORS file

Summary by CodeRabbit

  • New Features
    • Introduced zlib compression support in the MySQL driver for Go, including a compress parameter for compression modes.
    • Added new authors B Lamarche and Joe Mann to the project.
  • Bug Fixes
    • Improved error handling for connection issues and enhanced error messages during database connection attempts.
  • Documentation
    • Updated README.md to detail new compression features and parameters.
  • Tests
    • Enhanced benchmark and unit tests to cover new compression functionality and improved error handling.
  • Chores
    • Updated various internal components to support compression during MySQL communication, including packet reading/writing and connection handling.

Brigitte Lamarche and others added 30 commits August 11, 2017 15:38
* rows: implement driver.RowsColumnTypeScanType

Implementation for time.Time not yet complete!

* rows: implement driver.RowsColumnTypeNullable

* rows: move fields related code to fields.go

* fields: use NullTime for nullable datetime fields

* fields: make fieldType its own type

* rows: implement driver.RowsColumnTypeDatabaseTypeName

* fields: fix copyright year

* rows: compile time interface implementation checks

* rows: move tests to versioned driver test files

* rows: cache parseTime in resultSet instead of mysqlConn

* fields: fix string and time types

* rows: implement ColumnTypeLength

* rows: implement ColumnTypePrecisionScale

* rows: fix ColumnTypeNullable

* rows: ColumnTypes tests part1

* rows: use keyed composite literals in ColumnTypes tests

* rows: ColumnTypes tests part2

* rows: always use NullTime as ScanType for datetime

* rows: avoid errors through rounding of time values

* rows: remove parseTime cache

* fields: remove unused scanTypes

* rows: fix ColumnTypePrecisionScale implementation

* fields: sort types alphabetical

* rows: remove ColumnTypeLength implementation for now

* README: document ColumnType Support
AWS Aurora returns a 1290 after failing over requiring the connection to
be closed and opened again to be able to perform writes.
Most forks won't be in goveralls and so this command in travis.yml was,
previously, failing and causing the build to fail.

Now, it doesn't!
* Drop support for Go 1.6 and lower

* Remove cloneTLSConfig for legacy Go versions
…#623)

* Added support for custom string types.

* Add author name

* Added license header

* Added a newline to force a commit.

* Remove newline.
* Also add conversions for additional types in ConvertValue
  ref golang/go@d7c0de9
* Fixed broken import for appengine/cloudsql

appengine.go
import path of appengine/cloudsql has changed to google.golang.org/appengine/cloudsql - Fixed.

* Added my name to the AUTHORS
* Differentiate between BINARY and CHAR

When looking up the database type name, we now check the character set
for the following field types:
 * CHAR
 * VARCHAR
 * BLOB
 * TINYBLOB
 * MEDIUMBLOB
 * LONGBLOB

If the character set is 63 (which is the binary pseudo character set),
we return the binary names, which are (respectively):
 * BINARY
 * VARBINARY
 * BLOB
 * TINYBLOB
 * MEDIUMBLOB
 * LONGBLOB

If any other character set is in use, we return the text names, which
are (again, respectively):
 * CHAR
 * VARCHAR
 * TEXT
 * TINYTEXT
 * MEDIUMTEXT
 * LONGTEXT

To facilitate this, mysqlField has been extended to include a uint8
field for character set, which is read from the appropriate packet.

Column type tests have been updated to ensure coverage of binary and
text types.

* Increase test coverage for column types
* Fix prepared statement

When there are many args and maxAllowedPacket is not enough,
writeExecutePacket() attempted to use STMT_LONG_DATA even for
0byte string.
But writeCommandLongData() doesn't support 0byte data. So it
caused to send malfold packet.

This commit loosen threshold for using STMT_LONG_DATA.

* Change minimum size of LONG_DATA to 64byte

* Add test which reproduce issue 730

* TestPreparedManyCols test only numParams = 65535 case

* s/as possible//
@dveeden
Copy link
Contributor

dveeden commented Dec 3, 2024

Are you planning to make this work for both zlib and zstd? Maybe instead of compress=true it should be compress=zlib/compress=zstd/compress=zlib,zstd? Or do you plan on a separate compress-algorithm=zlib ?

@methane
Copy link
Member

methane commented Dec 3, 2024

In next release, no.
In next year, yes. But I don't want to introduce dependency that increase binary size.

This is API in my head:

import _ "github.com/go-sql-driver/mysql/zstd" // enable zstd compression

I will use https://github.com/klauspost/compress as a zstd library.

@dveeden
Copy link
Contributor

dveeden commented Dec 3, 2024

@methane it is ok to not include that now. But I think the options and/or API should be made with multiple compression algorithms in mind.

Using an import to enable zstd would make it a compile time option. Maybe some applications need this as runtime option.

@methane
Copy link
Member

methane commented Dec 3, 2024

Why do you think runtime option is needed?

I don't like thousands of options. After I understand that the option to disable zstd even if its available is needed for more than 1% users, I will add it.

@dveeden
Copy link
Contributor

dveeden commented Dec 3, 2024

@methane I agree that reducing options in general is better for users and may also remove complexity for developers.

However there are some options where the user may want to choose the compression algorithm. To make things worse there is also a compression level option in the protocol.

As example, this is what mysql (aka MySQL Client) does:

mysql --help
...
  -C, --compress      Use compression in server/client protocol.
...
  --compression-algorithms=name 
                      Use compression algorithm in server/client protocol.
                      Valid values are any combination of
                      'zstd','zlib','uncompressed'.
  --zstd-compression-level=# 
                      Use this compression level in the client/server protocol,
                      in case --compression-algorithms=zstd. Valid range is
                      between 1 and 22, inclusive. Default is 3.

I'm not saying we should do the same, especially as mysql was created when zlib was the only option and since had to add zstd.

Situations where a runtime option would be needed/useful:

  • Benchmarking. Using some workload with compression disabled, with zlib and with zstd. And then compare the performance of these 3 options. Besides comparing performance one may also want to compare the efficiency of the compression.
  • Compatibility. Maybe some servers don't work with a client that has both zlib and zstd enabled.
  • Balancing bandwidth and CPU: For large data imports one might want to limit the CPU overhead while still reducing the bandwidth. Here zlib might in some cases be better than zstd with the default compression level.

@methane
Copy link
Member

methane commented Dec 4, 2024

All of your use cases doesn't change my mind.
And zstd support is off topic.

@methane methane marked this pull request as ready for review December 14, 2024 09:54
@methane methane requested review from shogo82148 and removed request for dolmen December 14, 2024 09:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (16)
compress.go (1)

56-56: Fix typo in comment: 'chuecksum' should be 'checksum'

Apply this diff to correct the typo:

-	err = zr.Close()         // zr.Close() may return chuecksum error.
+	err = zr.Close()         // zr.Close() may return checksum error.
packets.go (2)

Line range hint 432-438: Ensure sequence synchronization by calling mc.syncSequence() after writing command packets

In writeCommandPacket, mc.syncSequence() is not called after mc.writePacket(data). This may lead to sequence number mismatches when compression is enabled.

Apply this diff to synchronize sequences:

 func (mc *mysqlConn) writeCommandPacket(command byte) error {
 	// Reset Packet Sequence
 	mc.resetSequence()

 	data, err := mc.buf.takeSmallBuffer(4 + 1)
 	if err != nil {
 		return err
 	}

 	// Add command byte
 	data[4] = command

 	// Send CMD packet
-	return mc.writePacket(data)
+	err = mc.writePacket(data)
+	mc.syncSequence()
+	return err
 }

Line range hint 470-480: Ensure sequence synchronization by calling mc.syncSequence() after writing command packets

Similarly, in writeCommandPacketUint32, consider calling mc.syncSequence() after writing the packet to maintain proper sequence numbers when compression is enabled.

Apply this diff to synchronize sequences:

 func (mc *mysqlConn) writeCommandPacketUint32(command byte, arg uint32) error {
 	// Reset Packet Sequence
 	mc.resetSequence()

 	data, err := mc.buf.takeSmallBuffer(4 + 1 + 4)
 	if err != nil {
 		return err
 	}

 	// Add command byte
 	data[4] = command

 	// Add arg [32 bit]
 	data[5] = byte(arg)
 	data[6] = byte(arg >> 8)
 	data[7] = byte(arg >> 16)
 	data[8] = byte(arg >> 24)

 	// Send CMD packet
-	return mc.writePacket(data)
+	err = mc.writePacket(data)
+	mc.syncSequence()
+	return err
 }
compress_test.go (2)

37-51: Simplify error handling logic

The error handling can be made more explicit and clearer.

Consider this improvement:

func uncompressHelper(t *testing.T, mc *mysqlConn, compressedPacket []byte) []byte {
	conn := new(mockConn)
	conn.data = compressedPacket
	mc.netConn = conn

	uncompressedPacket, err := mc.readPacket()
-	if err != nil {
-		if err != io.EOF {
-			t.Fatalf("non-nil/non-EOF error when reading contents: %s", err.Error())
-		}
+	if err != nil && err != io.EOF {
+		t.Fatalf("unexpected error when reading contents: %v", err)
	}
	return uncompressedPacket
}

59-119: Consider adding edge cases to test suite

The test suite is comprehensive but could benefit from additional edge cases:

  • Empty packet (0 bytes)
  • Maximum packet size (16MB)
  • Packets with special characters or null bytes
  • Packets with repeated patterns that test compression efficiency

Example additional test cases:

 tests := []struct {
 	uncompressed []byte
 	desc         string
 }{
+	{uncompressed: []byte{},
+		desc: "empty packet"},
+	{uncompressed: make([]byte, 1<<24-1),
+		desc: "max packet size"},
+	{uncompressed: bytes.Repeat([]byte{0}, 1000),
+		desc: "null bytes"},
+	{uncompressed: []byte("Hello\x00World\n\r\t"),
+		desc: "special characters"},
 	// ... existing test cases ...
 }
const.go (1)

14-14: Improve documentation for debug constant

The debug constant's purpose and impact should be better documented. Consider adding:

  • What debugging information is exposed when enabled
  • Security implications of enabling it in production
  • Where the debug output is directed
-	debug = false // for debugging. Set true only in development.
+	debug = false // Controls detailed logging of compression operations.
+	              // WARNING: Setting this to true in production may impact
+	              // performance and expose sensitive information in logs.
+	              // Use only during development.
buffer.go (2)

18-21: Enhance readerFunc documentation

The documentation for readerFunc could be more comprehensive.

-// readerFunc is a function that compatible with io.Reader.
-// We use this function type instead of io.Reader because we want to
-// just pass mc.readWithTimeout.
+// readerFunc is a function type that implements the reading behavior of io.Reader.
+// It is used instead of io.Reader interface to allow direct usage of
+// connection-specific read methods (like readWithTimeout) without wrapper types.
+// This improves performance by avoiding interface allocations and indirection.

Line range hint 46-85: Consider adding error documentation for fill method

The fill method should document possible error types it can return.

-func (b *buffer) fill(need int, r readerFunc) error {
+// fill reads into the buffer until at least need bytes are available.
+// It returns io.ErrUnexpectedEOF if EOF is encountered before need bytes are read,
+// io.EOF if exactly need bytes are read before EOF, or any other error encountered
+// during reading.
+func (b *buffer) fill(need int, r readerFunc) error {
infile.go (1)

Line range hint 1-24: Consider adding compression-related documentation

Since this file handles data transmission, it would be helpful to add documentation about how compression affects the LOCAL INFILE protocol.

connection_test.go (1)

22-22: Consider adding compression-specific test cases

While the buffer initialization changes are correct, we should add test cases specifically for compression scenarios.

Consider adding tests for:

  1. Compressed packet handling
  2. Sequence synchronization with compression enabled
  3. Edge cases with large compressed packets

Also applies to: 42-42, 69-69, 86-86, 102-102, 164-164, 181-181

packets_test.go (1)

Line range hint 276-306: Consider consolidating buffer resets

The test has multiple buffer reset operations that could be consolidated into a helper function to improve maintainability.

+func resetTestBuffer(mc *mysqlConn) {
+    mc.buf = newBuffer()
+    mc.sequence = 0
+}
+
 // reset
 conn.closed = false
 conn.reads = 0
-mc.sequence = 0
-mc.buf = newBuffer()
+resetTestBuffer(mc)
benchmark_test.go (1)

70-72: Consider adding compression level benchmarks

While the basic compression benchmark is good, consider adding benchmarks with different compression levels to help users optimize their configuration.

+func BenchmarkQueryCompressionLevels(b *testing.B) {
+    levels := []int{1, 3, 6, 9}  // Different compression levels
+    for _, level := range levels {
+        b.Run(fmt.Sprintf("level_%d", level), func(b *testing.B) {
+            benchmarkQueryHelper(b, true)  // TODO: Add compression level parameter
+        })
+    }
+}
dsn.go (1)

542-546: Consider improving the error message for invalid compression values.

The error handling is correct, but the message could be more helpful by explicitly stating the valid values.

-				return errors.New("invalid bool value: " + value)
+				return errors.New("invalid value for compress: " + value + " (valid values: true, false, 1, 0)")
utils.go (1)

493-503: Consider enhancing the documentation for 24-bit integer functions.

While the implementation is correct, the documentation could be more detailed to explain the byte order and usage context.

-// 24bit integer: used for packet headers.
+// putUint24 writes a 24-bit integer in little-endian byte order.
+// It is used for MySQL packet headers where the first 3 bytes represent
+// the payload length.
 
 func putUint24(data []byte, n int) {
     data[2] = byte(n >> 16)
     data[1] = byte(n >> 8)
     data[0] = byte(n)
 }
 
+// getUint24 reads a 24-bit integer in little-endian byte order.
+// It is used for MySQL packet headers where the first 3 bytes represent
+// the payload length.
 func getUint24(data []byte) int {
     return int(data[2])<<16 | int(data[1])<<8 | int(data[0])
 }
README.md (2)

41-41: Fix markdown list indentation.

The feature list item should maintain consistent indentation with other items.

-  * Supports zlib compression.
+* Supports zlib compression.
🧰 Tools
🪛 Markdownlint (0.37.0)

41-41: Expected: 0; Actual: 2
Unordered list indentation

(MD007, ul-indent)


273-279: Add language identifier to the code block.

The code block should specify the language for proper syntax highlighting.

-```
+```properties
 Type:           bool
 Valid Values:   true, false
 Default:        false
🧰 Tools
🪛 Markdownlint (0.37.0)

273-273: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77d86ec and d58a709.

📒 Files selected for processing (17)
  • .github/workflows/test.yml (1 hunks)
  • AUTHORS (2 hunks)
  • README.md (2 hunks)
  • benchmark_test.go (7 hunks)
  • buffer.go (5 hunks)
  • compress.go (1 hunks)
  • compress_test.go (1 hunks)
  • connection.go (3 hunks)
  • connection_test.go (7 hunks)
  • connector.go (2 hunks)
  • const.go (1 hunks)
  • driver_test.go (3 hunks)
  • dsn.go (4 hunks)
  • infile.go (1 hunks)
  • packets.go (13 hunks)
  • packets_test.go (8 hunks)
  • utils.go (1 hunks)
🧰 Additional context used
🪛 Markdownlint (0.37.0)
README.md

41-41: Expected: 0; Actual: 2
Unordered list indentation

(MD007, ul-indent)


273-273: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 golangci-lint (1.62.2)
compress_test.go

20-20: Error return value of rand.Read is not checked

(errcheck)

compress.go

171-171: Error return value is not checked

(errcheck)

driver_test.go

182-182: Error return value of db.Exec is not checked

(errcheck)


190-190: Error return value of db2.Exec is not checked

(errcheck)


198-198: Error return value of db3.Exec is not checked

(errcheck)

🔇 Additional comments (18)
connection.go (2)

31-41: LGTM: Addition of compression-related fields in mysqlConn struct

The added fields compIO, compressSequence, and compress correctly integrate compression handling into the connection struct.


67-85: Proper implementation of timeout handling in readWithTimeout and writeWithTimeout

The functions correctly set read and write deadlines based on the configuration, enhancing robustness.

.github/workflows/test.yml (1)

86-86: LGTM: Increasing max_allowed_packet to 48MB

The increase is appropriate to support larger packet sizes required by the zlib compression implementation.

compress_test.go (2)

18-22: ⚠️ Potential issue

Handle error returned by rand.Read

The function does not check the error returned by rand.Read. Ignoring this error can lead to unexpected behavior if the random number generator fails.

Apply this fix:

func makeRandByteSlice(size int) []byte {
	randBytes := make([]byte, size)
-	rand.Read(randBytes)
+	if _, err := rand.Read(randBytes); err != nil {
+		panic(fmt.Sprintf("failed to generate random bytes: %v", err))
+	}
	return randBytes
}
🧰 Tools
🪛 golangci-lint (1.62.2)

20-20: Error return value of rand.Read is not checked

(errcheck)


24-35: ⚠️ Potential issue

Add sequence number synchronization

The function should call mc.syncSequenceNr() after writing the packet to maintain proper sequence numbering, as mentioned in previous reviews.

Apply this fix:

func compressHelper(t *testing.T, mc *mysqlConn, uncompressedPacket []byte) []byte {
	conn := new(mockConn)
	mc.netConn = conn

	err := mc.writePacket(append(make([]byte, 4), uncompressedPacket...))
	if err != nil {
		t.Fatal(err)
	}
+	mc.syncSequenceNr()

	return conn.written
}

Likely invalid or redundant comment.

infile.go (1)

175-175: Verify sequence synchronization with compression enabled

The addition of syncSequence() is essential for maintaining proper packet sequencing when compression is enabled. However, we should verify its behavior in edge cases.

✅ Verification successful

Based on the search results, I can now generate the final response:

Sequence synchronization is correctly implemented

The syncSequence() call at line 175 is correctly placed and follows the established pattern in the codebase. Key observations:

  • It's called after writing the final packet but before reading the response, consistent with other implementations
  • The pattern writePacket() followed by syncSequence() is used in multiple places for maintaining proper packet ordering
  • The compression-enabled path properly handles sequence synchronization through the writePackets abstraction
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential race conditions or edge cases in sequence handling
# Look for other places where sequence synchronization might be needed

# Search for packet write operations that might need sequence sync
rg -A 3 "writePacket|readPacket" 

# Look for compression-related sequence handling
ast-grep --pattern 'syncSequence()' 

Length of output: 5728

AUTHORS (1)

24-24: LGTM! Author entries are properly formatted and ordered

The new author entries follow the file's guidelines and are correctly placed in alphabetical order.

Also applies to: 66-66

connector.go (2)

169-172: LGTM! Proper compression initialization

The compression initialization correctly checks both the client configuration and server capabilities before enabling compression. This follows the MySQL protocol specification for compression negotiation.


130-130: Verify buffer initialization without netConn

The buffer is now initialized without the network connection. Ensure that error handling for buffer operations is properly implemented throughout the codebase.

✅ Verification successful

Let me gather more context about the buffer implementation and usage to make a more informed decision.


Let me check how the buffer is used with the network connection and error handling patterns.


From the gathered information, I can now provide the final response:

Buffer initialization change is safe and well-implemented

The change to initialize the buffer without netConn is actually an improvement in design. The buffer struct is a simple wrapper around byte slices with proper initialization in newBuffer(). Error handling is comprehensive throughout the codebase as evidenced by the numerous error checks found. The buffer operations are properly decoupled from the network connection, with errors being consistently propagated up the call stack.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for error handling patterns around buffer operations
rg -A 3 "mc\.buf\." | grep -E "if.*err|return.*err"

Length of output: 536


Script:

#!/bin/bash
# Check buffer implementation and its error handling
ast-grep --pattern 'func newBuffer() $_' -A 10

# Find buffer read/write operations
rg -A 3 "mc\.buf\.(Read|Write|Flush)"

Length of output: 711


Script:

#!/bin/bash
# Check buffer usage with network connection
rg -A 5 "type buffer struct" 
rg -A 5 "func \(b \*buffer\) (Read|Write|fill)"

# Check how errors are propagated from buffer operations
ast-grep --pattern 'if err := $_.buf.$_; err != nil {
  $$$
}'

Length of output: 687

packets_test.go (1)

Line range hint 101-117: LGTM! Consistent buffer initialization in tests

The buffer initialization has been updated consistently with the new pattern across test cases while maintaining the original test logic.

benchmark_test.go (1)

49-54: LGTM! Clean compression parameter handling

The compression parameter is cleanly integrated into the DSN string construction while maintaining backward compatibility.

dsn.go (3)

76-79: LGTM! Well-structured addition of the compress field.

The placement in the unexported fields section and documentation are appropriate.


130-136: LGTM! Clean implementation of the EnableCompression option.

The function follows the established functional options pattern consistently.


310-312: LGTM! Consistent DSN formatting for compression.

The implementation follows the established pattern for boolean parameters in the DSN string.

driver_test.go (4)

150-154: Improved error message format for connection failures.

The error message now includes the DSN in the error output, which helps with debugging connection issues.


167-174: New test case setup for compression feature.

Added test connection with compression enabled via compress=true DSN parameter. This aligns with the PR objective of implementing zlib compression.


195-201: Test coverage for compression feature.

Good addition of test coverage for the new compression feature. The test ensures that the same test cases work with compression enabled.

🧰 Tools
🪛 golangci-lint (1.62.2)

198-198: Error return value of db3.Exec is not checked

(errcheck)


981-983: Improved error handling in datetime tests.

The changes add proper error handling for row scanning operations in datetime tests.

Also applies to: 988-990

compress.go Outdated Show resolved Hide resolved
compress.go Outdated Show resolved Hide resolved
driver_test.go Show resolved Hide resolved
@methane
Copy link
Member

methane commented Dec 18, 2024

@shogo82148 Do you have time to review this in this year?
I want to merge this and wait for a month before releasing 1.9.0. Some users will use master branch.

Copy link
Contributor

@shogo82148 shogo82148 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@methane methane merged commit 3348e57 into go-sql-driver:master Dec 19, 2024
38 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Dec 19, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Compression Mode