Replace univocity-parsers with FastCSV #4606

vdmitrienko · 2025-06-01T18:53:48Z

Overview

#4339

I hereby agree to the terms of the JUnit Contributor License Agreement.

Definition of Done

There are no TODOs left in the code
Method preconditions are checked and documented in the method's Javadoc
Coding conventions (e.g. for logging) have been followed
Change is covered by automated tests including corner cases, errors, and exception handling
Public API has Javadoc and @API annotations
Change is documented in Release Notes

marcphilipp

This looks very promising! 👍

gradle/libs.versions.toml

junit-jupiter-params/junit-jupiter-params.gradle.kts

...ooling-support-tests/src/test/java/platform/tooling/support/tests/ModularUserGuideTests.java

platform-tooling-support-tests/platform-tooling-support-tests.gradle.kts

marcphilipp · 2025-06-02T06:48:51Z

documentation/src/docs/asciidoc/release-notes/release-notes-6.0.0-M1.adoc

+* The `CsvFileSource.lineSeparator()` parameter is deprecated because line separators
+  are now detected automatically during CSV parsing. This setting is no longer required
+  and will be ignored.


Does auto-detection work in all cases? What happens if \n is used in a cell like in the following example with 4 columns?

a;b;\n c;d\r\n e;f;g;h\r\n

(assuming \n and \r are replaced with the corresponding character)

Does auto-detection work in all cases?

The auto-detection treats each of \r, \n, and \r\n as a line separator. For example, given the following input:

a;b\r c;d\n e;f\r\n g;h

The result is:

[["a", "b"], ["c", "d"], ["e", "f"], ["g", "h"]]

In contrast, univocity-parsers (when configured with \n as the line separator) produces different results:

["a", "b\rc", "d"], ["e", "f"], ["g", "h"]

What happens if \n is used in a cell like in the following example with 4 columns?

In this case, the results from FastCSV and univocity-parsers are mostly similar.

FastCSV:

[["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]]

univocity-parsers:

// .lineSeparator("\n") - same as FastCSV [["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]] // .lineSeparator("\r\n") - same as FastCSV [["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]] // .lineSeparator("\r") - differs from FastCSV [["a", "b", "c", "d"], ["e", "f", "g", "h"], [null]]

I’m afraid this breaks compatibility if someone uses a character sequence as a line delimiter that is not a newline.

So, considering 3 possible scenarios, all of them imply a breaking change 😞

User explicitly relies on \r\n as the line separator:
\r - causes an unexpected line break;
\n - causes an unexpected line break;

User explicitly relies on \r as the line separator:
\n - causes an unexpected line break;
\r\n - no change, since \r is already interpreted as a line break;

User explicitly relies on \n as the line separator:
\r - causes an unexpected line break;
\r\n - no change, since \n is already interpreted as a line break;

@osiegmar, would it be possible to add support for a lineSeparator() parameter in FastCSV?

Potentially, yes. Of course, this wouldn’t be a valid CSV file at all. Is this really a desired feature or just lack of specification/documentation and a good chance to change that with the new major version of JUnit?

Is there a (good) reason, someone separates text records by anything that is not a newline sequence?

Is there known usage of this?

I think dropping this in the new major version makes sense. I'm not aware of any cases other than using the same line separator on different operating systems. IIRC I initially introduced it because univocity-parsers would use the system line separator (and only that) be default.

I think dropping this in the new major version makes sense.

Great 👍
I'll make sure to clarify that in the release notes. Adding a few tests wouldn't hurt either.

Just to clarify, do we intend to remove lineSeparator() entirely, or simply deprecate it? I initially considered deprecation, but now I’m not sure it’s appropriate here, since deprecation generally implies the feature will continue to function as before, which is not the case.

marcphilipp · 2025-06-02T06:51:43Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

 			}
+			return String.join("\n", csvSource.value());


Does FastCSV provide an API for line-by-line reading so we don't have to create a string first? It's probably not a big deal since it comes from literals in an annotation.

With osiegmar/FastCSV@1077389 there is one now. @vdmitrienko You may want to give it a try if it simplify things for you.

Thanks, @osiegmar. This works well with individual strings, but it doesn't support headers. I think adding an overload that accepts an array (or varargs) of strings could simplify this use case:

build(final CsvCallbackHandler<T> callbackHandler, final String... data)

Regarding the validation of empty records, having a setting for that could be quite handy. It would also be great if the exception message included the index of the empty record. That said, we could also handle this on our side 🙂

Got it. In that case, I'd rather stick to the current String.join() implementation. The effort for this extra build-method (bigger than initially thought) doesn't seem to match this edge-case use. I also highly doubt that it would improve performance.

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

osiegmar · 2025-06-06T04:10:46Z

jupiter-tests/jupiter-tests.gradle.kts

@@ -24,6 +24,7 @@ dependencies {
 	testImplementation(libs.kotlinx.coroutines)
 	testImplementation(libs.groovy4)
 	testImplementation(libs.memoryfilesystem)
+	testImplementation(libs.fastcsv)


still necessary?

Since the library is now shadowed, this is no longer necessary 👍

osiegmar · 2025-06-06T04:13:59Z

junit-jupiter-params/LICENSE-fastcsv.md

@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2021 Oliver Siegmar


Just noticed that this hasn't been updated for a while. Now, I did.

Is this part of FastCSV's jar file? If so, we should probably extract it from there to avoid it becoming outdated.

Just added META-INF/LICENSE

@vdmitrienko Please let me know if you need help with Gradle to achieve that! 🙂

@marcphilipp, please, take a look at d5139e2

osiegmar · 2025-06-06T04:25:52Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvReaderFactory.java

+	static CsvReader<? extends CsvRecord> createReaderFor(CsvSource csvSource, String data) {
+		String delimiter = selectDelimiter(csvSource.delimiter(), csvSource.delimiterString());
+		// @formatter:off
+		CsvReader.CsvReaderBuilder builder = CsvReader.builder()


As you rely on skipEmptyLines(true) (current default), you may want to set that explicitly.

Good point. Updated

osiegmar · 2025-06-06T04:25:58Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvReaderFactory.java

+
+		String delimiter = selectDelimiter(csvFileSource.delimiter(), csvFileSource.delimiterString());
+		// @formatter:off
+		CsvReader.CsvReaderBuilder builder = CsvReader.builder()


As you rely on skipEmptyLines(true) (current default), you may want to set that explicitly.

osiegmar · 2025-06-06T04:34:40Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvReaderFactory.java

+
+		// @formatter:off
+		if (useHeadersInDisplayName) {
+			return NamedCsvRecordHandler.builder()


Current implementation allows duplicate header names. For compatibility, add allowDuplicateHeaderFields(true).

osiegmar · 2025-06-06T05:23:10Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

+		Object[] arguments = new Object[record.getFields().size()];
+
+		for (int i = 0; i < record.getFields().size(); i++) {
+			String field = record.getFields().get(i);


You should either call getFields() only once (as it constructs new objects internally) or (preferably) use the methods getFieldCount() and getField(int).

Thanks! Updated

osiegmar · 2025-06-06T05:29:47Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

@@ -147,6 +103,10 @@ static Arguments processCsvRecord(@Nullable String[] csvRecord, Set<String> null
 		return Named.of(name, column);
 	}

+	private static List<String> getHeaders(CsvRecord record) {
+		return ((NamedCsvRecord) record).getHeader();


Previously, the header field names were trimmed.

I also just noticed, that empty header fields (like in foo,,bar) are causing NullPointerExceptions with the univocity implementation. I also noticed that while the emptyValue is applied to header fields, nullValues are not.

The new implementation currently uses the fieldModifier also for the header record and would produce unexpected results (NULL_MARKER, nullValues, emptyValue).

Previously, the header field names were trimmed.

Currently, they are still trimmed because header fields are treated as regular fields, meaning that ignoreLeadingAndTrailingWhitespace() (default true) applies to them as well.

The new implementation currently uses the fieldModifier also for the header record and would produce unexpected results (NULL_MARKER, nullValues, emptyValue).

Thanks for the observation! Now it applies to headers consistently as well.

I believe treating headers as regular fields would be the simplest and most reliable solution, this way, attributes like ignoreLeadingAndTrailingWhitespace(), nullValues(), etc. apply uniformly to both headers and records.

@marcphilipp @osiegmar WDYT?

Previously, the header field names were trimmed.

Currently, they are still trimmed because header fields are treated as regular fields, meaning that ignoreLeadingAndTrailingWhitespace() (default true) applies to them as well.

Regardless of the setting ignoreLeadingAndTrailingWhitespace and regardless of whether quoted or not, headers were always trimmed:

junit5/junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

Lines 107 to 112 in b580aa9

// Cannot get parsed headers until after parsing has started.

static String[] getHeaders(CsvParser csvParser) {

return Arrays.stream(csvParser.getContext().parsedHeaders())//

.map(String::trim)//

.toArray(String[]::new);

}

I believe treating headers as regular fields would be the simplest and most reliable solution, this way, attributes like ignoreLeadingAndTrailingWhitespace(), nullValues(), etc. apply uniformly to both headers and records.

@marcphilipp @osiegmar WDYT?

I believe the same, just wanted to point out that difference.

osiegmar · 2025-06-06T06:01:26Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

 			}
+			return String.join("\n", csvSource.value());


Got it. In that case, I'd rather stick to the current String.join() implementation. The effort for this extra build-method (bigger than initially thought) doesn't seem to match this edge-case use. I also highly doubt that it would improve performance.

…-parsers-with-fastcsv # Conflicts: # junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

osiegmar · 2025-06-08T04:25:45Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

+			for (int i = 0; i < csvSource.value().length; i++) {
+				if (csvSource.value()[i].isEmpty()) {
+					int finalI = i;
+					Preconditions.condition(!csvSource.value()[i].isEmpty(), //
+						() -> "CSV record at index %d is empty".formatted(finalI + 1) //
+					);
 				}
-				Preconditions.notNull(csvRecord,
-					() -> "Record at index " + index + " contains invalid CSV: \"" + input + "\"");
-				argumentsList.add(processCsvRecord(csvRecord, nullValues, useHeadersInDisplayName, headers));
 			}


I'm curious: why are empty strings in the value array handled as an error? In text blocks empty lines are simply skipped. I also couldn't find something in the javadoc about it. Seems like an unnecessary difference.

This behavior originates from univocity parseLine() implementation, which we use to parse individual lines: it returns null in case a single line is empty. For text blocks, however, we use a different method: parseAll(), which simply discards empty lines.
I preserved this behavior for the compatibility sake, but seems like we could actually drop it and unify the handling of value arrays and text blocks.

osiegmar · 2025-06-08T04:46:11Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

@@ -147,6 +103,10 @@ static Arguments processCsvRecord(@Nullable String[] csvRecord, Set<String> null
 		return Named.of(name, column);
 	}

+	private static List<String> getHeaders(CsvRecord record) {
+		return ((NamedCsvRecord) record).getHeader();


Previously, the header field names were trimmed.

Currently, they are still trimmed because header fields are treated as regular fields, meaning that ignoreLeadingAndTrailingWhitespace() (default true) applies to them as well.

Regardless of the setting ignoreLeadingAndTrailingWhitespace and regardless of whether quoted or not, headers were always trimmed:

junit5/junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

Lines 107 to 112 in b580aa9

// Cannot get parsed headers until after parsing has started.

static String[] getHeaders(CsvParser csvParser) {

return Arrays.stream(csvParser.getContext().parsedHeaders())//

.map(String::trim)//

.toArray(String[]::new);

}

I believe treating headers as regular fields would be the simplest and most reliable solution, this way, attributes like ignoreLeadingAndTrailingWhitespace(), nullValues(), etc. apply uniformly to both headers and records.

@marcphilipp @osiegmar WDYT?

I believe the same, just wanted to point out that difference.

osiegmar · 2025-06-08T04:54:16Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

 			if (useHeadersInDisplayName) {
-				column = asNamed(requireNonNull(headers)[i] + " = " + column, column);
+				String header = resolveNullMarker(getHeaders(record).get(i));


Didn't see this before: getHeaders (which calls getHeader on NamedCsvRecord) should also better not called in a loop - it creates objects.

Missed that. Updated

osiegmar · 2025-06-09T19:07:54Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvReaderFactory.java

+	static void validate(CsvSource csvSource) {
+		validateMaxCharsPerColumn(csvSource.maxCharsPerColumn());
+		validateDelimiter(csvSource.delimiter(), csvSource.delimiterString(), csvSource);
+	}
+
+	static void validate(CsvFileSource csvFileSource) {
+		validateMaxCharsPerColumn(csvFileSource.maxCharsPerColumn());
+		validateDelimiter(csvFileSource.delimiter(), csvFileSource.delimiterString(), csvFileSource);
+	}


A validation should be added to ensure that the (now deprecated) lineSeparator contains only \r\n, \r, or \n.

vdmitrienko added 15 commits June 1, 2025 20:46

Remove univocity-parsers license

6287630

Do not create a shadow jar from com.univocity

7d98380

Rework arguments providers to use FastCSV

aabcd60

test: Update expected root cause exceptions

fd33e6f

test: Update expected message on empty CSV

1dc063f

test: Cover additional cases for empty values

b8efe07

Move "either value or textBlock" validation to getData(CsvSource)

f62814c

Deprecate CsvFileSource.lineSeparator as it's now detected automatically

e768e7a

test: Remove CsvFileSource.lineSeparator() usages

72ae172

CsvReaderFactory: set "since" to 6.0

c96e787

Preserve the original validation order

3d3cadf

Formatting

b88e12d

ModularUserGuideTests: require de.siegmar.fastcsv module

cc7fd8c

platform-tooling-support-tests: add FastCSV dependency

ff2691c

Add release notes

cd3cc0e

vdmitrienko mentioned this pull request Jun 1, 2025

Replace univocity-parsers with FastCSV #4339

Open

1 task

marcphilipp reviewed Jun 2, 2025

View reviewed changes

test: use CsvParseException import instead of a fully qualified name

4eaba09

rolnico mentioned this pull request Jun 3, 2025

Replace univocity-parsers with FastCSV powsybl/powsybl-core#3463

Open

vdmitrienko added 5 commits June 3, 2025 19:20

Use condition() instead of creating PreconditionViolationException

7661c96

Respect alphabetical order in libs.versions.toml

59effbb

Shadow FastCSV

bfe368c

Remove the no longer used extraJavaModuleInfo plugin

b913ee3

Updates according to the recent changes in FastCSV snapshot

61350e4

vdmitrienko requested a review from osiegmar June 5, 2025 20:16

osiegmar reviewed Jun 6, 2025

View reviewed changes

vdmitrienko added 4 commits June 7, 2025 15:54

Merge branch 'refs/heads/main' into junit-team#4339-replace-univocity…

0de1857

…-parsers-with-fastcsv # Conflicts: # junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

Avoid calling getFields() to prevent unnecessary object creation

43cd5c6

Remove univocity.parsers module option from documentation.gradle.kts

50acf31

Set allowDuplicateHeaderFields explicitly to avoid relying on defaults

e2ac4f4

vdmitrienko added 7 commits June 7, 2025 22:05

Set skipEmptyLines explicitly to avoid relying on defaults

62bddca

Move common reader settings to constants

5884015

Remove no longer necessary testImplementation(libs.fastcsv) dependency

9cfea34

Resolve NULL_MARKER for headers

1eea9e6

Mention in release notes that annotation parameters now apply to headers

2445030

Apply spotless

ed6e8d4

Ignore shadowed classes in avoidAccessingStandardStreams() arch test

954f548

osiegmar reviewed Jun 8, 2025

View reviewed changes

vdmitrienko added 5 commits June 8, 2025 09:27

Avoid calling getHeader() to prevent unnecessary object creation

2d62d8a

Replace explicit variable types with var where appropriate

cf704b3

Rename fromDelegate() factory method to delegatingTo()

70cfdbe

Ignore empty lines in CsvSource.value()

0872858

Copy FastCSV LICENSE file from dependency JAR

d5139e2

osiegmar reviewed Jun 9, 2025

View reviewed changes

vdmitrienko requested a review from marcphilipp June 10, 2025 16:08

		@@ -0,0 +1,21 @@
		MIT License

		Copyright (c) 2021 Oliver Siegmar

	// Cannot get parsed headers until after parsing has started.
	static String[] getHeaders(CsvParser csvParser) {
	return Arrays.stream(csvParser.getContext().parsedHeaders())//
	.map(String::trim)//
	.toArray(String[]::new);
	}

Uh oh!

Replace univocity-parsers with FastCSV #4606

Are you sure you want to change the base?

Replace univocity-parsers with FastCSV #4606

Uh oh!

Conversation

vdmitrienko commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Definition of Done

Uh oh!

marcphilipp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vdmitrienko Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcphilipp Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vdmitrienko Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vdmitrienko commented Jun 1, 2025 •

edited

Loading

vdmitrienko Jun 2, 2025 •

edited

Loading

marcphilipp Jun 3, 2025 •

edited

Loading

vdmitrienko Jun 7, 2025 •

edited

Loading

osiegmar Jun 6, 2025 •

edited

Loading