Skip to content

Commit e4c9941

Browse files
committed
Add a test for the substring column mapping transform
This confirms that the transform handles the case where the values list doesn't have length 2 by raising an error. This prompted me to make issue #146, which I think should really simplify this transform.
1 parent 4158841 commit e4c9941

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

hlink/tests/core/transforms_test.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,35 @@ def test_apply_transform_remove_punctuation(spark: SparkSession, is_a: bool) ->
260260
]
261261

262262

263+
@pytest.mark.parametrize("values", [[1], [1, 2, 3]])
264+
@pytest.mark.parametrize("is_a", [True, False])
265+
def test_apply_transform_substring_error_when_not_exactly_2_values(
266+
values: list[int], is_a: bool
267+
) -> None:
268+
"""
269+
The substring transform takes a list of exactly two values, which are the
270+
start position of the substring and its length. If the list has the wrong
271+
number of values, then apply_transform() raises an error.
272+
273+
TODO: It would be simpler to have two separate attributes for the substring
274+
start and length, like this:
275+
276+
{
277+
"type": "substring",
278+
"start_index": 0,
279+
"length": 4,
280+
}
281+
282+
See issue #146. Making these changes would eliminate the need for this
283+
test.
284+
"""
285+
input_col = col("input")
286+
transform = {"type": "substring", "values": values}
287+
288+
with pytest.raises(ValueError, match="Length of substr transform should be 2"):
289+
apply_transform(input_col, transform, is_a)
290+
291+
263292
@pytest.mark.parametrize("is_a", [True, False])
264293
def test_apply_transform_error_when_unrecognized_transform_type(is_a: bool) -> None:
265294
column_select = col("test")

0 commit comments

Comments
 (0)