[SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip test_perf_profiler_data_source if pyarrow is absent

dongjoon-hyun · dongjoon-hyun · commit 9b0b1ce2d628 · 2025-11-21T14:55:16.000-08:00
### What changes were proposed in this pull request? This PR aims to skip `test_perf_profiler_data_source` if `pyarrow` is absent. ### Why are the changes needed? To recover the failed `PyPy` CIs. - https://github.com/apache/spark/actions/workflows/build_python_pypy3.10.yml - https://github.com/apache/spark/actions/runs/19574648782 - https://github.com/apache/spark/actions/runs/19574648782/job/56056836234 ``` ====================================================================== ERROR: test_perf_profiler_data_source (pyspark.sql.tests.test_udf_profiler.UDFProfiler2Tests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/sql/tests/test_udf_profiler.py", line 609, in test_perf_profiler_data_source self.spark.read.format("TestDataSource").load().collect() File "/__w/spark/spark/python/pyspark/sql/classic/dataframe.py", line 469, in collect sock_info = self._jdf.collectToPython() File "/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line 1362, in __call__ return_value = get_return_value( File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line 263, in deco return f(*a, **kw) File "/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py", line 327, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o235.collectToPython. : org.apache.spark.SparkException: Error from python worker: Traceback (most recent call last): File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 199, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 37, in <module> File "/usr/local/pypy/pypy3.10/lib/pypy3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<builtin>/frozen importlib._bootstrap_external", line 897, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/worker/plan_data_source_read.py", line 21, in <module> import pyarrow as pa ModuleNotFoundError: No module named 'pyarrow' ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53162 from dongjoon-hyun/SPARK-54153. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/python/pyspark/sql/tests/test_udf_profiler.py b/python/pyspark/sql/tests/test_udf_profiler.py
@@ -585,6 +585,7 @@ def summarize(left, right):
         for id in self.profile_results:
             self.assert_udf_profile_present(udf_id=id, expected_line_count_prefix=2)
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_perf_profiler_data_source(self):
         class TestDataSourceReader(DataSourceReader):
             def __init__(self, schema):