You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-29339][R] Support Arrow 0.14 in vectoried dapply and gapply (test it in AppVeyor build)
### What changes were proposed in this pull request?
This PR proposes:
1. Use `is.data.frame` to check if it is a DataFrame.
2. to install Arrow and test Arrow optimization in AppVeyor build. We're currently not testing this in CI.
### Why are the changes needed?
1. To support SparkR with Arrow 0.14
2. To check if there's any regression and if it works correctly.
### Does this PR introduce any user-facing change?
```r
df <- createDataFrame(mtcars)
collect(dapply(df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double")))
```
**Before:**
```
Error in readBin(con, raw(), as.integer(dataLen), endian = "big") :
invalid 'n' argument
```
**After:**
```
gear
1 5
2 5
3 5
4 4
5 4
6 4
7 4
8 5
9 5
...
```
### How was this patch tested?
AppVeyor
Closesapache#25993 from HyukjinKwon/arrow-r-appveyor.
Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Copy file name to clipboardExpand all lines: docs/sparkr.md
+9-2Lines changed: 9 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -663,13 +663,20 @@ Apache Arrow is an in-memory columnar data format that is used in Spark to effic
663
663
664
664
## Ensure Arrow Installed
665
665
666
-
Currently, Arrow R library is not on CRAN yet [ARROW-3204](https://issues.apache.org/jira/browse/ARROW-3204). Therefore, it should be installed directly from Github. You can use `remotes::install_github` as below.
666
+
Arrow R library is available on CRAN as of [ARROW-3204](https://issues.apache.org/jira/browse/ARROW-3204). It can be installed as below.
`apache-arrow-0.12.1` is a version tag that can be checked in [Arrow at Github](https://github.com/apache/arrow/releases). You must ensure that Arrow R package is installed and available on all cluster nodes. The current supported version is 0.12.1.
678
+
`apache-arrow-0.12.1` is a version tag that can be checked in [Arrow at Github](https://github.com/apache/arrow/releases). You must ensure that Arrow R package is installed and available on all cluster nodes.
679
+
The current supported minimum version is 0.12.1; however, this might change between the minor releases since Arrow optimization in SparkR is experimental.
673
680
674
681
## Enabling for Conversion to/from R DataFrame, `dapply` and `gapply`
0 commit comments