Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested ':=' reference assignment fails #6768

Open
AntonNM opened this issue Jan 29, 2025 · 7 comments · May be fixed by #6789
Open

nested ':=' reference assignment fails #6768

AntonNM opened this issue Jan 29, 2025 · 7 comments · May be fixed by #6789
Labels
Milestone

Comments

@AntonNM
Copy link

AntonNM commented Jan 29, 2025

Issues arise when a function called in by-reference modification, modifies the dt by reference as well. Below is the example running on the latest dev version. The result of the outer function overwrites the inner function in the incorrect column name, the column named new is left empty.

> library(data.table)
data.table 1.16.99 IN DEVELOPMENT built 2025-01-27 04:18:35 UTC; root using 7 threads (see ?getDTthreads).  Latest news: r-datatable.com

library(data.table)
options(datatable.verbose = TRUE)
dt = data.table(a=1:10)
inner <- function(dt){
    dt[,se:=1:10]
}
outer <- function(dt){
    inner(dt);
    return (11:20)
}
dt[,new:=outer(dt)]
print(dt)
Detected that j uses these columns: <none>
Detected that j uses these columns: <none>
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
        a    se    new
    <int> <int> <NULL>
 1:     1    11   NULL
 2:     2    12   NULL
 3:     3    13   NULL
 4:     4    14   NULL
 5:     5    15   NULL
 6:     6    16   NULL
 7:     7    17   NULL
 8:     8    18   NULL
 9:     9    19   NULL
10:    10    20   NULL
@AntonNM
Copy link
Author

AntonNM commented Jan 29, 2025

There is a workaround of returning a containing the results of the inner and outer function and assigning them all at once, however this structure was more desirable.

@MichaelChirico
Copy link
Member

could you confirm (1) whether the issue is present on CRAN (2) whether updating to the current version of data.table fixes the issue? There are some recent related commits that might fix it

@AntonNM
Copy link
Author

AntonNM commented Jan 29, 2025

Thank you, the latest CRAN does indeed resolve this issue!

@MichaelChirico
Copy link
Member

Great! Please also check on the current devel version, this is probably due to #6551

@AntonNM
Copy link
Author

AntonNM commented Jan 31, 2025

Updating to the current dev version 32dfd8d (1.16.99) re-introduces this issue.

I also pulled, and tested the commits for #6551 & #6542 (the commit directly preceding #6551) and the issue is present in both.

Perhaps it was introduced in an earlier commit?

@MichaelChirico MichaelChirico added this to the 1.17.0 milestone Jan 31, 2025
@MichaelChirico MichaelChirico modified the milestones: 1.17.0, 1.18.0 Jan 31, 2025
@MichaelChirico
Copy link
Member

I see the same behavior on 1.16.0, 1.16.4 and current master:

Detected that j uses these columns: <none>
Detected that j uses these columns: <none>
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
Assigning to all 10 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==10; length(cols)==1)
        a    se    new
    <int> <int> <NULL>
 1:     1    11   NULL
 2:     2    12   NULL
 3:     3    13   NULL
 4:     4    14   NULL
 5:     5    15   NULL
 6:     6    16   NULL
 7:     7    17   NULL
 8:     8    18   NULL
 9:     9    19   NULL
10:    10    20   NULL

It's definitely odd that you've managed to create a column that's just NULL -- the resulting dt is not a proper data.table.

Could you double check & report which version produces the correct result?

@AntonNM
Copy link
Author

AntonNM commented Feb 1, 2025

I went back as far as 1.13.0 and was unable to find a version that produces the correct result, however I believe I found the root of the confusion:

library(data.table)
options(datatable.verbose = T)

dt = data.table(a=1:10)

inner <- function(dt){
    dt[,se:=1:10]
}
outer <- function(dt){
    inner(dt);
    return (11:20)
}

dt[,new:=outer(dt)]
print(dt)
dt[,new:=outer(dt)]
print(dt)

The second occurrence of the assignment statement produces the correct result. I believe this is what occurred when I first tested the latest CRAN version.

AntonNM added a commit to AntonNM/data.table that referenced this issue Feb 2, 2025
*Followed TODO: by mattdowle from resolution to '2-space indentation Rdatatable#2420'

*Added tests for jsub that modify DT by-reference

*Added test case for interger vector indexing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants