Skip to content

[Bug]: repeated absence data #48

@mguerreiro24

Description

@mguerreiro24

Describe the bug

When I open my dataset (from command trainValTest) I notice that the absence data is repeated in both the test and train sets

Steps to reproduce the bug

library(SDMtune)
species <- 'PheronemaCarpenteri'
presence_locations <- read.csv(paste(species,'/EnvironmentalData.csv',sep=''))
current <- sqrt(presence_locations$horizontalXCurrents^2+presence_locations$horizontalYCurrents^2)
presence_locations <- cbind(presence_locations,current)
presence_locations <- subset(presence_locations,current<1)
presence_locations <- presence_locations[,c(1:7,9,10,13)]#selecting variables
variables <- variable.names(presence_locations)[3:10]
  
background <- read.csv('backgroundEnvironment.csv')
background <- na.omit(background)
current <- sqrt(background$horizontalXCurrents^2+background$horizontalYCurrents^2)
background <- cbind(background,current)
background <- subset(background,current<1)
background <- background[,c(1:7,9,10,13)]#selecting variables

proportion <- dim(presence_locations)[1]*10

#for testing reasons this two variables are being set here. Normally they would be the output of a nested loop (I am doing a small sample SDM, so I am using 2 variables at a time; In this case, I only have 3 points!)
i=1
ii=2
run=1

#this section is normally within a for loop
background <- background[sample(nrow(background),proportion),]
  data_species <- rbind(presence_locations,background)
  df_coords<-data_species[,1:2]#coordinates: presence & absence
  df_data<-data_species[,c(i+2,ii+2)]#environmental data: presence & absence
  pa<- c(rep(1,dim(presence_locations)[1]),rep(0,dim(background)[1]))#vectors of 1 and 0: presence & absence
  data <- SWD(species = species,
              coords = df_coords,
              data = df_data,
              pa = pa)
  # Split presence locations in training (75%) and testing (25%) datasets
  datasets <- trainValTest(data, test = 0.25, only_presence = TRUE, seed = run)
  train <- datasets[[1]]
  test <- datasets[[2]]

Session information

R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default
  LAPACK version 3.12.1

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=Portuguese_Portugal.utf8  LC_CTYPE=Portuguese_Portugal.utf8   
[3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C                        
[5] LC_TIME=Portuguese_Portugal.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SDMtune_1.3.3

loaded via a namespace (and not attached):
 [1] terra_1.8-70       vctrs_0.6.5        cli_3.6.5          rlang_1.1.6       
 [5] generics_0.1.4     S7_0.2.0           glue_1.8.0         sp_2.2-0          
 [9] scales_1.4.0       grid_4.5.1         tibble_3.3.0       lifecycle_1.0.4   
[13] compiler_4.5.1     dplyr_1.1.4        codetools_0.2-20   RColorBrewer_1.1-3
[17] Rcpp_1.1.0         pkgconfig_2.0.3    rstudioapi_0.17.1  farver_2.1.2      
[21] lattice_0.22-7     R6_2.6.1           tidyselect_1.2.1   pillar_1.11.1     
[25] magrittr_2.0.4     tools_4.5.1        gtable_0.3.6       dismo_1.3-16      
[29] raster_3.6-32      ggplot2_4.0.0

Additional information

No response

Reproducible example

  • I have done my best to provide the steps to reproduce the bug

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions