-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing fails when protocol is missing #36
Comments
urltools::url_parse("bas%%%asa/s")
#> scheme domain port path parameter fragment
#> 1 <NA> bas%%%asa <NA> s <NA> <NA> Created on 2023-09-25 with reprex v2.0.2 A thing that we can do is to have two modes. Let's call them strict mode and lax mode. Strict mode is what we are doing. Lax mode is to relax this check: https://github.com/schochastics/adaR/blob/cd526ec1e32e4666b90a69749043515d1ca30517/src/adaR.cpp#L29 If not a valid URL, attempt to add "https://" and recheck. If it checks, then parse. If it doesn't check, return all NA. |
Ok this is rather a feature than a bug. I will table this for now (but keep it open), given that the primary use case of the package is dealing with webtracking data which should always have valid urls. But we may get back to your suggestion in case we need to |
Probably YAGNI. |
Great package! Was hoping to use it as drop-in replacement of I have two exported functions where I want to work with the domain, even when the user pastes a link with a path. With fun1 <- function(x) {
urltools::domain(x)
}
fun2 <- function(x) {
domain <- urltools::domain(x)
### Does something ###
fun1(domain)
}
fun1("https://github.com/schochastics/adaR/issues/36")
#> [1] "github.com"
fun2("https://github.com/schochastics/adaR/issues/36")
#> [1] "github.com" With fun1 <- function(x) {
adaR::ada_get_domain(x)
}
fun2 <- function(x) {
domain <- adaR::ada_get_domain(x)
### Does something ###
fun1(domain)
}
fun1("https://github.com/schochastics/adaR/issues/36")
#> [1] "github.com"
fun2("https://github.com/schochastics/adaR/issues/36")
#> [1] NA Created on 2023-10-05 with reprex v2.0.2 I'll work around it in the way @chainsawriot suggests, but I found the behaviour of |
without knowing what is happening in |
lax_get_domain <- function(url) {
## not vectorize
res <- adaR::ada_get_domain(url)
if (!is.na(res)) {
return(res)
}
return(adaR::ada_get_domain(paste0("https://", url)))
}
lax_get_domain("bit.ly/aasas")
#> [1] "bit.ly"
lax_get_domain("notdomain/notpath")
#> [1] NA
lax_get_domain("https://bit.ly/aasas")
#> [1] "bit.ly"
lax_get_domain("gopher://bit.ly/aasas")
#> [1] "bit.ly" Created on 2023-10-05 with reprex v2.0.2 |
Created on 2023-09-25 with reprex v2.0.2
not sure if/how to handle this
The text was updated successfully, but these errors were encountered: