-
Notifications
You must be signed in to change notification settings - Fork 18
Add support for netzschleuder #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd be in favor of this being a separate package under the igraph organization, to be used as a dependency. 😇 |
Sounds good to me, because that would allow a bit more freedom in development (adding dependencies). |
I'd say separate, as igraphdata is about sharing not accessing data, and if needed we can make igraphdata depend on the neztschleuder package? |
Alright makes sense. Let's see if others agree 🙂 |
So this was one of the "mentored projects", announced for Python, and we've looked into it with @ntamas a bit. It seems more complicated that expected. It looks like it may be necessary to implement an importer for the The GraphML use custom data types such as Some of the GML files were outright corrupt, e.g. contained invalid character codes. Of course we can talk to Tiago and look for a fix. igraph currently only supports scalar attributes (number, bool, string), which is a major limitation. It does not support e.g. a pair of coordinates as a single attribute. This is why it is typical to store x, y coordinates in two To be precise, the Python and R interfaces of igraph do support arbitrary Python and R objects as attributes, but these are not accessible from C, therefore the file format readers can't handle them. Before the attribute handling can be overhauled (one of the major item in our last, rejected CZI application), we can try to handle this by reading such attributes as strings, and then deserializing these strings in the host language (Python or R). This may still necessitate changes to the C core format readers, to actually return this data as a string. As I said, in GraphML unknown types are simply not supported (the format itself does not standardize non-scalar attributes) while in GML composite types are ignored now with a warning. Netschleuder actually serializes some of the non-scalar types into strings, even when the format (such as GML) would support them. In some cases, it does the serialization in a Python-specific way. Deserializing in R may be a challenge. I must leave now, but I wanted to type up some of the concerns. We could discuss this in a live meeting sometime. |
We definitely need to discuss this live but the networks also exist in zipped csv? I test a bit more to see if I run into problems. |
This is the working(?) prototype. Just tested a few samples so far. #' Download a graph from the Netzschleuder data catalogue
#'
#' Netzschleuder (<https://networks.skewed.de/>) is a large online repository for
#' network datasets with the aim of aiding scientific research.
#' @param name character. name of the network dataset.
#' @param net character. If the dataset contains several networks this is the network name.
#' @param directed logical. Whether a directed graph is constructed.
#' @param bipartite logical. Whether a bipartite graph is constructed.
#' @return a new graph object.
#' @keywords graphs
#' @family foreign
#' @export
graph_from_netzschleuder <- function(name, net = NULL, directed = FALSE, bipartite = FALSE) {
if (is.null(net)) {
net <- name
}
zip_url <- paste0(
"https://networks.skewed.de/net/", name, "/files/", net, ".csv.zip"
)
temp <- tempfile()
download.file(zip_url, temp, quiet = TRUE)
zip_contents <- unzip(temp, list = TRUE)
edge_file_name <- zip_contents$Name[grepl("edge", zip_contents$Name)]
node_file_name <- zip_contents$Name[grepl("node", zip_contents$Name)]
edges_df <- read.csv(unz(temp, edge_file_name)) + 1
names(edges_df)[c(1, 2)] <- c("from", "to")
nodes_df <- read.csv(unz(temp, node_file_name))
names(nodes_df)[1] <- "id"
nodes_df$id <- nodes_df$id + 1
if ("X_pos" %in% names(nodes_df)) {
pos_array <- gsub("array\\(\\[|\\]|\\)", "", nodes_df[["X_pos"]])
split_coords <- strsplit(pos_array, ",")
x_vals <- sapply(split_coords, function(x) as.numeric(trimws(x[1])))
y_vals <- sapply(split_coords, function(x) as.numeric(trimws(x[2])))
nodes_df[["X_pos"]] <- NULL
nodes_df$x <- x_vals
nodes_df$y <- y_vals
}
on.exit(unlink(temp))
g <- graph_from_data_frame(edges_df, directed = directed, vertices = nodes_df)
if (bipartite) {
types <- rep(FALSE, vcount(g))
types[nodes_df$id %in% edges_df[, 1]] <- TRUE
g <- set_vertex_attr(g, "type", value = types)
}
g
} |
Can we have it in igraphdata, separated into:
? |
What is the feature or improvement you would like to see?
Add a function that allows to get networkdata from https://networks.skewed.de
Use cases for the feature
The website has a good API and this would allow users to easily get a more diverse set of realistic network data
I already have a prototype implementation. If this is a desired feature, I will start a PR.
(cc @szhorvat)
The text was updated successfully, but these errors were encountered: