-
Notifications
You must be signed in to change notification settings - Fork 4
Improved visualization for drake #12
Comments
One aspect about the graph which irritated me tremendously when I took part in @krlmr s workshop, was that Functions and files are always on the left hand side, and not at the level where they come in. For example the function
If a function is used multiple times, the function could be either repeated (risk of cluttering the graph) or just the arrows added (loss of clarity and information). |
The current positioning deliberately shows the general order in which
The main purpose of the arrows is to show dependency relationships. Yes, the
If we duplicate nodes this way, each duplicate will no longer be connected to all of its dependencies or reverse dependencies. If you are trying to see all the connections of an imported function, you would need to track down all the duplicates, which I think would be cumbersome and tedious. |
Alternatively, we do not need to cling to a single graphical arrangement all the time. Currently, the only graph we have is the dependency graph (same as the schedule graph until ropensci/drake#283 is solved). We could optionally generate a "code graph" or a "call graph" with the relationships you described. |
One idea to consider if we stick with |
Do you mean we should emphasize the extended neighborhood of a selected node instead of just thickening the edges of the order-1 neighborhood? (Kind of like By the way, |
@wlandau and I think it should stay that way, as it makes sense in the |
Glad we are on the same page. I think the visuals of the dependency graph could also be part of a separate package. Seems like there is a lot more space to develop and experiment that way. |
The existing dependency graph is a valuable tool
In identifying what is happening during make and to identify why and where things go wrong or targets are outdated. I would definitely keep it in drake. It is much easier for me to understand the dependencies if I see them than jut read them.
Von meinem iPhone gesendet
… Am 12.03.2018 um 15:11 schrieb Will Landau ***@***.***>:
Glad we are on the same page. I think the visuals of the dependency graph could also be part of a separate package. Seems like there is a lot more room to develop that way.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
We can import and re-export any functionality we offload. Examples:
I like this approach because it lightens the code base and makes things easier and faster to test and maintain. |
Just realized I should elaborate. Let's take |
The big advantage that I see to the examples that you listed is that: This makes sense in the case of the examples you mentioned, because work on those packages is largely independent and orthogonal to each other. However, it seems that drake’s Graphing/visualization facilities aren’t something that can easily import the defaults from other packages, and need to be pretty tightly managed. Maybe there will be an expansion at some point where drake can produce a generic network, which can be passed to a users network visualizor of choice? |
|
@rkrug I added alternative graphical arrangements as another project idea. For the call graph, I would think it permissible to repeat mentions of imported functions because the dependency graph is something else entirely. The only issue I see is clutter. |
If I can add to the clustering/condensed graphs point: it would be nice to have targets created by Given a plan object like rules <- list(i__ = 1:10)
plan <- tribble(
~target, ~command,
"x", "rnorm(i__)",
"y", "exp(x_i__)"
) %>% evaluate_plan(rules) One should be able to pass the rules into the graphing function or something vis_drake_graph(config, rules) Then the code would group all |
I think this is the most natural way to think about clusters of targets. Unfortunately, it may be out of scope for the unconference because we don't have a DSL yet, but I believe it is where we should aim. |
Couldn't |
I suppose it could, and it would make the nested clusters you suggested fall into place naturally. If we go forward with a |
@AlexAxthelm re: opacity I assume this would be possible with the distances functionality in igraph, either to define a cluster around a node within a given number of links, or to highlight only things that are up/downstream of some selected node. and/or if edges have properties, maybe a user could choose some to just turn off if the network is cluttered? apologies if this was already addressed elsewhere, I am new to getting caught up learning about how cool |
What it's worth, library(drake)
load_mtcars_example()
config <- drake_config(my_plan)
deps_targets(targets = c("small", "large"), config = config)
#> [1] "simulate"
deps_targets(targets = c("small", "large"), config = config, reverse = TRUE)
#> [1] "regression1_large" "regression1_small" "regression2_large"
#> [4] "regression2_small" "\"report.md\"" See the graph for that example here (from I think the bigger challenge is to write the JavaScript for |
Anyway, I have been asked to close this thread. Unfortunately, I cannot physically be at the unconf, and most of the commenters on this thread are not attending either, so it would be difficult to make this project work on May 21-22. But let's talk more at ropensci/drake#229 and especially ropensci/drake#282. |
Current capabilities
As with many similar reproducible pipeline toolkits, the drake package can display the dependency networks of declarative workflows.
The
visNetwork
package powers interactivity behind the scenes. Click here for the true, interactive version of the above screenshot. There, you can hover, click, drag, zoom, and pan to explore the graph.Start fresh and customize!
Using the
dataframes_graph()
function, you can directly access the network data, including the nodes, edges, and relevant metadata. That means you can create your own custom visualizations without needing to developdrake
itself. You can start from a clean slate and create your own fresh tool.Unconf18 projects ideas
Condensed graphs
Ref: ropensci/drake#229. Network graphs of large workflows are cumbersome. Even with interactivity, graphs with hundreds of nodes are difficult to understand, and larger ones can max out a computer's memory and lag. Condensed graphs could potentially respond faster and more easily guide intuition. There are multiple approaches for simplifying, clustering, and downsizing. Examples:
EDIT: from ropensci/drake#229 (comment)), base
drake
is likely to support a rudimentary form of clustering. But a separate tool could account for nested groupings, and ashiny
app could allow users to assign nodes to clusters interactively.Static graphs
Ref: ropensci/drake#279. To print a
visNetwork
, you can either take a screenshot or export a file from RStudio's viewer pane. Either way, you need to go through a point-and-click tool or one the screenshot tools @maelle mentioned in #11.Drake
cannot yet create static images on its own, and such images could be crisper than screenshots and would enhance reproducible examples.Workflow plan generation
In
drake
, the declarative outline of a workflow is a data frame of targets and commands.The
make()
function resolves the dependency network and builds the targets.Currently, users need to write code to construct workflow plans. (See
drake_plan()
,wildcard templating, and ropensci/drake#233)). To begin a large project project, I usually need to iterate between
drake_plan()
andvis_drake_graph()
several times before all the nodes connect properly. Ashiny
app could interactively build an already-connected workflow graph and then generate a matching plan formake()
.Alternative graphical arrangements (re: #12 (comment))
The default graphical arrangement in
drake
can be counter-intuitive. The dependency graph shows how the targets and imports depend on each other, which is super important, but it is not necessarily the order in which these objects are used chronologically. For example, in this network fromvis_drake_graph()
, thereg1()
function appears upstream fromsmall
even thoughreg1()
takessmall
as an argument to buildregression1_small
. An optional "code graph" or "call graph" could better demonstrate the flow of execution duringmake()
.Final (initial?) thoughts
Drake
stands out from its many peers with its intense focus on R. R stands out because of its strong community and visualization power. Collaboration on visuals will really helpdrake
shine and hopefully improve reproducible research.cc @krlmlr, @AlexAxthelm, @dapperjapper, @kendonB, @rkrug
The text was updated successfully, but these errors were encountered: