Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for acyclic graphs and associated operations #154

Open
michaelpj opened this issue Dec 4, 2018 · 14 comments
Open

Add support for acyclic graphs and associated operations #154

michaelpj opened this issue Dec 4, 2018 · 14 comments

Comments

@michaelpj
Copy link

"Topological sorting" is under-specified wrt the ordering of the independent vertices at each "level" of the sort. We could reflect this in the output type, by returning something like [Set Vertex], where each set is an independent set. You can then recover the current sort by flattening the inner sets in any order.

This has two advantages:

  • We don't make a choice about how to sort the independent sets (the user might have an opinion).
  • Sometimes it's useful to know what the independent sets are (e.g. you can process all things whose dependencies have been satisfied in a "batch", but you need to know where the batch starts and ends).
@snowleopard
Copy link
Owner

This is related to this issue: #85, where a possible implementation of batch function was provided (it is no longer compatible with the new scc type though).

One problem with deriving such "level vertex sets" is that the solution is not unique, as demonstrated by this simple example: 1 + 2 * 3. Here there are two possible answers: [[1,2], 3] and [2, [1,3]], i.e. 1 can be batched with either 2 or 3. Making an arbitrary decision, e.g. picking the lexicographically first solution, makes me uneasy.

In an ideal world, I'd like to have the following signature of scc (see also #152):

scc :: Graph a -> Acyclic.Graph (NonEmpty.Graph a)

Now we could write functions like this:

topSort            :: Acyclic.Graph a        -> [a]
batchLexicographic :: Acyclic.Graph a        -> [Set a]
batchWeighted      :: Acyclic.Graph (a, Int) -> [Set a]

Where the latter does an optimal batching that aims to minimise the overall "execution time" of tasks associated with vertices. There are other interesting batching strategies, including dynamic ones.

To sum up: I'd like to have this functionality, but I think it shouldn't be a part of scc's API. It belongs to the API of the (yet non-existing) Acyclic.Graph data type.

@michaelpj
Copy link
Author

Ah, of course it's not unique, my mistake! Then we can't give a "best" most general type for topSort indeed.

So yes, this should be some kind of additional "batching" functionality, which might not even belong in this package.

Not sure if you want to keep this open for future reference, happy to close and I'll just implement my own thing.

@snowleopard
Copy link
Owner

Let's keep this issue open, but I'll give it a more general title.

@snowleopard snowleopard changed the title Enrich the output of topSort to be a list of independent sets Add support for acyclic graphs and associated operations Dec 4, 2018
@adithyaov
Copy link
Contributor

Hi, Please go through the code in the following repository and let me know if it could work as a possible representation for acyclic graphs.

https://github.com/adithyaov/acyclic-graphs

The function unsafeConvertToAcyclic in the module Acyclic.Graph provides a way to convert any type that has an instance of Graph to a graph of some acyclic type.
Functions like scc can be written like the follows, scc' = unsafeConvertToAcyclic . scc

I am currently looking at other possible implementation related to phantom types (gdp), but I haven't made much progress there.

@snowleopard
Copy link
Owner

@adithyaov Thanks! I'm not sure if your solution has any advantages compared to simply wrapping graphs into an Acyclic newtype, for example:

newtype Acyclic a = Acyclic a

scc' :: Graph a -> Acyclic (NonEmpty.Graph a)
scc' = Acyclic . scc

This is a lot simpler than what you propose, but seems to achieve the same goal: you introduce an abstract data type for acyclic graphs, which can only be created using functions like scc which are known to produce acyclic graphs.

Later on, if you extract the list of edges from this graph, the compiler gives you no static guarantees that there are no cycles in the obtained netlist. So, for example, if you want to apply topSort to the result, you'll need to use fromJust somewhere.

@adithyaov
Copy link
Contributor

adithyaov commented Mar 27, 2019

@snowleopard , I think that there is a guarantee of acyclicity.
It is guaranteed from the representation of the acyclic graphs themselves. In the above representation of acyclic graphs, there are only edges from lower ordered vertices to higher ordered vertices as + and * for acyclic graphs work in such a way.

If there is any Acyclic type, it is guaranteed that the property holds as we are not simply applying the wrapper but constructing the graph from the ground up using the specific * and + as defined for the acyclic graph.

Following is the algebra that is followed,
E is some set of ordered elements.
A is the set of acyclic graphs formed with E.
CaptureGSOC
* and + work in the following way,
CaptureGS2
unsafeConvertToAcyclic applies a new order on the elements and follows the construction of the acyclic graph.
Even if the graph has cycles, the result of unsafeConvertToAcyclic will be acyclic, the result won't make much sense but it'll still be acyclic.
The list of edges extracted from a graph of any Acyclic type is definitely guaranteed to contain no cycles.
I don't think fromJust is required anywhere. Could you please elaborate on this? The code nowhere works with a Maybe type (not internally nor externally).

@snowleopard
Copy link
Owner

It is guaranteed from the representation of the acyclic graphs themselves. In the above representation of acyclic graphs, there are only edges from lower ordered vertices to higher ordered vertices

@adithyaov Yes, this is an internal invariant of the data structure, but one can say that scc also provides the very same guarantee -- there are no cycles among strongly connected components. Alas, in both cases this guarantee is not reflected in types: the compiler is not aware of this, only the programmer.

The list of edges extracted from a graph of any Acyclic type is definitely guaranteed to contain no cycles.

And this is true of this list too: edgeList (scc g) for any g. Do you agree?

I don't think fromJust is required anywhere. Could you please elaborate on this?

Sure. As soon as you want to process the resulting acyclic graph by an algorithm that relies on the absence of cycles you run into partiality. A typical example is topSort which returns Maybe [a]. If you apply topSort to an acyclic graph using your representation you will still get a Maybe [a]. We know for sure it's a Just, but have no way of proving this to the compiler, which requires us to use fromJust.

@adithyaov
Copy link
Contributor

adithyaov commented Mar 27, 2019

@snowleopard , It seems I am a slow learner, I believe I started off on the wrong foot.
But I think I'm finally understanding what you meant! 👍

this guarantee is not reflected in types

Is considering the structure a good place to start? For example, maybe constructing the acyclic graphs like how one constructs a Tree (I don't have a concrete way of doing this yet but I'll try working on this)?
Even if I did manage to create a representation like such, how would one make it work with the general graph representation of Alga?
For example, consider the structure for binary trees, data BTree a = Empty | Node a (BTree a) (BTree a). The only way I can think of working with this and the algebraic structure Graph is to convert a graph to Tree (unsafe) and Tree to graph.

One could create a different implementation of topSort on such structures and hence Maybe won't be necessary?

If possible could you also point me in a new direction with some abstract ideas you have in mind? It seems working with the order of vertices is not getting me far enough.

It seems someone has created something similar to what's required,
https://github.com/athanclark/dag/
Do you think one could extend/modify it to suit Alga?

@snowleopard
Copy link
Owner

snowleopard commented Mar 29, 2019

@adithyaov I haven't had much time to think about type-safe acyclic graphs yet, so I don't yet have any good answers to your questions. Please try various different approaches and let us know how they work!

In addition to the link you shared, here are some relevant Twitter discussions:

https://twitter.com/phadej/status/1105201407576166400?s=19
https://twitter.com/haskell_cat/status/1105445470720139264?s=19

@anfelor
Copy link
Contributor

anfelor commented May 2, 2019

Please try various different approaches and let us know how they work!

The more I think about this, the less sure I am that there should be a representation of acyclic graphs, that is fundamentally different from alga. The problem is that acyclic graphs aren't instances of Graph: For example a * b is acyclic and b * a is acyclic but their sum and product aren't. Furthermore, it takes linear time to compute whether the sum or product of two acyclic graphs is acyclic again, since we have to consider each edge and vertex at least once to account for 2-cycles:
(1*2 + 3*4 + .. + 99*100) + (1*2 + 3*4 + .. + 32*31 + .. + 99*100). This means, that we could just as well add the underlying graphs and apply topSort to it.

The complexity of inserting many new edges is still an open problem, but the algorithm in that paper takes O(n^2.75) time, so it is still slower than just building an (possibly cyclic) graph and then applying topSort. So unless we find a different data structure to represent dags, I would suggest, we try one of the following:

  1. A wrapper around a graph and possibly a topological order (depending on how batch* is implemented):
data Acyclic a = Acyclic { getGraph :: AdjacencyMap a, getTopologicalOrder :: [a]}

topSort :: AdjacencyMap a -> Maybe (Acyclic a)
sccSort :: AdjacencyMap a -> Acyclic (NonEmpty.AdjacencyMap a) -- see below
batchLexicographic :: Acyclic a        -> [Set a]
batchWeighted      :: Acyclic (a, Int) -> [Set a]

Here one could add a monad to append new vertices as in the twitter thread.

  1. No representation at all. At least to fix topSort doesn't know that the result of scc has no cycles #152 a simple sccSort function would suffice, since most scc implementations already return the SCCs in topological order (not the one in Data.Graph I am afraid, but a custom implementation could do this):
sccSort :: Ord a => AdjacencyMap a -> [NonEmpty.AdjacencyMap a]
sccSort m = map expand sccs
  where
    sccs      = sccList m -- custom implementation of KL.scc
    expand xs = fromJust $ NonEmpty.toNonEmpty $ induce (`Set.member` s) m
      where
        s = Set.fromList xs

@snowleopard
Copy link
Owner

@anfelor I agree that the "abstract data type + smart constructors" approach is a nice solution from the pragmatic point of view: it's simple and it solves most of the API problems. I'm happy for this approach to be tried first. We could also add an "acyclic graph monad" (discussed here), as one of the ways to create such abstract acyclic graphs (let's use this term?), at least some small examples, where potential performance issues are not important.

The problem is that acyclic graphs aren't instances of Graph: For example a * b is acyclic and b * a is acyclic but their sum and product aren't.

Note that it is in principle possible to define an algebra of graphs, which operates on ordered vertices, as @adithyaov sketches in his above comment. With this approach both graph expressions a * b + b * a and (a * b) * (b * a) will actually correspond to the acyclic graph a * b, assuming a < b. This makes it possible to have one more way to create abstract acyclic graphs:

fromOrdered :: Ord a => Graph a -> Acyclic a

Having said that, I'm not sure whether this algebra has any interesting properties from the mathematical point of view. I haven't had a chance to check which laws hold and which do not.

@anfelor
Copy link
Contributor

anfelor commented May 3, 2019

Note that it is in principle possible to define an algebra of graphs, which operates on ordered vertices,

From a first glance at the laws, it looks like this should work for all partial orders <.
But I don't think Ord is the right constraint for this; how should for example in #85 a total order be defined if we only have a partial order? Furthermore, the ordering on something like a Package is probably going to involve the name, version, etc. and not the dependencies. Still a function like

fromOrdered :: (a -> a -> Bool) -> Graph a -> Acyclic a

which admits a partial order as the first argument should work (as a "graph homomorphism" :)).

But how to we get that partial order? Especially in the case of #85, I feel like we would need to compute an acyclic graph first and would get the partial order through reachability, not the other way around?

@snowleopard
Copy link
Owner

snowleopard commented May 3, 2019

@anfelor Indeed, a partial order is sufficient. Interesting!

But how to we get that partial order?

From the perspective of the algebra of graphs, the answer doesn't matter: we just operate on a partially ordered set of vertices.

But from the user's perspective, the answer does matter. In some cases you do have a partial order upfront, and it does act as a proof that the resulting graph is acyclic. Of course, if a graph can in fact have cycles, then you cannot supply such a partial order without first performing a topological sort.

Here is a simple example where we do have a partial order in advance:

  • Every object file depends only on C source files: file.c < file.o.
  • Every executable depends only on object files: file.o < file.exe.

Here we could just use < from the derived Ord instance for the extension data type:

data Extension = C | O | Exe deriving (Eq, Ord)

type File = (FilePath, Extension)

partialOrder :: File -> File -> Bool
partialOrder (_, x) (_, y) = x < y

I find this a convincing example, where we can have a type-safe construction of an acyclic graph thanks to some domain knowledge.

@anfelor
Copy link
Contributor

anfelor commented May 3, 2019

This seems to be a special case of partial orders: Those that are induced by a total order on keys (Extension in your example). But in general, transitivity could be a problem if for example some source files depend on .h files which depend on their source file in return.

Still, using a partial order as a guarantee of acyclicity is a good idea; in the general case topSort even returns a total order of the vertices!

snowleopard pushed a commit that referenced this issue Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants