Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thought: We sometimes don't need to distinct Overlay and Connect #80

Open
nobrakal opened this issue Jun 12, 2018 · 20 comments
Open

Thought: We sometimes don't need to distinct Overlay and Connect #80

nobrakal opened this issue Jun 12, 2018 · 20 comments

Comments

@nobrakal
Copy link
Contributor

Hi,

I did not found a better title, but here is the idea.
Reading the Algebra.Graph code, I found there was some functions using foldg with the same two last arguments.

For example:

While searching for improvements, I thought that there was actually a duplication of information:
Overlay and Connect both kinda join two graphs.

So I came to

data Graph a = Empty
             | Vertex a
             | Cons Op (Graph a) (Graph a)
             deriving (Foldable, Functor, Show, Traversable)

data Op = Overlay | Connect deriving (Show)

(Once again I am bad for names)

You only have to change

overlay = Cons Overlay
connect = Cons Connect

and then replace every occurrence of the constructor by these counterpart.

Pros

It allows to define:

foldg' :: b -> (a -> b) -> (b -> b -> b) -> Graph a -> b
foldg' e v o = go
  where
    go Empty        = e
    go (Vertex  x ) = v x
    go (Cons _ x y) = o (go x) (go y) 

Which is faster than foldg (hasVertex is about 1.40 times faster for example using hasEdge = foldg' False (==x) (||))

Cons

  • Uses can no more play with constructors, at least not simply.

  • Foldg become a bit less efficient. I had to redefine induce:

induce :: (a -> Bool) -> Graph a -> Graph a
induce p = go
  where
    go (Cons o x y) = case go x of
                        Empty -> go y
                        x' -> case go y of
                                Empty -> x'
                                y' -> Cons o x' y'
    go (Vertex x)   = if p x then Vertex x else Empty
    go Empty = Empty

removeVertex is 1.2 times better in the actual library than with this change, but hasEdge become faster of 2 times. I don't now yet why.

There is I think more Cons than Pros, but it can worth some investigation

@snowleopard snowleopard changed the title Thouht: We sometimes don't need to distinct Overlay and Connect Thought: We sometimes don't need to distinct Overlay and Connect Jun 12, 2018
@snowleopard
Copy link
Owner

@nobrakal Interesting! Yes, of course you can represent graphs (or any algebraic expressions) simply by trees labelled with operators. In fact, this is similar to what I plan to use for labelled graphs :-)

See here: https://github.com/snowleopard/alga/blob/edge-labels/src/Algebra/Graph/Labelled.hs#L62-L71

There is one important advantage of the current data type that you didn't mention: its constructors exactly correspond to the algebra of graphs.

This also brings us back to the question: do we need to export the constructors? Maybe we shouldn't. Then we'll be free to choose whatever internal representation we like, including yours.

However, I still prefer to keep the 'vanilla' version of the data type in the library simply because it emerges so naturally from the underlying mathematics.

If the goal is, however, to find the most efficient representation possible, I'm quite sure it will actually be something else and it will be complex -- more complex than what you suggest. There is also a relevant discussion here: #61.

@nobrakal
Copy link
Contributor Author

In fact, this is similar to what I plan to use for labelled graphs :-)

Haha, I missed this!

This also brings us back to the question: do we need to export the constructors?

Finally, maybe exporting constructors is not the right idea to have.

With some GHC hack, we can allow pattern-matching without constructors, using for example ViewPatterns

@snowleopard
Copy link
Owner

Yes, ViewPatterns may be the way to go.

@nobrakal
Copy link
Contributor Author

nobrakal commented Jun 17, 2018

An other option is PatternSynonyms. It is much more like what we need to hide constructors, it provide synonyms.

There is a naming problem: A pattern cannot have the same name as the constructor... But one can pose:

data Graph a = EmptyG
             | VertexG a
             | OverlayG (Graph a) (Graph a)
             | ConnectG (Graph a) (Graph a)
             deriving (Foldable, Functor, Show, Traversable)

pattern Empty :: Graph a
pattern Empty = EmptyG

pattern Vertex :: a -> Graph a
pattern Vertex a = VertexG a

pattern Overlay :: Graph a -> Graph a -> Graph a
pattern Overlay a b = OverlayG a b

pattern Connect :: Graph a -> Graph a -> Graph a
pattern Connect a b = ConnectG a b

{-# COMPLETE Empty, Vertex, Overlay, Connect #-}

Some advantages:

  • They can be used to construct graphs effectively, so they can even replace lower-cased aliases.
  • They can be a part of the Graph class definition.

@snowleopard
Copy link
Owner

snowleopard commented Jun 18, 2018

Yes, this is a promising approach.

While we're thinking about a more efficient internal representation, I think it is also important to consider packing arguments to multi-way + and * into lists. You then end up with something like this:

data Operator = Overlay | Connect
data Graph a = Leaf a | Node Operator [Graph a]

empty = Node Overlay []
vertex = Leaf
overlays = Node Overlay
connects = Node Connect

This works thanks to the associativity of both operators.

@nobrakal
Copy link
Contributor Author

I played a bit this afternoon and came to:

data Operator = OverlayG | ConnectG deriving (Eq,Show)
data Graph a  = Leaf a | Node Operator [Graph a]
             deriving (Foldable, Functor, Show, Traversable)


pattern Empty :: Graph a
pattern Empty <- Node _ [] where
  Empty = Node OverlayG []

pattern Vertex :: a -> Graph a
pattern Vertex a = Leaf a

pattern Overlay :: Graph a -> Graph a -> Graph a
pattern Overlay a b <- (view -> Node OverlayG [a,b]) where
  Overlay a b = Node OverlayG [a,b]

pattern Connect :: Graph a -> Graph a -> Graph a
pattern Connect a b <- (view -> Node ConnectG [a,b]) where
  Connect a b = Node ConnectG [a,b]

{-# COMPLETE Empty, Vertex, Overlay, Connect #-}
{-# COMPLETE Empty, Vertex, Node #-}

view :: Graph a -> Graph a
view (Node c [a,b])  = Node c [a,b]
view (Node c (x:xs)) = Node c [x,Node c xs]
view g = g

This is passing tests, combining view patterns and patterns synonyms can lead to some interesting results ^^.

Sadly, I think this will slow down the whole thing, since the traditional foldg will not be so efficient !

@nobrakal
Copy link
Contributor Author

nobrakal commented Jun 18, 2018

The results of the bench suite, even using custom foldg like the don't require the Connect of Overlay information, for example, I redefined hasVertex with

foldg'' :: b -> (a -> b) -> ([b] -> b) -> Graph a -> b
foldg'' e v o = go
  where
    go Empty         = e
    go (Vertex  x  ) = v x
    go (Node _ xs) = o $ map go xs

hasVertex :: Eq a => a -> Graph a -> Bool
hasVertex x = foldg'' False (==x) or

The result of the benchmarking suite

isEmpty: 0.97
vertexList: 2.33
vertexCount: 2.25
hasVertex: 1.58
edgeCount: 1.94
edgeList: 1.45
hasEdge: 1.63
addEdge: 2.03
addVertex: 2.68
removeVertex: 1.50

@snowleopard
Copy link
Owner

foldg''

We might call such folds "blindfolds" :-)

Are numbers > 1 good? I'm sure I'll keep forgetting that :D Maybe should add (good), (bad) to the table!

By the way, have you also done overlays = Node Overlay and connects = Node Connect? That should give a good speed up.

@nobrakal
Copy link
Contributor Author

I added a small reminder:

Comparing Alga to AlgaOld . It means that the displayed number will be k such that Alga = k * AlgaOld

So these are very bad values ^^. I just saw that I haven't changed rnf, so maybe when I used pattern matching, it slowed down the whole thing.

I tired with more specialized foldg but I am not able to obtain descent results

I have pushed the try here: https://github.com/nobrakal/alga/tree/betterRepresentation

A bench is running: https://travis-ci.com/nobrakal/alga/jobs/130222270

@snowleopard
Copy link
Owner

It means that the displayed number will be k such that Alga = k * AlgaOld

I'm afraid this doesn't help me, still takes me a while to understand =) Why not add obvious (good)/(bad) labels instead? If you want to get fancy, you could also add (OK) when 0.9 < k < 1.1.

vertexList: 1.92

It's quite surprising to see such a slowdown here. We should be doing less work to traverse a tree and find all vertices -- do you have any ideas why this happens? By the way, it may be useful to test the whole Algebra.Graph API in the regression suite.

@nobrakal
Copy link
Contributor Author

Why not add obvious (good)/(bad) labels instead?

It is done now :)

do you have any ideas why this happens?

Corrected now, it was again a problem with foldl :) Now vertexList is a bit faster:

vertexList

Description: Produce a list of the vertices in the graph

Mesh

1 10 100 1000
Alga 42.71 ns 312.1 ns 314.5 ns 736.2 ns
AlgaOld 45.18 ns 368.3 ns 371.9 ns 695.7 ns

Clique

1 10 100 1000
Alga 42.62 ns 1.977 μs 341.9 μs 68.43 ms
AlgaOld 45.30 ns 2.367 μs 442.3 μs 99.67 ms

SUMMARY:

  • Alga was the fastest 5 times

There was 3 ex-aequo

ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

  • Alga was 1.15 times faster than AlgaOld

My induce is still not convincing:

foldgWithoutEmpty' :: (a -> b) -> ([b] -> b) -> ([b] -> b) -> Graph a -> b
foldgWithoutEmpty' f o c = go
  where
    go (Leaf a) = f a
    go (Node co xs) = case co of
                        OverlayG -> o $ map go xs
                        _ -> c $ map go xs

induce :: (a -> Bool) -> Graph a -> Graph a
induce p = foldgWithoutEmpty' (\x -> if p x then Vertex x else Empty) (overlays . filter k) (connects . filter k)
  where
    k (Node _ []) = False -- Constant folding to get rid of Empty leaves
    k _ = True

removeVertex

Description: Remove a vertex of the graph

Mesh

1 10 100 1000
Alga 33.06 ns 522.7 ns 529.2 ns 1.395 μs
AlgaOld 35.66 ns 368.3 ns 362.4 ns 925.7 ns

Clique

1 10 100 1000
Alga 32.52 ns 4.863 μs 561.3 μs 61.48 ms
AlgaOld 34.27 ns 3.098 μs 456.6 μs 172.5 ms

SUMMARY:

  • AlgaOld was the fastest 10 times
  • Alga was the fastest 2 times

There was 4 ex-aequo

ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

  • Alga was 1.07 times faster than AlgaOld

By the way, it may be useful to test the whole Algebra.Graph API in the regression suite.

Yeah of course. I don't really now where to do that (because I need to extend the configuration file in haskell-perf/graph). Here ? In nobrakal/benchAlgaPr (I am not satisfied with this repo).

@snowleopard
Copy link
Owner

Thanks @nobrakal!

Perhaps, for foldgWithoutEmpty' you need to use parameter Operator -> [b] -> b instead? That might lead to a more efficient code.

@nobrakal
Copy link
Contributor Author

nobrakal commented Jun 20, 2018

Ooops, I spent some time to figure out the the bench-suite was broken. A mesh wasn't a real mesh !

Here is the corrected times:

removeVertex

Description: Remove a vertex of the graph

Mesh

1000
Alga 195.8 μs
AlgaOld 185.9 μs

Clique

1000
Alga 55.75 ms
AlgaOld 163.2 ms

Circuit

1000
Alga 98.39 μs
AlgaOld 89.24 μs

SUMMARY:

  • Alga was the fastest 2 times
  • AlgaOld was the fastest 2 times

There was 2 ex-aequo

ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

  • Alga was 1.20 times faster than AlgaOld

So there is a surprising drop for clique. Again, strangely,

induce :: (a -> Bool) -> Graph a -> Graph a
induce p = go
  where
    go (Leaf a) = if p a then Leaf a else Empty
    go (Node co xs) = Node co $ filter k $ map go xs 
    k (Node _ []) = False -- Constant folding to get rid of Empty leaves
    k _ = True

Is not faster, it is worse:

removeVertex

Description: Remove a vertex of the graph

Mesh

1000
Alga 224.7 μs
AlgaOld 169.3 μs

Clique

1000
Alga 68.39 ms
AlgaOld 169.2 ms

Circuit

1000
Alga 116.9 μs
AlgaOld 80.39 μs

SUMMARY:

  • AlgaOld was the fastest 4 times
  • Alga was the fastest 2 times

ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

  • AlgaOld was 1.06 times faster than Alga

Perhaps, for foldgWithoutEmpty' you need to use parameter Operator -> [b] -> b instead? That might lead to a more efficient code.

What do you mean ?

@snowleopard
Copy link
Owner

snowleopard commented Jun 21, 2018

What do you mean?

I mean the function signature should be:

foldgWithoutEmpty' :: (a -> b) -> (Operator -> [b] -> b) -> Graph a -> b
foldgWithoutEmpty' v op g = ...

The function op here deals with both types of nodes, so one of the conditionals (checking the operator of the current node) disappears/is pushed inside the function.

@nobrakal
Copy link
Contributor Author

nobrakal commented Jun 21, 2018

Yep you were right, it boost a bit performances, but nothing extraordinary:

foldgWithoutEmpty' :: (a -> b) -> (Operator -> [b] -> b) -> Graph a -> b
foldgWithoutEmpty' f h = go
  where
    go (Leaf a) = f a
    go (Node co xs) = h co $ map go xs

induce :: (a -> Bool) -> Graph a -> Graph a
induce p = foldgWithoutEmpty' (\x -> if p x then Vertex x else Empty) (\co t -> Node co $ filter k t)
  where
    k (Node _ []) = False -- Constant folding to get rid of Empty leaves
    k _ = True

removeVertex

Description: Remove a vertex of the graph

Mesh

1 10 100 1000
Alga 38.59 ns 1.350 μs 19.11 μs 207.2 μs
AlgaOld 41.80 ns 1.072 μs 14.01 μs 170.5 μs

Clique

1 10 100 1000
Alga 39.35 ns 4.519 μs 539.3 μs 60.47 ms
AlgaOld 41.96 ns 3.419 μs 445.9 μs 155.9 ms

Circuit

1 10 100 1000
Alga 112.6 ns 1.039 μs 10.68 μs 104.9 μs
AlgaOld 82.09 ns 857.2 ns 7.770 μs 79.26 μs

SUMMARY:

  • AlgaOld was the fastest 17 times
  • Alga was the fastest 4 times

There was 3 ex-aequo

ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

  • Alga was 1.03 times faster than AlgaOld

@nobrakal
Copy link
Contributor Author

Again, this is a problem with edges. Maybe I should change the comparing script to use the alga representation instead of edges, but edges is what users will use.

The same code benchmarked using alga's clique:

Clique

1000
Alga 29.79 μs
AlgaOld 34.93 μs

@snowleopard
Copy link
Owner

I agree: edges is very often used to construct graphs, so we shouldn't assume graph expressions are coming in a better shape.

@nobrakal
Copy link
Contributor Author

nobrakal commented Jun 27, 2018

So I don't know if we can make a better induce here (and because all the graphs structures are similar when building with edges, our induce is faster on very big lists nested inside a Node)

@nobrakal
Copy link
Contributor Author

Note that the implementation described in #84 (comment) shows great results here

### Clique
#### 1000
##### (0,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 76.30 ns | 1.000 |
| AlgaOld || 93.11 ms | 0.898 |
+---------++----------+-------+

##### (292,820)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 18.99 ms | 0.971 |
| AlgaOld || 95.07 ms | 0.970 |
+---------++----------+-------+

##### (997,999)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 45.77 ms | 0.996 |
| AlgaOld || 88.05 ms | 0.974 |
+---------++----------+-------+

##### (0,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 32.41 ms | 0.999 |
| AlgaOld || 90.33 ms | 0.978 |
+---------++----------+-------+

##### (1,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 45.26 ms | 1.000 |
| AlgaOld || 92.34 ms | 0.979 |
+---------++----------+-------+

##### (1,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 31.54 ms | 1.000 |
| AlgaOld || 90.06 ms | 0.980 |
+---------++----------+-------+

Even with an algebraic clique:

### Clique
#### 1000
##### (0,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 71.73 ns | 0.995 |
| AlgaOld || 48.15 μs | 1.000 |
+---------++----------+-------+

##### (292,820)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 16.92 μs | 1.000 |
| AlgaOld || 53.08 μs | 1.000 |
+---------++----------+-------+

##### (997,999)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 33.32 μs | 0.999 |
| AlgaOld || 53.66 μs | 1.000 |
+---------++----------+-------+

##### (0,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 26.77 μs | 1.000 |
| AlgaOld || 48.66 μs | 1.000 |
+---------++----------+-------+

##### (1,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 28.30 μs | 1.000 |
| AlgaOld || 48.14 μs | 1.000 |
+---------++----------+-------+

##### (1,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 95.58 ns | 0.996 |
| AlgaOld || 48.75 μs | 1.000 |
+---------++----------+-------+

@snowleopard
Copy link
Owner

Interesting, that's an impressive speed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants