Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

Cache old partition states incase of ZooKeeper failure. #65

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

scottjab
Copy link
Contributor

@scottjab scottjab commented Nov 3, 2016

Sequins will start 501ing if zookeeper has a blip. This PR attempts to cache partition state between blips.

r? @colinmarc
cc? @charleseff

@@ -44,6 +45,7 @@ func watchPartitions(zkWatcher *zkWatcher, peers *peers, db, version string, num
replication: replication,
local: make(map[int]bool),
remote: make(map[int][]string),
old: make(map[int][]string, 1024),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old isn't really a descriptive name. maybe disappeared?

@@ -129,6 +131,26 @@ func (p *partitions) updateRemotePartitions(nodes []string) {
}
}

// Keep track of old peers in case zookeeper goes away.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't particularly helpful. I think it'd be better to add documentation to FindPeers about the new functionality.

for _, node := range unDedupedPartition {
if !found[node] {
found[node] = true
p.old[partitionId] = append([]string{node}, p.old[partitionId]...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd probably be slightly more efficient to append rather than prepend, and then truncate as part of the dedupe (working backwards). Even better, though, would be to just have a separate dedupe method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this always adds the nodes to this list, when we just want to add them if they got removed above.


oldPeers := make([]string, 1024)
copy(oldPeers, p.old[partition])
// Append old peers to peer list, in case of Zookeeper issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't think this comment tells the reader much (if they don't already know what's up).

oldPeers := make([]string, 1024)
copy(oldPeers, p.old[partition])
// Append old peers to peer list, in case of Zookeeper issues.
peers = append(peers, oldPeers...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the dead peers will get shuffled into the live peers by the caller of this code. I think it'd be better if FindPeers return shuffled live peers + dead peers at the end.

unDedupedPartition := append(newPartition, p.old[partitionId]...)
found := map[string]bool{}

// Shitty dedupe, iterate though the remote peers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language!

@@ -111,6 +113,19 @@ func (p *partitions) updateLocalPartitions(local map[int]bool) {
}
}

func (p *partitions) deDupe(nodes []string) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call this dedupe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there's no reason for it to be on p: it can just be func dedupe(s []string) []string

@@ -111,6 +113,19 @@ func (p *partitions) updateLocalPartitions(local map[int]bool) {
}
}

func (p *partitions) deDupe(nodes []string) []string {
found := map[string]bool{}
dedupedNodes := make([]string, len(nodes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be make([]string, 0, len(nodes)) or you're going to have extra empty values at the end

@@ -129,6 +144,17 @@ func (p *partitions) updateRemotePartitions(nodes []string) {
}
}

for partitionId, partition := range remote {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still isn't checking that the remote peer actually disappeared (ie, that it used to be in p.remote and now is not)

@@ -129,6 +144,17 @@ func (p *partitions) updateRemotePartitions(nodes []string) {
}
}

for partitionId, partition := range remote {
newPartition := make([]string, len(partition))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this peers I think

newPartition := make([]string, len(partition))
copy(newPartition, partition)

unDedupedPartition := append(newPartition, p.disappeared[partitionId]...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the temp variable here.

@@ -221,17 +247,21 @@ func (p *partitions) partitionZKNode(partition int) string {
}

// getPeers returns the list of peers who have the given partition available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update the docstring. However, I still think it'd be cleaner to just return one list, and consider the peers "ordered" by priority. That gives us the freedom to actually add in priority later, if we want to do some light downweighting of peers from which we've seen errors.

@colinmarc
Copy link
Contributor

Speaking of the build failing, this would be great with some tests!

@scottjab-stripe scottjab-stripe force-pushed the scottjab/cache-old-peers-for-a-bit branch from c8be9fe to e18c85e Compare December 1, 2016 21:48
@@ -69,6 +71,19 @@ func WatchPartitions(zkWatcher *zk.Watcher, peers *Peers, db, version string, nu
return p
}

// Dedupelicates elements in a slice of strings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be of the form dedupe deduplicates...

@@ -107,17 +122,20 @@ func (p *Partitions) sync(updates chan []string) {
}

// FindPeers returns the list of peers who have the given partition available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is still wrong, and I still think the signature would be better if it just returned a list of peers vaguely prioritized. That way we could trivially add some sort of greylisting for peers that exhibit errors a lot.

@@ -120,12 +120,12 @@ func (vs *version) serveError(w http.ResponseWriter, key string, err error) {
w.WriteHeader(http.StatusInternalServerError)
}

func shuffle(vs []string) []string {
func shuffle(vs []string, disappeared []string) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return an arbitrary number of deleted nodes in addition to the live ones, which will change the behavior of proxy subtly. I think we want to try just peers, where dead peers are only considered if there aren't enough live peers to fulfill the replication factor.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants