Support for surfacing cache status to caller #44

mnutt · 2024-02-01T17:56:16Z

I'm looking into using galaxycache to replace an existing API caching system. One of the things I was hoping to do was to log detailed caching info in my request logs. My origin API is particularly sensitive to duplicates so I field a lot of questions like "why did these two requests that came in so close together result in two cache misses" and it's useful to have request logs to debug.

I see that there is built-in tracing support, is there any way to extract a cache status from that? I'm imagining that from a Get() call, I'd like to know a) was the current node authoritative, and b) was the result a local_miss, peer_miss, peer_hit, maincache_hit, hotcache_hit etc. I think from looking through the source that we have most of this information through authoritative and hitLevel, the only one I don't see is how to distinguish between a peer hit or miss.

I'm not wedded to any implementation in particular, but was imagining maybe something like this?

type Metadata struct {
	Level              hitLevel
	LocalAuthoritative bool
	PeerErr            error
	LocalErr           error
}

func (g *Galaxy) GetWithMetadata(ctx context.Context, key string, dest Codec) (*Metadata, error) { }

I opened this issue to make sure that I wasn't missing an obvious way to handle this, and to see if adding it would be something that would be in line with the project's goals. Thanks!

The text was updated successfully, but these errors were encountered:

dfinkel · 2024-02-05T18:59:13Z

You didn't miss an obvious way to handle this.

Generally, we haven't exposed this data because tracing generally provides a better view and it's pretty rare that you'd want that much fidelity in request-logs (distributed tracing is quite useful). For our services, we've been removing fine-grained request logs because we rarely used them and they get expensive depending on the request-rate, and hosting setup.

With that said, I am open to extending the Galaxy type's interface a bit to provide a more extensible form of the Get call. It's probably about time to think about how generics should change the Galaxy type (or at least a wrapper). Providing an Info return value is definitely something that's worth exploring.

Currently, neither the HTTP nor the gRPC transports are setup to plumb back hit information, so figuring out whether it's a remote hit would require some plumbing. (possible, and possibly useful, but since both of them currently use galaxy.Get directly, they'd need a method like the one you're proposing in order to plumb it back. (which does make it doable, but may require an interface change)

It would definitely require an interface change (or at least an optional interface extension) to the RemoteFetcher interface, since there's no place for telemetry there:

galaxycache/peers.go

Line 43 in c7ef985

Fetch(context context.Context, galaxy string, key string) ([]byte, error)

The other complication that may be worth considering is that single-flighting on both the authoritative host and current host may end up being pretty close to cache-hits if they join up close enough to the completion of a previous request for that key.

I'm going to have to think about the various use-cases and the return value of a new method a bit more before I propose adding another Get-ish method. (It would be good to open the door to a Peek method while we're at it -- particularly while we have to extend both transports anyway)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for surfacing cache status to caller #44

Support for surfacing cache status to caller #44

mnutt commented Feb 1, 2024

dfinkel commented Feb 5, 2024

Support for surfacing cache status to caller #44

Support for surfacing cache status to caller #44

Comments

mnutt commented Feb 1, 2024

dfinkel commented Feb 5, 2024