Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Managing Cache Deletion and Fallback to Forwarding During Unbound Recursive DNS Failures #1061

Open
kkkgo opened this issue May 4, 2024 · 3 comments

Comments

@kkkgo
Copy link

kkkgo commented May 4, 2024

I have been using Unbound for about four to five years, greatly benefiting from its robust caching capabilities, which generally result in very fast DNS responses. I employ the following primary configuration to enable Unbound's optimistic caching:

    serve-expired: yes
    serve-expired-ttl: 0
    serve-expired-reply-ttl: 0
    prefetch: yes

My expected operational outcomes are:

  • When a client requests a DNS record and its TTL has expired, the cached result should be served with a TTL of 0.
  • Unbound should immediately attempt to refresh the cache, which in my configuration is done recursively.
  • Upon subsequent requests, the client should receive an updated DNS result.

This configuration works well under most circumstances, and I have also shared it as a Docker image with others. However, due to the unstable network quality of some ISPs, particularly for authoritative DNS servers located abroad, there are consistent connectivity issues, leading to:

  1. Some domains consistently fail recursive queries, yielding no results.
  2. Some domains occasionally succeed in recursive queries but fail most of the time, leading to stale cache data.

For the first issue, I have implemented a simple plugin that uses a third-party DNS as a downstream fallback for Unbound. The server first attempts to resolve via Unbound, and if it fails to get a response within a specific time frame (e.g., 200 ms), it forwards the request to a public DNS. This is also applicable for fault tolerance when the Unbound service is interrupted.

For the second issue, I found it challenging to make decisions downstream because once a domain's recursive query succeeds, its result is continuously cached, even if set with a long serve-expired-ttl. This means the cache keeps serving the data with a TTL of 0. Hence, if I use TTL=0 as a criterion for fallback DNS, it negates the benefit of serve-expired: yes. The crux of the problem is that I cannot use downstream DNS to determine if Unbound has successfully completed a recursive query.

To address this, I propose two possible solutions:

  1. Introduce a threshold for deleting cache entries after consecutive DNS refresh failures, such as serve-expired-fetch-fail: 5. If a DNS refresh query fails more than five times, the expired result should be removed from the cache. This would allow downstream DNS to recognize Unbound's unavailability and switch to a public DNS result, a process feasible for most DNS servers with parallel query capabilities.
  2. On a recursive query failure, attempt to fall back to a public DNS. This could be facilitated by adding an option like recursive-first: yes:
server:
    forward-zone:
        name: "."
        recursive-first: yes
        forward-addr: 8.8.8.8

This approach ensures that if recursive querying fails, a request is made to a public DNS, thus refreshing the DNS cache.

As I am not a professional programmer, these are just some of my thoughts and suggestions. I am open to hearing if there are more viable solutions or improvements to my approach.

@Dynamic5912
Copy link

Not an answer - however what is the benefit of setting serve-expired-reply-ttl to zero?

I'm not using Redis or anything - just straight Unbound.

Thanks!

@kkkgo
Copy link
Author

kkkgo commented Jun 4, 2024

Not an answer - however what is the benefit of setting serve-expired-reply-ttl to zero?

I'm not using Redis or anything - just straight Unbound.

Thanks!

serve-expired-reply-ttl setting defines the TTL value used for expired cache responses sent to the client. Setting it to 0 means Unbound will send a response with a TTL of 0 to the client, indicating that the record has expired and needs to be re-queried immediately. This allows expired DNS records to be refreshed as quickly as possible, reducing the retention time of expired records. If the client receives an expired record that is unusable, it can promptly initiate another DNS resolution attempt.

@Dynamic5912
Copy link

Not an answer - however what is the benefit of setting serve-expired-reply-ttl to zero?

I'm not using Redis or anything - just straight Unbound.

Thanks!

serve-expired-reply-ttl setting defines the TTL value used for expired cache responses sent to the client. Setting it to 0 means Unbound will send a response with a TTL of 0 to the client, indicating that the record has expired and needs to be re-queried immediately. This allows expired DNS records to be refreshed as quickly as possible, reducing the retention time of expired records. If the client receives an expired record that is unusable, it can promptly initiate another DNS resolution attempt.

Excellent - thanks for explaining.

Is there any benefit to also using serve-expired-ttl-reset?

I never really could get my head around understanding exactly what this does - and I note that by default it's set to no by Unbound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants