-
Notifications
You must be signed in to change notification settings - Fork 32
Allow rotation between multiple bandwidth servers #57
Comments
Measurement endpoints in the original bandwidth scanner design are set up per-authority, so each operator also has a different test endpoint. That's the intent here, with the baked in URL (https://bwauth.torproject.org) as a default fallback for testing, etc. |
Measurement endpoints in the original bandwidth scanner design are set up per-authority, so each operator also has a different test endpoint. That's the intent here,
This design results in significant biases towards relays that are located between the bandwidth scanner and the bandwidth server. For example, see gabelmoo on:
https://people.torproject.org/~irl/volatile/rsvotes/
We are going to fix this by creating a pool of bandwidth servers shared between all
bandwidth scanners, possibly including one or more CDNs:
https://trac.torproject.org/projects/tor/ticket/24674
with the baked in URL (https://bwauth.torproject.org) as a default fallback for testing, etc.
This is not a good default, because it means that test networks running bandwidth scanners compete for bandwidth with Faravahar's bandwidth server.
We want to change the default, probably to a CDN, because it won't have these issues:
https://trac.torproject.org/projects/tor/ticket/21990
However, one proposal is to create a 'loop' circuits back to the bandwidth scanner, which is also running a tor relay (with ExitPolicy Accept to the test endpoint), see the "there and back again" circuit generator and https://trac.torproject.org/projects/tor/ticket/9762. Thoughts?
This won't work as coded, Tor relays refuse to connect back to the last hop. So you'll need to run two relays in this design.
It seems unnecessary, less anonymous, and guarantees the kinds of geographical biases we are trying to fix.
If this design was implement with a pool of exits, with bandwidth servers on the exit machines, it could be more accurate. But I think this design also makes it harder for us to use a CDN.
If exit scarcity ever becomes an issue, then let's use single onion services for internal downloads.
If the perconnbwrate is ever activated, we can turn scanner clients into relays by setting an ORPort and a low MaxAdvertisedBandwidth. It doesn't require a change in the way we build paths.
|
The bandwidth scanning process shouldn't depend on anonymity of the scanners or endpoints because this is hard to guarantee. Ideally a malicious exit relay shouldn't be able to learn the endpoints AND selectively limit throughput to influence other relays measurements that are in the same circuit. I.e. a circuit that looks like: bwscanner client ---> measured relay ---> malicious exit ---> known measurement endpoint means that a malicious exit relay operator could collude with guard relays and bias measurement results. I believe this is made worse by the 'slice' approach of the current scanners, because controlling enough relays in the manner above can probably be used to block new relays from being fairly measured and rising in rank via a kind of 'consensus wall'. And I suspect this might happen as a side effect of circuits built between relays that coincidentally happen to be in the same datacenter because connections will scale thoughput faster and bias measurements on circuits that are geographically closer. And any relay that is aware it's being measured can do whatever it can in order to improve its measurement results (ie drop every other cell that isn't part of a measurement circuit). I'm not sure what we can do about this type of biasing, though. I think this is the sort of thing that Peerflow can mitigate because measurements are the result of passive observations from the rest of the network rather than active probes. So a scanner process that looks like the this: bwscanner client ---> local relay ---> measured relay ---> endpoint exit ---> endpoint local to exit Should mean that a malicious relay can only influence results that it is also part of. An active network adversary who is able to attack network infrastructure (e.g. the scanner's ISP or upstreams) can of course still degrade measurements towards relays that it wishes to degrade with this approach. The geographical biases are also difficult to address if we continue to run a small set of scanners (i.e. one or less per dirauth) as most of the scanners are located in US or Europe - so even moving the test endpoints to a CDN will still leave the scanners wherever they may be. So you'll see biases that arise from a relay being near a bwscanner and a cdn node - all of which might be in the same datacenter! I guess it's obvious that using a single CDN hands a lot of power to the CDN operator, too. The redesigned scanner was built with the possibility of 'sharding' the scans across a set of scanners - see circuit.TwoHop and arguments "partitions, this_partition". The intent here was to be able to run multiple scanners in parallel on different machines in order to better scale (i.e. reduce the time to complete a scan of the entire network). For example, a bandwidth authority operator could use a cloud computing service to spin up nodes for the duration of the scan and then combine the results. That might make attacking bandwidth scanner infrastructure harder because endpoints won't be defined statically, though exits with single line exitpolicy towards a test endpoint are going to stand out. Single onion services make a lot of sense because we can avoid the requirement of using an exit in the measurement path, so a circuit can look something like: bwscanner client ---> bwscanner local relay ---> measured relay ---> bwscanner local relay single onion endpoint There's a lot of speculation above :) - Thoughts? P.S. there are other ways to measure performance that could be used for feedback rather than bandwidth measurements - e.g. circuit failure rates, extend latency, particularly for high bandwidth relays that may be more CPU constrained than bandwidth limited. |
There's a lot of speculation above :) - Thoughts?
Our initial goal is to replace the current codebase with a system that produces similar results. We really need to do this in the next year or so.
We also want to try and reduce the bias in the current results, so any new system will probably need to support a pool of bandwidth servers, not just one hard-coded bandwidth server. This is an existing torflow/BwAuthority that we want to keep. (And I think we can do better, by using a config file, rather than expecting operators to edit the source like BwAuthority does.)
That's why I opened this ticket for this feature.
P.S. there are other ways to measure performance that could be used for feedback rather than bandwidth measurements - e.g. circuit failure rates, extend latency, particularly for high bandwidth relays that may be more CPU constrained than bandwidth limited.
I think the best way to work through alternative path selection and measurement options is a tor-dev thread. And then a proposal using the Core Tor proposals process. We can afford to take our time, test some different alternatives, and then make decisions over the next few years.
Would you like to copy most of your last post into a tor-dev email?
|
There is one hard-coded bandwidth server.
We need multiple bandwidth servers, so that measurements aren't biased towards one location.
The text was updated successfully, but these errors were encountered: