proposals/268-guard-selection.txt

Filename: 268-guard-selection.txt
Title: New Guard Selection Behaviour
Author: Isis Lovecruft, George Kadianakis, [Ola Bini]
Created: 2015-10-28
Status: Obsolete

  (Editorial note: this was origianlly written as a revision of
  proposal 259, but it diverges so substantially that it seemed
  better to assign it a new number for reference, so that we
  aren't always talking about "The old 259" and "the new 259". -NM)

  This proposal has been obsoleted by proposal #271.

§1. Overview

  Tor uses entry guards to prevent an attacker who controls some
  fraction of the network from observing a fraction of every user's
  traffic. If users chose their entries and exits uniformly at
  random from the list of servers every time they build a circuit,
  then an adversary who had (k/N) of the network would deanonymize
  F=(k/N)^2 of all circuits... and after a given user had built C
  circuits, the attacker would see them at least once with
  probability 1-(1-F)^C.  With large C, the attacker would get a
  sample of every user's traffic with probability 1.

  To prevent this from happening, Tor clients choose a small number of
  guard nodes (currently 3).  These guard nodes are the only nodes
  that the client will connect to directly.  If they are not
  compromised, the user's paths are not compromised.

  But attacks remain.  Consider an attacker who can run a firewall
  between a target user and the Tor network, and make
  many of the guards they don't control appear to be unreachable.
  Or consider an attacker who can identify a user's guards, and mount
  denial-of-service attacks on them until the user picks a guard
  that the attacker controls.

  In the presence of these attacks, we can't continue to connect to
  the Tor network unconditionally.  Doing so would eventually result
  in the user choosing a hostile node as their guard, and losing
  anonymity.

  This proposal outlines a new entry guard selection algorithm, which
  addresses the following concerns:

    - Heuristics and algorithms for determining how and which guard(s)
      is(/are) chosen should be kept as simple and easy to understand
      as possible.

    - Clients in censored regions or who are behind a fascist firewall
      who connect to the Tor network should not experience any
      significant disadvantage in terms of reachability or usability.

    - Tor should make a best attempt at discovering the most
      appropriate behaviour, with as little user input and
      configuration as possible.


§2. Design

  Alice, an OP attempting to connect to the Tor network, should
  undertake the following steps to determine information about the
  local network and to select (some) appropriate entry guards.  In the
  following scenario, it is assumed that Alice has already obtained a
  recent, valid, and verifiable consensus document.

  The algorithm is divided into four components such that the full
  algorithm is implemented by first invoking START, then repeatedly
  calling NEXT while adviced it SHOULD_CONTINUE and finally calling
  END. For an example usage see §A. Appendix.

  Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE
  is used for the algorithm to be able to tell the caller whether we
  consider the work done or not - this can be used to retry primary
  guards when we finally are able to connect to a guard after a long
  network outage, for example.

  This algorithm keeps track of the unreachability status for guards
  in state global to the system, so that repeated runs will not have
  to rediscover unreachability over and over again. However, this
  state does not need to be persisted permanently - it is purely an
  optimization.

  The algorithm expects several arguments to guide its behavior. These
  will be defined in §2.1.

  The goal of this algorithm is to strongly prefer connecting to the
  same guards we have connected to before, while also trying to detect
  conditions such as a network outage. The way it does this is by keeping
  track of how many guards we have exposed ourselves to, and if we have
  connected to too many we will fall back to only retrying the ones we have
  already tried. The algorithm also decides on sample set that should
  be persisted - in order to minimize the risk of an attacker forcing
  enumeration of the whole network by triggering rebuilding of
  circuits.


§2.1. Definitions

  Bad guard: a guard is considered bad if it conforms with the function IS_BAD
  (see §G. Appendix for details).

  Dead guard: a guard is considered dead if it conforms with the function
  IS_DEAD (see §H. Appendix for details).

  Obsolete guard: a guard is considered obsolete if it conforms with the
  function IS_OBSOLETE (see §I. Appendix for details).

  Live entry guard: a guard is considered live if it conforms with the function
  IS_LIVE (see §D. Appendix for details).

§2.1. The START algorithm

  In order to start choosing an entry guard, use the START
  algorithm. This takes four arguments that can be used to fine tune
  the workings:

  USED_GUARDS
      This is a list that contains all the guards that have been used
      before by this client. We will prioritize using guards from this
      list in order to minimize our exposure. The list is expected to
      be sorted based on priority, where the first entry will have the
      highest priority.

  SAMPLED_GUARDS
      This is a set that contains all guards that should be considered
      for connection. This set should be persisted between runs. It
      should be filled by using NEXT_BY_BANDWIDTH with GUARDS as an
      argument if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
      guards after winnowing out older guards.

  N_PRIMARY_GUARDS
      The number of guards we should consider our primary
      guards. These guards will be retried more frequently and will
      take precedence in most situations. By default the primary
      guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS.
      When the algorith is used in constrained mode (have bridges or entry
      nodes in the configuration file), this value should be 1 otherwise the
      proposed value is 3.

  DIR
      If this argument is set, we should only consider guards that can
      be directory guards. If not set, we will consider all guards.

  The primary work of START is to initialize the state machine depicted
  in §2.2. The initial state of the machine is defined by:

  GUARDS
      This is a set of all guards from the consensus. It will primarily be used
      to fill in SAMPLED_GUARDS

  FILTERED_SAMPLED
      This is a set that contains all guards that we are willing to connect to.
      It will be obtained from calling FILTER_SET with SAMPLED_GUARDS as
      argument.

  REMAINING_GUARDS
      This is a running set of the guards we have not yet tried to connect to.
      It should be initialized to be FILTERED_SAMPLED without USED_GUARDS.

  STATE
      A variable that keeps track of which state in the state
      machine we are currently in. It should be initialized to
      STATE_PRIMARY_GUARDS.

  PRIMARY_GUARDS
      This list keeps track of our primary guards. These are guards
      that we will prioritize when trying to connect, and will also
      retry more often in case of failure with other guards.
      It should be initialized by calling algorithm
      NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains
      N_PRIMARY_GUARDS elements.


§2.2. The NEXT algorithm

  The NEXT algorithm is composed of several different possibly flows. The
  first one is a simple state machine that can transfer between two
  different states. Every time NEXT is invoked, it will resume at the
  state where it left off previously. In the course of selecting an
  entry guard, a new consensus can arrive. When that happens we need
  to update the data structures used, but nothing else should change.

  Before jumping in to the state machine, we should first check if it
  was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried
  any of the PRIMARY_GUARDS. If this is the case, and we are not in
  STATE_PRIMARY_GUARDS, we should save the previous state and set the
  state to STATE_PRIMARY_GUARDS.


§2.2.1. The STATE_PRIMARY_GUARDS state

  Return each entry in PRIMARY_GUARDS in turn. For each entry, if the
  guard should be retried and considered suitable use it. A guard is
  considered to eligible to retry if is marked for retry or is live
  and id not bad. Also, a guard is considered to be suitable if is
  live and, if is a directory it should not be a cache.

  If all entries have been tried transition to STATE_TRY_REMAINING.

§2.2.2. The STATE_TRY_REMAINING state

  Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in
  turn.For each entry, if a guard is found return it.

  Return each entry from REMAINING_GUARDS in turn.
  For each entry, if the guard should be retried and considered
  suitable use it and mark it as unreachable. A guard is
  considered to eligible to retry if is marked for retry or is live
  and id not bad. Also, a guard is considered to be suitable if is
  live and, if is a directory it should not be a cache.

  If no entries remain in REMAINING_GUARDS, transition to
  STATE_PRIMARY_GUARDS.


§2.2.3. ON_NEW_CONSENSUS

  First, ensure that all guard profiles are updated with information
  about whether they were in the newest consensus or not.

  Update the bad status for all guards in USED_GUARDS and SAMPLED_GUARDS.
  Remove all dead guards from USED_GUARDS and SAMPLED_GUARDS.
  Remove all obsolete guards from USED_GUARDS and SAMPLED_GUARDS.

§2.3. The SHOULD_CONTINUE algorithm

  This algorithm takes as an argument a boolean indicating whether the
  circuit was successfully built or not.

  After the caller have tried to build a circuit with a returned
  guard, they should invoke SHOULD_CONTINUE to understand if the
  algorithm is finished or not. SHOULD_CONTINUE will always return
  true if the circuit failed. If the circuit succeeded,
  SHOULD_CONTINUE will always return false, unless the guard that
  succeeded was the first guard to succeed after
  INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the
  state to STATE_PRIMARY_GUARDS and return true.


§2.4. The END algorithm

  The goal of this algorithm is simply to make sure that we keep track
  of successful connections made. This algorithm should be invoked
  with the guard that was used to correctly set up a circuit.

  Once invoked, this algorithm will mark the guard as used, and make
  sure it is in USED_GUARDS, by adding it at the end if it was not there.


§2.5. Helper algorithms

  These algorithms are used in the above algorithms, but have been
  separated out here in order to make the flow clearer.

  NEXT_PRIMARY_GUARD
      - Return the first entry from USED_GUARDS that is not in
        PRIMARY_GUARDS and that is in the most recent consensus.
      - If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with
        REMAINING_GUARDS as the argument.

  NEXT_BY_BANDWIDTH
      - Takes G as an argument, which should be a set of guards to
        choose from.
      - Return a randomly select element from G, weighted by bandwidth.

  FILTER_SET
      - Takes G as an argument, which should be a set of guards to filter.
      - Filter out guards in G that don't comply with IS_LIVE (see
        §D. Appendix for details).
      - If the filtered set is smaller than MINIMUM_FILTERED_SAMPLE_SIZE and G
        is smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, expand G and try to
        filter out again. G is expanded by adding one new guard at a time using
        NEXT_BY_BANDWIDTH with GUARDS as an argument.
      - If G is not smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, G should not be
        expanded. Abort execution of this function by returning null and report
	      an error to the user.


§3. Consensus Parameters, & Configurable Variables

  This proposal introduces several new parameters that ideally should
  be set in the consensus but that should also be possible to
  set or override in the client configuration file. Some of these have
  proposed values, but for others more simulation and trial needs to
  happen.

  PRIMARY_GUARDS_RETRY_INTERVAL
      In order to make it more likely we connect to a primary guard,
      we would like to retry the primary guards more often than other
      types of guards. This parameter controls how many minutes should
      pass before we consider retrying primary guards again. The
      proposed value is 3.

  SAMPLE_SET_THRESHOLD
      In order to allow us to recognize completely unreachable network,
      we would like to avoid connecting to too many guards before switching
      modes. We also want to avoid exposing ourselves to too many nodes in a
      potentially hostile situation. This parameter, expressed as a
      fraction, determines the number of guards we should keep as the
      sampled set of the only guards we will consider connecting
      to. It will be used as a fraction for the sampled set.
      If we assume there are 1900 guards, a setting of 0.02
      means we will have a sample set of 38 guards.
      This limits our total exposure. Proposed value is 0.02.

  MINIMUM_FILTERED_SAMPLE_SIZE
      The minimum size of the sampled set after filtering out nodes based on
      client configuration (FILTERED_SAMPLED). Proposed value is ???.

  MAXIMUM_SAMPLE_SIZE_THRESHOLD
      In order to guarantee a minimum size of guards after filtering,
      we expand SAMPLED_GUARDS until a limit.  This fraction of GUARDS will be
      used as an upper bound when expanding SAMPLED_GUARDS.
      Proposed value is 0.03.

  INTERNET_LIKELY_DOWN_INTERVAL
      The number of minutes since we started trying to find an entry
      guard before we should consider the network down and consider
      retrying primary guards before using a functioning guard
      found. Proposed value 5.

§4.  Security properties and behavior under various conditions

  Under normal conditions, this algorithm will allow us to quickly
  connect and use guards we have used before with high likelihood of
  working. Assuming the first primary guard is reachable and in the
  consensus, this algorithm will deterministically always return that
  guard.

  Under dystopic conditions (when a firewall is in place that blocks
  all ports except for potentially port 80 and 443), this algorithm
  will try to connect to 2% of all guards before switching modes to try
  dystopic guards. Currently, that means trying to connect to circa 40
  guards before getting a successful connection. If we assume a
  connection try will take maximum 10 seconds, that means it will take
  up to 6 minutes to get a working connection.

  When the network is completely down, we will try to connect to 2% of
  all guards plus 2% of all dystopic guards before realizing we are
  down. This means circa 50 guards tried assuming there are 1900 guards
  in the network.

  In terms of exposure, we will connect to a maximum of 2% of all
  guards plus 2% of all dystopic guards, or 3% of all guards,
  whichever is lower. If N is the number of guards, and k is the
  number of guards an attacker controls, that means an attacker would
  have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their
  guards selected before we fall back. In real terms, this means an
  attacker would need to control over 10% of all guards in order to
  have a larger than 50% chance of controlling a guard for any given client.

  In addition, since the sampled set changes slowly (the suggestion
  here is that guards in it expire every month) it is not possible for
  an attacker to force a connection to an entry guard that isn't
  already in the users sampled set.


§A. Appendix: An example usage

  In order to clarify how this algorithm is supposed to be used, this
  pseudo code illustrates the building of a circuit:

    ESTABLISH_CIRCUIT:

      if chosen_entry_node = NULL
        if context = NULL
          context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards,
                                    sampled_guards=[],
                                    options,
                                    n_primary_guards=3,
                                    dir=false,
                                    guards_in_consensus)

        chosen_entry_node = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
        if not IS_SUITABLE(chosen_entry_node)
          try another entry guard

      circuit = composeCircuit(chosen_entry_node)
      return circuit

    ON_FIRST_HOP_CALLBACK(channel):

      if !SHOULD_CONTINUE:
            ALGO_CHOOSE_ENTRY_GUARD_END(entryGuard)
      else
            chosen_entry_node = NULL


§B. Appendix: Entry Points in Tor

  In order to clarify how this algorithm is supposed to be integrated with
  Tor, here are some entry points to trigger actions mentioned in spec:

    When establish_circuit:

        If *chosen_entry_node* doesn't exist
            If *context* exist, populate the first one as *context*
            Otherwise, use ALGO_CHOOSE_ENTRY_GUARD_START to initalize a new *context*.

            After this when we want to choose_good_entry_server, we will use
            ALGO_CHOOSE_ENTRY_GUARD_NEXT to get a candidate.

        Use chosen_entry_node to build_circuit and handle_first_hop,
        return this circuit

    When entry_guard_register_connect_status(should_continue):

        if !should_continue:
           Call ALGO_CHOOSE_ENTRY_GUARD_END(chosen_entry_node)
        else:
           Set chosen_entry_node to NULL

    When new directory_info_has_arrived:

        Do ON_NEW_CONSENSUS


§C. Appendix: IS_SUITABLE helper function

    A guard is suitable if it satisfies all of the folowing conditions:
      - It's considered to be live, according to IS_LIVE.
      - It's a directory cache if a directory guard is requested.
      - It's not the chosen exit node.
      - It's not in the family of the chosen exit node.

    This conforms to the existing conditions in "populate_live_entry_guards()".


§D. Appendix: IS_LIVE helper function

    A guard is considered live if it satisfies all of the folowing conditions:
      - It's not disabled because of path bias issues (path_bias_disabled).
      - It was not observed to become unusable according to the directory or
        the user configuration (bad_since).
      - It's marked for retry (can_retry) or it's been unreachable for some
        time (unreachable_since) but enough time has passed since we last tried
        to connect to it (entry_is_time_to_retry).
      - It's in our node list, meaninig it's present in the latest consensus.
      - It has a usable descriptor (either a routerdescriptor or a
        microdescriptor) unless a directory guard is requested.
      - It's a general-purpose router unless UseBridges is configured.
      - It's reachable by the configuration (fascist_firewall_allows_node).

    This conforms to the existing conditions in "entry_is_live()".

    A guard is observed to become unusable according to the directory or the
    user configuration if it satisfies any of the following conditions:
      - It's not in our node list, meaninig it's present in the latest consensus.
      - It's not currently running (is_running).
      - It's not a bridge and not a configured bridge
        (node_is_a_configured_bridge) and UseBridges is True.
      - It's not a possible guard and is not in EntryNodes and UseBridges is
        False.
      - It's in ExcludeNodes. Nevertheless this is ignored when
    loading from config.
      - It's not reachable by the configuration (fascist_firewall_allows_node).
      - It's disabled because of path bias issues (path_bias_disabled).

    This conforms to the existing conditions in "entry_guards_compute_status()".

§E. Appendix: UseBridges and Bridges configurations

  This is mutually exclusive with EntryNodes.

  If options->UseBridges OR options->EntryNodes:
    - guards = populate_live_entry_guards() - this is the "bridge flavour" of
      IS_SUITABLE as mentioned before.
    - return node_sl_choose_by_bandwidth(guards, WEIGHT_FOR_GUARD)
  This is "choose a guard from S by bandwidth weight".

  UseBridges and Bridges must be set together. Bridges go to bridge_list (via
  bridge_add_from_config()), but how is it used?
  learned_bridge_descriptor() adds the bridge to the global entry_guards if
  UseBridges = True.

  We either keep the existing global entry_guards OR incorporate bridges in the
  proposal (remove non bridges from USED_GUARDS, and REMAINING_GUARDS = bridges?)

  If UseBridges is set as true, we need to fill the SAMPLED_GUARDS
  with bridges specified and learned from consensus.

§F. Appendix: EntryNodes configuration

  This is mutually exclusive with Bridges.

  The global entry_guards will be updated with entries in EntryNodes
  (see entry_guards_set_from_config()).

  If EntryNodes is set, we need to fill the SAMPLED_GUARDS with
  EntryNodes specified in options.

§G. Appendix: IS_BAD helper function

  A guard is considered bad if is not included in the newest
  consensus.

§H. Appendix: IS_DEAD helper function

  A guard is considered dead if it's marked as bad for
  ENTRY_GUARD_REMOVE_AFTER period (30 days) unless they have been disabled
  because of path bias issues (path_bias_disabled).

§I. Appendix: IS_OBSOLETE helper function

  A guard is considered obsolete if it was chosen by an Tor
  version we can't recognize or it was chosen more than GUARD_LIFETIME ago.

-*- coding: utf-8 -*-