-
Notifications
You must be signed in to change notification settings - Fork 16
/
searching.theory.txt
246 lines (212 loc) · 15.3 KB
/
searching.theory.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
SEARCH
/=+===============================+=\
/ : : \
)==: GENERAL :==(
\ :_______________________________: /
\=+===============================+=/
BRUTE FORCE ==> #Trying all combinations|permutations
#Also called "exhaustion".
#Time complexity:
# - average|worst: O(nᵐ)
# - best: O(1)
#Space complexity: O(1)
#Can sort input:
# - so than can stop when > the searched value
# - to put more likely items first
LINEAR|SEQUENTIAL SEARCH ==> #Brute force when trying to find an item in an array
#Time complexity: O(n)
SEARCH KEY ==> #Arbitrary starting vertex of search
FINGER SEARCH ==> #Search where the search key ("finger") is provided.
#Only efficient if search key is likely to be close to searched node.
#Often implemented on linked lists, treaps, etc.
/=+===============================+=\
/ : : \
)==: GRAPH SEARCH :==(
\ :_______________________________: /
\=+===============================+=/
PRUNING ==> #During a graph search algorithm, skipping a node's descendants when they
#can be guessed (from the parent's value) to be invalid.
#Unless pruning can be done, search algorithms like BFS/DFS are conceptually a linear search.
BREADTH-FIRST SEARCH (BFS) ==> #Search one level at a time, i.e.:
# - start at search key
# - visit each vertex with distance 1, then 2, etc.
#Time complexity: O(order + size)
#Space complexity: O(order)
#Good when:
# - depth is big
# - searched value's height is probably high
#Bad when:
# - space complexity is important
# - branching factor is big
DEPTH-FIRST SEARCH (DFS) ==> #Search one path at a time (until cycle or dead-end), i.e.:
# - start at search key
# - visit vertex
# - if:
# - it has non-visited children, repeat with the leftmost one
# - it is the search key, stop
# - otherwise, repeat with parent ("backtrack")
#Also called "backtracking" when applying to more general algorithms
#"Backjumping":
# - when it can know that a node and its descendants will be invalid because
# a siblings|cousin is invalid
#Time complexity: O(order + size)
#Space complexity: O(order)
#Good when:
# - branching factor is big
# - searched value's height is probably low
#Bad when:
# - depth is big
DEPTH-LIMITED SEARCH (DLS) ==> #DFS where we stop when reaching a given depth.
#Compensate some DFS problems but is an incomplete algorithm
ITERATIVE DEEPENING DFS (IDDFS) #Running DLS iteratively with increasing depth.
==> #Also called "Iterative deepening search" (IDS)
#Each iteration repeats DLS on nodes already visited:
# - cons: time complexity
# - pro: space complexity, i.e. does not need to store nodes to know if already visited
#I.e. has pros of BFS but without space complexity problem.
/=+===============================+=\
/ : : \
)==: ABSTRACT DATA TYPES :==(
\ :_______________________________: /
\=+===============================+=/
SUMMARY ==> #Relationship between data items:
# - none:
# - with keys: associative array
# - unique values: bidirectional map
# - non-unique keys: multimap
# - without keys: multiset
# - unique values: set
# - serial: list
# - FILO access: stack
# - FIFO access: queue
# - FILO/FIFO access: deque
# - other: graph
# - parent/child: tree
ASSOCIATIVE ARRAY ==> #Unordered array of (KEY, VAL) pairs, with unique keys.
#Also called "map" or "dictionary"
#Operations:
# - access(KEY)->VAL
# - set(KEY, VAL)
# - delete(KEY)
#Implementations:
# - static data structure:
# - varying size items: variant
# - same size items: random access array (index map)
# - dynamic data structure:
# - small size: association list
# - medium size: search tree
# - big size:
# - more space efficient: search tree
# - faster, not persistent: hash table
BIDIRECTIONAL MAP ==> #Associative array where values are unique.
#I.e. values can be used as keys as well.
#Also called "hash bag"
MULTIMAP ==> #Associate array where keys are not unique.
#Can also be thought as associative array where values are arrays.
#Also called "multihash" or "multidict"
MULTISET ==> #Unordered array of values.
#Also called "bag"
#Operations:
# - insert(VAL)
# - has(VAL)->BOOL
# - count(VAL)->NUM
# - delete(VAL)
# - clear()
# - length()->NUM: also called "cardinality" or "capacity"
# - pop()->VAL, pick()->VAL: pop|get random element
# - union(SET2)->SET3
# - intersection(SET2)->SET3
# - difference(SET2)->SET3
# - is_subset(SET2)->BOOL
#Implementation: like associative array
SET ==> #Multiset with unique values.
#Operations: like multiset, except count()
#Implementation: like multiset with also:
# - bloom filter: similar pros|cons to hash table but:
# - more space efficient
# - false positives, limited operations
LIST ==> #Ordered array of values
#Also called "sequence"/"linear data structure"
#Operations:
# - length()->NUM
# - access(INDEX)->VAL
# - search(VAL)->VAL
# - set(INDEX|VAL2, VAL)
# - delete(INDEX|VAL)
# - slice(INDEX, LENGTH)
# - clear()
#Good for algorithms like rank query, nearest neighbors and range queries.
#Implementation:
# - static data structure: random access array
# - dynamic data structure:
# - random access array:
# - fast access by key
# - slow search by value, insert|delete
# - best space efficiency, especially if values are small
# - high locality of reference
# - not persistent
# - linked list:
# - fast insert|delete (after initial search)
# - slow access|search
# - fast memory allocation
# - search tree:
# - fair insert|access|search (neither best nor worst)
# - good for ordered iteration
# - less time efficient if small number of items
CONS ==> #List of size 2, i.e. ordered pair
#Usually used with Lisp languages to build other data structures like linked list or trees
ASSOCIATION LIST ==> #List of (KEY, VAL) pairs
#I.e. ordered associative array
#Also called "alist"
#Implementation: like list
STACK/LIFO ==> #List with read|write to most-recent item
#Operations:
# - push(VAL)
# - pop()->VAL
# - peek()->VAL
#Implementation:
# - random access array
# - linked list:
# - pros:
# - persistent
# - memory allocation might be easier or more incremental
# - cons:
# - less space efficient
QUEUE/FIFO ==> #List with read|write to least-recent item
#Sometimes used for buffers.
#Operations:
# - push(VAL)
# - shift()->VAL
# - head()->VAL
#Implementation:
# - random access array (circular buffer)
# - linked list (circular linked list):
# - same pros|cons as for stack
DOUBLE-ENDED QUEUE ==> #Combination of a stack and a queue
#Also called "deque[ue]" or "head-tail"
#Implementation: like queue
GRAPH ==> #Array of values with relationships (edges).
#Operations:
# - insert|delete_node(NODE)
# - insert|delete_edge(NODE, NODE2)
# - neighbors(NODE|EDGE)->NODE2|EDGE2_ARR
# - adjacent(NODE|EDGE, NODE2|EDGE2)->BOOL
# - get_node_value(NODE)->VAL
# - set_node_value(NODE, VAL)
# - get_edge_value(NODE, NODE2)->VAL
# - set_edge_value(NODE, NODE2, VAL)
# - traverse|walk(FUNC(NODE|EDGE))
#EDGE:
# - is [NODE, NODE2] pair.
# - can be encapsuled in a single EDGE object:
# - can be used instead of (NODE, NODE2) in operations
# - e.g. get_edge_value(EDGE)->VAL
#See graph theory doc
#Implementation (see graph doc for comparison):
# - multiset (node|edge list)
# - associative array (adjacency|incidence list)
# - matrix (adjacency|incidence matrix)
TREE ==> #In computer science, usually means arborescence in graph theory, i.e. specific type of graph
#Implementation: like graph