Skip to content

Commit 7c21a23

Browse files
committed
checkpoint - several content updates
1 parent 2161325 commit 7c21a23

File tree

15 files changed

+183
-41
lines changed

15 files changed

+183
-41
lines changed

docs/01_Welcome.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Fill in any other blanks for welcoming and setting the stage for the workshop.
3030

3131
## The Exercises
3232

33-
This workshop uses the Atlas Search Playground for the exercises. All you need is a browser and network connectivity. This handy developer tool allows us to work in an isolated, focused environment
33+
This workshop uses the Atlas Search Playground for the exercises.
34+
All you need is a browser and network connectivity.
35+
This handy developer tool allows us to work in an isolated, focused environment
3436
with no setup.
3537

docs/10_About_workshop/1_intro.mdx

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,23 @@
11
# 📘 Atlas Search Playground
22

3-
The Atlas Search Playground (or just Playground) is used for the exercises in this workshop. The Playground is a self-contained lightweight, yet feature-rich Atlas Search environment which does not require an Atlas account to use.
3+
The Atlas Search Playground (or just Playground) is used for the exercises in this workshop.
4+
The Playground is a self-contained lightweight, yet feature-rich Atlas Search environment
5+
which does not require an Atlas account to use.
46

57
There are two tools in the Playground:
68
* `Code Sandbox`: your data, an index configuration, an aggregation pipeline, and results
79
* `Search Demo Builder`: a configurable search UI on your data
810

9-
The exercises will only use the Code Sandbox, as it allows saving and sharing links to the full environment and allows us to work on one topic at a time. We'll cover the Search Demo Builder briefly near the end of thw workshop.
11+
The exercises will only use the Code Sandbox, as it allows saving and sharing links to
12+
the full environment and allows us to work on one topic at a time.
13+
14+
We'll cover the Search Demo Builder briefly near the end of the workshop.
1015

1116
## Code Sandbox layout
1217

13-
To begin, navigate to the [Atlas Search Playground](https://search-playground.mongodb.com/). In the next section, you'll work through the first exercise to get familiar with the Playground's Code Sandbox.
18+
To begin, navigate to the [Atlas Search Playground](https://search-playground.mongodb.com/).
19+
In the next section, you'll work through the first exercise to get familiar with the
20+
Playground's Code Sandbox.
1421

1522
Let's dive into the world of Atlas Search using this convenient and powerful playground!
1623

docs/20_Intro_to_Atlas_Search/1_system.mdx

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,14 @@ or dynamically mapping any and all fields supported.
99
![system diagram](/img/system_diagram.png)
1010

1111
Changes to a collection via updates, deletes, or additions are *eventually consistent*, meaning the
12-
index is updated independently of changes to the collection in a separate process, asynchronously. The lag between a change made to the database and refelected in a subsequent search is dependent on many factors such as deployment tier and architecture, the complexity of the index mapping, the other changes that are also queued, and the laws of physics.
12+
index is updated independently of changes to the collection in a separate process, asynchronously.
13+
The lag between a change made to the database and refelected in a subsequent search is dependent
14+
on many factors such as deployment tier and architecture, the complexity of the index mapping,
15+
the other changes that are also queued, and the laws of physics.
1316

14-
The Atlas Search process can be deployed either coupled alongside the database nodes, or on separate dedicated nodes. Dedicated nodes provide separation of concerns, alleviating resource contention. Dedicated search nodes are recommended for production workloads.
17+
The Atlas Search process can be deployed either coupled alongside the database nodes,
18+
or on separate dedicated nodes. Dedicated nodes provide separation of concerns, alleviating
19+
resource contention. Dedicated search nodes are recommended for production workloads.
1520

1621
## Coupled nodes
1722
![coupled nodes](/img/coupled.png)

docs/20_Intro_to_Atlas_Search/2_aggregation_stages.mdx

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,58 @@ where the magic happens
66

77
## $searchMeta
88

9-
The `$searchMeta` stage performs the same search that `$search` does, but only returns the results metadata, not actual matching documents. Results metadata includes the count of matching results and facets. This same metadata is available when using `$search` too, accessible in the $$SEARCH_META context variable.
9+
The `$searchMeta` stage performs the same search that `$search` does,
10+
but only returns the results metadata, not actual matching documents.
11+
Results metadata includes the count of matching results and facets.
12+
This same metadata is available when using `$search` too,
13+
accessible in the $$SEARCH_META context variable.
1014

11-
**TODO**: Add exercise using $search and have developer switch to $searchMeta to see the results
15+
## Exercises: search pipeline stages
16+
17+
### Step 1
18+
1. Navigate to the original Playground used in the last section's exercise
19+
https://search-playground.mongodb.com/tools/code-sandbox/snapshots/6782aea0667feaaf06324b87
20+
2. Press Run. Got the empty `[]` array of results?
21+
3. Change `$search` to `$searchMeta` (in the Query pane), and press Run again.
22+
23+
<details>
24+
<summary>Here's the expected results...</summary>
25+
<div>
26+
```js
27+
[
28+
{
29+
"count": {
30+
"lowerBound": 0
31+
}
32+
}
33+
]
34+
```
35+
</div>
36+
</details>
37+
38+
### Step 2
39+
40+
1. Now fix the query to match the document as you did previously
41+
2. Press Run again
42+
3. Did the `$searchMeta` results change?
43+
44+
<details>
45+
<summary>Here's the expected results...</summary>
46+
<div>
47+
```js
48+
[
49+
{
50+
"count": {
51+
"lowerBound": 1
52+
}
53+
}
54+
]
55+
```
56+
</div>
57+
</details>
58+
59+
## Post $search-stages
60+
61+
* Such as $sort, $group, etc any stage that consumes **all** documents from previous stage.
62+
* $limit, $addFields, $project and the like are fine as they only operate on one doc at a time
63+
or cut-off
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# 📘 Powered by Lucene
2+
3+
https://lucene.apache.org
4+
5+
## Anatomy of a Lucene index
6+
7+
A Lucene index encapsulates specialized data structures unique to each type of data indexed.
8+
9+
* Numbers and dates: ...
10+
* Geo-spatial: ...
11+
* Text: via inverted indexes
12+
13+
Each field is indexed independently.
14+
15+
Segmented architecture, append-only, for fast indexing. Background processes to optimize the index
16+
segments.
17+
18+
## Inverted Index
19+
20+
![inverted index](/img/analysis_lucene_standard.png)
21+
22+
## Search algorithms
23+
24+
* "index intersection" using skip lists
25+
* link to Adrien's presentation
26+
27+
Atlas Search translates its search operators to Lucene's `Query` API.

docs/30_Index_configuration/2_index_config.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,13 @@ Documents are mapped to an index through a flexible configuration.
88

99
## Dynamic mapping
1010

11+
You can configure an entire index to use dynamic mappings, or specify individual fields,
12+
such as fields of type `document`, to be dynamically mapped.
13+
1114
## Configuring a real Atlas Search index
1215

1316
* Atlas Search Visual Editor or JSON Editor
1417
* via Compass
1518
* Atlas CLI
1619
* Driver commands
20+

docs/30_Index_configuration/6_string.mdx

Lines changed: 13 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -14,36 +14,16 @@ See also: Relevant As-You-Type Suggestions Search Solution
1414
Basic examples
1515
lucene.standard matching case-insensitive: https://search-playground.mongodb.com/tools/code-playground/snapshots/664738af4e0a3f240a5de9d9
1616

17-
## Analysis matters
18-
“searches” query does not match “Search” with lucene.standard: https://search-playground.mongodb.com/tools/code-playground/snapshots/664739964e0a3f240a5de9db
19-
“searches” matches “Search” using lucene.english: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473aa64e0a3f240a5de9dd
20-
21-
## Custom analyzers
22-
Last 4 digit of phone number matching (regex extraction during indexing, keyword analysis at query time): https://search-playground.corp.mongodb.com/tools/code-playground/snapshots/669e6c98d49ef6fad98118ba
23-
Example of being able to do ‘startsWith’ and ‘endsWith’ using wildcard and ‘reverse’ token filter:
24-
https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c8bc4a45448733549bbc
25-
26-
Example of being able to do ‘startsWith’, ‘endsWith’ and ‘contains’ using nGrams: https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c999934a05d9b585b6e7
27-
Relevancy
28-
Example of an as-you-type suggest configuration; sophisticated use of multi and several weighted query clauses: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473b744e0a3f240a5de9e1
29-
$project score?
30-
$project scoreDetails?
31-
32-
# multi
33-
Why?
34-
Relevancy example: boost each multi uniquely
35-
Multiple language example: may not know the language of the content and each document could be different - multi across all possible languages, query across them as desired at query-time, let relevancy sort it out
36-
Example of being able to do ‘startsWith’ and ‘endsWith’ using wildcard and ‘reverse’ token filter:
37-
https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c8bc4a45448733549bbc
38-
39-
40-
* Text: the heart and soul of your content
41-
* Strings are analyzed, tokenized into terms
42-
* Multiple analyzers can be used for a single string field (multi)
43-
* Terms: words, fragments, atomic searchable units
44-
* An inverted index structure organizes terms lexicographically/alphabetically for quick lookup (aka a dictionary)
45-
* Term statistics:
46-
* Posting list: document identifiers
47-
* Term frequency (tf): how many occurrences of the term per document
48-
* Document frequency (df): how many documents contain the term
49-
* Positions: where in the document does this term occur
17+
Query operators:
18+
text: matches any of the query terms; can include synonyms and fuzziness
19+
phrase: matches query terms that occur in proximity
20+
regex: pattern matching
21+
wildcard: matches across missing characters
22+
moreLikeThis: matches documents that overlap important terms
23+
Analysis occurs on query values
24+
Except on regex and wildcard operators: partial strings not analyzable
25+
Index-time and search-time analyzers can be different, if needed
26+
Remember: it’s a dictionary
27+
Index it how you’d like to find it; search for it how you indexed it
28+
Leverage analyzers to index text efficiently for searching
29+
Index statistics factor into score computations

docs/40_Analysis/custom.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Composed from analyzer building blocks
2+
charFilters: pre-process characters of text for filtering/replacing (optional)
3+
htmlStrip, icuNormalize, mapping, persian
4+
tokenizer: splits text into tokens
5+
edgeGram, keyword, nGram, regexCaptureGroup, regexSplit, standard, uaxUrlEmail, whitespace
6+
tokenFilters: processes individual tokens (optional)
7+
asciiFolding, daitchMokotoffSoundex, edgeGram, englishPossessive, flattenGraph, icuFolding, icuNormalizer, kStemming, length, lowercase, nGram, porterStemming, regex, reverse, shingle, snowballStemming, spanishPluralStemming, stempel, stopword, trim, wordDelimiterGraph

docs/40_Analysis/index.mdx

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,42 @@ https://www.mongodb.com/docs/atlas/atlas-search/analyzers/
88

99
![Visual Editor standard analyzer output](/img/editor_analysis.png)
1010

11+
## Analysis matters
12+
“searches” query does not match “Search” with lucene.standard: https://search-playground.mongodb.com/tools/code-playground/snapshots/664739964e0a3f240a5de9db
13+
“searches” matches “Search” using lucene.english: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473aa64e0a3f240a5de9dd
14+
15+
## Custom analyzers
16+
Last 4 digit of phone number matching (regex extraction during indexing, keyword analysis at query time): https://search-playground.corp.mongodb.com/tools/code-playground/snapshots/669e6c98d49ef6fad98118ba
17+
Example of being able to do ‘startsWith’ and ‘endsWith’ using wildcard and ‘reverse’ token filter:
18+
https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c8bc4a45448733549bbc
19+
20+
Example of being able to do ‘startsWith’, ‘endsWith’ and ‘contains’ using nGrams: https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c999934a05d9b585b6e7
21+
Relevancy
22+
Example of an as-you-type suggest configuration; sophisticated use of multi and several weighted query clauses: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473b744e0a3f240a5de9e1
23+
$project score?
24+
$project scoreDetails?
25+
26+
# multi
27+
Why?
28+
Relevancy example: boost each multi uniquely
29+
Multiple language example: may not know the language of the content and each document could be different - multi across all possible languages, query across them as desired at query-time, let relevancy sort it out
30+
Example of being able to do ‘startsWith’ and ‘endsWith’ using wildcard and ‘reverse’ token filter:
31+
https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c8bc4a45448733549bbc
32+
33+
34+
* Text: the heart and soul of your content
35+
* Strings are analyzed, tokenized into terms
36+
* Multiple analyzers can be used for a single string field (multi)
37+
* Terms: words, fragments, atomic searchable units
38+
* An inverted index structure organizes terms lexicographically/alphabetically for quick lookup (aka a dictionary)
39+
* Term statistics:
40+
* Posting list: document identifiers
41+
* Term frequency (tf): how many occurrences of the term per document
42+
* Document frequency (df): how many documents contain the term
43+
* Positions: where in the document does this term occur
44+
45+
lucene.standard (default): tokenizes at word break characters, removes punctuation, and lowercases
46+
lucene.english: standard tokenization plus de-pluralization, stop word removal, and stemming
47+
lucene.keyword: Tokenizes text as a single term; suitable for wildcard or regex matching over entire value
48+
Many language-specific analyzers built-in: (lucene.)arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, chinese, cjk, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, japanese, korean, kuromoji, latvian, lithuanian, morfologik, nori, norwegian, persian, polish, portuguese, romanian, russian, smartcn, sorani, spanish, swedish, thai, turkish, ukrainian
49+

docs/50_Operators/1_index.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,7 @@
33

44
* Search operators w/ quick labs for each one
55
* compound: filter, mustNot, must, should
6+
7+
# TODO
8+
* "For string type, the moreLikeThis and queryString operators don't support an array of strings."
9+
huh? Really?

docs/50_Operators/compound.mdx

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# `compound` operators
2+
3+
* should
4+
* must
5+
* mustNot
6+
* filter
7+
8+
`minimumShouldMatch`

docs/60_Relevancy/index.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,8 @@
55
* TF/IDF, BM25
66
* compound: additive clause scoring
77
* scoreDetails w/ demonstrative lab
8+
9+
Relevancy
10+
Example of an as-you-type suggest configuration; sophisticated use of multi and several weighted query clauses: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473b744e0a3f240a5de9e1
11+
$project score?
12+
$project scoreDetails?

docs/99_TODO.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,5 @@ https://search-playground.corp.mongodb.com/tools/code-playground/snapshots/669e8
1414
Which embedded document matched?
1515
https://search-playground.corp.mongodb.com/tools/code-playground/snapshots/669e850dd49ef6fad98118d6
1616
Using scoreDetails to glimpse analysis in action:
17+
18+
Synonyms: https://search-playground.mongodb.com/tools/code-sandbox/snapshots/6785d30eb6487c1cfd0bb817
225 KB
Loading
305 KB
Loading

0 commit comments

Comments
 (0)