checkpoint - several content updates

erikhatcher · erikhatcher · commit 7c21a2320703 · 2025-01-14T08:36:34.000-05:00
diff --git a/docs/01_Welcome.mdx b/docs/01_Welcome.mdx
@@ -30,6 +30,8 @@ Fill in any other blanks for welcoming and setting the stage for the workshop.
 
 ## The Exercises
 
-This workshop uses the Atlas Search Playground for the exercises. All you need is a browser and network connectivity. This handy developer tool allows us to work in an isolated, focused environment
+This workshop uses the Atlas Search Playground for the exercises.
+All you need is a browser and network connectivity.
+This handy developer tool allows us to work in an isolated, focused environment
 with no setup.
 
diff --git a/docs/10_About_workshop/1_intro.mdx b/docs/10_About_workshop/1_intro.mdx
@@ -1,16 +1,23 @@
 # 📘 Atlas Search Playground
 
-The Atlas Search Playground (or just Playground) is used for the exercises in this workshop.  The Playground is a self-contained lightweight, yet feature-rich Atlas Search environment which does not require an Atlas account to use.
+The Atlas Search Playground (or just Playground) is used for the exercises in this workshop.
+The Playground is a self-contained lightweight, yet feature-rich Atlas Search environment
+which does not require an Atlas account to use.
 
 There are two tools in the Playground:
   * `Code Sandbox`: your data, an index configuration, an aggregation pipeline, and results
   * `Search Demo Builder`: a configurable search UI on your data
 
-The exercises will only use the Code Sandbox, as it allows saving and sharing links to the full environment and allows us to work on one topic at a time. We'll cover the Search Demo Builder briefly near the end of thw workshop.
+The exercises will only use the Code Sandbox, as it allows saving and sharing links to
+the full environment and allows us to work on one topic at a time.
+
+We'll cover the Search Demo Builder briefly near the end of the workshop.
 
 ## Code Sandbox layout
 
-To begin, navigate to the [Atlas Search Playground](https://search-playground.mongodb.com/). In the next section, you'll work through the first exercise to get familiar with the Playground's Code Sandbox.
+To begin, navigate to the [Atlas Search Playground](https://search-playground.mongodb.com/).
+In the next section, you'll work through the first exercise to get familiar with the
+Playground's Code Sandbox.
 
 Let's dive into the world of Atlas Search using this convenient and powerful playground!
 
diff --git a/docs/20_Intro_to_Atlas_Search/1_system.mdx b/docs/20_Intro_to_Atlas_Search/1_system.mdx
@@ -9,9 +9,14 @@ or dynamically mapping any and all fields supported.
 ![system diagram](/img/system_diagram.png)
 
 Changes to a collection via updates, deletes, or additions are *eventually consistent*, meaning the
-index is updated independently of changes to the collection in a separate process, asynchronously. The lag between a change made to the database and refelected in a subsequent search is dependent on many factors such as deployment tier and architecture, the complexity of the index mapping, the other changes that are also queued, and the laws of physics.
+index is updated independently of changes to the collection in a separate process, asynchronously.
+The lag between a change made to the database and refelected in a subsequent search is dependent
+on many factors such as deployment tier and architecture, the complexity of the index mapping,
+the other changes that are also queued, and the laws of physics.
 
-The Atlas Search process can be deployed either coupled alongside the database nodes, or on separate dedicated nodes. Dedicated nodes provide separation of concerns, alleviating resource contention. Dedicated search nodes are recommended for production workloads.
+The Atlas Search process can be deployed either coupled alongside the database nodes,
+or on separate dedicated nodes. Dedicated nodes provide separation of concerns, alleviating
+resource contention. Dedicated search nodes are recommended for production workloads.
 
 ## Coupled nodes
 ![coupled nodes](/img/coupled.png)
diff --git a/docs/20_Intro_to_Atlas_Search/2_aggregation_stages.mdx b/docs/20_Intro_to_Atlas_Search/2_aggregation_stages.mdx
@@ -6,6 +6,58 @@ where the magic happens
 
 ## $searchMeta
 
-The `$searchMeta` stage performs the same search that `$search` does, but only returns the results metadata, not actual matching documents. Results metadata includes the count of matching results and facets. This same metadata is available when using `$search` too, accessible in the $$SEARCH_META context variable.
+The `$searchMeta` stage performs the same search that `$search` does,
+but only returns the results metadata, not actual matching documents.
+Results metadata includes the count of matching results and facets.
+This same metadata is available when using `$search` too, 
+accessible in the $$SEARCH_META context variable.
 
-**TODO**: Add exercise using $search and have developer switch to $searchMeta to see the results
+## Exercises: search pipeline stages
+
+### Step 1
+1. Navigate to the original Playground used in the last section's exercise
+   https://search-playground.mongodb.com/tools/code-sandbox/snapshots/6782aea0667feaaf06324b87
+2. Press Run. Got the empty `[]` array of results?
+3. Change `$search` to `$searchMeta` (in the Query pane), and press Run again.
+
+<details>
+<summary>Here's the expected results...</summary>
+<div>
+```js
+[
+  {
+    "count": {
+      "lowerBound": 0
+    }
+  }
+]
+```
+</div>
+</details>
+
+### Step 2
+
+1. Now fix the query to match the document as you did previously
+2. Press Run again
+3. Did the `$searchMeta` results change?
+
+<details>
+<summary>Here's the expected results...</summary>
+<div>
+```js
+[
+  {
+    "count": {
+      "lowerBound": 1
+    }
+  }
+]
+```
+</div>
+</details>
+
+## Post $search-stages
+
+  * Such as $sort, $group, etc any stage that consumes **all** documents from previous stage.
+  * $limit, $addFields, $project and the like are fine as they only operate on one doc at a time
+    or cut-off
diff --git a/docs/20_Intro_to_Atlas_Search/3_lucene.mdx b/docs/20_Intro_to_Atlas_Search/3_lucene.mdx
@@ -0,0 +1,27 @@
+# 📘 Powered by Lucene
+
+https://lucene.apache.org
+
+## Anatomy of a Lucene index
+
+A Lucene index encapsulates specialized data structures unique to each type of data indexed.
+
+  * Numbers and dates: ...
+  * Geo-spatial: ...
+  * Text: via inverted indexes
+
+Each field is indexed independently.
+
+Segmented architecture, append-only, for fast indexing. Background processes to optimize the index
+segments.
+
+## Inverted Index
+
+![inverted index](/img/analysis_lucene_standard.png)
+
+## Search algorithms
+
+  * "index intersection" using skip lists
+  * link to Adrien's presentation
+
+Atlas Search translates its search operators to Lucene's `Query` API.
diff --git a/docs/30_Index_configuration/2_index_config.mdx b/docs/30_Index_configuration/2_index_config.mdx
@@ -8,9 +8,13 @@ Documents are mapped to an index through a flexible configuration.
 
 ## Dynamic mapping
 
+You can configure an entire index to use dynamic mappings, or specify individual fields,
+such as fields of type `document`, to be dynamically mapped.
+
 ## Configuring a real Atlas Search index
 
   * Atlas Search Visual Editor or JSON Editor
   * via Compass
   * Atlas CLI
   * Driver commands
+  
diff --git a/docs/30_Index_configuration/6_string.mdx b/docs/30_Index_configuration/6_string.mdx
@@ -14,36 +14,16 @@ See also: Relevant As-You-Type Suggestions Search Solution
 Basic examples
 lucene.standard matching case-insensitive: https://search-playground.mongodb.com/tools/code-playground/snapshots/664738af4e0a3f240a5de9d9
 
-## Analysis matters
-“searches” query does not match “Search” with lucene.standard: https://search-playground.mongodb.com/tools/code-playground/snapshots/664739964e0a3f240a5de9db
-“searches” matches “Search” using lucene.english: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473aa64e0a3f240a5de9dd
-
-## Custom analyzers
-Last 4 digit of phone number matching (regex extraction during indexing, keyword analysis at query time): https://search-playground.corp.mongodb.com/tools/code-playground/snapshots/669e6c98d49ef6fad98118ba
-Example of being able to do ‘startsWith’ and ‘endsWith’ using wildcard and ‘reverse’ token filter:
-https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c8bc4a45448733549bbc
-
-Example of being able to do ‘startsWith’, ‘endsWith’ and ‘contains’ using nGrams: https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c999934a05d9b585b6e7 
-Relevancy
-Example of an as-you-type suggest configuration; sophisticated use of multi and several weighted query clauses: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473b744e0a3f240a5de9e1
-$project score?
-$project scoreDetails?
-
-# multi
-Why?
-Relevancy example: boost each multi uniquely
-Multiple language example: may not know the language of the content and each document could be different - multi across all possible languages, query across them as desired at query-time, let relevancy sort it out
-Example of being able to do ‘startsWith’ and ‘endsWith’ using wildcard and ‘reverse’ token filter:
-https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c8bc4a45448733549bbc
-
-
-* Text: the heart and soul of your content
-* Strings are analyzed, tokenized into terms
-  * Multiple analyzers can be used for a single string field (multi)
-* Terms: words, fragments, atomic searchable units
-* An inverted index structure organizes terms lexicographically/alphabetically for quick lookup (aka a dictionary)
-* Term statistics:
-  * Posting list: document identifiers
-  * Term frequency (tf): how many occurrences of the term per document
-  * Document frequency (df): how many documents contain the term
-  * Positions: where in the document does this term occur
+Query operators:
+text: matches any of the query terms; can include synonyms and fuzziness
+phrase: matches query terms that occur in proximity
+regex: pattern matching
+wildcard: matches across missing characters
+moreLikeThis: matches documents that overlap important terms
+Analysis occurs on query values
+Except on regex and wildcard operators: partial strings not analyzable
+Index-time and search-time analyzers can be different, if needed
+Remember: it’s a dictionary
+Index it how you’d like to find it; search for it how you indexed it
+Leverage analyzers to index text efficiently for searching
+Index statistics factor into score computations
diff --git a/docs/40_Analysis/custom.mdx b/docs/40_Analysis/custom.mdx
@@ -0,0 +1,7 @@
+Composed from analyzer building blocks
+charFilters: pre-process characters of text for filtering/replacing (optional)
+htmlStrip, icuNormalize, mapping, persian
+tokenizer: splits text into tokens
+edgeGram, keyword, nGram, regexCaptureGroup, regexSplit, standard, uaxUrlEmail, whitespace
+tokenFilters: processes individual tokens (optional)
+asciiFolding, daitchMokotoffSoundex, edgeGram, englishPossessive, flattenGraph, icuFolding, icuNormalizer, kStemming, length, lowercase, nGram, porterStemming, regex, reverse, shingle, snowballStemming, spanishPluralStemming, stempel, stopword, trim, wordDelimiterGraph
diff --git a/docs/40_Analysis/index.mdx b/docs/40_Analysis/index.mdx
@@ -8,3 +8,42 @@ https://www.mongodb.com/docs/atlas/atlas-search/analyzers/
 
 ![Visual Editor standard analyzer output](/img/editor_analysis.png)
 
+## Analysis matters
+“searches” query does not match “Search” with lucene.standard: https://search-playground.mongodb.com/tools/code-playground/snapshots/664739964e0a3f240a5de9db
+“searches” matches “Search” using lucene.english: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473aa64e0a3f240a5de9dd
+
+## Custom analyzers
+Last 4 digit of phone number matching (regex extraction during indexing, keyword analysis at query time): https://search-playground.corp.mongodb.com/tools/code-playground/snapshots/669e6c98d49ef6fad98118ba
+Example of being able to do ‘startsWith’ and ‘endsWith’ using wildcard and ‘reverse’ token filter:
+https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c8bc4a45448733549bbc
+
+Example of being able to do ‘startsWith’, ‘endsWith’ and ‘contains’ using nGrams: https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c999934a05d9b585b6e7 
+Relevancy
+Example of an as-you-type suggest configuration; sophisticated use of multi and several weighted query clauses: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473b744e0a3f240a5de9e1
+$project score?
+$project scoreDetails?
+
+# multi
+Why?
+Relevancy example: boost each multi uniquely
+Multiple language example: may not know the language of the content and each document could be different - multi across all possible languages, query across them as desired at query-time, let relevancy sort it out
+Example of being able to do ‘startsWith’ and ‘endsWith’ using wildcard and ‘reverse’ token filter:
+https://search-playground.mongodb.com/tools/code-playground/snapshots/6683c8bc4a45448733549bbc
+
+
+* Text: the heart and soul of your content
+* Strings are analyzed, tokenized into terms
+  * Multiple analyzers can be used for a single string field (multi)
+* Terms: words, fragments, atomic searchable units
+* An inverted index structure organizes terms lexicographically/alphabetically for quick lookup (aka a dictionary)
+* Term statistics:
+  * Posting list: document identifiers
+  * Term frequency (tf): how many occurrences of the term per document
+  * Document frequency (df): how many documents contain the term
+  * Positions: where in the document does this term occur
+
+lucene.standard (default): tokenizes at word break characters, removes punctuation, and lowercases
+lucene.english: standard tokenization plus de-pluralization, stop word removal, and stemming
+lucene.keyword: Tokenizes text as a single term; suitable for wildcard or regex matching over entire value
+Many language-specific analyzers built-in: (lucene.)arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, chinese, cjk, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, japanese, korean, kuromoji, latvian, lithuanian, morfologik, nori, norwegian, persian, polish, portuguese, romanian, russian, smartcn, sorani, spanish, swedish, thai, turkish, ukrainian
+
diff --git a/docs/50_Operators/1_index.mdx b/docs/50_Operators/1_index.mdx
@@ -3,3 +3,7 @@
 
 * Search operators w/ quick labs for each one
 * compound: filter, mustNot, must, should
+
+# TODO
+  * "For string type, the moreLikeThis and queryString operators don't support an array of strings."
+    huh?  Really?
diff --git a/docs/50_Operators/compound.mdx b/docs/50_Operators/compound.mdx
@@ -0,0 +1,8 @@
+# `compound` operators
+
+  * should
+  * must
+  * mustNot
+  * filter
+
+`minimumShouldMatch`
diff --git a/docs/60_Relevancy/index.mdx b/docs/60_Relevancy/index.mdx
@@ -5,3 +5,8 @@
 * TF/IDF, BM25
 * compound: additive clause scoring
 * scoreDetails w/ demonstrative lab
+
+Relevancy
+Example of an as-you-type suggest configuration; sophisticated use of multi and several weighted query clauses: https://search-playground.mongodb.com/tools/code-playground/snapshots/66473b744e0a3f240a5de9e1
+$project score?
+$project scoreDetails?
diff --git a/docs/99_TODO.md b/docs/99_TODO.md
@@ -14,3 +14,5 @@ https://search-playground.corp.mongodb.com/tools/code-playground/snapshots/669e8
 Which embedded document matched?
 https://search-playground.corp.mongodb.com/tools/code-playground/snapshots/669e850dd49ef6fad98118d6
 Using scoreDetails to glimpse analysis in action:
+
+Synonyms: https://search-playground.mongodb.com/tools/code-sandbox/snapshots/6785d30eb6487c1cfd0bb817
diff --git a/static/img/analysis_lucene_english.png b/static/img/analysis_lucene_english.png
diff --git a/static/img/analysis_lucene_standard.png b/static/img/analysis_lucene_standard.png