add prompts documentation

unimib-datAI · Sep 18, 2024 · d9a9784 · d9a9784
1 parent d563222
commit d9a9784
Show file tree

Hide file tree

Showing 5 changed files with 638 additions and 0 deletions.
diff --git a/docs/prompts/completion.md b/docs/prompts/completion.md
@@ -0,0 +1,205 @@
+---
+sidebar_position: 3
+---
+
+# Completion
+
+The prompts shown in this page are related to **Completion** prompts created to perform the cell entity annotation task on tables.
+
+## Completion Prompt (no example and no pool)
+
+**Input prompt**:
+
+```
+Given a table T, it is possible to identify each of its elements through a pair of indices (i, j), where
+i is the row index and j is the column index.
+The elements of the table can be associated with a Wikidata ID if they have one.
+Given the table
+T =
+
+Mont Blanc,4810,Alps,Graian Alps
+K2,8611,Himalayas,Karakoram
+Kilimanjaro,5895,Tanzania,East African Mountains
+Mount Everest,8848,Himalayas,Himalayas
+
+the association between the elements of the table and the Wikidata’s IDs is:
+```
+
+**Response obtained by GPT-4 is**:
+
+```
+Result =
+ (0, 0): Q7158
+ (0, 1): Q12197
+ (0, 2): Q12348
+ (0, 3): Q12198
+ (1, 0): Q18286
+ (1, 1): Q1204
+ (1, 2): Q1867
+ (1, 3): Q1849
+ (2, 0): Q17050
+ (2, 1): Q1348
+ (2, 2): Q9245
+ (2, 3): Q13357
+ (3, 0): Q513
+ (3, 1): Q513
+ (3, 2): Q1867
+ (3, 3): Q1849
+
+The table T can be represented as a list of lists in Python as follows:
+T = [['Mont Blanc', '4810
+```
+
+## Completion Prompt with example (no pool)
+
+**Input prompt**:
+
+```
+Given a table T, it is possible to identify each of its elements through a pair of indices (i, j), where
+i is the row index and j is the column index.
+The elements of the table can be associated with a Wikidata ID if they have one.
+For example, if we had the cell (2, 5) in table T containing the entity ”Milano,” and in Wikidata,
+we had the entity ”Q490 Milano,” then through CEA, we associate (2, 5) = Q490
+Given the table
+T =
+
+Mont Blanc,4810,Alps,Graian Alps
+K2,8611,Himalayas,Karakoram
+Kilimanjaro,5895,Tanzania,East African Mountains
+Mount Everest,8848,Himalayas,Himalayas
+
+the association between the elements of the table and the Wikidata’s IDs is:
+```
+
+**Response obtained by GPT-4 is**:
+
+```
+Result =
+ (1, 1) = Q1216
+ (1, 3) = Q1234
+ (2, 1) = Q1823
+ (2, 3) = Q1819
+ (3, 1) = Q8917
+ (3, 4) = Q170787
+ (4, 1) = Q513
+ (4, 3) = Q1819
+
+ # TODO: implement this function
+#raise NotImplementedError()
+wikidata = {}
+for i in range(len(table)):
+    for j in range(len(table[i])):
+        if table[i][j] != "":
+            wikidata[(i+1,j+1)] = cea(table[i][j])
+return wikidata
+```
+
+## Completion Prompt with pool (no example)
+
+**Input prompt**:
+
+```
+Given a table T, it is possible to identify each of its elements through a pair of indices (i, j), where
+i is the row index and j is the column index.
+The elements of the table can be associated with a Wikidata ID if they have one.
+The Wikidata IDs are in a pool P.
+Given the table
+T =
+Mont Blanc,4810,Alps,Graian Alps
+K2,8611,Himalayas,Karakoram
+Kilimanjaro,5895,Tanzania,East African Mountains
+Mount Everest,8848,Himalayas,Himalayas
+
+and the pool
+P =
+Q513 Everest, Q524 Vesuvio, Q583 Mont Blanc
+Q791 Taumatawhakatangihangakōauauotamateapōkaiwhenuakitānatahu, Q1286 Alps, Q1637969 K2
+Q1373 Matterhorn, Q18869 Caucasus, Q1451 Pinatubo, Q1484 Mayon Volcano
+Q5469 Karakoram, Q17009782 East African Mountain, Q3428 Bungsberg, Q3660 Mount Davis,
+Q2353 Topfer, Q265406 Kilimanjaro, Q3309 Pec, Q3375 Zugspitze, Q1262 Graian Alps
+Q3388 Grossglockner, Q43105 Mount Elbrus, Q655495 Tanzania, Q3403 Dufourspitze
+Q4049 Medvědí vrch, Q4052 Dlouhé stráně, Q686902 Himalayas
+
+the association between the elements of the table and the IDs in the pool is:
+```
+
+**Response obtained by GPT-4 is**:
+
+```
+(0, 0) -> Q583
+(0, 1) -> None
+(0, 2) -> Q1286
+(0, 3) -> Q1262
+(1, 0) -> Q1637969
+(1, 1) -> None
+(1, 2) -> Q686902
+(1, 3) -> Q5469
+(2, 0) -> Q265406
+(2, 1) -> None
+(2, 2) -> Q655495
+(2, 3) -> Q17009782
+(3, 0) -> Q513
+(3, 1) -> None
+(3, 2) -> Q686902
+(3, 3) -> Q686902
+
+The function should return the table with the Wikidata IDs instead of the names of
+```
+
+## Completion Prompt with pool and with example
+
+**Input prompt**:
+
+```
+Given a table T, it is possible to identify each of its elements through a pair of indices (i, j), where
+i is the row index and j is the column index.
+The elements of the table can be associated with a Wikidata ID if they have one.
+For example, if we had the cell (2, 5) in table T containing the entity ”Milano,” and in pool P, we
+had the entity ”Q490 Milano,” then through CEA, we associate (2, 5) –>Q490.
+The Wikidata IDs are in a pool P.
+Given the table
+T =
+
+Mont Blanc,4810,Alps,Graian Alps
+K2,8611,Himalayas,Karakoram
+Kilimanjaro,5895,Tanzania,East African Mountains
+Mount Everest,8848,Himalayas,Himalayas
+
+and the pool
+P =
+
+Q513 Everest, Q524 Vesuvio, Q583 Mont Blanc
+Q791 Taumatawhakatangihangakōauauotamateapōkaiwhenuakitānatahu, Q1286 Alps, Q1637969 K2
+Q1373 Matterhorn, Q18869 Caucasus, Q1451 Pinatubo, Q1484 Mayon Volcano
+Q5469 Karakoram, Q17009782 East African Mountain, Q3428 Bungsberg, Q3660 Mount Davis,
+Q2353 Topfer, Q265406 Kilimanjaro, Q3309 Pec, Q3375 Zugspitze, Q1262 Graian Alps
+Q3388 Grossglockner, Q43105 Mount Elbrus, Q655495 Tanzania, Q3403 Dufourspitze
+Q4049 Medvědí vrch, Q4052 Dlouhé stráně, Q686902 Himalayas
+
+the association between the elements of the table and the IDs in the pool is:
+Prompt con esempio senza pool:
+Given a table T, it is possible to identify each of its elements through a pair of indices (i, j), where
+i is the row index and j is the column index.
+The elements of the table can be associated with a Wikidata ID if they have one.
+```
+
+**Response obtained by GPT-4 is**:
+
+```
+(0, 0) = Q583
+(0, 1) = None
+(0, 2) = Q1286
+(0, 3) = Q1262
+(1, 0) = Q1637969
+(1, 1) = None
+(1, 2) = Q686902
+(1, 3) = Q5469
+(2, 0) = Q265406
+(2, 1) = None
+(2, 2) = Q655495
+(2, 3) = Q17009782
+(3, 0) = Q513
+(3, 1) = None
+(3, 2) = Q686902
+(3, 3) = Q686902
+```
diff --git a/docs/prompts/execution.md b/docs/prompts/execution.md
@@ -0,0 +1,169 @@
+---
+sidebar_position: 1
+---
+
+# Execution
+
+The prompts shown in this page are related to Execution prompts created to perform the cell entity annotation task on tables.
+
+## Execution Prompt (no example and no candidates)
+
+**Input prompt**:
+
+```
+T =
+Mont Blanc,4810,Alps,Graian Alps
+K2,8611,Himalayas,Karakoram
+Kilimanjaro,5895,Tanzania,East African Mountains
+Mount Everest,8848,Himalayas,Himalayas
+
+Each element in the table T is identified by the indexes couple (i, j) where i is the row index and j
+is the column index.
+Perform a Cell Entity Annotation task on the table T using Wikidata as a Knowledge Graph
+```
+
+**Response obtained by GPT-4 is**:
+
+```
+public static void main(String[] args) throws IOException {
+    String table = "Mont Blanc,4810,Alps,Graian Alps\n" +
+            "K2,8611,Himalayas,Karakoram\n" +
+            "Kilimanjaro,5895,Tanzania,East African Mountains\n" +
+            "Mount Everest,8848,Himalayas,Himalayas";
+    String[] rows = table.split("\n");
+    String[] headers = rows[0].split(",");
+    List<List<String>> data = new ArrayList<>();
+    for (int i = 1; i < rows.length; i++) {
+        String[] row = rows[i].split(",");
+        data.add(Arrays.asList(row));
+    }
+    System.out.println(headers);
+    System.out.println(data);
+
+// create a Wikidata object
+Wikidata wikidata = new Wikidata();
+
+// get the entity for each cell
+List<List<Entity>> entities = new ArrayList<>();
+```
+
+## Execution Prompt with example (no candidates)
+
+**Input prompt**:
+
+```
+T =
+Mont Blanc,4810,Alps,Graian Alps
+K2,8611,Himalayas,Karakoram
+Kilimanjaro,5895,Tanzania,East African Mountains
+Mount Everest,8848,Himalayas,Himalayas
+
+Each element in the table T is identified by the indexes couple (i, j) where i is the row index and j
+is the column index.
+Perform a Cell Entity Annotation task on the table T using Wikidata as a Knowledge Graph
+For example, if we had the cell (2, 5) in table T containing the entity ”Milano,” and in Wikidata,
+we had the entity ”Q490 Milano,” then through CEA, we associate (2, 5) –>Q490
+```
+
+**Response obtained by GPT-4 is**:
+
+```
+GENERAZIONE
+ */
+    public static void main(String[] args) throws IOException {
+        String path = "src/main/resources/table.csv";
+        Table table = new Table(path);
+        Wikidata wikidata = new Wikidata();
+        CellEntityAnnotation cea = new CellEntityAnnotation(table, wikidata);
+        cea.annotate();
+        System.out.println(table);
+    }
+}
+```
+
+## Execution Prompt with pool (no example)
+
+**Input prompt**:
+
+```
+T =
+Mont Blanc,4810,Alps,Graian Alps
+K2,8611,Himalayas,Karakoram
+Kilimanjaro,5895,Tanzania,East African Mountains
+Mount Everest,8848,Himalayas,Himalayas
+
+Each element in the table T is identified by the indexes couple (i, j) where i is the row index and j
+is the column index.
+P =
+Q513 Everest, Q524 Vesuvio, Q583 Mont Blanc
+Q791 Taumatawhakatangihangakōauauotamateapōkaiwhenuakitānatahu, Q1286 Alps, Q1637969 K2
+Q1373 Matterhorn, Q18869 Caucasus, Q1451 Pinatubo, Q1484 Mayon Volcano
+Q5469 Karakoram, Q17009782 East African Mountain, Q3428 Bungsberg, Q3660 Mount Davis,
+Q2353 Topfer, Q265406 Kilimanjaro, Q3309 Pec, Q3375 Zugspitze, Q1262 Graian Alps
+Q3388 Grossglockner, Q43105 Mount Elbrus, Q655495 Tanzania, Q3403 Dufourspitze
+Q4049 Medvědí vrch, Q4052 Dlouhé stráně, Q686902 Himalayas
+
+In the pool P there are a group of wikidata’s ids. Perform a Cell Entity Annotation task on the
+table T using the pool
+```
+
+**Response obtained by GPT-4 is**:
+
+```
+@Test
+public void testCellEntityAnnotation() throws Exception {
+    String text = "Mont Blanc is the highest mountain in the Alps. K2 is the second highest mountain in the world. Kilimanjaro is the highest mountain in Africa. Mount Everest is the highest mountain in the world.";
+    List<Table> tables = new ArrayList<>();
+    Table table = new Table();
+    table.setDocId("test");
+    table.setId(0);
+    table.setCaption("Mountains");
+    table.setHeader(Arrays.asList("Name", "Height", "Location", "Range"));
+    List<List<String>> rows = new ArrayList<>();
+    rows.add(Arrays.asList("Mont Blanc", "4810", "Alps", "Graian Alps"));
+    rows.add(Arrays.asList("K2", "8611", "Himalayas", "Karakoram"));
+    rows.add(Arrays.asList("Kilimanjaro", "5895", "Tanzania", "East African Mountains
+```
+
+## Execution Prompt with pool and with example
+
+**Input prompt**:
+
+```
+T =
+Mont Blanc,4810,Alps,Graian Alps
+K2,8611,Himalayas,Karakoram
+Kilimanjaro,5895,Tanzania,East African Mountains
+Mount Everest,8848,Himalayas,Himalayas
+
+Each element in the table T is identified by the indexes couple (i, j) where i is the row index and j
+is the column index.
+P =
+Q513 Everest, Q524 Vesuvio, Q583 Mont Blanc
+Q791 Taumatawhakatangihangakōauauotamateapōkaiwhenuakitānatahu, Q1286 Alps, Q1637969 K2
+Q1373 Matterhorn, Q18869 Caucasus, Q1451 Pinatubo, Q1484 Mayon Volcano
+Q5469 Karakoram, Q17009782 East African Mountain, Q3428 Bungsberg, Q3660 Mount Davis,
+Q2353 Topfer, Q265406 Kilimanjaro, Q3309 Pec, Q3375 Zugspitze, Q1262 Graian Alps
+Q3388 Grossglockner, Q43105 Mount Elbrus, Q655495 Tanzania, Q3403 Dufourspitze
+Q4049 Medvědí vrch, Q4052 Dlouhé stráně, Q686902 Himalayas
+
+In the pool P there are a group of wikidata’s ids.
+Perform a Cell Entity Annotation task on the table T using the pool P.
+For example, if we had the cell (2, 5) in table T containing the entity ”Milano,” and in pool P, we
+had the entity ”Q490 Milano,” then through CEA, we associate (2, 5) –>Q490.
+```
+
+**Response obtained by GPT-4 is**:
+
+```
+@Test
+public void testCellEntityAnnotation() throws Exception {
+    String table = "Mont Blanc,4810,Alps,Graian Alps\n" +
+            "K2,8611,Himalayas,Karakoram\n" +
+            "Kilimanjaro,5895,Tanzania,East African Mountains\n" +
+            "Mount Everest,8848,Himalayas,Himalayas";
+    String pool = "Q513 Everest, Q524 Vesuvio, Q583 Mont Blanc\n" +
+            "Q791 Taumatawhakatangihangakōauauotamateapōkaiwhenuakitānatahu, Q1286 Alps, Q1637969 K2\n" +
+            "Q1373 Matterhorn, Q18869 Caucasus, Q1451 Pinatubo, Q1484 Mayon Volcano\n" +
+            "Q5469 Karakoram, Q17009782 East African Mountain
+```
diff --git a/docs/prompts/index.mdx b/docs/prompts/index.mdx
@@ -0,0 +1,8 @@
+# Prompts
+
+On this page, various prompts provided to GPT-4 for the cell entity annotation task are presented. The prompts are organized into four distinct categories:
+
+1. **Execution**: It involves instructing the model to execute a specific task on the given data and provides a clear instruction to perform a particular action or task;
+2. **Request**: Prompt is structured as a question, asking for the outcome or result of performing a specified task on the provided data;
+3. **Completion**: It deliberately leave part of the statement incomplete, tasking the model with filling in the missing information;
+4. **Programming**: Prompt is formulated as pseudocode, presenting a specific programming logic or structure. The model is expected to understand and follow the provided code structure to perform the desired task.