arun1729
diff --git a/‎.github/workflows/python-tests.yml‎
Lines changed: 38 additions & 0 deletions b/‎.github/workflows/python-tests.yml‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎PERFORMANCE_ROADMAP.md‎
Lines changed: 76 additions & 0 deletions b/‎PERFORMANCE_ROADMAP.md‎
Lines changed: 76 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 36 additions & 2 deletions b/‎README.md‎
Lines changed: 36 additions & 2 deletions
diff --git a/‎cog/core.py‎
Lines changed: 21 additions & 2 deletions b/‎cog/core.py‎
Lines changed: 21 additions & 2 deletions
diff --git a/‎cog/database.py‎
Lines changed: 19 additions & 0 deletions b/‎cog/database.py‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎cog/torque.py‎
Lines changed: 26 additions & 0 deletions b/‎cog/torque.py‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎setup.py‎
Lines changed: 1 addition & 1 deletion b/‎setup.py‎
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,38 @@
+# This workflow runs Python tests on pull requests and pushes
+name: Python Tests
+
+on:
+  push:
+    branches: [ master, main ]
+  pull_request:
+    branches: [ master, main ]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ['3.9', '3.10', '3.11', '3.12']
+
+    steps:
+    - uses: actions/checkout@v4
+    
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v5
+      with:
+        python-version: ${{ matrix.python-version }}
+    
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install xxhash==3.2.0
+        pip install pytest
+    
+    - name: Run tests
+      run: |
+        python -m pytest test/ -v --ignore=test/bench.py --ignore=test/benchmark.py
+    
+    - name: Run quick benchmark (optional)
+      run: |
+        python test/benchmark.py --quick --skip-individual
+      continue-on-error: true
@@ -0,0 +1,76 @@
+# CogDB Performance Roadmap
+
+> Weekly release cadence - tracking performance issues and optimizations
+
+## 🔴 High Priority
+
+### Star Graph / High-Degree Vertex Performance Degradation
+**Discovered:** 2024-12-10 via benchmark  
+**Issue:** When inserting many edges from/to the same vertex, performance degrades severely.
+
+| Edges | Speed | Degradation |
+|-------|-------|-------------|
+| 100 | 569 edges/s | baseline |
+| 500 | 633 edges/s | - |
+| 1,000 | 338 edges/s | 47% slower |
+| 5,000 | 83 edges/s | **87% slower** |
+
+**Root Cause:** `put_set()` in `database.py` traverses linked lists to check for duplicates. This is O(n) per insert, making high-degree vertices O(n²) overall.
+
+**Location:** `database.py:241-277` (put_set method)
+
+**Potential Fix:**  
+1. Use hash-based set for duplicate checking instead of linked list traversal
+2. Consider bloom filter for faster "definitely not present" checks
+3. Or maintain an in-memory index of vertex adjacencies
+
+---
+
+## 🟡 Medium Priority
+
+### Unbounded Cache Growth
+**Issue:** Cache in `cache.py` grows unboundedly - no eviction policy.  
+**Fix:** Implement LRU cache with `collections.OrderedDict`  
+**Effort:** Low
+
+### Redundant Table Switches in put_node
+**Issue:** `put_node()` calls `use_table()` 5 times per edge insert.  
+**Fix:** Cache table references within the method  
+**Effort:** Low
+
+---
+
+## 🟢 Low Priority / Nice to Have
+
+### Efficient Serialization
+**Issue:** Record.marshal() uses string concatenation with `+`  
+**Fix:** Use bytearray for efficient concatenation  
+**Effort:** Low, ~5% improvement
+
+### Configurable Auto-Flush
+**Issue:** Currently binary (batch mode on/off)  
+**Fix:** Add config option for flush frequency (every N records)  
+**Effort:** Low
+
+---
+
+## ✅ Completed
+
+### v3.1.0 (2024-12-10)
+- [x] Batch flush mode - defer flush() during bulk inserts
+- [x] `Graph.put_batch()` method for efficient bulk loading
+- [x] Comprehensive benchmark suite (`test/benchmark.py`)
+- [x] ~1.6x speedup for large batch inserts
+
+---
+
+## Benchmark Baselines (v3.1.0)
+
+```
+Chain graph (batch, 5000 edges): 4,377 edges/s
+Social graph (12,492 edges): 3,233 edges/s  
+Dense graph (985 edges): 2,585 edges/s
+Read performance: 20,000+ ops/s
+```
+
+Run benchmarks: `python3 test/benchmark.py`
@@ -5,7 +5,10 @@
 # CogDB - Micro Graph Database for Python Applications
 > Documents and examples at [cogdb.io](https://cogdb.io)
 
-> New release: 3.0.5
+> New release: 3.1.0
+> - **Batch insert mode** for significantly faster bulk graph loading
+> - New `put_batch()` method for efficient triple insertion
+> - Performance improvements: up to 1.6x faster inserts at scale
 > - New word embeddings API
 > - Similarity filtering using word embeddings
 > - Filter step
@@ -45,6 +48,22 @@ g.put("alice","score","7")
 g.put("dani","score","100")
 ```
 
+#### Using `put_batch` for bulk inserts (faster)
+
+```python
+from cog.torque import Graph
+g = Graph("people")
+
+# Insert multiple triples at once - significantly faster for large graphs
+g.put_batch([
+    ("alice", "follows", "bob"),
+    ("bob", "follows", "charlie"),
+    ("charlie", "follows", "alice"),
+    ("alice", "likes", "pizza"),
+    ("bob", "likes", "tacos"),
+])
+```
+
 #### Drop Edge ###
 
 ```python
@@ -292,4 +311,19 @@ for r in scanner:
 
 ## Benchmark
 
-# ![Put Perf](notes/bench.png)
+![Put Perf](notes/bench.png)
+
+### Performance Results
+
+Run benchmarks with: `python3 test/benchmark.py`
+
+| Graph Type | Edges | Speed (edges/s) |
+|------------|-------|----------------|
+| Chain (batch) | 5,000 | 4,377 |
+| Social network | 12,492 | 3,233 |
+| Dense graph | 985 | 2,585 |
+| Chain (individual) | 5,000 | 2,712 |
+
+**Batch vs Individual Insert:**
+- 1.6x faster at 5,000 edges
+- Read performance: 20,000+ ops/second
@@ -367,6 +367,7 @@ class Store:
 
     def __init__(self, tablemeta, config, logger, caching_enabled=True, shared_cache=None):
         self.caching_enabled = caching_enabled
+        self.batch_mode = False  # When True, defers flush() until end_batch()
         self.logger = logging.getLogger('store')
         self.tablemeta = tablemeta
         self.config = config
@@ -380,8 +381,24 @@ def __init__(self, tablemeta, config, logger, caching_enabled=True, shared_cache
         logger.info("Store for file init: " + self.store)
 
     def close(self):
+        if self.batch_mode:
+            self.store_file.flush()  # Ensure pending writes are flushed on close
         self.store_file.close()
 
+    def begin_batch(self):
+        """
+        Enable batch mode - defers flush() until end_batch() is called.
+        Use this when inserting many records for significantly better performance.
+        """
+        self.batch_mode = True
+
+    def end_batch(self):
+        """
+        End batch mode and flush all pending writes to disk.
+        """
+        self.store_file.flush()
+        self.batch_mode = False
+
     def save(self, record):
         """
         Store data
@@ -391,7 +408,8 @@ def save(self, record):
         record.set_store_position(store_position)
         marshalled_record = record.marshal()
         self.store_file.write(marshalled_record)
-        self.store_file.flush()
+        if not self.batch_mode:
+            self.store_file.flush()
         if self.caching_enabled:
             self.store_cache.put(store_position, marshalled_record)
         return store_position
@@ -408,7 +426,8 @@ def update_record_link_inplace(self, start_pos, int_value):
 
         if self.caching_enabled:
             self.store_cache.partial_update_from_zero_index(start_pos, byte_value)
-        self.store_file.flush()
+        if not self.batch_mode:
+            self.store_file.flush()
 
     # @profile
     def read(self, position):
 
@@ -182,6 +182,25 @@ def print_cache_info(self):
         print("::: cache info ::: {}, {}, {}".format(self.current_namespace, self.current_table.table_meta.name,
                                                      str(self.current_table.store.store_cache.size_list())))
 
+    def begin_batch(self):
+        """
+        Enable batch mode on all tables in the current namespace.
+        Defers flush() until end_batch() is called for better bulk insert performance.
+        """
+        if self.current_namespace in self.namespaces and self.namespaces[self.current_namespace]:
+            for table in self.namespaces[self.current_namespace].values():
+                if table:
+                    table.store.begin_batch()
+
+    def end_batch(self):
+        """
+        End batch mode and flush all pending writes to disk.
+        """
+        if self.current_namespace in self.namespaces and self.namespaces[self.current_namespace]:
+            for table in self.namespaces[self.current_namespace].values():
+                if table:
+                    table.store.end_batch()
+
     def close(self):
         for name, space in self.namespaces.items():
             if space is None:
 
@@ -221,6 +221,32 @@ def put(self, vertex1, predicate, vertex2, update=False, create_new_edge=False):
         self.all_predicates = self.cog.list_tables()
         return self
 
+    def put_batch(self, triples):
+        """
+        Insert multiple triples efficiently using batch mode.
+        Significantly faster than calling put() in a loop for large datasets.
+        
+        :param triples: List of (vertex1, predicate, vertex2) tuples
+        :return: self for method chaining
+        
+        Example:
+            g.put_batch([
+                ("alice", "follows", "bob"),
+                ("bob", "follows", "charlie"),
+                ("charlie", "follows", "alice")
+            ])
+        """
+        self.cog.use_namespace(self.graph_name)
+        self.cog.begin_batch()
+        try:
+            for v1, pred, v2 in triples:
+                self.cog.use_table(pred)
+                self.cog.put_node(v1, pred, v2)
+        finally:
+            self.cog.end_batch()
+        self.all_predicates = self.cog.list_tables()
+        return self
+
     def drop(self, vertex1, predicate, vertex2):
         """
         Drops edge between vertex1 and vertex2 for the given predicate.
 
@@ -2,7 +2,7 @@
 
 
 setup(name='cogdb',
-      version='3.0.9',
+      version='3.1.0',
       description='Persistent Embedded Graph Database',
       url='http://github.com/arun1729/cog',
       author='Arun Mahendra',