Conversation
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
| Get the number of documents. | ||
| """ | ||
| return self._get_num_docs_sqlite() | ||
| return self._num_docs |
There was a problem hiding this comment.
How do we update this value?
I think it may be better to handle it here totally: here do:
self._num_docs = self._num_docs or self._get_num_docs_sqlite()
return self._num_docsAnd simply in all updates, deletes, index u set self._num_docs to 0.
There was a problem hiding this comment.
we update the value in the end of index and del, I think this way is more intuitive rather than setting it to 0
There was a problem hiding this comment.
yes there are sufficient tests for this and they all pass. I modified it to the way you suggested
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1706 +/- ##
=======================================
Coverage 85.51% 85.51%
=======================================
Files 132 132
Lines 8303 8306 +3
=======================================
+ Hits 7100 7103 +3
Misses 1203 1203
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
|
📝 Docs are deployed on https://ft-fix-hnsw-performance--jina-docs.netlify.app 🎉 |
As the user reported, num_docs() operation is expensive and slows down the search. This PR addresses the issue by storing/caching num_docs and updating it after index/del
I tested on this simple code snippet and while index time increases slightly (from 6.13 to 6.14 seconds) we can see the speedup in the search time (from 0.0238 to 0.0018, 13x speedup)