Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

real time indexing #20

Open
karlicoss opened this issue Dec 27, 2019 · 6 comments · Fixed by #211
Open

real time indexing #20

karlicoss opened this issue Dec 27, 2019 · 6 comments · Fixed by #211
Labels
backend Related to indexing/serving

Comments

@karlicoss
Copy link
Owner

E.g. something inotify based. That would make the implementation quite a bit more complext that it's at the moment.
Also due to the nature of many exports (periodic), it won't be realtime unless the underlying exports are realtime.
Still it could at least detect source files changes, etc.
Also would work well in conjunction with Grasp.

@karlicoss karlicoss added the backend Related to indexing/serving label Dec 27, 2019
@karlicoss karlicoss pinned this issue May 25, 2020
@karlicoss
Copy link
Owner Author

Might need to be careful about closing libmagic #124 (comment)

@karlicoss
Copy link
Owner Author

Relevant: i've implemented 'almost realtime' indexing recently:

INDEX_POLICY = os.environ.get('PROMNESIA_INDEX_POLICY', 'overwrite_all')

E.g. you can have a separate config file only with your text notes (which should be indexed very fast). Then if you run

PROMNESIA_INDEX_POLICY=update promnesia index --config /path/to/small/config, it will merge it into the main database.

That means you can run it very often (e.g. every five minutes), or potentially combine with entr to achieve 'realtime' indexing..

@karlicoss karlicoss added this to the prio-1 milestone Nov 20, 2020
@karlicoss karlicoss removed this from the prio-1 milestone Nov 21, 2020
@ankostis
Copy link
Contributor

The last comment here needs to make it into main docs.

Even better, if a new option is added like promnesia index --update so that the above preserves existing items in server's database:

promnesia index --update --config <small-config> --secrets <secret-file>

But what about de-duplication? Are there any issues with updates?

@karlicoss
Copy link
Owner Author

Yep, good idea to pass it in cmdline args! It was somewhat experimental at first, so I made it an env variable, but it seems to work pretty well (apart from one minor race condition I might need to fix first).
Maybe even it makes sense to make --update mode the default? I guess the worst that would happen is some stale entries would be in the database -- then if the user notices them, they can do a full reindex manually.

Regarding deduplication -- not sure what do you mean?
This is how it works at the moment

policy_update = update_policy_active()
if not policy_update:
engine = create_engine(f'sqlite:///{tpath}')
else:
engine = create_engine(f'sqlite:///{db_path}')
binder = NTBinder.make(DbVisit)
meta = MetaData(engine)
table = Table('visits', meta, *binder.columns)
meta.create_all()
cleared: Set[str] = set()
with engine.begin() as conn:
for chunk in chunked(vit_ok(), n=_CHUNK_BY):
srcs = set(v.src or '' for v in chunk)
new = srcs.difference(cleared)
for src in new:
conn.execute(table.delete().where(table.c.src == src))
cleared.add(src)

So it clears all the entries corresponding to the data source first and then inserts them. Hopefully shouldn't result in duplication!

ankostis added a commit to ankostis/promnesia that referenced this issue Mar 1, 2021
- CLI options described in the 2nd case explained in karlicoss#211,
  due to simplicity.
- The precedence for deciding update/overwrite:
  env-var, --update/overwrite, --source exist?
- Function defaults are false, as suggested in
  [karlicoss#20](karlicoss#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
ankostis added a commit to ankostis/promnesia that referenced this issue Mar 1, 2021
- CLI options described in the 2nd case explained in karlicoss#211,
  due to simplicity.
- The precedence for deciding update/overwrite:
  env-var, --update/overwrite, --source exist?
- Function defaults are false, as suggested in
  [karlicoss#20](karlicoss#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
ankostis added a commit to ankostis/promnesia that referenced this issue Mar 2, 2021
- CLI options described in the 2nd case explained in karlicoss#211,
  due to simplicity.
- The precedence for deciding update/overwrite:
  env-var, --update/overwrite, --source exist?
- Function defaults are false, as suggested in
  [karlicoss#20](karlicoss#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
ankostis added a commit to ankostis/promnesia that referenced this issue Mar 3, 2021
- CLI options described in the 2nd case explained in karlicoss#211,
  due to simplicity.
- The precedence for deciding update/overwrite:
  env-var, --update/overwrite, --source exist?
- Function defaults are false, as suggested in
  [karlicoss#20](karlicoss#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
ankostis added a commit to ankostis/promnesia that referenced this issue Mar 5, 2021
as suggested in karlicoss#20:

- drop PROMNESIA_INDEX_POLICY env-var.
- CLI options described in the 2nd case explained in karlicoss#211,
  due to simplicity.
- Function defaults are false, as suggested in
  [karlicoss#20](karlicoss#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
ankostis added a commit to ankostis/promnesia that referenced this issue Mar 6, 2021
as suggested in karlicoss#20:

- drop PROMNESIA_INDEX_POLICY env-var.
- CLI options described in the 2nd case explained in karlicoss#211,
  due to simplicity.
- Function defaults are false, as suggested in
  [karlicoss#20](karlicoss#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
ankostis added a commit to ankostis/promnesia that referenced this issue Mar 6, 2021
as suggested in karlicoss#20:

- drop PROMNESIA_INDEX_POLICY env-var.
- CLI options described in the 2nd case explained in karlicoss#211,
  due to simplicity.
- Function defaults are false, as suggested in
  [karlicoss#20](karlicoss#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
ankostis added a commit to ankostis/promnesia that referenced this issue Mar 7, 2021
as suggested in karlicoss#20:

- drop PROMNESIA_INDEX_POLICY env-var.
- CLI options described in the 2nd case explained in karlicoss#211,
  due to simplicity.
- Function defaults are false, as suggested in
  [karlicoss#20](karlicoss#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
karlicoss pushed a commit that referenced this issue Mar 7, 2021
as suggested in #20:

- drop PROMNESIA_INDEX_POLICY env-var.
- CLI options described in the 2nd case explained in #211,
  due to simplicity.
- Function defaults are false, as suggested in
  [#20](#20 (comment)).
- Both index & demo updated.
- Env-var now checks its value one of (update|overwrite).
- All update/overwrite decision logic moved to __main_.
@karlicoss
Copy link
Owner Author

hmm seems that it was closed automatically by github -- we don't really have realtime indexing yet, so I'll reopen

@karlicoss karlicoss reopened this Jan 25, 2023
@karlicoss
Copy link
Owner Author

Perhaps for actual 'realtime' this would need proper HPI support.
E.g. HPI module exposes a generator or something, which Promnesia can poll on (presumably, in a loop over all promnesia sources).
Not sure how easy it'll be to make it asynchronous enough though, and also going to be tricky to 'expire' stale Visits, but could work well for incremental/synthetic sources (which typically are the most expensive computationally)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to indexing/serving
Projects
None yet
2 participants