Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: browser-history instructions missing #339

Open
ankostis opened this issue Nov 25, 2022 · 9 comments
Open

doc: browser-history instructions missing #339

ankostis opened this issue Nov 25, 2022 · 9 comments
Labels
documentation documentation/readme enhancements source Related to specific sources/modules/indexers

Comments

@ankostis
Copy link
Contributor

I noticed that browser-history instructions are completely absent from SOURCES.org, so I could provide a MR (if i figured how to do it:-)).

Would it suffice to do the following:

  1. run some this browserexport commands (which ones??)

  2. configuring the respective HPI MODULE, and

  3. somehow (?) instructing promnesia to hook from the above HPI SOURCE.

Another Q regards the old txt files, generated by the deprecated-script: how to also parse them into a unified browser history?

@purarue
Copy link
Contributor

purarue commented Nov 25, 2022

For 1, you should use browserexport save -b [browser] -t ~/data/browsing or something like that, see here

Can copy over the example part of the readme:

$ browserexport save -b firefox --to ~/data/browser_history
$ browserexport save -b chrome --to ~/data/browser_history
$ browserexport save -b safari --to ~/data/browser_history

For 3, after my.browser is configured in HPI, the Visit source just needs a quick transformation, like this. That source (currently in my own promnesia module repo) can be copied over/merged here

@karlicoss karlicoss added source Related to specific sources/modules/indexers documentation documentation/readme enhancements labels Jan 25, 2023
@karlicoss
Copy link
Owner

Yeah, I've been meaning to switch promnesia to use https://github.com/seanbreckenridge/promnesia/blob/master/promnesia_sean/sources/browsing.py
Just will need to implement a fallback to the old method (in case someone doesn't have HPI configured, similar to what takeout does)
Also will run some final comparisons first just in case nothing is lost/timezone issues

karlicoss added a commit that referenced this issue Feb 5, 2023
…wserexport

later will rename to browser, and implement defensive fallback onto browser_old

adapted from https://github.com/seanbreckenridge/promnesia/blob/master/promnesia_sean/sources/browsing.py

related: #339

Old vs new modules produce almost identical results (tested on various chrome & firefox databases)
There are some minor differences vs the old module:

- old database timestamps end with +00:00 UTC, new ones with +00:00 -- likely because browserexport is using timezone.utc instead of pytz
- previously locator was pointing at the database file, now it's pointing at the URL

  I guess it's not necessarily in the 'spirit' of locator field, but on the other hand, not that it's very useful to point to an sqlite file either.
  Perhaps later it could be in some sort of extra debug field instead.
karlicoss added a commit that referenced this issue Feb 9, 2023
…wserexport

later will rename to browser, and implement defensive fallback onto browser_old

adapted from https://github.com/seanbreckenridge/promnesia/blob/master/promnesia_sean/sources/browsing.py

related: #339

Old vs new modules produce almost identical results (tested on various chrome & firefox databases)
There are some minor differences vs the old module:

- old database timestamps end with +00:00 UTC, new ones with +00:00 -- likely because browserexport is using timezone.utc instead of pytz
- previously locator was pointing at the database file, now it's pointing at the URL

  I guess it's not necessarily in the 'spirit' of locator field, but on the other hand, not that it's very useful to point to an sqlite file either.
  Perhaps later it could be in some sort of extra debug field instead.
karlicoss added a commit that referenced this issue Feb 10, 2023
…wserexport

later will rename to browser, and implement defensive fallback onto browser_old

adapted from https://github.com/seanbreckenridge/promnesia/blob/master/promnesia_sean/sources/browsing.py

related: #339

Old vs new modules produce almost identical results (tested on various chrome & firefox databases)
There are some minor differences vs the old module:

- old database timestamps end with +00:00 UTC, new ones with +00:00 -- likely because browserexport is using timezone.utc instead of pytz
- previously locator was pointing at the database file, now it's pointing at the URL

  I guess it's not necessarily in the 'spirit' of locator field, but on the other hand, not that it's very useful to point to an sqlite file either.
  Perhaps later it could be in some sort of extra debug field instead.
@purarue
Copy link
Contributor

purarue commented Feb 10, 2023

@karlicoss if you'd like, I can make a PR with some instructions for using browserexport here. Should that go in SOURCES.org or somewhere else? edit: perhaps a section in GUIDE.org?

@karlicoss
Copy link
Owner

@seanbreckenridge thanks, of course would be appreciated! GUIDE.org feels a bit more generic/high level. I think a separate section in SOURCES.org would be good (since the ones that are listed there are just autogenerated from docstrings).

@ankostis
Copy link
Contributor Author

Given the opportunity, browsers forget their history pretty soon, and extending the history span is the most precious artifact of promnesiq for me. I would like the instructions to explain in detail how not to lose one accumulated history elements, when updating. And whether duplicates and overlaps are dropped.

Forgive me is what I'm saying doesn't make sense.

@karlicoss
Copy link
Owner

karlicoss commented Feb 10, 2023

No, that makes sense @ankostis ! Worth mention why even bother with setting up a promnesia module if the extension can work with local browser history directly.
And in addition it works across different devices/browsers/etc regardless the way cloud sync is set up.

Promnesia itself isn't dealing with browser history backups etc though -- so I guess the instructions will link to more detailed instructions in https://github.com/seanbreckenridge/browserexport#usage

@purarue
Copy link
Contributor

purarue commented Feb 10, 2023

And whether duplicates and overlaps are dropped.

Duplicates in this case would use the timestamp as part of the unique check, so whenever a new database is backed up to your local data directory (e.g. ~/data/browsing), running promnesia index should pick it up. And it you have the my.browser.active_browser module setup, it'll additionally snapshot your current browser database whenever you run index. I'll expand on this in docs I PR in a bit

Just as an example (I havent switched to the new browser module yet, so ignore the promnesia_sean.sources. prefix), the page we are currently on looks like this for me:

image

@karlicoss
Copy link
Owner

I guess if you use active_browser (I haven't set it up yet), you'd have duplicates in the extension coming both from the backend and from the local browser history API (if they have different source names). Maybe worth doing some frontend changes to handle that

@purarue
Copy link
Contributor

purarue commented Feb 10, 2023

Ah, that is true... even my.browser.export might have duplicates (with local browser history API) if you recently backed up a database. I haven't found it to be too bothersome, but might be worth deduping them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation documentation/readme enhancements source Related to specific sources/modules/indexers
Projects
None yet
Development

No branches or pull requests

3 participants