Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizing TimeMap formats served #122

Open
machawk1 opened this issue Feb 21, 2020 · 3 comments
Open

Customizing TimeMap formats served #122

machawk1 opened this issue Feb 21, 2020 · 3 comments

Comments

@machawk1
Copy link
Member

In server mode, MemGator serves 3 formats of TimeMaps: Link, CDXJ, and JSON.

In the vein of #116, I would like to be able to restrict the formats that my MemGator binary serves when in server mode.

Because most web archives that are Memento compliant only serve Link formatted TimeMaps, I would like impose similar restrictions at the aggregator level using MemGator (without touching the code).

@ibnesayeed
Copy link
Member

While it is doable by introducing yet another flag and some logic changes, I am not sure if that added complexity would be worth the effort, unless you have a convincing case.

@machawk1
Copy link
Member Author

What are your thoughts on reusing the -f flag to have this functionality in server mode, @ibnesayeed? Currently it looks to be ignored when run in server mode.

My use case stems from the aggregators-of-aggregator (dare I say "meta aggregator" ;) ) concept and the need to pull from well-formatted known sources when the ideal source is not available. From what I recall, MemGator expects a Link-formatted TimeMap to be available, but it would be interesting to promote each format to a first-class rather than derived format.

For example, if a MemGator that serves all three formats were to query a differently configured MemGator (with regard to # of archives and formats of TMs served), if the latter did not serve in Link, would the former still be able to use it as a source?

@ibnesayeed
Copy link
Member

What are your thoughts on reusing the -f flag to have this functionality in server mode, @ibnesayeed? Currently it looks to be ignored when run in server mode.

The flag can be reused, but it will need clear documentation to distinguish the behavior difference in one-off mode and server. However, the increased complexity of parsing comma-separated list and validating when it should have only one value, then plugging all the logic will be a mess.

For the use case you are describing, I would say it would be more dangerous to allow customization in what formats to serve responses in. For now, we know for sure that if a MemGator instance is running, it must be returning Link formatted TimeMap, despite what other formats it might support. MemGator is programmed to read only Link format as the common ground. Many output formats are there to be utilized according to specific use case where one format might feel a better fit. However, for the sake of interoperability, standard Link format is always going to be there. On the contrary, if we allow customization and someone chooses to only return CDXJ format, then any secondary aggregator will be out of luck to use it as an upstream endpoint. Even if the secondary aggregator understands how to parse other formats (say, when #116 is implemented), it has to switch parser depending on the content-type or some sort of content-sniffing would be at play. I do not think going that route is solving any problem that we have, but cause more issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants