Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glottolog codes #340

Open
jangari opened this issue Oct 15, 2013 · 17 comments
Open

Glottolog codes #340

jangari opened this issue Oct 15, 2013 · 17 comments
Assignees
Milestone

Comments

@jangari
Copy link

jangari commented Oct 15, 2013

Some people at MPI Leipzig are working on a replacement for ethnologue codes, called Glottolog (http://glottolog.org). Glottocodes (as they are called) have several advantages over ethnologue codes which I won't go into, but more importantly their website collates lots of data from other databases (including OLAC) and so on.

I think it would be within PARADISEC's interests to support glottolog codes and publish them within collection pages using NABU. I've had discussions with the Glottolog team and if we do this, they can harvest metadata from our catalog (or potentially our rif-cs feed) and display a link from a langauge page on glottolog to collection pages in NABU if we have a collection relating to that language. This increases our interoperability significantly.

Moreover, if we display glottocodes in NABU, we could quite easily point from our page on a language to the glottolog page for it.

Most glottocodes are compatible with iso639-3, making it a fairly simple process, but there will be some that are not the same. However I'm confident that these won't be so numerous as to make it too big of a job.

@silviapfeiffer
Copy link
Contributor

Is there a spreadsheet or an API that we can use to map iso639-3 codes to glottolog codes?
We could:

  • add a column to the languages table (http://catalog.paradisec.org.au/admin/languages)
  • import such a spreadsheet with those codes that have a mapping
  • then report to you which languages don't have a mapping
  • so you can fill in the remaining glotto codes & languages by hand.

We could introduce a special iso code ("glt" or something - similar to how we have "mul" for "multiple") that we can use for glotto-only codes.

The most important question: do the glotto codes have map coordinates and would you prefer importing/using those coordinates over the ones that are currently in our DB?

I can see that there are coordinates here for example: http://glottolog.org/resource/languoid/id/gala1264 - how do they compare to ours http://catalog.paradisec.org.au/admin/languages?utf8=%E2%9C%93&q[code_contains]=glo&commit=Filter&order=name?

Just looking around their source code (https://github.com/clld/glottolog3) - doesn't seem like they have an API. I'd suggest waiting until they offer such. Without an API, the data is pretty useless IMHO. Might be worth asking them about.

@xrotwang
Copy link

hi, glottolog3 developer here. It's probably not what you would call an API, but since glottolog does allow requesting language pages using iso codes, you can use this as API to get the mapping:

robert@astroman:~$ curl -I http://glottolog.org/resource/languoid/iso/deu
HTTP/1.1 302 Found
Date: Tue, 15 Oct 2013 23:42:09 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 218
Connection: keep-alive
Server: gunicorn/17.5
Location: http://glottolog.org/resource/languoid/id/stan1295

so iso-639-3 deu maps to glottocode stan1295.

@jangari
Copy link
Author

jangari commented Oct 16, 2013

"The most important question: do the glotto codes have map coordinates and would you prefer importing/using those coordinates over the ones that are currently in our DB?"

I think we should retain the coords used in our system, since ours have bounding boxes (it would be incredibly awesome if they were polygons, but I doubt such data is available) and Glottolog has what appear (on the basis of the Galambu example) to be points which are the centroids of those bounding boxes.

Another reason is that occasionally, our map data differs from the language map coordinates if, for example, the recordings took place in a diaspora community or if the language has moved location which hasn't been recognised by Ethnologue and by these boundary boxes. I can think of one collection in particular (http://catalog.paradisec.org.au/collections/LG1) where the language recorded differs from the current understanding with respect to its location.

@jangari
Copy link
Author

jangari commented Oct 16, 2013

Hi Robert,

So I take it that it wouldn't be terribly complicated to construct a table that links glottocodes to iso639-3 that Sylvia can use to introduce glotocodes into our languages table.

If we add glottocodes into our languages table, then how can we indicate them as such? I.e., what should we use as an analogue of "iso639-3"?

I also have intentions of disseminating glottocodes using our rif-cs feed, by the way. Robert, this could be used as a means of linking from languoids back to collection pages in Paradisec that contain items relating to that languoid.

@LindaBarwick
Copy link

Just to throw into the discussion the multitree.org codes, which I think are immediately relevant.

Professor Linda Barwick
Associate Dean (Research)
Sydney Conservatorium of Music
Greenway Building C41
The University of Sydney NSW 2006
T +61 (0)2 9351 1383 M +61 (0)409 712 722
E [email protected]
W sydney.edu.au/music

Sent from my mobile device - please excuse brevity and typos

On 16 Oct 2013, at 11:29 am, jangari [email protected] wrote:

Hi Robert,

So I take it that it wouldn't be terribly complicated to construct a table that links glottocodes to iso639-3 that Sylvia can use to introduce glotocodes into our languages table.

If we add glottocodes into our languages table, then how can we indicate them as such? I.e., what should we use as an analogue of "iso639-3"?

I also have intentions of disseminating glottocodes using our rif-cs feed, by the way. Robert, this could be used as a means of linking from languoids back to collection pages in Paradisec that contain items relating to that languoid.


Reply to this email directly or view it on GitHub.

@xrotwang
Copy link

@jangari yes, I think such a mapping could be easily constructed. We could also provide a spreadsheet for the mapping, but I think the method I explained above would be even better suited for periodically updating the mapping or adding codes for new languages in paradisec. I would prefer golottocodes to be indicated as "glottocode" :) It's not much of a term yet, but we're on the way.
Disseminating the relevant glottocodes with the rif-cs feed seems perfect from my end. We can certainly parse that.

@silviapfeiffer
Copy link
Contributor

@xrotwang Thanks for that - it looks like it would indeed be simple to get the language code mappings from iso to glottocodes. This way, we can add glottocodes to the languages that are already in nabu.

However, how would we get languages and glottocodes that don't have an iso equivalent?

@silviapfeiffer
Copy link
Contributor

@jangari We'd simply add a glottocode entry to the languages table - that part will be easy. I assume you'd also like to have a glottocode field in items and collections? Maybe we can use either the iso codes or the glotto codes to determine the map, depending on which one has data?

@xrotwang
Copy link

@silviapfeiffer well, when there is no iso code, you could try with the name:

robert@astroman:~$ curl -I "http://glottolog.org/glottolog?name=Gamella&namequerytype=whole"
HTTP/1.1 302 Found
Date: Wed, 16 Oct 2013 13:31:44 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 218
Connection: keep-alive
Server: gunicorn/17.5
Location: http://glottolog.org/resource/languoid/id/game1240

If a search like above does only find one matching languoid, we will redirect to this languoid instead of showing the search results page. So HTTP 302 can be used to identify these cases. However, language names are not the most reliable identifiers, so most probably, a human will have to find appropriate glottocodes in case there is no iso code.

@jangari
Copy link
Author

jangari commented Oct 16, 2013

I'd be happy to manually link the ones that don't automatically correspond.

@silviapfeiffer, yes, I think it'd be a good idea to have a glottocode field on collection and item pages, next to the eth codes. Although looking at it now, that might be kind of complex; as one might expect, it doesn't currently say 'ethnologue code' or 'iso639-3', it just says 'language'. Should we have two fields? Or one field that populates both glottocode and eth code at once? So it could for example say:

Language: Tiwi (iso639-3: tiw; glottocode: tiwi1244)

So if we have a direct correspondence with glottocodes and ethnologue codes (after manually linking the rest), then selecting a language from a list will populate both at once.

This might bring up further problems later, say if someone needs to add a language because it isn't in the ethnologue list but does have a glottocode, but these will be rare presumably and we can deal with them when they happen.

For rif-cs, we can add a subject field in the feed level with the ethnologue code. For example, if we have (taking the example above):

<subject type="local">Tiwi</subject>
<subject type="iso639-3">tiw</subject>

Then we can append to it the following:

<subject type="glottocode">tiwi1244</subject>

Then I presume Glottolog will be able to harvest our feed, parsing out that element.

@jangari
Copy link
Author

jangari commented Oct 17, 2013

@silviapfeiffer
Copy link
Contributor

@xrotwang I'm not worried about matching those languages that we have in our DB right now to a glottocode (we can indeed do that with both your suggestions) - I'm worried about getting glottocodes and languages that we don't have yet

@silviapfeiffer
Copy link
Contributor

@jangari We need a new field, so two fields, but one language name only. As I suggested: we can use a special iso code for languages that only have a glottocode (e.g. glt) similar to how we have mul right now. Shouldn't be a problem.

@xrotwang
Copy link

@silviapfeiffer For new languages I don't see a way around manual assignment of a glottocode upon checking with http://glottolog.org - unless at some point in time depositors of data already provide a glottocode. But that's not going to happen soon, I guess.

@NickWardPDSC
Copy link
Collaborator

closing this as it's not a current issue; but noting there is still a good lot of useful information in the above comments

@johnf
Copy link
Member

johnf commented May 8, 2024

@nthieberger I think glottolog was mentioned in some meeting in the last 12 months.
Is this still of interest?

@johnf
Copy link
Member

johnf commented Jan 24, 2025

A bit more research needed on the right approach to take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants