Skip to content
This repository has been archived by the owner on Feb 11, 2024. It is now read-only.

Dependency proximity #52

Open
chainsawriot opened this issue Nov 22, 2023 · 2 comments
Open

Dependency proximity #52

chainsawriot opened this issue Nov 22, 2023 · 2 comments

Comments

@chainsawriot
Copy link
Contributor

chainsawriot commented Nov 22, 2023

require(udpipe)
#> Loading required package: udpipe
require(textplot)
#> Loading required package: textplot
##m_eng_ewt   <- udpipe_download_model(language = "english-ewt", "~/dev/misc")
## Change this
m_eng_ewt_path <- "~/dev/misc/english-ewt-ud-2.5-191206.udpipe"
m_eng_ewt_loaded <- udpipe::udpipe_load_model(file = m_eng_ewt_path)


sentence <- udpipe::udpipe_annotate(m_eng_ewt_loaded, x = "Turkish President Tayyip Erdogan, in his strongest comments yet on the Gaza conflict, said on Wednesday the Palestinian militant group Hamas was not a terrorist organisation but a liberation group fighting to protect Palestinian lands.") |> as.data.frame()
textplot::textplot_dependencyparser(sentence)
#> Loading required namespace: ggraph

Created on 2023-11-22 with reprex v2.0.2

@chainsawriot
Copy link
Contributor Author

chainsawriot commented Nov 22, 2023

Use igraph to calculate the syntactic distance. (UPDATE SHOULD BE INCORRECT, e.g. distances(graph, mode = "all")[, "ROOT"])

require(textplot)
#> Loading required package: textplot
require(igraph)
#> Loading required package: igraph
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union

##m_eng_ewt   <- udpipe_download_model(language = "english-ewt", "~/dev/misc")
## Change this
m_eng_ewt_path <- "~/dev/misc/english-ewt-ud-2.5-191206.udpipe"
m_eng_ewt_loaded <- udpipe::udpipe_load_model(file = m_eng_ewt_path)


sentence <- udpipe::udpipe_annotate(m_eng_ewt_loaded, x = "Turkish President Tayyip Erdogan, in his strongest comments yet on the Gaza conflict, said on Wednesday the Palestinian militant group Hamas was not a terrorist organisation but a liberation group fighting to protect Palestinian lands.") |> as.data.frame()
textplot::textplot_dependencyparser(sentence)
#> Loading required namespace: ggraph

sentence[,c("token_id", "head_token_id", "token", "dep_rel")]
#>    token_id head_token_id        token   dep_rel
#> 1         1             2      Turkish      amod
#> 2         2            16    President     nsubj
#> 3         3             2       Tayyip      flat
#> 4         4             2      Erdogan      flat
#> 5         5             2            ,     punct
#> 6         6             9           in      case
#> 7         7             9          his nmod:poss
#> 8         8             9    strongest      amod
#> 9         9             2     comments      nmod
#> 10       10             9          yet    advmod
#> 11       11            14           on      case
#> 12       12            14          the       det
#> 13       13            14         Gaza  compound
#> 14       14             9     conflict      nmod
#> 15       15            16            ,     punct
#> 16       16             0         said      root
#> 17       17            18           on      case
#> 18       18            16    Wednesday       obl
#> 19       19            22          the       det
#> 20       20            22  Palestinian      amod
#> 21       21            22     militant      amod
#> 22       22            28        group     nsubj
#> 23       23            22        Hamas     appos
#> 24       24            28          was       cop
#> 25       25            28          not    advmod
#> 26       26            28            a       det
#> 27       27            28    terrorist  compound
#> 28       28            18 organisation      flat
#> 29       29            32          but        cc
#> 30       30            32            a       det
#> 31       31            32   liberation  compound
#> 32       32            18        group      conj
#> 33       33            32     fighting       acl
#> 34       34            35           to      mark
#> 35       35            33      protect     xcomp
#> 36       36            37  Palestinian      amod
#> 37       37            35        lands       obj
#> 38       38            16            .     punct

graph <- graph_from_data_frame(sentence[,c("head_token_id", "token_id")])
V(graph)$name <- c("ROOT", sentence$token)
distances(graph, mode = "all")[, "terrorist"]
#>         ROOT      Turkish    President       Tayyip      Erdogan            , 
#>            5            4            6            7            5            3 
#>           in          his    strongest     comments          yet           on 
#>            1            2            4            6            5            7 
#>          the         Gaza     conflict            ,         said           on 
#>            6            6            6            6            7            7 
#>    Wednesday          the  Palestinian     militant        group        Hamas 
#>            7            7            8            8            8            5 
#>          was          not            a    terrorist organisation          but 
#>            4            2            2            0            2            3 
#>            a   liberation        group     fighting           to      protect 
#>            3            3            3            5            5            5 
#>  Palestinian        lands            . 
#>            7            8            5

Created on 2023-11-22 with reprex v2.0.2

@chainsawriot chainsawriot changed the title Syntactic proximity Dependency proximity Nov 23, 2023
@chainsawriot
Copy link
Contributor Author

Using the term from https://arxiv.org/pdf/1909.10171.pdf maybe it should be called dependency proximity.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant