From 76723dc442eba166e357ff338acedff2e6815b39 Mon Sep 17 00:00:00 2001 From: slowkow Date: Wed, 15 May 2024 21:05:44 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20slowkow/?= =?UTF-8?q?hlabud@ae1ca3a4a26aa9ff8aa15b9ba9b3354b3c215cf2=20=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- articles/examples.html | 57 ++++++++++++++++++--------- articles/numbering.html | 2 +- articles/visualize-hla-structure.html | 2 +- pkgdown.yml | 2 +- search.json | 2 +- 5 files changed, 42 insertions(+), 23 deletions(-) diff --git a/articles/examples.html b/articles/examples.html index d691a58..c671828 100644 --- a/articles/examples.html +++ b/articles/examples.html @@ -90,7 +90,7 @@

Introduction

Kamil Slowikowski

-

2024-05-03

+

2024-05-15

hlabud is an R package that provides functions to facilitate download and analysis of human leukocyte antigen (HLA) genotype sequence alignments from IMGTHLA in R.

@@ -242,12 +242,31 @@

Get a one-hot enc #> DRB1*01:01:01:03 1 0 1 0 #> DRB1*01:01:01:04 1 0 1 0 #> DRB1*01:01:01:05 1 0 1 0 +

What is a one-hot encoded matrix? Here is a simple example to +demonstrate the idea:

+
+dat <- data.frame(
+  V1 = c("A", "A", "B"),
+  V2 = c("B", "B", "B"),
+  V3 = c("C", "B", "B"),
+  stringsAsFactors = TRUE
+)
+dat
+#>   V1 V2 V3
+#> 1  A  B  C
+#> 2  A  B  B
+#> 3  B  B  B
+predict(onehot::onehot(dat), dat)
+#>      V1=A V1=B V2=B V3=B V3=C
+#> [1,]    1    0    1    0    1
+#> [2,]    1    0    1    1    0
+#> [3,]    0    1    1    1    0

Convert genotypes to a dosage matrix

Suppose we have some individuals with the following genotypes:

-
+
 genotypes <- c(
   "DRB1*12:02:02:03,DRB1*12:02:02:03",
   "DRB1*04:174,DRB1*15:152",
@@ -260,7 +279,7 @@ 

Convert genotypes to a dosage matr (e.g., 0, 1, 2).

We can use dosage() to convert each individual’s genotypes to amino acid dosages:

-
+
 dosage <- dosage(a$onehot, genotypes)
 dosage[,1:8]
 #>                                   n29unk Mn29 n28unk Vn28 n27unk Cn27 n26unk
@@ -298,7 +317,7 @@ 

Logistic regre

Let’s simulate a dataset with cases and controls to demonstrate one approach for testing which amino acid positions might be associated with cases.

-
+
 set.seed(2)
 n <- 100
 d <- data.frame(
@@ -332,7 +351,7 @@ 

Logistic regre acid position. This could reveal if any amino acid position might be associated with the case variable in our simulated dataset.

-
+
 # prepare column names for use in formulas
 ix <- 4:ncol(d)
 colnames(d)[ix] <- sprintf("VAR%s", colnames(d)[ix])
@@ -382,7 +401,7 @@ 

UMAP embedding of HLA-DRB1 allelesFor example, here is a UMAP embedding of HLA-DRB1 alleles encoded as a one-hot amino acid matrix with 1658 columns, one for each amino acid at each position. The color indicates the 2-digit allele name.

-
+
 uamp(a$onehot, n_epochs = 200, min_dist = 1, spread = 2)

We can highlight which alleles have aspartic acid (Asp or D) at @@ -409,7 +428,7 @@

Get 2020;48: D783–D788. doi:10.1093/nar/gkz1029 -
+
 af <- hla_frequencies()
 af
 #> # A tibble: 123,502 × 7
@@ -429,7 +448,7 @@ 

Get

We can use this data to plot the frequency of a specific allele (e.g. DQB1*02:01) in populations with more than 1000 sampled individuals:

-
+
 my_allele <- "DQB1*02:01"
 my_af <- af %>% filter(allele == my_allele) %>%
   filter(n > 1000) %>%
@@ -465,7 +484,7 @@ 

Compute HLA di

The amino acid distance matrix by Granthan 1974 (https://doi.org/10.1126/science.185.4154.862) encodes information about the composition, polarity, and molecular volume of each amino acid.

-
+
 grantham
 #>    amino    c    p     v
 #> 1    Ser 1.42  9.2  32.0
@@ -490,14 +509,14 @@ 

Compute HLA di #> 20 Trp 0.13 5.4 170.0

We can use that matrix to compute an HLA divergence metric for a set of individuals like this:

-
+
 my_genos <- c("A*23:01:12,A*24:550", "A*25:12N,A*11:27", "A*24:381,A*33:85")
 
 hla_divergence(my_genos)
 #> A*23:01:12,A*24:550    A*25:12N,A*11:27    A*24:381,A*33:85 
 #>           0.5131579           3.4736842           5.1078947

The divergence for a homozygote is equal to zero, by definition:

-
+
 hla_divergence("A*01:01,A*01:01")
 #> A*01:01,A*01:01 
 #>               0
@@ -505,7 +524,7 @@

Compute HLA di translated from the original Perl code by Pierini & Lenz 2018 (https://doi.org/10.1093/molbev/msy116).

The amino acid distance matrix is easily accessible, and we provide two built-in options “grantham” and “uniform”:

-
+
 amino_distance_matrix(method = "grantham")
 #>     A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y
 #> A   0 112 111 126 195  91 107  60  86  94  96 106  84 113  27  99  58 148 112
@@ -565,7 +584,7 @@ 

Download a

Here are a few examples of how to download releases or get a list of release names.

Download the latest release (default) or a specific release:

-
+
 # Download all of the data (120MB) for the latest IMGTHLA release
 install_hla(release = "latest")
 
@@ -573,7 +592,7 @@ 

Download a install_hla(release = "3.51.0")

Optionally, get or set the directory hlabud uses to store the data:

-
+
 getOption("hlabud_dir")
 #> [1] "/home/username/.local/share/hlabud"
 
@@ -594,7 +613,7 @@ 

Download a

Count the number of alleles in each IMGTHLA release

We can get a list of the release names:

-
+
 releases <- hla_releases()
 releases
 #>  [1] "3.56.0"   "3.55.0"   "3.54.0"   "3.53.0"   "3.52.0"   "3.51.0"  
@@ -603,7 +622,7 @@ 

Count the number of #> [19] "3.42.0" "3.41.2" "3.41.0" "3.40.0" "3.39.0" "3.38.0" #> [25] "3.37.0" "3.36.0" "3.35.0" "3.34.0" "3.33.0" "3.32.0"

Then we can get the allele names for each release:

-
+
 my_alleles <- rbindlist(lapply(releases, function(release) {
   retval <- hla_alleles(release = release)
   retval$release <- release
@@ -620,7 +639,7 @@ 

Count the number of #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.3412.txt'

Next, count how many alleles we have in each release:

-
+
 d <- my_alleles %>% count(release) %>% filter(n > 1)
 d
 #>     release     n
@@ -652,7 +671,7 @@ 

Count the number of #> 25: 3.56.0 39886 #> release n

And plot the number of alleles as a line plot:

-
+
 ggplot(d) +
   aes(x = release, y = n, group = 1) +
   geom_line() +
@@ -664,7 +683,7 @@ 

Count the number of axis.ticks.x = element_blank(), )

-
+
 d2 <- my_alleles %>% mutate(gene = str_split_fixed(Allele, "\\*", 2)[,1]) %>% count(release, gene)
 ggplot() +
   aes(x = release, y = n) +
diff --git a/articles/numbering.html b/articles/numbering.html
index 3c2ba12..335d135 100644
--- a/articles/numbering.html
+++ b/articles/numbering.html
@@ -90,7 +90,7 @@
 

Introduction

Kamil Slowikowski

-

2024-05-03

+

2024-05-15

The IMGTHLA provides a Github repo with alignments of amino acid sequences and nucleotide sequences for thousands of alleles of the HLA genes. The IMGTHLA diff --git a/articles/visualize-hla-structure.html b/articles/visualize-hla-structure.html index 586f2b9..f6f296e 100644 --- a/articles/visualize-hla-structure.html +++ b/articles/visualize-hla-structure.html @@ -90,7 +90,7 @@

Introduction

Kamil Slowikowski

-

2024-05-03

+

2024-05-15

In this vignette, we explore a few different methods for visualizing the molecular structure of HLA proteins. First, we’ll look at an example of how to use the NGLVieweR R package to diff --git a/pkgdown.yml b/pkgdown.yml index 9722170..6b63d94 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -5,7 +5,7 @@ articles: examples: examples.html numbering: numbering.html visualize-hla-structure: visualize-hla-structure.html -last_built: 2024-05-03T01:05Z +last_built: 2024-05-15T21:04Z urls: reference: https://slowkow.github.io/hlabud/reference article: https://slowkow.github.io/hlabud/articles diff --git a/search.json b/search.json index 08f6e77..3a026f5 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"GNU General Public License","title":"GNU General Public License","text":"Version 3, 29 June 2007Copyright © 2007 Free Software Foundation, Inc.  Everyone permitted copy distribute verbatim copies license document, changing allowed.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"preamble","dir":"","previous_headings":"","what":"Preamble","title":"GNU General Public License","text":"GNU General Public License free, copyleft license software kinds works. licenses software practical works designed take away freedom share change works. contrast, GNU General Public License intended guarantee freedom share change versions program–make sure remains free software users. , Free Software Foundation, use GNU General Public License software; applies also work released way authors. can apply programs, . speak free software, referring freedom, price. General Public Licenses designed make sure freedom distribute copies free software (charge wish), receive source code can get want , can change software use pieces new free programs, know can things. protect rights, need prevent others denying rights asking surrender rights. Therefore, certain responsibilities distribute copies software, modify : responsibilities respect freedom others. example, distribute copies program, whether gratis fee, must pass recipients freedoms received. must make sure , , receive can get source code. must show terms know rights. Developers use GNU GPL protect rights two steps: (1) assert copyright software, (2) offer License giving legal permission copy, distribute /modify . developers’ authors’ protection, GPL clearly explains warranty free software. users’ authors’ sake, GPL requires modified versions marked changed, problems attributed erroneously authors previous versions. devices designed deny users access install run modified versions software inside , although manufacturer can . fundamentally incompatible aim protecting users’ freedom change software. systematic pattern abuse occurs area products individuals use, precisely unacceptable. Therefore, designed version GPL prohibit practice products. problems arise substantially domains, stand ready extend provision domains future versions GPL, needed protect freedom users. Finally, every program threatened constantly software patents. States allow patents restrict development use software general-purpose computers, , wish avoid special danger patents applied free program make effectively proprietary. prevent , GPL assures patents used render program non-free. precise terms conditions copying, distribution modification follow.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_0-definitions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"0. Definitions","title":"GNU General Public License","text":"“License” refers version 3 GNU General Public License. “Copyright” also means copyright-like laws apply kinds works, semiconductor masks. “Program” refers copyrightable work licensed License. licensee addressed “”. “Licensees” “recipients” may individuals organizations. “modify” work means copy adapt part work fashion requiring copyright permission, making exact copy. resulting work called “modified version” earlier work work “based ” earlier work. “covered work” means either unmodified Program work based Program. “propagate” work means anything , without permission, make directly secondarily liable infringement applicable copyright law, except executing computer modifying private copy. Propagation includes copying, distribution (without modification), making available public, countries activities well. “convey” work means kind propagation enables parties make receive copies. Mere interaction user computer network, transfer copy, conveying. interactive user interface displays “Appropriate Legal Notices” extent includes convenient prominently visible feature (1) displays appropriate copyright notice, (2) tells user warranty work (except extent warranties provided), licensees may convey work License, view copy License. interface presents list user commands options, menu, prominent item list meets criterion.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_1-source-code","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"1. Source Code","title":"GNU General Public License","text":"“source code” work means preferred form work making modifications . “Object code” means non-source form work. “Standard Interface” means interface either official standard defined recognized standards body, , case interfaces specified particular programming language, one widely used among developers working language. “System Libraries” executable work include anything, work whole, () included normal form packaging Major Component, part Major Component, (b) serves enable use work Major Component, implement Standard Interface implementation available public source code form. “Major Component”, context, means major essential component (kernel, window system, ) specific operating system () executable work runs, compiler used produce work, object code interpreter used run . “Corresponding Source” work object code form means source code needed generate, install, (executable work) run object code modify work, including scripts control activities. However, include work’s System Libraries, general-purpose tools generally available free programs used unmodified performing activities part work. example, Corresponding Source includes interface definition files associated source files work, source code shared libraries dynamically linked subprograms work specifically designed require, intimate data communication control flow subprograms parts work. Corresponding Source need include anything users can regenerate automatically parts Corresponding Source. Corresponding Source work source code form work.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_2-basic-permissions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"2. Basic Permissions","title":"GNU General Public License","text":"rights granted License granted term copyright Program, irrevocable provided stated conditions met. License explicitly affirms unlimited permission run unmodified Program. output running covered work covered License output, given content, constitutes covered work. License acknowledges rights fair use equivalent, provided copyright law. may make, run propagate covered works convey, without conditions long license otherwise remains force. may convey covered works others sole purpose make modifications exclusively , provide facilities running works, provided comply terms License conveying material control copyright. thus making running covered works must exclusively behalf, direction control, terms prohibit making copies copyrighted material outside relationship . Conveying circumstances permitted solely conditions stated . Sublicensing allowed; section 10 makes unnecessary.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_3-protecting-users-legal-rights-from-anti-circumvention-law","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"3. Protecting Users’ Legal Rights From Anti-Circumvention Law","title":"GNU General Public License","text":"covered work shall deemed part effective technological measure applicable law fulfilling obligations article 11 WIPO copyright treaty adopted 20 December 1996, similar laws prohibiting restricting circumvention measures. convey covered work, waive legal power forbid circumvention technological measures extent circumvention effected exercising rights License respect covered work, disclaim intention limit operation modification work means enforcing, work’s users, third parties’ legal rights forbid circumvention technological measures.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_4-conveying-verbatim-copies","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"4. Conveying Verbatim Copies","title":"GNU General Public License","text":"may convey verbatim copies Program’s source code receive , medium, provided conspicuously appropriately publish copy appropriate copyright notice; keep intact notices stating License non-permissive terms added accord section 7 apply code; keep intact notices absence warranty; give recipients copy License along Program. may charge price price copy convey, may offer support warranty protection fee.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_5-conveying-modified-source-versions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"5. Conveying Modified Source Versions","title":"GNU General Public License","text":"may convey work based Program, modifications produce Program, form source code terms section 4, provided also meet conditions: ) work must carry prominent notices stating modified , giving relevant date. b) work must carry prominent notices stating released License conditions added section 7. requirement modifies requirement section 4 “keep intact notices”. c) must license entire work, whole, License anyone comes possession copy. License therefore apply, along applicable section 7 additional terms, whole work, parts, regardless packaged. License gives permission license work way, invalidate permission separately received . d) work interactive user interfaces, must display Appropriate Legal Notices; however, Program interactive interfaces display Appropriate Legal Notices, work need make . compilation covered work separate independent works, nature extensions covered work, combined form larger program, volume storage distribution medium, called “aggregate” compilation resulting copyright used limit access legal rights compilation’s users beyond individual works permit. Inclusion covered work aggregate cause License apply parts aggregate.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_6-conveying-non-source-forms","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"6. Conveying Non-Source Forms","title":"GNU General Public License","text":"may convey covered work object code form terms sections 4 5, provided also convey machine-readable Corresponding Source terms License, one ways: ) Convey object code , embodied , physical product (including physical distribution medium), accompanied Corresponding Source fixed durable physical medium customarily used software interchange. b) Convey object code , embodied , physical product (including physical distribution medium), accompanied written offer, valid least three years valid long offer spare parts customer support product model, give anyone possesses object code either (1) copy Corresponding Source software product covered License, durable physical medium customarily used software interchange, price reasonable cost physically performing conveying source, (2) access copy Corresponding Source network server charge. c) Convey individual copies object code copy written offer provide Corresponding Source. alternative allowed occasionally noncommercially, received object code offer, accord subsection 6b. d) Convey object code offering access designated place (gratis charge), offer equivalent access Corresponding Source way place charge. need require recipients copy Corresponding Source along object code. place copy object code network server, Corresponding Source may different server (operated third party) supports equivalent copying facilities, provided maintain clear directions next object code saying find Corresponding Source. Regardless server hosts Corresponding Source, remain obligated ensure available long needed satisfy requirements. e) Convey object code using peer--peer transmission, provided inform peers object code Corresponding Source work offered general public charge subsection 6d. separable portion object code, whose source code excluded Corresponding Source System Library, need included conveying object code work. “User Product” either (1) “consumer product”, means tangible personal property normally used personal, family, household purposes, (2) anything designed sold incorporation dwelling. determining whether product consumer product, doubtful cases shall resolved favor coverage. particular product received particular user, “normally used” refers typical common use class product, regardless status particular user way particular user actually uses, expects expected use, product. product consumer product regardless whether product substantial commercial, industrial non-consumer uses, unless uses represent significant mode use product. “Installation Information” User Product means methods, procedures, authorization keys, information required install execute modified versions covered work User Product modified version Corresponding Source. information must suffice ensure continued functioning modified object code case prevented interfered solely modification made. convey object code work section , , specifically use , User Product, conveying occurs part transaction right possession use User Product transferred recipient perpetuity fixed term (regardless transaction characterized), Corresponding Source conveyed section must accompanied Installation Information. requirement apply neither third party retains ability install modified object code User Product (example, work installed ROM). requirement provide Installation Information include requirement continue provide support service, warranty, updates work modified installed recipient, User Product modified installed. Access network may denied modification materially adversely affects operation network violates rules protocols communication across network. Corresponding Source conveyed, Installation Information provided, accord section must format publicly documented (implementation available public source code form), must require special password key unpacking, reading copying.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_7-additional-terms","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"7. Additional Terms","title":"GNU General Public License","text":"“Additional permissions” terms supplement terms License making exceptions one conditions. Additional permissions applicable entire Program shall treated though included License, extent valid applicable law. additional permissions apply part Program, part may used separately permissions, entire Program remains governed License without regard additional permissions. convey copy covered work, may option remove additional permissions copy, part . (Additional permissions may written require removal certain cases modify work.) may place additional permissions material, added covered work, can give appropriate copyright permission. Notwithstanding provision License, material add covered work, may (authorized copyright holders material) supplement terms License terms: ) Disclaiming warranty limiting liability differently terms sections 15 16 License; b) Requiring preservation specified reasonable legal notices author attributions material Appropriate Legal Notices displayed works containing ; c) Prohibiting misrepresentation origin material, requiring modified versions material marked reasonable ways different original version; d) Limiting use publicity purposes names licensors authors material; e) Declining grant rights trademark law use trade names, trademarks, service marks; f) Requiring indemnification licensors authors material anyone conveys material (modified versions ) contractual assumptions liability recipient, liability contractual assumptions directly impose licensors authors. non-permissive additional terms considered “restrictions” within meaning section 10. Program received , part , contains notice stating governed License along term restriction, may remove term. license document contains restriction permits relicensing conveying License, may add covered work material governed terms license document, provided restriction survive relicensing conveying. add terms covered work accord section, must place, relevant source files, statement additional terms apply files, notice indicating find applicable terms. Additional terms, permissive non-permissive, may stated form separately written license, stated exceptions; requirements apply either way.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_8-termination","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"8. Termination","title":"GNU General Public License","text":"may propagate modify covered work except expressly provided License. attempt otherwise propagate modify void, automatically terminate rights License (including patent licenses granted third paragraph section 11). However, cease violation License, license particular copyright holder reinstated () provisionally, unless copyright holder explicitly finally terminates license, (b) permanently, copyright holder fails notify violation reasonable means prior 60 days cessation. Moreover, license particular copyright holder reinstated permanently copyright holder notifies violation reasonable means, first time received notice violation License (work) copyright holder, cure violation prior 30 days receipt notice. Termination rights section terminate licenses parties received copies rights License. rights terminated permanently reinstated, qualify receive new licenses material section 10.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_9-acceptance-not-required-for-having-copies","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"9. Acceptance Not Required for Having Copies","title":"GNU General Public License","text":"required accept License order receive run copy Program. Ancillary propagation covered work occurring solely consequence using peer--peer transmission receive copy likewise require acceptance. However, nothing License grants permission propagate modify covered work. actions infringe copyright accept License. Therefore, modifying propagating covered work, indicate acceptance License .","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_10-automatic-licensing-of-downstream-recipients","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"10. Automatic Licensing of Downstream Recipients","title":"GNU General Public License","text":"time convey covered work, recipient automatically receives license original licensors, run, modify propagate work, subject License. responsible enforcing compliance third parties License. “entity transaction” transaction transferring control organization, substantially assets one, subdividing organization, merging organizations. propagation covered work results entity transaction, party transaction receives copy work also receives whatever licenses work party’s predecessor interest give previous paragraph, plus right possession Corresponding Source work predecessor interest, predecessor can get reasonable efforts. may impose restrictions exercise rights granted affirmed License. example, may impose license fee, royalty, charge exercise rights granted License, may initiate litigation (including cross-claim counterclaim lawsuit) alleging patent claim infringed making, using, selling, offering sale, importing Program portion .","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_11-patents","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"11. Patents","title":"GNU General Public License","text":"“contributor” copyright holder authorizes use License Program work Program based. work thus licensed called contributor’s “contributor version”. contributor’s “essential patent claims” patent claims owned controlled contributor, whether already acquired hereafter acquired, infringed manner, permitted License, making, using, selling contributor version, include claims infringed consequence modification contributor version. purposes definition, “control” includes right grant patent sublicenses manner consistent requirements License. contributor grants non-exclusive, worldwide, royalty-free patent license contributor’s essential patent claims, make, use, sell, offer sale, import otherwise run, modify propagate contents contributor version. following three paragraphs, “patent license” express agreement commitment, however denominated, enforce patent (express permission practice patent covenant sue patent infringement). “grant” patent license party means make agreement commitment enforce patent party. convey covered work, knowingly relying patent license, Corresponding Source work available anyone copy, free charge terms License, publicly available network server readily accessible means, must either (1) cause Corresponding Source available, (2) arrange deprive benefit patent license particular work, (3) arrange, manner consistent requirements License, extend patent license downstream recipients. “Knowingly relying” means actual knowledge , patent license, conveying covered work country, recipient’s use covered work country, infringe one identifiable patents country reason believe valid. , pursuant connection single transaction arrangement, convey, propagate procuring conveyance , covered work, grant patent license parties receiving covered work authorizing use, propagate, modify convey specific copy covered work, patent license grant automatically extended recipients covered work works based . patent license “discriminatory” include within scope coverage, prohibits exercise , conditioned non-exercise one rights specifically granted License. may convey covered work party arrangement third party business distributing software, make payment third party based extent activity conveying work, third party grants, parties receive covered work , discriminatory patent license () connection copies covered work conveyed (copies made copies), (b) primarily connection specific products compilations contain covered work, unless entered arrangement, patent license granted, prior 28 March 2007. Nothing License shall construed excluding limiting implied license defenses infringement may otherwise available applicable patent law.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_12-no-surrender-of-others-freedom","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"12. No Surrender of Others’ Freedom","title":"GNU General Public License","text":"conditions imposed (whether court order, agreement otherwise) contradict conditions License, excuse conditions License. convey covered work satisfy simultaneously obligations License pertinent obligations, consequence may convey . example, agree terms obligate collect royalty conveying convey Program, way satisfy terms License refrain entirely conveying Program.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_13-use-with-the-gnu-affero-general-public-license","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"13. Use with the GNU Affero General Public License","title":"GNU General Public License","text":"Notwithstanding provision License, permission link combine covered work work licensed version 3 GNU Affero General Public License single combined work, convey resulting work. terms License continue apply part covered work, special requirements GNU Affero General Public License, section 13, concerning interaction network apply combination .","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_14-revised-versions-of-this-license","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"14. Revised Versions of this License","title":"GNU General Public License","text":"Free Software Foundation may publish revised /new versions GNU General Public License time time. new versions similar spirit present version, may differ detail address new problems concerns. version given distinguishing version number. Program specifies certain numbered version GNU General Public License “later version” applies , option following terms conditions either numbered version later version published Free Software Foundation. Program specify version number GNU General Public License, may choose version ever published Free Software Foundation. Program specifies proxy can decide future versions GNU General Public License can used, proxy’s public statement acceptance version permanently authorizes choose version Program. Later license versions may give additional different permissions. However, additional obligations imposed author copyright holder result choosing follow later version.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_15-disclaimer-of-warranty","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"15. Disclaimer of Warranty","title":"GNU General Public License","text":"WARRANTY PROGRAM, EXTENT PERMITTED APPLICABLE LAW. EXCEPT OTHERWISE STATED WRITING COPYRIGHT HOLDERS /PARTIES PROVIDE PROGRAM “” WITHOUT WARRANTY KIND, EITHER EXPRESSED IMPLIED, INCLUDING, LIMITED , IMPLIED WARRANTIES MERCHANTABILITY FITNESS PARTICULAR PURPOSE. ENTIRE RISK QUALITY PERFORMANCE PROGRAM . PROGRAM PROVE DEFECTIVE, ASSUME COST NECESSARY SERVICING, REPAIR CORRECTION.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_16-limitation-of-liability","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"16. Limitation of Liability","title":"GNU General Public License","text":"EVENT UNLESS REQUIRED APPLICABLE LAW AGREED WRITING COPYRIGHT HOLDER, PARTY MODIFIES /CONVEYS PROGRAM PERMITTED , LIABLE DAMAGES, INCLUDING GENERAL, SPECIAL, INCIDENTAL CONSEQUENTIAL DAMAGES ARISING USE INABILITY USE PROGRAM (INCLUDING LIMITED LOSS DATA DATA RENDERED INACCURATE LOSSES SUSTAINED THIRD PARTIES FAILURE PROGRAM OPERATE PROGRAMS), EVEN HOLDER PARTY ADVISED POSSIBILITY DAMAGES.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_17-interpretation-of-sections-15-and-16","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"17. Interpretation of Sections 15 and 16","title":"GNU General Public License","text":"disclaimer warranty limitation liability provided given local legal effect according terms, reviewing courts shall apply local law closely approximates absolute waiver civil liability connection Program, unless warranty assumption liability accompanies copy Program return fee. END TERMS CONDITIONS","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"how-to-apply-these-terms-to-your-new-programs","dir":"","previous_headings":"","what":"How to Apply These Terms to Your New Programs","title":"GNU General Public License","text":"develop new program, want greatest possible use public, best way achieve make free software everyone can redistribute change terms. , attach following notices program. safest attach start source file effectively state exclusion warranty; file least “copyright” line pointer full notice found. Also add information contact electronic paper mail. program terminal interaction, make output short notice like starts interactive mode: hypothetical commands show w show c show appropriate parts General Public License. course, program’s commands might different; GUI interface, use “box”. also get employer (work programmer) school, , sign “copyright disclaimer” program, necessary. information , apply follow GNU GPL, see . GNU General Public License permit incorporating program proprietary programs. program subroutine library, may consider useful permit linking proprietary applications library. want , use GNU Lesser General Public License instead License. first, please read .","code":" Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'. This is free software, and you are welcome to redistribute it under certain conditions; type 'show c' for details."},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"hlabud usage examples","text":"Kamil Slowikowski 2024-05-03 hlabud R package provides functions facilitate download analysis human leukocyte antigen (HLA) genotype sequence alignments IMGTHLA R. Let’s consider question might want answer HLA genotypes. amino acid positions different two genotypes? nucleotides different?","code":"library(hlabud) a <- hla_alignments(\"DRB1\") a$release #> [1] \"3.56.0\" dosage(a$onehot, c(\"DRB1*03:01:05\", \"DRB1*03:02:03\")) #> F26 Y26 D28 E28 F47 Y47 G86 V86 #> DRB1*03:01:05 0 1 1 0 1 0 0 1 #> DRB1*03:02:03 1 0 0 1 0 1 1 0 n <- hla_alignments(\"DRB1\", type = \"nuc\") n$release #> [1] \"3.56.0\" dosage(n$onehot, c(\"DRB1*03:01:05\", \"DRB1*03:02:03\")) #> A164 T164 C171 G171 A227 T227 A240 G240 G344 T344 G345 T345 A357 #> DRB1*03:01:05 1 0 1 0 0 1 1 0 0 1 1 0 1 #> DRB1*03:02:03 0 1 0 1 1 0 0 1 1 0 0 1 0 #> G357 #> DRB1*03:01:05 0 #> DRB1*03:02:03 1"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"installation","dir":"Articles","previous_headings":"","what":"Installation","title":"hlabud usage examples","text":"quickest way get hlabud install GitHub: , included usage examples. hope inspire share HLA analyses. source code page available . Thank reporting issues hlabud.","code":"# install.packages(\"devtools\") devtools::install_github(\"slowkow/hlabud\")"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"get-a-one-hot-encoded-matrix-for-all-hla-drb1-alleles","dir":"Articles","previous_headings":"","what":"Get a one-hot encoded matrix for all HLA-DRB1 alleles","title":"hlabud usage examples","text":"can use hla_alignments(\"DRB1\") load DRB1_prot.txt file latest IMGTHLA release: object list three items: $sequences amino acid sequence alignments named character vector: conventions used alignments (copied EBI help page): entry allele displayed respect reference sequences. identity reference sequence present base displayed hyphen (-). Non-identity reference sequence shown displaying appropriate base position. insertion deletion occurred represented period (.). sequence unknown point alignment, represented asterisk (*). protein alignments null alleles, ‘Stop’ codons represented hash (X). protein alignments, sequence following termination codon, marked appear blank. conventions used nucleotide protein alignments. $alleles matrix amino acids one column position: $onehot one-hot encoded matrix one column amino acid position:","code":"library(hlabud) a <- hla_alignments(gene = \"DRB1\", verbose = TRUE) #> Reading /home/runner/.local/share/hlabud/3.56.0/alignments/DRB1_prot.txt str(a) #> List of 7 #> $ sequences: Named chr [1:3671] \"MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVR.LLERCIYNQEE.SVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCR\"| __truncated__ \"------------------------------------------------------.-----------.--------------------------------------------\"| __truncated__ \"------------------------------------------------------.-----------.--------------------------------------------\"| __truncated__ \"------------------------------------------------------.-----------.--------------------------------------------\"| __truncated__ ... #> ..- attr(*, \"names\")= chr [1:3671] \"DRB1*01:01:01:01\" \"DRB1*01:01:01:02\" \"DRB1*01:01:01:03\" \"DRB1*01:01:01:04\" ... #> $ alleles : chr [1:3671, 1:288] \"M\" \"M\" \"M\" \"M\" ... #> ..- attr(*, \"dimnames\")=List of 2 #> .. ..$ : chr [1:3671] \"DRB1*01:01:01:01\" \"DRB1*01:01:01:02\" \"DRB1*01:01:01:03\" \"DRB1*01:01:01:04\" ... #> .. ..$ : chr [1:288] \"n29\" \"n28\" \"n27\" \"n26\" ... #> $ onehot : num [1:3671, 1:1658] 0 0 0 0 0 0 0 0 0 0 ... #> ..- attr(*, \"dimnames\")=List of 2 #> .. ..$ : chr [1:3671] \"DRB1*01:01:01:01\" \"DRB1*01:01:01:02\" \"DRB1*01:01:01:03\" \"DRB1*01:01:01:04\" ... #> .. ..$ : chr [1:1658] \"n29unk\" \"Mn29\" \"n28unk\" \"Ln28\" ... #> $ gene : chr \"DRB1\" #> $ type : chr \"prot\" #> $ release : chr \"3.56.0\" #> $ file : chr \"/home/runner/.local/share/hlabud/3.56.0/alignments/DRB1_prot.txt\" substr(head(a$sequences, 6), 1, 50) #> DRB1*01:01:01:01 #> \"MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGT\" #> DRB1*01:01:01:02 #> \"--------------------------------------------------\" #> DRB1*01:01:01:03 #> \"--------------------------------------------------\" #> DRB1*01:01:01:04 #> \"--------------------------------------------------\" #> DRB1*01:01:01:05 #> \"--------------------------------------------------\" #> DRB1*01:01:01:06 #> \"--------------------------------------------------\" a$alleles[1:5,1:40] #> n29 n28 n27 n26 n25 n24 n23 n22 n21 n20 n19 n18 n17 n16 n15 #> DRB1*01:01:01:01 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> DRB1*01:01:01:02 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> DRB1*01:01:01:03 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> DRB1*01:01:01:04 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> DRB1*01:01:01:05 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> n14 n13 n12 n11 n10 n9 n8 n7 n6 n5 n4 n3 n2 n1 1 #> DRB1*01:01:01:01 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> DRB1*01:01:01:02 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> DRB1*01:01:01:03 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> DRB1*01:01:01:04 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> DRB1*01:01:01:05 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> 2 3 4 5 6 7 8 9 10 11 #> DRB1*01:01:01:01 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" #> DRB1*01:01:01:02 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" #> DRB1*01:01:01:03 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" #> DRB1*01:01:01:04 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" #> DRB1*01:01:01:05 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" a$onehot[1:5,1:25] #> n29unk Mn29 n28unk Ln28 Vn28 n27unk Cn27 n26unk Ln26 n25unk #> DRB1*01:01:01:01 0 1 0 0 1 0 1 0 1 0 #> DRB1*01:01:01:02 0 1 0 0 1 0 1 0 1 0 #> DRB1*01:01:01:03 0 1 0 0 1 0 1 0 1 0 #> DRB1*01:01:01:04 0 1 0 0 1 0 1 0 1 0 #> DRB1*01:01:01:05 0 1 0 0 1 0 1 0 1 0 #> Kn25 Rn25 n24unk Fn24 Ln24 n23unk Pn23 n22unk Gn22 n21unk Cn21 #> DRB1*01:01:01:01 1 0 0 0 1 0 1 0 1 0 0 #> DRB1*01:01:01:02 1 0 0 0 1 0 1 0 1 0 0 #> DRB1*01:01:01:03 1 0 0 0 1 0 1 0 1 0 0 #> DRB1*01:01:01:04 1 0 0 0 1 0 1 0 1 0 0 #> DRB1*01:01:01:05 1 0 0 0 1 0 1 0 1 0 0 #> Gn21 n20unk Sn20 n19unk #> DRB1*01:01:01:01 1 0 1 0 #> DRB1*01:01:01:02 1 0 1 0 #> DRB1*01:01:01:03 1 0 1 0 #> DRB1*01:01:01:04 1 0 1 0 #> DRB1*01:01:01:05 1 0 1 0"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"convert-genotypes-to-a-dosage-matrix","dir":"Articles","previous_headings":"","what":"Convert genotypes to a dosage matrix","title":"hlabud usage examples","text":"Suppose individuals following genotypes: want run association test amino acid positions, need convert genotype names matrix allele dosages (e.g., 0, 1, 2). can use dosage() convert individual’s genotypes amino acid dosages: Note: dosage matrix one row individual one column amino acid position. default, dosage() discard columns individuals identical. input allele names truncated 4-digits 2-digits (e.g. DRB1*03:01 DRB1*03), hlabud pick first allele matches input allele (e.g. DRB1*03:01:01:01). want specific allele, need provide full allele name input. Please careful check data looks way expect!","code":"genotypes <- c( \"DRB1*12:02:02:03,DRB1*12:02:02:03\", \"DRB1*04:174,DRB1*15:152\", \"DRB1*04:56:02,DRB1*15:01:48\", \"DRB1*14:172,DRB1*04:160\", \"DRB1*04:359,DRB1*04:284:02\" ) dosage <- dosage(a$onehot, genotypes) dosage[,1:8] #> n29unk Mn29 n28unk Vn28 n27unk Cn27 n26unk #> DRB1*12:02:02:03,DRB1*12:02:02:03 0 2 0 2 0 2 0 #> DRB1*04:174,DRB1*15:152 2 0 2 0 2 0 2 #> DRB1*04:56:02,DRB1*15:01:48 2 0 2 0 2 0 2 #> DRB1*14:172,DRB1*04:160 2 0 2 0 2 0 2 #> DRB1*04:359,DRB1*04:284:02 2 0 2 0 2 0 2 #> Ln26 #> DRB1*12:02:02:03,DRB1*12:02:02:03 2 #> DRB1*04:174,DRB1*15:152 0 #> DRB1*04:56:02,DRB1*15:01:48 0 #> DRB1*14:172,DRB1*04:160 0 #> DRB1*04:359,DRB1*04:284:02 0 dim(dosage) #> [1] 5 428"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"logistic-regression-association-for-amino-acid-positions","dir":"Articles","previous_headings":"","what":"Logistic regression association for amino acid positions","title":"hlabud usage examples","text":"Let’s simulate dataset cases controls demonstrate one approach testing amino acid positions might associated cases. simulated dataset 100 individuals, 52 cases 48 controls. also one column amino acid position might want test association case variable. One possible approach association testing use glm() fit logistic regression model amino acid position. reveal amino acid position might associated case variable simulated dataset. volcano shows Odds Ratio P-value amino acid position. top hits P < 0.05 labeled. simulation, case variable associated F37 (P = 0.021, = 4, 95% CI 1.4 15).","code":"set.seed(2) n <- 100 d <- data.frame( geno = paste( sample(rownames(a$onehot), n, replace = TRUE), sample(rownames(a$onehot), n, replace = TRUE), sep = \",\" ), age = sample(21:100, n, replace = TRUE), case = sample(0:1, n, replace = TRUE) ) d <- cbind(d, dosage(a$onehot, d$geno)) d[1:5,1:6] #> geno age case n29unk #> DRB1*04:243,DRB1*15:01:01:08 DRB1*04:243,DRB1*15:01:01:08 67 0 1 #> DRB1*04:08:01:01,DRB1*04:56:02 DRB1*04:08:01:01,DRB1*04:56:02 38 1 1 #> DRB1*13:339,DRB1*04:112 DRB1*13:339,DRB1*04:112 67 0 2 #> DRB1*03:85,DRB1*01:02:10 DRB1*03:85,DRB1*01:02:10 55 0 2 #> DRB1*03:62,DRB1*14:224 DRB1*03:62,DRB1*14:224 73 1 1 #> Mn29 n28unk #> DRB1*04:243,DRB1*15:01:01:08 1 1 #> DRB1*04:08:01:01,DRB1*04:56:02 1 1 #> DRB1*13:339,DRB1*04:112 0 2 #> DRB1*03:85,DRB1*01:02:10 0 2 #> DRB1*03:62,DRB1*14:224 1 1 # prepare column names for use in formulas ix <- 4:ncol(d) colnames(d)[ix] <- sprintf(\"VAR%s\", colnames(d)[ix]) # select the amino acid positions that have at least 3 people with dosage > 0 my_as <- names(which(colSums(d[,4:ncol(d)] > 0) >= 3)) # run the association tests my_glm <- rbindlist(pblapply(my_as, function(my_a) { f <- sprintf(\"case ~ %s\", my_a) glm(as.formula(f), data = d, family = \"binomial\") %>% parameters(exponentiate = TRUE) })) # look at the top hits my_glm %>% arrange(p) %>% filter(!Parameter %in% c(\"(Intercept)\")) %>% head #> Parameter Coefficient SE CI CI_low CI_high z #> #> 1: VARF37 3.9529448 2.3501312 0.95 1.35121582 14.6317263 2.311857 #> 2: VARY60 0.4269585 0.1790904 0.95 0.18053396 0.9458131 -2.028981 #> 3: VARK98 0.5739907 0.1603272 0.95 0.32635370 0.9824460 -1.987475 #> 4: VARS104 0.5739907 0.1603272 0.95 0.32635370 0.9824460 -1.987475 #> 5: VARQ96 0.3253919 0.1886793 0.95 0.08932709 0.9191775 -1.936226 #> 6: VARS179 0.6085247 0.1617164 0.95 0.35644231 1.0163414 -1.869106 #> df_error p #> #> 1: Inf 0.02078556 #> 2: Inf 0.04246025 #> 3: Inf 0.04686976 #> 4: Inf 0.04686976 #> 5: Inf 0.05284007 #> 6: Inf 0.06160809"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"umap-embedding-of-hla-drb1-alleles","dir":"Articles","previous_headings":"","what":"UMAP embedding of HLA-DRB1 alleles","title":"hlabud usage examples","text":"many possibilities analysis one-hot encoding matrix. example, UMAP embedding HLA-DRB1 alleles encoded one-hot amino acid matrix 1658 columns, one amino acid position. color indicates 2-digit allele name. can highlight alleles aspartic acid (Asp D) position 57: can use color represent amino acid residue position 57:","code":"uamp(a$onehot, n_epochs = 200, min_dist = 1, spread = 2)"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"get-hla-allele-frequencies-from-allele-frequency-net-database-afnd","dir":"Articles","previous_headings":"","what":"Get HLA allele frequencies from Allele Frequency Net Database (AFND)","title":"hlabud usage examples","text":"hlabud R package includes table HLA allele frequencies Allele Frequency Net Database (AFND). use data, please cite latest manuscript Allele Frequency Net Database: Gonzalez-Galarza FF, McCabe , Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data new query tools. Nucleic Acids Res. 2020;48: D783–D788. doi:10.1093/nar/gkz1029 can use data plot frequency specific allele (e.g. DQB1*02:01) populations 1000 sampled individuals: See github.com/slowkow/allelefrequencies examples might use data.","code":"af <- hla_frequencies() af #> # A tibble: 123,502 × 7 #> group gene allele population indivs_over_n alleles_over_2n n #> #> 1 hla A A*01:01 Argentina Rosario To… 15.1 0.076 86 #> 2 hla A A*01:01 Armenia combined Reg… NA 0.125 100 #> 3 hla A A*01:01 Australia Cape York … NA 0.053 103 #> 4 hla A A*01:01 Australia Groote Eyl… NA 0.027 75 #> 5 hla A A*01:01 Australia New South … NA 0.187 134 #> 6 hla A A*01:01 Australia Yuendumu A… NA 0.008 191 #> 7 hla A A*01:01 Austria 27 0.146 200 #> 8 hla A A*01:01 Azores Central Islan… NA 0.08 59 #> 9 hla A A*01:01 Azores Oriental Isla… NA 0.115 43 #> 10 hla A A*01:01 Azores Terceira Isla… NA 0.109 130 #> # ℹ 123,492 more rows my_allele <- \"DQB1*02:01\" my_af <- af %>% filter(allele == my_allele) %>% filter(n > 1000) %>% arrange(-alleles_over_2n) ggplot(my_af) + aes(x = alleles_over_2n, y = reorder(population, alleles_over_2n)) + scale_y_discrete(position = \"right\") + geom_colh() + labs( x = \"Allele Frequency (Alleles / 2N)\", y = NULL, title = glue(\"Frequency of {my_allele} across {length(unique(my_af$population))} populations\"), caption = \"Data from AFND http://allelefrequencies.net\" )"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"compute-hla-divergence-with-the-grantham-distance-matrix","dir":"Articles","previous_headings":"","what":"Compute HLA divergence with the Grantham distance matrix","title":"hlabud usage examples","text":"Humans diploid, us two copies HLA gene. individual two highly dissimilar alleles can bind greater number different peptides homozygous individual (https://doi.org/10.1007/BF02918202): MHC class II allele capacity bind present specific set peptides processed antigens. inability specific class II allele bind present fragment derived processed antigen results loss immune responsiveness antigen individuals homozygous class II allele. amino acid distance matrix Granthan 1974 (https://doi.org/10.1126/science.185.4154.862) encodes information composition, polarity, molecular volume amino acid. can use matrix compute HLA divergence metric set individuals like : divergence homozygote equal zero, definition: hlabud includes R code divergence calculations translated original Perl code Pierini & Lenz 2018 (https://doi.org/10.1093/molbev/msy116). amino acid distance matrix easily accessible, provide two built-options “grantham” “uniform”:","code":"grantham #> amino c p v #> 1 Ser 1.42 9.2 32.0 #> 2 Arg 0.65 10.5 124.0 #> 3 Leu 0.00 4.9 111.0 #> 4 Pro 0.39 8.0 32.5 #> 5 Thr 0.71 8.6 61.0 #> 6 Ala 0.00 8.1 31.0 #> 7 Val 0.00 5.9 84.0 #> 8 Gly 0.74 9.0 3.0 #> 9 Ile 0.00 5.2 111.0 #> 10 Phe 0.00 5.2 132.0 #> 11 Tyr 0.20 6.2 136.0 #> 12 Cys 2.75 5.5 55.0 #> 13 His 0.58 10.4 96.0 #> 14 Gln 0.89 10.5 85.0 #> 15 Asn 1.33 11.6 56.0 #> 16 Lys 0.33 11.3 119.0 #> 17 Asp 1.38 13.0 54.0 #> 18 Glu 0.92 12.3 83.0 #> 19 Met 0.00 5.7 105.0 #> 20 Trp 0.13 5.4 170.0 my_genos <- c(\"A*23:01:12,A*24:550\", \"A*25:12N,A*11:27\", \"A*24:381,A*33:85\") hla_divergence(my_genos) #> A*23:01:12,A*24:550 A*25:12N,A*11:27 A*24:381,A*33:85 #> 0.5131579 3.4736842 5.1078947 hla_divergence(\"A*01:01,A*01:01\") #> A*01:01,A*01:01 #> 0 amino_distance_matrix(method = \"grantham\") #> A R N D C Q E G H I L K M F P S T W Y #> A 0 112 111 126 195 91 107 60 86 94 96 106 84 113 27 99 58 148 112 #> R 112 0 86 96 180 43 54 125 29 97 102 26 91 97 103 110 71 101 77 #> N 111 86 0 23 139 46 42 80 68 149 153 94 142 158 91 46 65 174 143 #> D 126 96 23 0 154 61 45 94 81 168 172 101 160 177 108 65 85 181 160 #> C 195 180 139 154 0 154 170 159 174 198 198 202 196 205 169 112 149 215 194 #> Q 91 43 46 61 154 0 29 87 24 109 113 53 101 116 76 68 42 130 99 #> E 107 54 42 45 170 29 0 98 40 134 138 56 126 140 93 80 65 152 122 #> G 60 125 80 94 159 87 98 0 98 135 138 127 127 153 42 56 59 184 147 #> H 86 29 68 81 174 24 40 98 0 94 99 32 87 100 77 89 47 115 83 #> I 94 97 149 168 198 109 134 135 94 0 5 102 10 21 95 142 89 61 33 #> L 96 102 153 172 198 113 138 138 99 5 0 107 15 22 98 145 92 61 36 #> K 106 26 94 101 202 53 56 127 32 102 107 0 95 102 103 121 78 110 85 #> M 84 91 142 160 196 101 126 127 87 10 15 95 0 28 87 135 81 67 36 #> F 113 97 158 177 205 116 140 153 100 21 22 102 28 0 114 155 103 40 22 #> P 27 103 91 108 169 76 93 42 77 95 98 103 87 114 0 74 38 147 110 #> S 99 110 46 65 112 68 80 56 89 142 145 121 135 155 74 0 58 177 144 #> T 58 71 65 85 149 42 65 59 47 89 92 78 81 103 38 58 0 128 92 #> W 148 101 174 181 215 130 152 184 115 61 61 110 67 40 147 177 128 0 37 #> Y 112 77 143 160 194 99 122 147 83 33 36 85 36 22 110 144 92 37 0 #> V 64 96 133 152 192 96 121 109 84 29 32 97 21 50 68 124 69 88 55 #> V #> A 64 #> R 96 #> N 133 #> D 152 #> C 192 #> Q 96 #> E 121 #> G 109 #> H 84 #> I 29 #> L 32 #> K 97 #> M 21 #> F 50 #> P 68 #> S 124 #> T 69 #> W 88 #> Y 55 #> V 0"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"download-and-unpack-all-data-from-the-latest-imgthla-release","dir":"Articles","previous_headings":"","what":"Download and unpack all data from the latest IMGTHLA release","title":"hlabud usage examples","text":"want use hla_alignments(), don’t need install_hla() data files downloaded automatically needed cached future use. users might need access additional files present full data release. Run install_hla() download unpack latest IMGTHLA release. destination folder downloaded data files getOption(\"hlabud_dir\") (automatically tailored operating system thanks rappdirs package). examples download releases get list release names. Download latest release (default) specific release: Optionally, get set directory hlabud uses store data: installing releases, hlabud folder might look like :","code":"# Download all of the data (120MB) for the latest IMGTHLA release install_hla(release = \"latest\") # Download a specific release install_hla(release = \"3.51.0\") getOption(\"hlabud_dir\") #> [1] \"/home/username/.local/share/hlabud\" # Manually override the directory for hlabud to use options(hlabud_dir = \"/path/to/my/dir\") ❯ ls -lah \"/home/user/.local/share/hlabud\" total 207M drwxrwxr-x 3 user user 32 Apr 5 01:19 3.30.0 drwxrwxr-x 11 user user 4.0K Apr 7 19:31 3.40.0 drwxrwxr-x 12 user user 4.0K Apr 5 00:27 3.51.0 -rw-rw-r-- 1 user user 15K Apr 7 19:23 tags.json -rw-rw-r-- 1 user user 79M Apr 7 19:28 v3.40.0-alpha.tar.gz -rw-rw-r-- 1 user user 129M Apr 4 20:07 v3.51.0-alpha.tar.gz"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"count-the-number-of-alleles-in-each-imgthla-release","dir":"Articles","previous_headings":"","what":"Count the number of alleles in each IMGTHLA release","title":"hlabud usage examples","text":"can get list release names: can get allele names release: Next, count many alleles release: plot number alleles line plot:","code":"releases <- hla_releases() releases #> [1] \"3.56.0\" \"3.55.0\" \"3.54.0\" \"3.53.0\" \"3.52.0\" \"3.51.0\" #> [7] \"3.50.0\" \"3.49.0\" \"3.48.0\" \"3.47.0\" \"3.46.0\" \"3.45.1\" #> [13] \"3.45.01\" \"3.45.0.1\" \"3.45.0\" \"3.44.1\" \"3.44.0\" \"3.43.0\" #> [19] \"3.42.0\" \"3.41.2\" \"3.41.0\" \"3.40.0\" \"3.39.0\" \"3.38.0\" #> [25] \"3.37.0\" \"3.36.0\" \"3.35.0\" \"3.34.0\" \"3.33.0\" \"3.32.0\" my_alleles <- rbindlist(lapply(releases, function(release) { retval <- hla_alleles(release = release) retval$release <- release return(retval) }), fill = TRUE) #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.3451.txt' #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.34501.txt' #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.34501.txt' #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.3441.txt' #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.3412.txt' d <- my_alleles %>% count(release) %>% filter(n > 1) d #> release n #> #> 1: 3.32.0 18363 #> 2: 3.33.0 18955 #> 3: 3.34.0 20272 #> 4: 3.35.0 21683 #> 5: 3.36.0 22548 #> 6: 3.37.0 24093 #> 7: 3.38.0 25958 #> 8: 3.39.0 26512 #> 9: 3.40.0 27273 #> 10: 3.41.0 27980 #> 11: 3.42.0 28786 #> 12: 3.43.0 29417 #> 13: 3.44.0 30523 #> 14: 3.45.0 31552 #> 15: 3.46.0 32330 #> 16: 3.47.0 33552 #> 17: 3.48.0 34145 #> 18: 3.49.0 35077 #> 19: 3.50.0 36016 #> 20: 3.51.0 36625 #> 21: 3.52.0 37068 #> 22: 3.53.0 37619 #> 23: 3.54.0 38416 #> 24: 3.55.0 38909 #> 25: 3.56.0 39886 #> release n ggplot(d) + aes(x = release, y = n, group = 1) + geom_line() + geom_text(aes(label = release), hjust = 1) + labs(x = NULL, y = \"Number of alleles\", title = \"Each release has more HLA alleles\") + theme( axis.text.x = element_blank(), axis.ticks.x = element_blank(), ) d2 <- my_alleles %>% mutate(gene = str_split_fixed(Allele, \"\\\\*\", 2)[,1]) %>% count(release, gene) ggplot() + aes(x = release, y = n) + geom_line( data = d2, aes(group = gene, color = gene) ) + scale_color_discrete(guide = \"none\") + geom_text( data = d2 %>% filter(release == \"3.52.0\"), mapping = aes(label = gene), hjust = 0 ) + labs(x = NULL, y = \"Number of alleles\", title = \"Number of alleles per release and gene\") + scale_x_discrete(expand = expansion(mult = c(0.01, 0.1))) + scale_y_log10() + theme( panel.grid.major.y = element_line(), axis.text.x = element_blank(), axis.ticks.x = element_blank(), )"},{"path":"https://slowkow.github.io/hlabud/articles/numbering.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Numbering amino acid positions","text":"Kamil Slowikowski 2024-05-03 IMGTHLA provides Github repo alignments amino acid sequences nucleotide sequences thousands alleles HLA genes. IMGTHLA alignments define official numbering scheme, provide explanations conventions help page. hlabud R package provides easy access alignment data, hlabud follows official numbering scheme. examples help beginners visualize understand conventions work.","code":""},{"path":"https://slowkow.github.io/hlabud/articles/numbering.html","id":"alignment-files-on-the-imgthla-github-page","dir":"Articles","previous_headings":"","what":"Alignment files on the IMGTHLA Github page","title":"Numbering amino acid positions","text":"IMGTHLA Github page provides folder alignment files. examples vignette, use HLA-DRB1 gene. DRB1, can find three separate files: https://raw.githubusercontent.com/ANHIG/IMGTHLA/v3.56.0-alpha/alignments/DRB1_gen.txt https://raw.githubusercontent.com/ANHIG/IMGTHLA/v3.56.0-alpha/alignments/DRB1_nuc.txt https://raw.githubusercontent.com/ANHIG/IMGTHLA/v3.56.0-alpha/alignments/DRB1_prot.txt files contain different information: gen contains genomic DNA sequences. nuc contains nucleotide coding sequences (CDS). prot contains protein sequences (amino acids). Let’s consider DRB1_prot.txt file. file look like? plain text file header sequence alignments. alignment, line represents one sequence (allele), line 100 residues. first 100 residues alleles shown first block. , next block next 100 residues alleles, .","code":""},{"path":"https://slowkow.github.io/hlabud/articles/numbering.html","id":"numbering-conventions","dir":"Articles","previous_headings":"Alignment files on the IMGTHLA Github page","what":"Numbering conventions","title":"Numbering amino acid positions","text":"conventions used alignments (copied EBI): entry allele displayed respect reference sequences. identity reference sequence present base displayed hyphen (-). Non-identity reference sequence shown displaying appropriate base position. insertion deletion occurred represented period (.). sequence unknown point alignment, represented asterisk (*). protein alignments null alleles, ‘Stop’ codons represented hash (X). protein alignments, sequence following termination codon, marked appear blank. conventions used nucleotide protein alignments. ’s lot information! Let’s try work example illustrate works. first sequence alignment reference sequence. position numbering relative reference sequence. means deletions (.) reference sequence numbered. Notice numbering starts negative numbers. help page clarifies: Protein Sequence Numbering amino acid-based systems, start codon mature protein labeled codon 1. codon 5’ numbered -1. numbering based reference sequence. amino acid number 0.","code":""},{"path":"https://slowkow.github.io/hlabud/articles/numbering.html","id":"numbering-indels","dir":"Articles","previous_headings":"Alignment files on the IMGTHLA Github page","what":"Numbering indels","title":"Numbering amino acid positions","text":"alignment shows 100 residues displayed chunks 10: numbering convention says indels reference sequence numbered. clarify point, manually added additional numbers (11, 21, 30, 39, 49, 59) alignment : Notice move first chunk GDTRPRFLWQ next chunk LKFECHFFNG simply add 10 1 get 11 number L amino acid. , move TERVR.LLER, add 10 11 get 21 T amino acid. However, move CIYNQEE.SV rule “add 10” work. Instead labeling C position 31, label position 30. ? reason C 31, 30, indel (gap) reference sequence position 25_26 (notice . R.L). convention deletions reference sequence numbered. Let’s take closer look data hlabud. first amino acid positions first 4 sequences: hlabud numbers positions focusing example: hlabud using correct numbering, see: - T position 21 - C position 30 see positions 25, 26, 25_26? alignment file: result hlabud: , can see deletion positions 25 26 numbered like residues. Instead, gets special label (25_26) consists positions flanking indel (25 26). alleles observe position 25_26? three possibilities position 25_26: . indicates deletion 1 amino acid (absence amino position) * indicates sequence unknown position W indicates tryptophan position hope example helps explain numbering indels. notice discrepancy hlabud IMGT, please report .","code":"library(hlabud) a <- hla_alignments(\"DRB1\", release = \"3.56.0\") seqs <- substr(a$sequences[1:4], 30, 89) str_replace_all(seqs, \"(\\\\S{10})\", \"\\\\1 \") #> [1] \"GDTRPRFLWQ LKFECHFFNG TERVR.LLER CIYNQEE.SV RFDSDVGEYR AVTELGRPDA \" #> [2] \"---------- ---------- -----.---- -------.-- ---------- ---------- \" #> [3] \"---------- ---------- -----.---- -------.-- ---------- ---------- \" #> [4] \"---------- ---------- -----.---- -------.-- ---------- ---------- \" colnames(a$alleles)[50:70] #> [1] \"21\" \"22\" \"23\" \"24\" \"25\" \"25_26\" \"26\" \"27\" \"28\" #> [10] \"29\" \"30\" \"31\" \"32\" \"33\" \"34\" \"35\" \"36\" \"36_37\" #> [19] \"37\" \"38\" \"39\" a$alleles[1,\"21\"] #> [1] \"T\" a$alleles[1,\"30\"] #> [1] \"C\" a$alleles[1,\"25\"] #> [1] \"R\" a$alleles[1,\"26\"] #> [1] \"L\" a$alleles[1,\"25_26\"] #> [1] \".\" table(a$alleles[,\"25_26\"]) #> #> . * W #> 3658 12 1"},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Visualize HLA protein structures","text":"Kamil Slowikowski 2024-05-03 vignette, explore different methods visualizing molecular structure HLA proteins. First, ’ll look example use NGLVieweR R package show HLA protein structures. Next, ’ll use PyMOL thing.","code":""},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"what-are-the-pdb-identifiers-for-each-hla-gene","dir":"Articles","previous_headings":"","what":"What are the PDB identifiers for each HLA gene?","title":"Visualize HLA protein structures","text":"list PDB identifiers might consider using represent HLA protein: Also try searching PDB website , e.g., \"HLA-DR\" see appropriate structure analysis.","code":"HLA-A 2xpg HLA-B 2bvp HLA-C 4nt6 HLA-DP 3lqz HLA-DQ 4z7w HLA-DR 3pdo"},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"using-nglviewer","dir":"Articles","previous_headings":"","what":"Using NGLVieweR","title":"Visualize HLA protein structures","text":"Let’s try visualize amino acid PDB position 9 HLA-B protein structure. visualize structure 2bvp Protein Data Bank (PDB). example NGLVieweR R package Niels van der Velden: view , see blue peptide red HLA-B protein. tyrosine PDB position 9 highlighted ball+stick representation, also labeled text label. structure rotating can getter better view. can use hlabud answer questions HLA-B amino acid sequence. first question need ask : IMGT position corresponds tyrosine PDB position 9? need open PDB Sequence Annotations tool order figure IMGT number corresponds PDB number 9. screenshot tool: Next, can view amino acid sequence numbering IMGT: eye, can see sequence YFYT starting PDB position 9 corresponds YFYT sequence IMGT position 3. , manually confirmed PDB position 9 matches IMGT position 3. Next, might ask HLA-B alleles Y3? fraction reported HLA-B alleles tyrosine IMGT position 3 (Y3)? turns , almost HLA-B alleles Y3.","code":"# devtools::install_github(\"nvelden/NGLVieweR\") # we need the latest version library(NGLVieweR) library(magrittr) my_sele <- \"9:A\" NGLVieweR(\"2bvp\") %>% stageParameters( backgroundColor = \"white\", zoomSpeed = 1, cameraFov = 80 ) %>% addRepresentation( type = \"cartoon\" ) %>% addRepresentation( type = \"ball+stick\", param = list( sele = my_sele ) ) %>% addRepresentation( type = \"label\", param = list( sele = my_sele, labelType = \"format\", labelFormat = \"[%(resname)s]%(resno)s\", # or enter custom text labelGrouping = \"residue\", # or \"atom\" (eg. sele = \"20:A.CB\") color = \"black\", fontFamiliy = \"sans-serif\", xOffset = 1, yOffset = 0, zOffset = 0, fixedSize = TRUE, radiusType = 1, radiusSize = 5.5, # Label size showBackground = TRUE # backgroundColor=\"black\", # backgroundOpacity=0.5 ) ) %>% zoomMove( center = my_sele, zoom = my_sele, duration = 0, # animation time in ms z_offSet = -20 ) %>% setSpin() library(hlabud) a <- hla_alignments(\"B\") library(stringr) a$alleles[which(str_detect(rownames(a$alleles), \"B*57:03\")),][1,1:50] #> n30 n29 n28 n27 n26 n25 n24 n23 #> \"M\" \"R\" \"V\" \"T\" \"A\" \"P\" \"R\" \"T\" #> n22 n22_n21 n21 n20 n19 n18 n17 n16 #> \"V\" \"......\" \"L\" \"L\" \"L\" \"L\" \"W\" \"G\" #> n15 n14 n13 n12 n11 n10 n9 n8 #> \"A\" \"V\" \"A\" \"L\" \"T\" \"E\" \"T\" \"W\" #> n7 n6 n5 n4 n3 n2 n1 1 #> \"A\" \"G\" \"S\" \"H\" \"S\" \"M\" \"R\" \"Y\" #> 2 3 4 5 6 7 8 9 #> \"F\" \"Y\" \"T\" \"A\" \"M\" \"S\" \"R\" \"P\" #> 10 11 12 13 14 15 16 17 #> \"G\" \"R\" \"G\" \"E\" \"P\" \"R\" \"F\" \"I\" #> 18 18_19 #> \"A\" \".....\" my_alleles <- names(which(a$onehot[,\"Y3\"] == 1)) length(my_alleles) #> [1] 7023 head(my_alleles, 20) #> [1] \"B*07:02:01:01\" \"B*07:02:01:02\" \"B*07:02:01:03\" \"B*07:02:01:04\" #> [5] \"B*07:02:01:05\" \"B*07:02:01:06\" \"B*07:02:01:07\" \"B*07:02:01:08\" #> [9] \"B*07:02:01:09\" \"B*07:02:01:10\" \"B*07:02:01:11\" \"B*07:02:01:12\" #> [13] \"B*07:02:01:13\" \"B*07:02:01:14\" \"B*07:02:01:15\" \"B*07:02:01:16\" #> [17] \"B*07:02:01:17\" \"B*07:02:01:18\" \"B*07:02:01:19\" \"B*07:02:01:20\" sum(a$onehot[,\"Y3\"] == 1) / nrow(a$onehot) #> [1] 0.711406"},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"using-pymol","dir":"Articles","previous_headings":"","what":"Using PyMOL","title":"Visualize HLA protein structures","text":"PyMOL one favorite methods visualizing protein structures, allows us change residue existing protein visualize new mutated protein. takes lines PyMOL create nice figure. example, want quickly highlight positions 13 45 HLA-DQB1, snippet PyMOL code produce figure . Bash script : Write PyMOL script Run PyMOL script pymol command PyMOL script : Load structure Protein Data Bank (PDB). 7kei identifier published protein structure. Color HLA-DQA1 protein teal. Color HLA-DQB1 protein orange. Color peptide purple. color residues 13 45 HLA-DQB1 red. Label residues positions names. Write PNG file view structure. image , manually rotated structure mouse added text labels like \"PDB: 7kei\" saving file.","code":"#!/usr/bin/env bash # Write a pymol script cat << EOF > script.pml fetch 7kei show cartoon remove solvent remove chain D remove chain H color teal, chain A color orange, chain B color purple, chain C color red, chain B & resi 13 color red, chain B & resi 45 label n. CA and chain B & resi 13, \"%s %s\" % (resi, resn) label n. CA and chain B & resi 45, \"%s %s\" % (resi, resn) png 7kei.png, width=1200, height=800, dpi=300 EOF # On Linux, we can just use `pymol` without making an alias # On macOS, we need to make an alias alias pymol=/Applications/PyMOL.app/Contents/MacOS/PyMOL pymol -c script.pml"},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"other-software-for-viewing-pdb-data","dir":"Articles","previous_headings":"","what":"Other software for viewing PDB data","title":"Visualize HLA protein structures","text":"ChimeraX: https://www.cgl.ucsf.edu/chimerax/ Python: https://github.com/nglviewer/nglview Javascript: https://www.rcsb.org/3d-view https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=7kei&bu=1 https://github.com/nglviewer/ngl https://github.com/biasmv/pv R: https://www.raymolecule.com","code":""},{"path":"https://slowkow.github.io/hlabud/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Kamil Slowikowski. Author, maintainer.","code":""},{"path":"https://slowkow.github.io/hlabud/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"J R, DJ B, X G, MA C, P F, SGE. M (2019). “IPD-IMGT/HLA Database.” Nucleic Acids Research, 48(D1), D948–D955. doi:10.1093/nar/gkz950. Slowikowski K (2023). hlabud: IMGTHLA Data R. doi:10.5281/zenodo.8183949, R package version 2.0.0, https://github.com/slowkow/hlabud.","code":"@Article{, author = {Robinson J and Barker DJ and Georgiou X and Cooper MA and Flicek P and Marsh SGE.}, title = {IPD-IMGT/HLA Database}, doi = {10.1093/nar/gkz950}, year = {2019}, month = {oct}, publisher = {Oxford University Press}, volume = {48}, number = {D1}, pages = {D948–D955}, journal = {Nucleic Acids Research}, } @Manual{, title = {{hlabud}: IMGTHLA Data from R}, author = {Kamil Slowikowski}, year = {2023}, note = {R package version 2.0.0}, doi = {10.5281/zenodo.8183949}, url = {https://github.com/slowkow/hlabud}, }"},{"path":"https://slowkow.github.io/hlabud/index.html","id":"hlabud-hla-analysis-in-r-","dir":"","previous_headings":"","what":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"hlabud provides methods retrieve sequence alignment data IMGTHLA convert data convenient R matrices ready downstream analysis. See usage examples learn use data logistic regression dimensionality reduction. example, let’s consider simple question two HLA genotypes. amino acid positions different two genotypes? output, can conclude four positions (26, 28, 47, 86) distinguish two HLA-DRB1 alleles. see DRB1*03:01:05 Y position 26 DRB1*03:02:03 F.","code":"library(hlabud) a <- hla_alignments(\"DRB1\") dosage(a$onehot, c(\"DRB1*03:01:05\", \"DRB1*03:02:03\")) ## F26 Y26 D28 E28 F47 Y47 G86 V86 ## DRB1*03:01:05 0 1 1 0 1 0 0 1 ## DRB1*03:02:03 1 0 0 1 0 1 1 0"},{"path":"https://slowkow.github.io/hlabud/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"quickest way get hlabud install GitHub:","code":"# install.packages(\"devtools\") devtools::install_github(\"slowkow/hlabud\")"},{"path":"https://slowkow.github.io/hlabud/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"See usage examples get ideas use hlabud analyses. Get one-hot encoded matrix HLA-DRB1 alleles Convert genotypes dosage matrix Logistic regression association amino acid positions UMAP embedding 3,516 HLA-DRB1 alleles Get HLA allele frequencies Allele Frequency Net Database (AFND) Compute HLA divergence Grantham distance matrix Download unpack data latest IMGTHLA release Visualize 3D molecular structure HLA proteins highlight specific amino acid residues","code":""},{"path":"https://slowkow.github.io/hlabud/index.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"hlabud provides access data IMGT/HLA database. Therefore, use hlabud please cite IMGT/HLA paper: Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. IPD-IMGT/HLA Database. Nucleic Acids Res. 2020;48: D948–D955. doi:10.1093/nar/gkz950 hlabud also provides access data Allele Frequency Net Database (AFND). Therefore, use hlabud::hla_frequencies() please cite AFND paper: Gonzalez-Galarza FF, McCabe , Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data new query tools. Nucleic Acids Res. 2020;48: D783–D788. doi:10.1093/nar/gkz1029 Additionally, can also cite hlabud package like : Slowikowski K. hlabud: methods access analysis human leukocyte antigen (HLA) gene sequence alignments IMGT/HLA. R package version 1.0.0.","code":""},{"path":"https://slowkow.github.io/hlabud/index.html","id":"related-work","dir":"","previous_headings":"","what":"Related work","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"recommend article anyone new HLA, beautiful figures help build intuition: La Gruta NL, Gras S, Daley SR, Thomas PG, Rossjohn J. Understanding drivers MHC restriction T cell receptors. Nat Rev Immunol. 2018;18: 467–478. Learn conventions HLA nomenclature: Marsh SGE, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, et al. Nomenclature factors HLA system, 2010. Tissue Antigens. 2010;75: 291–455. case-control analysis HLA genotype data, consider BIGDAWG R package available CRAN. related article: Pappas DJ, Marin W, Hollenbach JA, Mack SJ. Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): integrated case-control analysis pipeline. Hum Immunol. 2016;77: 283–287.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"default, return amino acid distance matrix Grantham 1974 (doi:10.1126/science.185.4154.862).","code":""},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"","code":"amino_distance_matrix(method = \"grantham\")"},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"method \"grantham\" Grantham 1974 matrix \"uniform\" matrix ones non-diagonal.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"20x20 symmetric matrix positive numbers zeros diagonal.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"","code":"# By default, the Grantham 1974 matrix amino_distance_matrix(\"grantham\") #> A R N D C Q E G H I L K M F P S T W Y #> A 0 112 111 126 195 91 107 60 86 94 96 106 84 113 27 99 58 148 112 #> R 112 0 86 96 180 43 54 125 29 97 102 26 91 97 103 110 71 101 77 #> N 111 86 0 23 139 46 42 80 68 149 153 94 142 158 91 46 65 174 143 #> D 126 96 23 0 154 61 45 94 81 168 172 101 160 177 108 65 85 181 160 #> C 195 180 139 154 0 154 170 159 174 198 198 202 196 205 169 112 149 215 194 #> Q 91 43 46 61 154 0 29 87 24 109 113 53 101 116 76 68 42 130 99 #> E 107 54 42 45 170 29 0 98 40 134 138 56 126 140 93 80 65 152 122 #> G 60 125 80 94 159 87 98 0 98 135 138 127 127 153 42 56 59 184 147 #> H 86 29 68 81 174 24 40 98 0 94 99 32 87 100 77 89 47 115 83 #> I 94 97 149 168 198 109 134 135 94 0 5 102 10 21 95 142 89 61 33 #> L 96 102 153 172 198 113 138 138 99 5 0 107 15 22 98 145 92 61 36 #> K 106 26 94 101 202 53 56 127 32 102 107 0 95 102 103 121 78 110 85 #> M 84 91 142 160 196 101 126 127 87 10 15 95 0 28 87 135 81 67 36 #> F 113 97 158 177 205 116 140 153 100 21 22 102 28 0 114 155 103 40 22 #> P 27 103 91 108 169 76 93 42 77 95 98 103 87 114 0 74 38 147 110 #> S 99 110 46 65 112 68 80 56 89 142 145 121 135 155 74 0 58 177 144 #> T 58 71 65 85 149 42 65 59 47 89 92 78 81 103 38 58 0 128 92 #> W 148 101 174 181 215 130 152 184 115 61 61 110 67 40 147 177 128 0 37 #> Y 112 77 143 160 194 99 122 147 83 33 36 85 36 22 110 144 92 37 0 #> V 64 96 133 152 192 96 121 109 84 29 32 97 21 50 68 124 69 88 55 #> V #> A 64 #> R 96 #> N 133 #> D 152 #> C 192 #> Q 96 #> E 121 #> G 109 #> H 84 #> I 29 #> L 32 #> K 97 #> M 21 #> F 50 #> P 68 #> S 124 #> T 69 #> W 88 #> Y 55 #> V 0 # All ones, and zeros on the diagonal amino_distance_matrix(\"uniform\") #> A R N D C Q E G H I L K M F P S T W Y V #> A 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> R 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> N 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> D 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> C 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> Q 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> E 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 #> G 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 #> H 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 #> I 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 #> L 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 #> K 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 #> M 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 #> F 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 #> P 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 #> S 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 #> T 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 #> W 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 #> Y 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 #> V 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0"},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"genotype name, return dosage matrix residue (amino acid nucleotide) position.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"","code":"dosage( mat, names, drop_constants = TRUE, drop_duplicates = FALSE, verbose = FALSE )"},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"mat one-hot encoded matrix one row per allele one column residue (amino acid nucleotide) position. names Input character vector one genotype individual. entries must present rownames(mat). drop_constants Filter constant amino acid positions. TRUE default. drop_duplicates Filter duplicate amino acid positions. FALSE default. verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"matrix one row input genotype, one column residue position.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"genotype represented like \"HLA-*01:01,HLA-*01:01\" default, returned matrix filtered exclude: positions input genotypes allele","code":""},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"","code":"DRB1_file <- file.path( \"https://github.com/ANHIG/IMGTHLA/raw\", \"5f2c562056f8ffa89aeea0631f2a52300ee0de17\", \"alignments/DRB1_prot.txt\" ) a <- read_alignments(DRB1_file) genotypes <- c( \"DRB1*12:02:02:03,DRB1*12:02:02:03,DRB1*14:54:02\", \"DRB1*04:174,DRB1*15:152\", \"DRB1*04:56:02,DRB1*15:01:48\", \"DRB1*14:172,DRB1*04:160\", \"DRB1*04:359,DRB1*04:284:02\" ) dosage <- dosage(a$onehot, genotypes) dosage[,1:5] #> n29unk Mn29 n28unk Vn28 n27unk #> DRB1*12:02:02:03,DRB1*12:02:02:03,DRB1*14:54:02 1 2 1 2 1 #> DRB1*04:174,DRB1*15:152 2 0 2 0 2 #> DRB1*04:56:02,DRB1*15:01:48 2 0 2 0 2 #> DRB1*14:172,DRB1*04:160 2 0 2 0 2 #> DRB1*04:359,DRB1*04:284:02 2 0 2 0 2"},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"function : Get folder name getOption(\"hlabud_dir\") else automatically choose appropriate folder operating system thanks rappdirs. Create folder automatically already exist. Set hlabud_dir option new folder.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"","code":"get_hlabud_dir()"},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"name folder.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"locations hlabud_dir folder operating system. Linux: Mac: Windows: set hlabud_dir option, please use:","code":"~/.local/share/hlabud ~/Library/Application Support/hlabud C:\\Documents and Settings\\{User}\\Application Data\\slowkow\\hlabud options(hlabud_dir = \"/my/favorite/path\")"},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"","code":"if (FALSE) { hlabud_dir <- get_hlabud_dir() }"},{"path":"https://slowkow.github.io/hlabud/reference/get_onehot.html","id":null,"dir":"Reference","previous_headings":"","what":"Make a one-hot encoded matrix from a dataframe of amino acid sequences. — get_onehot","title":"Make a one-hot encoded matrix from a dataframe of amino acid sequences. — get_onehot","text":"Make one-hot encoded matrix dataframe amino acid sequences.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/get_onehot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make a one-hot encoded matrix from a dataframe of amino acid sequences. — get_onehot","text":"","code":"get_onehot(sequences, n_pre, verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/get_onehot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make a one-hot encoded matrix from a dataframe of amino acid sequences. — get_onehot","text":"n_pre number amino acid sequences position 1. verbose Print messages along way. al dataframe columns allele, seq","code":""},{"path":"https://slowkow.github.io/hlabud/reference/grantham.html","id":null,"dir":"Reference","previous_headings":"","what":"Table 1 from Grantham 1974 — grantham","title":"Table 1 from Grantham 1974 — grantham","text":"Grantham R. Amino Acid Difference Formula Help Explain Protein Evolution. Science. 1974;185: 862–864. doi:10.1126/science.185.4154.862","code":""},{"path":"https://slowkow.github.io/hlabud/reference/grantham.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Table 1 from Grantham 1974 — grantham","text":"","code":"grantham"},{"path":"https://slowkow.github.io/hlabud/reference/grantham.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Table 1 from Grantham 1974 — grantham","text":"data frame 20 rows 5 columns: amino Amino acid c Composition c, atomic weight ratio noncarbon elements end groups rings carbons side chain p Polarity p published data v Volume v published data","code":""},{"path":"https://slowkow.github.io/hlabud/reference/grantham.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Table 1 from Grantham 1974 — grantham","text":"","code":"grantham #> amino c p v #> 1 Ser 1.42 9.2 32.0 #> 2 Arg 0.65 10.5 124.0 #> 3 Leu 0.00 4.9 111.0 #> 4 Pro 0.39 8.0 32.5 #> 5 Thr 0.71 8.6 61.0 #> 6 Ala 0.00 8.1 31.0 #> 7 Val 0.00 5.9 84.0 #> 8 Gly 0.74 9.0 3.0 #> 9 Ile 0.00 5.2 111.0 #> 10 Phe 0.00 5.2 132.0 #> 11 Tyr 0.20 6.2 136.0 #> 12 Cys 2.75 5.5 55.0 #> 13 His 0.58 10.4 96.0 #> 14 Gln 0.89 10.5 85.0 #> 15 Asn 1.33 11.6 56.0 #> 16 Lys 0.33 11.3 119.0 #> 17 Asp 1.38 13.0 54.0 #> 18 Glu 0.92 12.3 83.0 #> 19 Met 0.00 5.7 105.0 #> 20 Trp 0.13 5.4 170.0"},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":null,"dir":"Reference","previous_headings":"","what":"Get sequence alignments from IMGTHLA — hla_alignments","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"conventions used alignments (EBI IMGT-HLA help page): entry allele displayed respect reference sequences. identity reference sequence present base displayed hyphen (-). Non-identity reference sequence shown displaying appropriate base position. insertion deletion occurred represented period (.). sequence unknown point alignment, represented asterisk (*). protein alignments null alleles, 'Stop' codons represented hash (X). protein alignments, sequence following termination codon, marked appear blank. conventions used nucleotide protein alignments.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"","code":"hla_alignments( gene = \"DRB1\", type = \"prot\", release = \"latest\", verbose = FALSE )"},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"gene name gene like \"DRB1\" type type sequence, one \"prot\", \"nuc\", \"gen\" release Default \"latest\". release name like \"3.51.0\". verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"list character vector called sequences two matrices called alleles onehot. character vector sequences one sequence allele, names allele names. matrix alleles one row allele, one column position, values representing residues position allele. matrix onehot one-hot encoding variants distinguish alleles, one row allele one column amino acid position.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"","code":"# \\donttest{ a <- hla_alignments(\"DRB1\") head(a$sequences) #> DRB1*01:01:01:01 #> \"MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVR.LLERCIYNQEE.SVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRR.VEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSKMLSGVGGFVLGLLFLGAGLFIYFRNQKGHSGLQPTGFLS\" #> DRB1*01:01:01:02 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:03 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:04 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:05 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:06 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" a$alleles[1:6,1:6] #> n29 n28 n27 n26 n25 n24 #> DRB1*01:01:01:01 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:02 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:03 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:04 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:05 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:06 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" a$onehot[1:6,1:6] #> n29unk Mn29 n28unk Ln28 Vn28 n27unk #> DRB1*01:01:01:01 0 1 0 0 1 0 #> DRB1*01:01:01:02 0 1 0 0 1 0 #> DRB1*01:01:01:03 0 1 0 0 1 0 #> DRB1*01:01:01:04 0 1 0 0 1 0 #> DRB1*01:01:01:05 0 1 0 0 1 0 #> DRB1*01:01:01:06 0 1 0 0 1 0 # }"},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"Download list allele names HLA genes particular IMGTHLA release.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"","code":"hla_alleles(release = \"latest\", overwrite = FALSE, verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"release Default \"latest\". release name like \"3.51.0\". overwrite Overwrite existing alleles.json file Allelelist.{version}.txt file verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"data frame HLA allele ids names","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"","code":"# \\donttest{ head(hla_alleles()) #> AlleleID Allele #> 1 HLA00001 A*01:01:01:01 #> 2 HLA02169 A*01:01:01:02N #> 3 HLA14798 A*01:01:01:03 #> 4 HLA15760 A*01:01:01:04 #> 5 HLA16415 A*01:01:01:05 #> 6 HLA16417 A*01:01:01:06 # }"},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate HLA divergence for each individual — hla_divergence","title":"Calculate HLA divergence for each individual — hla_divergence","text":"First, convert allele name (e.g. *01:01) amino acid sequence. divergence sum distances pair amino acids position, divided total sequence length. amino acid distance matrix use one published Grantham 1974 (doi:10.1126/science.185.4154.862), based three physical properties amino acids (composition, polarity, molecular volume) correlated estimate relative substitution frequency.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate HLA divergence for each individual — hla_divergence","text":"","code":"hla_divergence( alleles = c(\"A*01:01,A*02:01\"), method = \"grantham\", release = \"latest\", verbose = FALSE )"},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate HLA divergence for each individual — hla_divergence","text":"alleles character vector comma-delimited alleles individual. usually expect two alleles per individual, possible (fewer) copies due copy number alterations. function still works individual different number alleles. method pairwise amino acid matrix, method name: \"grantham\" \"uniform\" indicate pairwise amino acid distance matrix use. choose pass matrix, 20x20 symmetric matrix zeros diagonal, rownames colnames one-letter amino acid codes R N D C Q E G H L K M F P S T W Y V. release Default \"latest\". release name like \"3.51.0\". verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate HLA divergence for each individual — hla_divergence","text":"dataframe divergence individual.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate HLA divergence for each individual — hla_divergence","text":"code function translation original Perl code Tobias Lenz, published Pierini & Lenz 2018 MolBiolEvol (https://doi.org/10.1093/molbev/msy116). comparing two amino acid sequences, characters one 20 amino acids considered divergence calculation, gaps (characters) count.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate HLA divergence for each individual — hla_divergence","text":"","code":"my_genos <- c(\"A*23:01:12,A*24:550\", \"A*25:12N,A*11:27\", \"A*24:381,A*33:85\", \"A*01:01:,A*01:01,A*02:01\") hla_divergence(my_genos, method = \"grantham\") #> A*23:01:12,A*24:550 A*25:12N,A*11:27 A*24:381,A*33:85 #> 0.5131579 3.4736842 5.1078947 #> A*01:01:,A*01:01,A*02:01 #> 3.9982456 # This is equivalent hla_divergence(my_genos, method = amino_distance_matrix(\"grantham\")) #> A*23:01:12,A*24:550 A*25:12N,A*11:27 A*24:381,A*33:85 #> 0.5131579 3.4736842 5.1078947 #> A*01:01:,A*01:01,A*02:01 #> 3.9982456"},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":null,"dir":"Reference","previous_headings":"","what":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"Download read table HLA allele frequencies Allele Frequency Net Database (AFND).","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"","code":"hla_frequencies(verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"dataframe HLA allele frequencies genes.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"use data, please cite latest manuscript Allele Frequency Net Database: Gonzalez-Galarza FF, McCabe , Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data new query tools. Nucleic Acids Res. 2020;48: D783–D788. doi:10.1093/nar/gkz1029","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"","code":"# \\donttest{ hla_frequencies() #> # A tibble: 123,502 × 7 #> group gene allele population indivs_over_n alleles_over_2n n #> #> 1 hla A A*01:01 Argentina Rosario To… 15.1 0.076 86 #> 2 hla A A*01:01 Armenia combined Reg… NA 0.125 100 #> 3 hla A A*01:01 Australia Cape York … NA 0.053 103 #> 4 hla A A*01:01 Australia Groote Eyl… NA 0.027 75 #> 5 hla A A*01:01 Australia New South … NA 0.187 134 #> 6 hla A A*01:01 Australia Yuendumu A… NA 0.008 191 #> 7 hla A A*01:01 Austria 27 0.146 200 #> 8 hla A A*01:01 Azores Central Islan… NA 0.08 59 #> 9 hla A A*01:01 Azores Oriental Isla… NA 0.115 43 #> 10 hla A A*01:01 Azores Terceira Isla… NA 0.109 130 #> # ℹ 123,492 more rows # }"},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":null,"dir":"Reference","previous_headings":"","what":"Get HLA gene names from IMGTHLA — hla_genes","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"Retrieve list txt files github.com/ANHIG/IMGTHLA/alignments return list gene names derived file names.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"","code":"hla_genes(release = \"latest\", overwrite = FALSE, verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"release Default \"latest\". release name like \"3.51.0\". overwrite Overwrite existing genes.json file new one GitHub verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"tibble two columns: HLA gene names (\"\", \"DRB1\") types (\"nuc\", \"gen\", \"prot\").","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"","code":"# \\donttest{ hla_genes() #> # A tibble: 107 × 2 #> gene type #> #> 1 A gen #> 2 A nuc #> 3 A prot #> 4 B gen #> 5 B nuc #> 6 B prot #> 7 C gen #> 8 C nuc #> 9 C prot #> 10 DMA gen #> # ℹ 97 more rows # }"},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the names of releases from IMGTHLA — hla_releases","title":"Get the names of releases from IMGTHLA — hla_releases","text":"Get tags github.com/ANHIG/IMGTHLA, save file called tags.json getOption(\"hlabud_dir\"), return release names file.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the names of releases from IMGTHLA — hla_releases","text":"","code":"hla_releases(overwrite = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the names of releases from IMGTHLA — hla_releases","text":"overwrite Overwrite existing tags.json file getOption(\"hlabud_dir\")","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the names of releases from IMGTHLA — hla_releases","text":"character vector release names like \"3.51.0\"","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get the names of releases from IMGTHLA — hla_releases","text":"tags.json file automatically overwritten older 24 hours.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the names of releases from IMGTHLA — hla_releases","text":"","code":"# \\donttest{ hla_releases() #> [1] \"3.56.0\" \"3.55.0\" \"3.54.0\" \"3.53.0\" \"3.52.0\" \"3.51.0\" #> [7] \"3.50.0\" \"3.49.0\" \"3.48.0\" \"3.47.0\" \"3.46.0\" \"3.45.1\" #> [13] \"3.45.01\" \"3.45.0.1\" \"3.45.0\" \"3.44.1\" \"3.44.0\" \"3.43.0\" #> [19] \"3.42.0\" \"3.41.2\" \"3.41.0\" \"3.40.0\" \"3.39.0\" \"3.38.0\" #> [25] \"3.37.0\" \"3.36.0\" \"3.35.0\" \"3.34.0\" \"3.33.0\" \"3.32.0\" # }"},{"path":"https://slowkow.github.io/hlabud/reference/hlabud-package.html","id":null,"dir":"Reference","previous_headings":"","what":"hlabud: Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA — hlabud-package","title":"hlabud: Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA — hlabud-package","text":"Fetch sequence alignment data IMGTHLA database Robinson et al (2020) doi:10.1093/nar/gkz950 , automatically convert sequence alignments convenient R matrices ready downstream analysis. vignette shows examples using one-hot encoding data logistic regression dimensionality reduction. Data downloaded lazily, -needed, cached user-configurable folder.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hlabud-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"hlabud: Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA — hlabud-package","text":"Maintainer: Kamil Slowikowski kslowikowski@gmail.com (ORCID)","code":""},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":null,"dir":"Reference","previous_headings":"","what":"Download and unpack a tarball release from IMGTHLA — install_hla","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"release tarball Github unpacked getOption(\"hlabud_dir\") folder.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"","code":"install_hla(release = \"latest\", overwrite = FALSE, verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"release Default \"latest\". release name like \"3.51.0\". overwrite TRUE, overwrite existing files release folder. verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"Note latest releases 100 MB size, download might take slow connections.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"","code":"if (FALSE) { install_hla() install_hla(\"3.51.0\") install_hla(\"3.51.0\", verbose = TRUE) # Change the install directory options(hlabud_dir = \"path/to/my/dir\") install_hla() }"},{"path":"https://slowkow.github.io/hlabud/reference/one_to_three.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert one letter amino acid codes to three letter amino acid codes — one_to_three","title":"Convert one letter amino acid codes to three letter amino acid codes — one_to_three","text":"Convert one letter amino acid codes three letter amino acid codes","code":""},{"path":"https://slowkow.github.io/hlabud/reference/one_to_three.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert one letter amino acid codes to three letter amino acid codes — one_to_three","text":"","code":"one_to_three(aminos)"},{"path":"https://slowkow.github.io/hlabud/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe operator — %>%","title":"Pipe operator — %>%","text":"See magrittr::%>% details.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/pipe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pipe operator — %>%","text":"lhs value magrittr placeholder. rhs function call using magrittr semantics.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/pipe.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Pipe operator — %>%","text":"result calling rhs(lhs).","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":null,"dir":"Reference","previous_headings":"","what":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"function reads txt files provided IMGTHLA.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"","code":"read_alignments(file)"},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"file File name txt file IMGTHLA like \"DQB1_prot.txt\"","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"list character vector called sequences two matrices alleles onehot. matrix alleles one row allele, one column position, values representing residues position allele. matrix onehot one-hot encoding variants distinguish alleles, one row allele one column amino acid position.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"Consider using hla_alignments() instead function. already txt file want read, can read read_alignments(\"myfile.txt\"). sequences contained file: {gene}_prot.txt amino acid sequence HLA allele. {gene}_nuc.txt nucleotide sequence exons. {gene}_gen.txt genomic sequence exons introns.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"","code":"my_file <- file.path( \"https://github.com/ANHIG/IMGTHLA/raw\", \"5f2c562056f8ffa89aeea0631f2a52300ee0de17\", \"alignments/DRB1_prot.txt\" ) a <- read_alignments(my_file) head(a$sequences) #> DRB1*01:01:01:01 #> \"MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVR.LLERCIYNQEE.SVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRR.VEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSKMLSGVGGFVLGLLFLGAGLFIYFRNQKGHSGLQPTGFLS\" #> DRB1*01:01:01:02 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:03 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:04 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:05 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:06 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" a$alleles[1:5,1:5] #> n29 n28 n27 n26 n25 #> DRB1*01:01:01:01 \"M\" \"V\" \"C\" \"L\" \"K\" #> DRB1*01:01:01:02 \"M\" \"V\" \"C\" \"L\" \"K\" #> DRB1*01:01:01:03 \"M\" \"V\" \"C\" \"L\" \"K\" #> DRB1*01:01:01:04 \"M\" \"V\" \"C\" \"L\" \"K\" #> DRB1*01:01:01:05 \"M\" \"V\" \"C\" \"L\" \"K\" a$onehot[1:5,1:5] #> n29unk Mn29 n28unk Vn28 n27unk #> DRB1*01:01:01:01 0 1 0 1 0 #> DRB1*01:01:01:02 0 1 0 1 0 #> DRB1*01:01:01:03 0 1 0 1 0 #> DRB1*01:01:01:04 0 1 0 1 0 #> DRB1*01:01:01:05 0 1 0 1 0"},{"path":[]},{"path":"https://slowkow.github.io/hlabud/news/index.html","id":"bug-fixes-2-0-0","dir":"Changelog","previous_headings":"","what":"Bug fixes","title":"hlabud 2.0.0","text":"Fix incorrect position numbering, accounting insertions deletions indicated “.” character. Thanks Vinicius Stelet bringing attention issue #3. Instead discarding positions *, include label unk, example pos241_unk indicates unknown amino acid position 241. Thanks Sreekar Mantena reporting issue! Fix --one error. example, HLA-pos361_- colnames($onehot) reference allele instead -. now fixed. Thanks Sreekar Mantena reporting issue!","code":""},{"path":"https://slowkow.github.io/hlabud/news/index.html","id":"changes-2-0-0","dir":"Changelog","previous_headings":"","what":"Changes","title":"hlabud 2.0.0","text":"Change position names pos21_D D21. negative, posn21_D Dn21. Change dosage() take one-hot matrix first argument. Change dosage() return full allele names IMGT matching partial allele names like DRB1*03 DRB1*03:01. show messages indicating alleles matched verbose=TRUE. Automatically overwrite {hlabud_dir}/alleles.json older 24 hours. Automatically overwrite {hlabud_dir}/tags.json older 24 hours.","code":""},{"path":"https://slowkow.github.io/hlabud/news/index.html","id":"hlabud-100","dir":"Changelog","previous_headings":"","what":"hlabud 1.0.0","title":"hlabud 1.0.0","text":"Initial release. Added NEWS.md file track changes package.","code":""}] +[{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"GNU General Public License","title":"GNU General Public License","text":"Version 3, 29 June 2007Copyright © 2007 Free Software Foundation, Inc.  Everyone permitted copy distribute verbatim copies license document, changing allowed.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"preamble","dir":"","previous_headings":"","what":"Preamble","title":"GNU General Public License","text":"GNU General Public License free, copyleft license software kinds works. licenses software practical works designed take away freedom share change works. contrast, GNU General Public License intended guarantee freedom share change versions program–make sure remains free software users. , Free Software Foundation, use GNU General Public License software; applies also work released way authors. can apply programs, . speak free software, referring freedom, price. General Public Licenses designed make sure freedom distribute copies free software (charge wish), receive source code can get want , can change software use pieces new free programs, know can things. protect rights, need prevent others denying rights asking surrender rights. Therefore, certain responsibilities distribute copies software, modify : responsibilities respect freedom others. example, distribute copies program, whether gratis fee, must pass recipients freedoms received. must make sure , , receive can get source code. must show terms know rights. Developers use GNU GPL protect rights two steps: (1) assert copyright software, (2) offer License giving legal permission copy, distribute /modify . developers’ authors’ protection, GPL clearly explains warranty free software. users’ authors’ sake, GPL requires modified versions marked changed, problems attributed erroneously authors previous versions. devices designed deny users access install run modified versions software inside , although manufacturer can . fundamentally incompatible aim protecting users’ freedom change software. systematic pattern abuse occurs area products individuals use, precisely unacceptable. Therefore, designed version GPL prohibit practice products. problems arise substantially domains, stand ready extend provision domains future versions GPL, needed protect freedom users. Finally, every program threatened constantly software patents. States allow patents restrict development use software general-purpose computers, , wish avoid special danger patents applied free program make effectively proprietary. prevent , GPL assures patents used render program non-free. precise terms conditions copying, distribution modification follow.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_0-definitions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"0. Definitions","title":"GNU General Public License","text":"“License” refers version 3 GNU General Public License. “Copyright” also means copyright-like laws apply kinds works, semiconductor masks. “Program” refers copyrightable work licensed License. licensee addressed “”. “Licensees” “recipients” may individuals organizations. “modify” work means copy adapt part work fashion requiring copyright permission, making exact copy. resulting work called “modified version” earlier work work “based ” earlier work. “covered work” means either unmodified Program work based Program. “propagate” work means anything , without permission, make directly secondarily liable infringement applicable copyright law, except executing computer modifying private copy. Propagation includes copying, distribution (without modification), making available public, countries activities well. “convey” work means kind propagation enables parties make receive copies. Mere interaction user computer network, transfer copy, conveying. interactive user interface displays “Appropriate Legal Notices” extent includes convenient prominently visible feature (1) displays appropriate copyright notice, (2) tells user warranty work (except extent warranties provided), licensees may convey work License, view copy License. interface presents list user commands options, menu, prominent item list meets criterion.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_1-source-code","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"1. Source Code","title":"GNU General Public License","text":"“source code” work means preferred form work making modifications . “Object code” means non-source form work. “Standard Interface” means interface either official standard defined recognized standards body, , case interfaces specified particular programming language, one widely used among developers working language. “System Libraries” executable work include anything, work whole, () included normal form packaging Major Component, part Major Component, (b) serves enable use work Major Component, implement Standard Interface implementation available public source code form. “Major Component”, context, means major essential component (kernel, window system, ) specific operating system () executable work runs, compiler used produce work, object code interpreter used run . “Corresponding Source” work object code form means source code needed generate, install, (executable work) run object code modify work, including scripts control activities. However, include work’s System Libraries, general-purpose tools generally available free programs used unmodified performing activities part work. example, Corresponding Source includes interface definition files associated source files work, source code shared libraries dynamically linked subprograms work specifically designed require, intimate data communication control flow subprograms parts work. Corresponding Source need include anything users can regenerate automatically parts Corresponding Source. Corresponding Source work source code form work.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_2-basic-permissions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"2. Basic Permissions","title":"GNU General Public License","text":"rights granted License granted term copyright Program, irrevocable provided stated conditions met. License explicitly affirms unlimited permission run unmodified Program. output running covered work covered License output, given content, constitutes covered work. License acknowledges rights fair use equivalent, provided copyright law. may make, run propagate covered works convey, without conditions long license otherwise remains force. may convey covered works others sole purpose make modifications exclusively , provide facilities running works, provided comply terms License conveying material control copyright. thus making running covered works must exclusively behalf, direction control, terms prohibit making copies copyrighted material outside relationship . Conveying circumstances permitted solely conditions stated . Sublicensing allowed; section 10 makes unnecessary.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_3-protecting-users-legal-rights-from-anti-circumvention-law","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"3. Protecting Users’ Legal Rights From Anti-Circumvention Law","title":"GNU General Public License","text":"covered work shall deemed part effective technological measure applicable law fulfilling obligations article 11 WIPO copyright treaty adopted 20 December 1996, similar laws prohibiting restricting circumvention measures. convey covered work, waive legal power forbid circumvention technological measures extent circumvention effected exercising rights License respect covered work, disclaim intention limit operation modification work means enforcing, work’s users, third parties’ legal rights forbid circumvention technological measures.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_4-conveying-verbatim-copies","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"4. Conveying Verbatim Copies","title":"GNU General Public License","text":"may convey verbatim copies Program’s source code receive , medium, provided conspicuously appropriately publish copy appropriate copyright notice; keep intact notices stating License non-permissive terms added accord section 7 apply code; keep intact notices absence warranty; give recipients copy License along Program. may charge price price copy convey, may offer support warranty protection fee.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_5-conveying-modified-source-versions","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"5. Conveying Modified Source Versions","title":"GNU General Public License","text":"may convey work based Program, modifications produce Program, form source code terms section 4, provided also meet conditions: ) work must carry prominent notices stating modified , giving relevant date. b) work must carry prominent notices stating released License conditions added section 7. requirement modifies requirement section 4 “keep intact notices”. c) must license entire work, whole, License anyone comes possession copy. License therefore apply, along applicable section 7 additional terms, whole work, parts, regardless packaged. License gives permission license work way, invalidate permission separately received . d) work interactive user interfaces, must display Appropriate Legal Notices; however, Program interactive interfaces display Appropriate Legal Notices, work need make . compilation covered work separate independent works, nature extensions covered work, combined form larger program, volume storage distribution medium, called “aggregate” compilation resulting copyright used limit access legal rights compilation’s users beyond individual works permit. Inclusion covered work aggregate cause License apply parts aggregate.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_6-conveying-non-source-forms","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"6. Conveying Non-Source Forms","title":"GNU General Public License","text":"may convey covered work object code form terms sections 4 5, provided also convey machine-readable Corresponding Source terms License, one ways: ) Convey object code , embodied , physical product (including physical distribution medium), accompanied Corresponding Source fixed durable physical medium customarily used software interchange. b) Convey object code , embodied , physical product (including physical distribution medium), accompanied written offer, valid least three years valid long offer spare parts customer support product model, give anyone possesses object code either (1) copy Corresponding Source software product covered License, durable physical medium customarily used software interchange, price reasonable cost physically performing conveying source, (2) access copy Corresponding Source network server charge. c) Convey individual copies object code copy written offer provide Corresponding Source. alternative allowed occasionally noncommercially, received object code offer, accord subsection 6b. d) Convey object code offering access designated place (gratis charge), offer equivalent access Corresponding Source way place charge. need require recipients copy Corresponding Source along object code. place copy object code network server, Corresponding Source may different server (operated third party) supports equivalent copying facilities, provided maintain clear directions next object code saying find Corresponding Source. Regardless server hosts Corresponding Source, remain obligated ensure available long needed satisfy requirements. e) Convey object code using peer--peer transmission, provided inform peers object code Corresponding Source work offered general public charge subsection 6d. separable portion object code, whose source code excluded Corresponding Source System Library, need included conveying object code work. “User Product” either (1) “consumer product”, means tangible personal property normally used personal, family, household purposes, (2) anything designed sold incorporation dwelling. determining whether product consumer product, doubtful cases shall resolved favor coverage. particular product received particular user, “normally used” refers typical common use class product, regardless status particular user way particular user actually uses, expects expected use, product. product consumer product regardless whether product substantial commercial, industrial non-consumer uses, unless uses represent significant mode use product. “Installation Information” User Product means methods, procedures, authorization keys, information required install execute modified versions covered work User Product modified version Corresponding Source. information must suffice ensure continued functioning modified object code case prevented interfered solely modification made. convey object code work section , , specifically use , User Product, conveying occurs part transaction right possession use User Product transferred recipient perpetuity fixed term (regardless transaction characterized), Corresponding Source conveyed section must accompanied Installation Information. requirement apply neither third party retains ability install modified object code User Product (example, work installed ROM). requirement provide Installation Information include requirement continue provide support service, warranty, updates work modified installed recipient, User Product modified installed. Access network may denied modification materially adversely affects operation network violates rules protocols communication across network. Corresponding Source conveyed, Installation Information provided, accord section must format publicly documented (implementation available public source code form), must require special password key unpacking, reading copying.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_7-additional-terms","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"7. Additional Terms","title":"GNU General Public License","text":"“Additional permissions” terms supplement terms License making exceptions one conditions. Additional permissions applicable entire Program shall treated though included License, extent valid applicable law. additional permissions apply part Program, part may used separately permissions, entire Program remains governed License without regard additional permissions. convey copy covered work, may option remove additional permissions copy, part . (Additional permissions may written require removal certain cases modify work.) may place additional permissions material, added covered work, can give appropriate copyright permission. Notwithstanding provision License, material add covered work, may (authorized copyright holders material) supplement terms License terms: ) Disclaiming warranty limiting liability differently terms sections 15 16 License; b) Requiring preservation specified reasonable legal notices author attributions material Appropriate Legal Notices displayed works containing ; c) Prohibiting misrepresentation origin material, requiring modified versions material marked reasonable ways different original version; d) Limiting use publicity purposes names licensors authors material; e) Declining grant rights trademark law use trade names, trademarks, service marks; f) Requiring indemnification licensors authors material anyone conveys material (modified versions ) contractual assumptions liability recipient, liability contractual assumptions directly impose licensors authors. non-permissive additional terms considered “restrictions” within meaning section 10. Program received , part , contains notice stating governed License along term restriction, may remove term. license document contains restriction permits relicensing conveying License, may add covered work material governed terms license document, provided restriction survive relicensing conveying. add terms covered work accord section, must place, relevant source files, statement additional terms apply files, notice indicating find applicable terms. Additional terms, permissive non-permissive, may stated form separately written license, stated exceptions; requirements apply either way.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_8-termination","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"8. Termination","title":"GNU General Public License","text":"may propagate modify covered work except expressly provided License. attempt otherwise propagate modify void, automatically terminate rights License (including patent licenses granted third paragraph section 11). However, cease violation License, license particular copyright holder reinstated () provisionally, unless copyright holder explicitly finally terminates license, (b) permanently, copyright holder fails notify violation reasonable means prior 60 days cessation. Moreover, license particular copyright holder reinstated permanently copyright holder notifies violation reasonable means, first time received notice violation License (work) copyright holder, cure violation prior 30 days receipt notice. Termination rights section terminate licenses parties received copies rights License. rights terminated permanently reinstated, qualify receive new licenses material section 10.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_9-acceptance-not-required-for-having-copies","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"9. Acceptance Not Required for Having Copies","title":"GNU General Public License","text":"required accept License order receive run copy Program. Ancillary propagation covered work occurring solely consequence using peer--peer transmission receive copy likewise require acceptance. However, nothing License grants permission propagate modify covered work. actions infringe copyright accept License. Therefore, modifying propagating covered work, indicate acceptance License .","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_10-automatic-licensing-of-downstream-recipients","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"10. Automatic Licensing of Downstream Recipients","title":"GNU General Public License","text":"time convey covered work, recipient automatically receives license original licensors, run, modify propagate work, subject License. responsible enforcing compliance third parties License. “entity transaction” transaction transferring control organization, substantially assets one, subdividing organization, merging organizations. propagation covered work results entity transaction, party transaction receives copy work also receives whatever licenses work party’s predecessor interest give previous paragraph, plus right possession Corresponding Source work predecessor interest, predecessor can get reasonable efforts. may impose restrictions exercise rights granted affirmed License. example, may impose license fee, royalty, charge exercise rights granted License, may initiate litigation (including cross-claim counterclaim lawsuit) alleging patent claim infringed making, using, selling, offering sale, importing Program portion .","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_11-patents","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"11. Patents","title":"GNU General Public License","text":"“contributor” copyright holder authorizes use License Program work Program based. work thus licensed called contributor’s “contributor version”. contributor’s “essential patent claims” patent claims owned controlled contributor, whether already acquired hereafter acquired, infringed manner, permitted License, making, using, selling contributor version, include claims infringed consequence modification contributor version. purposes definition, “control” includes right grant patent sublicenses manner consistent requirements License. contributor grants non-exclusive, worldwide, royalty-free patent license contributor’s essential patent claims, make, use, sell, offer sale, import otherwise run, modify propagate contents contributor version. following three paragraphs, “patent license” express agreement commitment, however denominated, enforce patent (express permission practice patent covenant sue patent infringement). “grant” patent license party means make agreement commitment enforce patent party. convey covered work, knowingly relying patent license, Corresponding Source work available anyone copy, free charge terms License, publicly available network server readily accessible means, must either (1) cause Corresponding Source available, (2) arrange deprive benefit patent license particular work, (3) arrange, manner consistent requirements License, extend patent license downstream recipients. “Knowingly relying” means actual knowledge , patent license, conveying covered work country, recipient’s use covered work country, infringe one identifiable patents country reason believe valid. , pursuant connection single transaction arrangement, convey, propagate procuring conveyance , covered work, grant patent license parties receiving covered work authorizing use, propagate, modify convey specific copy covered work, patent license grant automatically extended recipients covered work works based . patent license “discriminatory” include within scope coverage, prohibits exercise , conditioned non-exercise one rights specifically granted License. may convey covered work party arrangement third party business distributing software, make payment third party based extent activity conveying work, third party grants, parties receive covered work , discriminatory patent license () connection copies covered work conveyed (copies made copies), (b) primarily connection specific products compilations contain covered work, unless entered arrangement, patent license granted, prior 28 March 2007. Nothing License shall construed excluding limiting implied license defenses infringement may otherwise available applicable patent law.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_12-no-surrender-of-others-freedom","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"12. No Surrender of Others’ Freedom","title":"GNU General Public License","text":"conditions imposed (whether court order, agreement otherwise) contradict conditions License, excuse conditions License. convey covered work satisfy simultaneously obligations License pertinent obligations, consequence may convey . example, agree terms obligate collect royalty conveying convey Program, way satisfy terms License refrain entirely conveying Program.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_13-use-with-the-gnu-affero-general-public-license","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"13. Use with the GNU Affero General Public License","title":"GNU General Public License","text":"Notwithstanding provision License, permission link combine covered work work licensed version 3 GNU Affero General Public License single combined work, convey resulting work. terms License continue apply part covered work, special requirements GNU Affero General Public License, section 13, concerning interaction network apply combination .","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_14-revised-versions-of-this-license","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"14. Revised Versions of this License","title":"GNU General Public License","text":"Free Software Foundation may publish revised /new versions GNU General Public License time time. new versions similar spirit present version, may differ detail address new problems concerns. version given distinguishing version number. Program specifies certain numbered version GNU General Public License “later version” applies , option following terms conditions either numbered version later version published Free Software Foundation. Program specify version number GNU General Public License, may choose version ever published Free Software Foundation. Program specifies proxy can decide future versions GNU General Public License can used, proxy’s public statement acceptance version permanently authorizes choose version Program. Later license versions may give additional different permissions. However, additional obligations imposed author copyright holder result choosing follow later version.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_15-disclaimer-of-warranty","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"15. Disclaimer of Warranty","title":"GNU General Public License","text":"WARRANTY PROGRAM, EXTENT PERMITTED APPLICABLE LAW. EXCEPT OTHERWISE STATED WRITING COPYRIGHT HOLDERS /PARTIES PROVIDE PROGRAM “” WITHOUT WARRANTY KIND, EITHER EXPRESSED IMPLIED, INCLUDING, LIMITED , IMPLIED WARRANTIES MERCHANTABILITY FITNESS PARTICULAR PURPOSE. ENTIRE RISK QUALITY PERFORMANCE PROGRAM . PROGRAM PROVE DEFECTIVE, ASSUME COST NECESSARY SERVICING, REPAIR CORRECTION.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_16-limitation-of-liability","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"16. Limitation of Liability","title":"GNU General Public License","text":"EVENT UNLESS REQUIRED APPLICABLE LAW AGREED WRITING COPYRIGHT HOLDER, PARTY MODIFIES /CONVEYS PROGRAM PERMITTED , LIABLE DAMAGES, INCLUDING GENERAL, SPECIAL, INCIDENTAL CONSEQUENTIAL DAMAGES ARISING USE INABILITY USE PROGRAM (INCLUDING LIMITED LOSS DATA DATA RENDERED INACCURATE LOSSES SUSTAINED THIRD PARTIES FAILURE PROGRAM OPERATE PROGRAMS), EVEN HOLDER PARTY ADVISED POSSIBILITY DAMAGES.","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"id_17-interpretation-of-sections-15-and-16","dir":"","previous_headings":"TERMS AND CONDITIONS","what":"17. Interpretation of Sections 15 and 16","title":"GNU General Public License","text":"disclaimer warranty limitation liability provided given local legal effect according terms, reviewing courts shall apply local law closely approximates absolute waiver civil liability connection Program, unless warranty assumption liability accompanies copy Program return fee. END TERMS CONDITIONS","code":""},{"path":"https://slowkow.github.io/hlabud/LICENSE.html","id":"how-to-apply-these-terms-to-your-new-programs","dir":"","previous_headings":"","what":"How to Apply These Terms to Your New Programs","title":"GNU General Public License","text":"develop new program, want greatest possible use public, best way achieve make free software everyone can redistribute change terms. , attach following notices program. safest attach start source file effectively state exclusion warranty; file least “copyright” line pointer full notice found. Also add information contact electronic paper mail. program terminal interaction, make output short notice like starts interactive mode: hypothetical commands show w show c show appropriate parts General Public License. course, program’s commands might different; GUI interface, use “box”. also get employer (work programmer) school, , sign “copyright disclaimer” program, necessary. information , apply follow GNU GPL, see . GNU General Public License permit incorporating program proprietary programs. program subroutine library, may consider useful permit linking proprietary applications library. want , use GNU Lesser General Public License instead License. first, please read .","code":" Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'. This is free software, and you are welcome to redistribute it under certain conditions; type 'show c' for details."},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"hlabud usage examples","text":"Kamil Slowikowski 2024-05-15 hlabud R package provides functions facilitate download analysis human leukocyte antigen (HLA) genotype sequence alignments IMGTHLA R. Let’s consider question might want answer HLA genotypes. amino acid positions different two genotypes? nucleotides different?","code":"library(hlabud) a <- hla_alignments(\"DRB1\") a$release #> [1] \"3.56.0\" dosage(a$onehot, c(\"DRB1*03:01:05\", \"DRB1*03:02:03\")) #> F26 Y26 D28 E28 F47 Y47 G86 V86 #> DRB1*03:01:05 0 1 1 0 1 0 0 1 #> DRB1*03:02:03 1 0 0 1 0 1 1 0 n <- hla_alignments(\"DRB1\", type = \"nuc\") n$release #> [1] \"3.56.0\" dosage(n$onehot, c(\"DRB1*03:01:05\", \"DRB1*03:02:03\")) #> A164 T164 C171 G171 A227 T227 A240 G240 G344 T344 G345 T345 A357 #> DRB1*03:01:05 1 0 1 0 0 1 1 0 0 1 1 0 1 #> DRB1*03:02:03 0 1 0 1 1 0 0 1 1 0 0 1 0 #> G357 #> DRB1*03:01:05 0 #> DRB1*03:02:03 1"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"installation","dir":"Articles","previous_headings":"","what":"Installation","title":"hlabud usage examples","text":"quickest way get hlabud install GitHub: , included usage examples. hope inspire share HLA analyses. source code page available . Thank reporting issues hlabud.","code":"# install.packages(\"devtools\") devtools::install_github(\"slowkow/hlabud\")"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"get-a-one-hot-encoded-matrix-for-all-hla-drb1-alleles","dir":"Articles","previous_headings":"","what":"Get a one-hot encoded matrix for all HLA-DRB1 alleles","title":"hlabud usage examples","text":"can use hla_alignments(\"DRB1\") load DRB1_prot.txt file latest IMGTHLA release: object list three items: $sequences amino acid sequence alignments named character vector: conventions used alignments (copied EBI help page): entry allele displayed respect reference sequences. identity reference sequence present base displayed hyphen (-). Non-identity reference sequence shown displaying appropriate base position. insertion deletion occurred represented period (.). sequence unknown point alignment, represented asterisk (*). protein alignments null alleles, ‘Stop’ codons represented hash (X). protein alignments, sequence following termination codon, marked appear blank. conventions used nucleotide protein alignments. $alleles matrix amino acids one column position: $onehot one-hot encoded matrix one column amino acid position: one-hot encoded matrix? simple example demonstrate idea:","code":"library(hlabud) a <- hla_alignments(gene = \"DRB1\", verbose = TRUE) #> Reading /home/runner/.local/share/hlabud/3.56.0/alignments/DRB1_prot.txt str(a) #> List of 7 #> $ sequences: Named chr [1:3671] \"MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVR.LLERCIYNQEE.SVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCR\"| __truncated__ \"------------------------------------------------------.-----------.--------------------------------------------\"| __truncated__ \"------------------------------------------------------.-----------.--------------------------------------------\"| __truncated__ \"------------------------------------------------------.-----------.--------------------------------------------\"| __truncated__ ... #> ..- attr(*, \"names\")= chr [1:3671] \"DRB1*01:01:01:01\" \"DRB1*01:01:01:02\" \"DRB1*01:01:01:03\" \"DRB1*01:01:01:04\" ... #> $ alleles : chr [1:3671, 1:288] \"M\" \"M\" \"M\" \"M\" ... #> ..- attr(*, \"dimnames\")=List of 2 #> .. ..$ : chr [1:3671] \"DRB1*01:01:01:01\" \"DRB1*01:01:01:02\" \"DRB1*01:01:01:03\" \"DRB1*01:01:01:04\" ... #> .. ..$ : chr [1:288] \"n29\" \"n28\" \"n27\" \"n26\" ... #> $ onehot : num [1:3671, 1:1658] 0 0 0 0 0 0 0 0 0 0 ... #> ..- attr(*, \"dimnames\")=List of 2 #> .. ..$ : chr [1:3671] \"DRB1*01:01:01:01\" \"DRB1*01:01:01:02\" \"DRB1*01:01:01:03\" \"DRB1*01:01:01:04\" ... #> .. ..$ : chr [1:1658] \"n29unk\" \"Mn29\" \"n28unk\" \"Ln28\" ... #> $ gene : chr \"DRB1\" #> $ type : chr \"prot\" #> $ release : chr \"3.56.0\" #> $ file : chr \"/home/runner/.local/share/hlabud/3.56.0/alignments/DRB1_prot.txt\" substr(head(a$sequences, 6), 1, 50) #> DRB1*01:01:01:01 #> \"MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGT\" #> DRB1*01:01:01:02 #> \"--------------------------------------------------\" #> DRB1*01:01:01:03 #> \"--------------------------------------------------\" #> DRB1*01:01:01:04 #> \"--------------------------------------------------\" #> DRB1*01:01:01:05 #> \"--------------------------------------------------\" #> DRB1*01:01:01:06 #> \"--------------------------------------------------\" a$alleles[1:5,1:40] #> n29 n28 n27 n26 n25 n24 n23 n22 n21 n20 n19 n18 n17 n16 n15 #> DRB1*01:01:01:01 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> DRB1*01:01:01:02 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> DRB1*01:01:01:03 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> DRB1*01:01:01:04 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> DRB1*01:01:01:05 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" \"P\" \"G\" \"G\" \"S\" \"C\" \"M\" \"T\" \"A\" \"L\" #> n14 n13 n12 n11 n10 n9 n8 n7 n6 n5 n4 n3 n2 n1 1 #> DRB1*01:01:01:01 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> DRB1*01:01:01:02 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> DRB1*01:01:01:03 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> DRB1*01:01:01:04 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> DRB1*01:01:01:05 \"T\" \"V\" \"T\" \"L\" \"M\" \"V\" \"L\" \"S\" \"S\" \"P\" \"L\" \"A\" \"L\" \"A\" \"G\" #> 2 3 4 5 6 7 8 9 10 11 #> DRB1*01:01:01:01 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" #> DRB1*01:01:01:02 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" #> DRB1*01:01:01:03 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" #> DRB1*01:01:01:04 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" #> DRB1*01:01:01:05 \"D\" \"T\" \"R\" \"P\" \"R\" \"F\" \"L\" \"W\" \"Q\" \"L\" a$onehot[1:5,1:25] #> n29unk Mn29 n28unk Ln28 Vn28 n27unk Cn27 n26unk Ln26 n25unk #> DRB1*01:01:01:01 0 1 0 0 1 0 1 0 1 0 #> DRB1*01:01:01:02 0 1 0 0 1 0 1 0 1 0 #> DRB1*01:01:01:03 0 1 0 0 1 0 1 0 1 0 #> DRB1*01:01:01:04 0 1 0 0 1 0 1 0 1 0 #> DRB1*01:01:01:05 0 1 0 0 1 0 1 0 1 0 #> Kn25 Rn25 n24unk Fn24 Ln24 n23unk Pn23 n22unk Gn22 n21unk Cn21 #> DRB1*01:01:01:01 1 0 0 0 1 0 1 0 1 0 0 #> DRB1*01:01:01:02 1 0 0 0 1 0 1 0 1 0 0 #> DRB1*01:01:01:03 1 0 0 0 1 0 1 0 1 0 0 #> DRB1*01:01:01:04 1 0 0 0 1 0 1 0 1 0 0 #> DRB1*01:01:01:05 1 0 0 0 1 0 1 0 1 0 0 #> Gn21 n20unk Sn20 n19unk #> DRB1*01:01:01:01 1 0 1 0 #> DRB1*01:01:01:02 1 0 1 0 #> DRB1*01:01:01:03 1 0 1 0 #> DRB1*01:01:01:04 1 0 1 0 #> DRB1*01:01:01:05 1 0 1 0 dat <- data.frame( V1 = c(\"A\", \"A\", \"B\"), V2 = c(\"B\", \"B\", \"B\"), V3 = c(\"C\", \"B\", \"B\"), stringsAsFactors = TRUE ) dat #> V1 V2 V3 #> 1 A B C #> 2 A B B #> 3 B B B predict(onehot::onehot(dat), dat) #> V1=A V1=B V2=B V3=B V3=C #> [1,] 1 0 1 0 1 #> [2,] 1 0 1 1 0 #> [3,] 0 1 1 1 0"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"convert-genotypes-to-a-dosage-matrix","dir":"Articles","previous_headings":"","what":"Convert genotypes to a dosage matrix","title":"hlabud usage examples","text":"Suppose individuals following genotypes: want run association test amino acid positions, need convert genotype names matrix allele dosages (e.g., 0, 1, 2). can use dosage() convert individual’s genotypes amino acid dosages: Note: dosage matrix one row individual one column amino acid position. default, dosage() discard columns individuals identical. input allele names truncated 4-digits 2-digits (e.g. DRB1*03:01 DRB1*03), hlabud pick first allele matches input allele (e.g. DRB1*03:01:01:01). want specific allele, need provide full allele name input. Please careful check data looks way expect!","code":"genotypes <- c( \"DRB1*12:02:02:03,DRB1*12:02:02:03\", \"DRB1*04:174,DRB1*15:152\", \"DRB1*04:56:02,DRB1*15:01:48\", \"DRB1*14:172,DRB1*04:160\", \"DRB1*04:359,DRB1*04:284:02\" ) dosage <- dosage(a$onehot, genotypes) dosage[,1:8] #> n29unk Mn29 n28unk Vn28 n27unk Cn27 n26unk #> DRB1*12:02:02:03,DRB1*12:02:02:03 0 2 0 2 0 2 0 #> DRB1*04:174,DRB1*15:152 2 0 2 0 2 0 2 #> DRB1*04:56:02,DRB1*15:01:48 2 0 2 0 2 0 2 #> DRB1*14:172,DRB1*04:160 2 0 2 0 2 0 2 #> DRB1*04:359,DRB1*04:284:02 2 0 2 0 2 0 2 #> Ln26 #> DRB1*12:02:02:03,DRB1*12:02:02:03 2 #> DRB1*04:174,DRB1*15:152 0 #> DRB1*04:56:02,DRB1*15:01:48 0 #> DRB1*14:172,DRB1*04:160 0 #> DRB1*04:359,DRB1*04:284:02 0 dim(dosage) #> [1] 5 428"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"logistic-regression-association-for-amino-acid-positions","dir":"Articles","previous_headings":"","what":"Logistic regression association for amino acid positions","title":"hlabud usage examples","text":"Let’s simulate dataset cases controls demonstrate one approach testing amino acid positions might associated cases. simulated dataset 100 individuals, 52 cases 48 controls. also one column amino acid position might want test association case variable. One possible approach association testing use glm() fit logistic regression model amino acid position. reveal amino acid position might associated case variable simulated dataset. volcano shows Odds Ratio P-value amino acid position. top hits P < 0.05 labeled. simulation, case variable associated F37 (P = 0.021, = 4, 95% CI 1.4 15).","code":"set.seed(2) n <- 100 d <- data.frame( geno = paste( sample(rownames(a$onehot), n, replace = TRUE), sample(rownames(a$onehot), n, replace = TRUE), sep = \",\" ), age = sample(21:100, n, replace = TRUE), case = sample(0:1, n, replace = TRUE) ) d <- cbind(d, dosage(a$onehot, d$geno)) d[1:5,1:6] #> geno age case n29unk #> DRB1*04:243,DRB1*15:01:01:08 DRB1*04:243,DRB1*15:01:01:08 67 0 1 #> DRB1*04:08:01:01,DRB1*04:56:02 DRB1*04:08:01:01,DRB1*04:56:02 38 1 1 #> DRB1*13:339,DRB1*04:112 DRB1*13:339,DRB1*04:112 67 0 2 #> DRB1*03:85,DRB1*01:02:10 DRB1*03:85,DRB1*01:02:10 55 0 2 #> DRB1*03:62,DRB1*14:224 DRB1*03:62,DRB1*14:224 73 1 1 #> Mn29 n28unk #> DRB1*04:243,DRB1*15:01:01:08 1 1 #> DRB1*04:08:01:01,DRB1*04:56:02 1 1 #> DRB1*13:339,DRB1*04:112 0 2 #> DRB1*03:85,DRB1*01:02:10 0 2 #> DRB1*03:62,DRB1*14:224 1 1 # prepare column names for use in formulas ix <- 4:ncol(d) colnames(d)[ix] <- sprintf(\"VAR%s\", colnames(d)[ix]) # select the amino acid positions that have at least 3 people with dosage > 0 my_as <- names(which(colSums(d[,4:ncol(d)] > 0) >= 3)) # run the association tests my_glm <- rbindlist(pblapply(my_as, function(my_a) { f <- sprintf(\"case ~ %s\", my_a) glm(as.formula(f), data = d, family = \"binomial\") %>% parameters(exponentiate = TRUE) })) # look at the top hits my_glm %>% arrange(p) %>% filter(!Parameter %in% c(\"(Intercept)\")) %>% head #> Parameter Coefficient SE CI CI_low CI_high z #> #> 1: VARF37 3.9529448 2.3501312 0.95 1.35121582 14.6317263 2.311857 #> 2: VARY60 0.4269585 0.1790904 0.95 0.18053396 0.9458131 -2.028981 #> 3: VARK98 0.5739907 0.1603272 0.95 0.32635370 0.9824460 -1.987475 #> 4: VARS104 0.5739907 0.1603272 0.95 0.32635370 0.9824460 -1.987475 #> 5: VARQ96 0.3253919 0.1886793 0.95 0.08932709 0.9191775 -1.936226 #> 6: VARS179 0.6085247 0.1617164 0.95 0.35644231 1.0163414 -1.869106 #> df_error p #> #> 1: Inf 0.02078556 #> 2: Inf 0.04246025 #> 3: Inf 0.04686976 #> 4: Inf 0.04686976 #> 5: Inf 0.05284007 #> 6: Inf 0.06160809"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"umap-embedding-of-hla-drb1-alleles","dir":"Articles","previous_headings":"","what":"UMAP embedding of HLA-DRB1 alleles","title":"hlabud usage examples","text":"many possibilities analysis one-hot encoding matrix. example, UMAP embedding HLA-DRB1 alleles encoded one-hot amino acid matrix 1658 columns, one amino acid position. color indicates 2-digit allele name. can highlight alleles aspartic acid (Asp D) position 57: can use color represent amino acid residue position 57:","code":"uamp(a$onehot, n_epochs = 200, min_dist = 1, spread = 2)"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"get-hla-allele-frequencies-from-allele-frequency-net-database-afnd","dir":"Articles","previous_headings":"","what":"Get HLA allele frequencies from Allele Frequency Net Database (AFND)","title":"hlabud usage examples","text":"hlabud R package includes table HLA allele frequencies Allele Frequency Net Database (AFND). use data, please cite latest manuscript Allele Frequency Net Database: Gonzalez-Galarza FF, McCabe , Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data new query tools. Nucleic Acids Res. 2020;48: D783–D788. doi:10.1093/nar/gkz1029 can use data plot frequency specific allele (e.g. DQB1*02:01) populations 1000 sampled individuals: See github.com/slowkow/allelefrequencies examples might use data.","code":"af <- hla_frequencies() af #> # A tibble: 123,502 × 7 #> group gene allele population indivs_over_n alleles_over_2n n #> #> 1 hla A A*01:01 Argentina Rosario To… 15.1 0.076 86 #> 2 hla A A*01:01 Armenia combined Reg… NA 0.125 100 #> 3 hla A A*01:01 Australia Cape York … NA 0.053 103 #> 4 hla A A*01:01 Australia Groote Eyl… NA 0.027 75 #> 5 hla A A*01:01 Australia New South … NA 0.187 134 #> 6 hla A A*01:01 Australia Yuendumu A… NA 0.008 191 #> 7 hla A A*01:01 Austria 27 0.146 200 #> 8 hla A A*01:01 Azores Central Islan… NA 0.08 59 #> 9 hla A A*01:01 Azores Oriental Isla… NA 0.115 43 #> 10 hla A A*01:01 Azores Terceira Isla… NA 0.109 130 #> # ℹ 123,492 more rows my_allele <- \"DQB1*02:01\" my_af <- af %>% filter(allele == my_allele) %>% filter(n > 1000) %>% arrange(-alleles_over_2n) ggplot(my_af) + aes(x = alleles_over_2n, y = reorder(population, alleles_over_2n)) + scale_y_discrete(position = \"right\") + geom_colh() + labs( x = \"Allele Frequency (Alleles / 2N)\", y = NULL, title = glue(\"Frequency of {my_allele} across {length(unique(my_af$population))} populations\"), caption = \"Data from AFND http://allelefrequencies.net\" )"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"compute-hla-divergence-with-the-grantham-distance-matrix","dir":"Articles","previous_headings":"","what":"Compute HLA divergence with the Grantham distance matrix","title":"hlabud usage examples","text":"Humans diploid, us two copies HLA gene. individual two highly dissimilar alleles can bind greater number different peptides homozygous individual (https://doi.org/10.1007/BF02918202): MHC class II allele capacity bind present specific set peptides processed antigens. inability specific class II allele bind present fragment derived processed antigen results loss immune responsiveness antigen individuals homozygous class II allele. amino acid distance matrix Granthan 1974 (https://doi.org/10.1126/science.185.4154.862) encodes information composition, polarity, molecular volume amino acid. can use matrix compute HLA divergence metric set individuals like : divergence homozygote equal zero, definition: hlabud includes R code divergence calculations translated original Perl code Pierini & Lenz 2018 (https://doi.org/10.1093/molbev/msy116). amino acid distance matrix easily accessible, provide two built-options “grantham” “uniform”:","code":"grantham #> amino c p v #> 1 Ser 1.42 9.2 32.0 #> 2 Arg 0.65 10.5 124.0 #> 3 Leu 0.00 4.9 111.0 #> 4 Pro 0.39 8.0 32.5 #> 5 Thr 0.71 8.6 61.0 #> 6 Ala 0.00 8.1 31.0 #> 7 Val 0.00 5.9 84.0 #> 8 Gly 0.74 9.0 3.0 #> 9 Ile 0.00 5.2 111.0 #> 10 Phe 0.00 5.2 132.0 #> 11 Tyr 0.20 6.2 136.0 #> 12 Cys 2.75 5.5 55.0 #> 13 His 0.58 10.4 96.0 #> 14 Gln 0.89 10.5 85.0 #> 15 Asn 1.33 11.6 56.0 #> 16 Lys 0.33 11.3 119.0 #> 17 Asp 1.38 13.0 54.0 #> 18 Glu 0.92 12.3 83.0 #> 19 Met 0.00 5.7 105.0 #> 20 Trp 0.13 5.4 170.0 my_genos <- c(\"A*23:01:12,A*24:550\", \"A*25:12N,A*11:27\", \"A*24:381,A*33:85\") hla_divergence(my_genos) #> A*23:01:12,A*24:550 A*25:12N,A*11:27 A*24:381,A*33:85 #> 0.5131579 3.4736842 5.1078947 hla_divergence(\"A*01:01,A*01:01\") #> A*01:01,A*01:01 #> 0 amino_distance_matrix(method = \"grantham\") #> A R N D C Q E G H I L K M F P S T W Y #> A 0 112 111 126 195 91 107 60 86 94 96 106 84 113 27 99 58 148 112 #> R 112 0 86 96 180 43 54 125 29 97 102 26 91 97 103 110 71 101 77 #> N 111 86 0 23 139 46 42 80 68 149 153 94 142 158 91 46 65 174 143 #> D 126 96 23 0 154 61 45 94 81 168 172 101 160 177 108 65 85 181 160 #> C 195 180 139 154 0 154 170 159 174 198 198 202 196 205 169 112 149 215 194 #> Q 91 43 46 61 154 0 29 87 24 109 113 53 101 116 76 68 42 130 99 #> E 107 54 42 45 170 29 0 98 40 134 138 56 126 140 93 80 65 152 122 #> G 60 125 80 94 159 87 98 0 98 135 138 127 127 153 42 56 59 184 147 #> H 86 29 68 81 174 24 40 98 0 94 99 32 87 100 77 89 47 115 83 #> I 94 97 149 168 198 109 134 135 94 0 5 102 10 21 95 142 89 61 33 #> L 96 102 153 172 198 113 138 138 99 5 0 107 15 22 98 145 92 61 36 #> K 106 26 94 101 202 53 56 127 32 102 107 0 95 102 103 121 78 110 85 #> M 84 91 142 160 196 101 126 127 87 10 15 95 0 28 87 135 81 67 36 #> F 113 97 158 177 205 116 140 153 100 21 22 102 28 0 114 155 103 40 22 #> P 27 103 91 108 169 76 93 42 77 95 98 103 87 114 0 74 38 147 110 #> S 99 110 46 65 112 68 80 56 89 142 145 121 135 155 74 0 58 177 144 #> T 58 71 65 85 149 42 65 59 47 89 92 78 81 103 38 58 0 128 92 #> W 148 101 174 181 215 130 152 184 115 61 61 110 67 40 147 177 128 0 37 #> Y 112 77 143 160 194 99 122 147 83 33 36 85 36 22 110 144 92 37 0 #> V 64 96 133 152 192 96 121 109 84 29 32 97 21 50 68 124 69 88 55 #> V #> A 64 #> R 96 #> N 133 #> D 152 #> C 192 #> Q 96 #> E 121 #> G 109 #> H 84 #> I 29 #> L 32 #> K 97 #> M 21 #> F 50 #> P 68 #> S 124 #> T 69 #> W 88 #> Y 55 #> V 0"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"download-and-unpack-all-data-from-the-latest-imgthla-release","dir":"Articles","previous_headings":"","what":"Download and unpack all data from the latest IMGTHLA release","title":"hlabud usage examples","text":"want use hla_alignments(), don’t need install_hla() data files downloaded automatically needed cached future use. users might need access additional files present full data release. Run install_hla() download unpack latest IMGTHLA release. destination folder downloaded data files getOption(\"hlabud_dir\") (automatically tailored operating system thanks rappdirs package). examples download releases get list release names. Download latest release (default) specific release: Optionally, get set directory hlabud uses store data: installing releases, hlabud folder might look like :","code":"# Download all of the data (120MB) for the latest IMGTHLA release install_hla(release = \"latest\") # Download a specific release install_hla(release = \"3.51.0\") getOption(\"hlabud_dir\") #> [1] \"/home/username/.local/share/hlabud\" # Manually override the directory for hlabud to use options(hlabud_dir = \"/path/to/my/dir\") ❯ ls -lah \"/home/user/.local/share/hlabud\" total 207M drwxrwxr-x 3 user user 32 Apr 5 01:19 3.30.0 drwxrwxr-x 11 user user 4.0K Apr 7 19:31 3.40.0 drwxrwxr-x 12 user user 4.0K Apr 5 00:27 3.51.0 -rw-rw-r-- 1 user user 15K Apr 7 19:23 tags.json -rw-rw-r-- 1 user user 79M Apr 7 19:28 v3.40.0-alpha.tar.gz -rw-rw-r-- 1 user user 129M Apr 4 20:07 v3.51.0-alpha.tar.gz"},{"path":"https://slowkow.github.io/hlabud/articles/examples.html","id":"count-the-number-of-alleles-in-each-imgthla-release","dir":"Articles","previous_headings":"","what":"Count the number of alleles in each IMGTHLA release","title":"hlabud usage examples","text":"can get list release names: can get allele names release: Next, count many alleles release: plot number alleles line plot:","code":"releases <- hla_releases() releases #> [1] \"3.56.0\" \"3.55.0\" \"3.54.0\" \"3.53.0\" \"3.52.0\" \"3.51.0\" #> [7] \"3.50.0\" \"3.49.0\" \"3.48.0\" \"3.47.0\" \"3.46.0\" \"3.45.1\" #> [13] \"3.45.01\" \"3.45.0.1\" \"3.45.0\" \"3.44.1\" \"3.44.0\" \"3.43.0\" #> [19] \"3.42.0\" \"3.41.2\" \"3.41.0\" \"3.40.0\" \"3.39.0\" \"3.38.0\" #> [25] \"3.37.0\" \"3.36.0\" \"3.35.0\" \"3.34.0\" \"3.33.0\" \"3.32.0\" my_alleles <- rbindlist(lapply(releases, function(release) { retval <- hla_alleles(release = release) retval$release <- release return(retval) }), fill = TRUE) #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.3451.txt' #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.34501.txt' #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.34501.txt' #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.3441.txt' #> Warning in hla_alleles(release = release): unrecognized release name #> 'Allelelist.3412.txt' d <- my_alleles %>% count(release) %>% filter(n > 1) d #> release n #> #> 1: 3.32.0 18363 #> 2: 3.33.0 18955 #> 3: 3.34.0 20272 #> 4: 3.35.0 21683 #> 5: 3.36.0 22548 #> 6: 3.37.0 24093 #> 7: 3.38.0 25958 #> 8: 3.39.0 26512 #> 9: 3.40.0 27273 #> 10: 3.41.0 27980 #> 11: 3.42.0 28786 #> 12: 3.43.0 29417 #> 13: 3.44.0 30523 #> 14: 3.45.0 31552 #> 15: 3.46.0 32330 #> 16: 3.47.0 33552 #> 17: 3.48.0 34145 #> 18: 3.49.0 35077 #> 19: 3.50.0 36016 #> 20: 3.51.0 36625 #> 21: 3.52.0 37068 #> 22: 3.53.0 37619 #> 23: 3.54.0 38416 #> 24: 3.55.0 38909 #> 25: 3.56.0 39886 #> release n ggplot(d) + aes(x = release, y = n, group = 1) + geom_line() + geom_text(aes(label = release), hjust = 1) + labs(x = NULL, y = \"Number of alleles\", title = \"Each release has more HLA alleles\") + theme( axis.text.x = element_blank(), axis.ticks.x = element_blank(), ) d2 <- my_alleles %>% mutate(gene = str_split_fixed(Allele, \"\\\\*\", 2)[,1]) %>% count(release, gene) ggplot() + aes(x = release, y = n) + geom_line( data = d2, aes(group = gene, color = gene) ) + scale_color_discrete(guide = \"none\") + geom_text( data = d2 %>% filter(release == \"3.52.0\"), mapping = aes(label = gene), hjust = 0 ) + labs(x = NULL, y = \"Number of alleles\", title = \"Number of alleles per release and gene\") + scale_x_discrete(expand = expansion(mult = c(0.01, 0.1))) + scale_y_log10() + theme( panel.grid.major.y = element_line(), axis.text.x = element_blank(), axis.ticks.x = element_blank(), )"},{"path":"https://slowkow.github.io/hlabud/articles/numbering.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Numbering amino acid positions","text":"Kamil Slowikowski 2024-05-15 IMGTHLA provides Github repo alignments amino acid sequences nucleotide sequences thousands alleles HLA genes. IMGTHLA alignments define official numbering scheme, provide explanations conventions help page. hlabud R package provides easy access alignment data, hlabud follows official numbering scheme. examples help beginners visualize understand conventions work.","code":""},{"path":"https://slowkow.github.io/hlabud/articles/numbering.html","id":"alignment-files-on-the-imgthla-github-page","dir":"Articles","previous_headings":"","what":"Alignment files on the IMGTHLA Github page","title":"Numbering amino acid positions","text":"IMGTHLA Github page provides folder alignment files. examples vignette, use HLA-DRB1 gene. DRB1, can find three separate files: https://raw.githubusercontent.com/ANHIG/IMGTHLA/v3.56.0-alpha/alignments/DRB1_gen.txt https://raw.githubusercontent.com/ANHIG/IMGTHLA/v3.56.0-alpha/alignments/DRB1_nuc.txt https://raw.githubusercontent.com/ANHIG/IMGTHLA/v3.56.0-alpha/alignments/DRB1_prot.txt files contain different information: gen contains genomic DNA sequences. nuc contains nucleotide coding sequences (CDS). prot contains protein sequences (amino acids). Let’s consider DRB1_prot.txt file. file look like? plain text file header sequence alignments. alignment, line represents one sequence (allele), line 100 residues. first 100 residues alleles shown first block. , next block next 100 residues alleles, .","code":""},{"path":"https://slowkow.github.io/hlabud/articles/numbering.html","id":"numbering-conventions","dir":"Articles","previous_headings":"Alignment files on the IMGTHLA Github page","what":"Numbering conventions","title":"Numbering amino acid positions","text":"conventions used alignments (copied EBI): entry allele displayed respect reference sequences. identity reference sequence present base displayed hyphen (-). Non-identity reference sequence shown displaying appropriate base position. insertion deletion occurred represented period (.). sequence unknown point alignment, represented asterisk (*). protein alignments null alleles, ‘Stop’ codons represented hash (X). protein alignments, sequence following termination codon, marked appear blank. conventions used nucleotide protein alignments. ’s lot information! Let’s try work example illustrate works. first sequence alignment reference sequence. position numbering relative reference sequence. means deletions (.) reference sequence numbered. Notice numbering starts negative numbers. help page clarifies: Protein Sequence Numbering amino acid-based systems, start codon mature protein labeled codon 1. codon 5’ numbered -1. numbering based reference sequence. amino acid number 0.","code":""},{"path":"https://slowkow.github.io/hlabud/articles/numbering.html","id":"numbering-indels","dir":"Articles","previous_headings":"Alignment files on the IMGTHLA Github page","what":"Numbering indels","title":"Numbering amino acid positions","text":"alignment shows 100 residues displayed chunks 10: numbering convention says indels reference sequence numbered. clarify point, manually added additional numbers (11, 21, 30, 39, 49, 59) alignment : Notice move first chunk GDTRPRFLWQ next chunk LKFECHFFNG simply add 10 1 get 11 number L amino acid. , move TERVR.LLER, add 10 11 get 21 T amino acid. However, move CIYNQEE.SV rule “add 10” work. Instead labeling C position 31, label position 30. ? reason C 31, 30, indel (gap) reference sequence position 25_26 (notice . R.L). convention deletions reference sequence numbered. Let’s take closer look data hlabud. first amino acid positions first 4 sequences: hlabud numbers positions focusing example: hlabud using correct numbering, see: - T position 21 - C position 30 see positions 25, 26, 25_26? alignment file: result hlabud: , can see deletion positions 25 26 numbered like residues. Instead, gets special label (25_26) consists positions flanking indel (25 26). alleles observe position 25_26? three possibilities position 25_26: . indicates deletion 1 amino acid (absence amino position) * indicates sequence unknown position W indicates tryptophan position hope example helps explain numbering indels. notice discrepancy hlabud IMGT, please report .","code":"library(hlabud) a <- hla_alignments(\"DRB1\", release = \"3.56.0\") seqs <- substr(a$sequences[1:4], 30, 89) str_replace_all(seqs, \"(\\\\S{10})\", \"\\\\1 \") #> [1] \"GDTRPRFLWQ LKFECHFFNG TERVR.LLER CIYNQEE.SV RFDSDVGEYR AVTELGRPDA \" #> [2] \"---------- ---------- -----.---- -------.-- ---------- ---------- \" #> [3] \"---------- ---------- -----.---- -------.-- ---------- ---------- \" #> [4] \"---------- ---------- -----.---- -------.-- ---------- ---------- \" colnames(a$alleles)[50:70] #> [1] \"21\" \"22\" \"23\" \"24\" \"25\" \"25_26\" \"26\" \"27\" \"28\" #> [10] \"29\" \"30\" \"31\" \"32\" \"33\" \"34\" \"35\" \"36\" \"36_37\" #> [19] \"37\" \"38\" \"39\" a$alleles[1,\"21\"] #> [1] \"T\" a$alleles[1,\"30\"] #> [1] \"C\" a$alleles[1,\"25\"] #> [1] \"R\" a$alleles[1,\"26\"] #> [1] \"L\" a$alleles[1,\"25_26\"] #> [1] \".\" table(a$alleles[,\"25_26\"]) #> #> . * W #> 3658 12 1"},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Visualize HLA protein structures","text":"Kamil Slowikowski 2024-05-15 vignette, explore different methods visualizing molecular structure HLA proteins. First, ’ll look example use NGLVieweR R package show HLA protein structures. Next, ’ll use PyMOL thing.","code":""},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"what-are-the-pdb-identifiers-for-each-hla-gene","dir":"Articles","previous_headings":"","what":"What are the PDB identifiers for each HLA gene?","title":"Visualize HLA protein structures","text":"list PDB identifiers might consider using represent HLA protein: Also try searching PDB website , e.g., \"HLA-DR\" see appropriate structure analysis.","code":"HLA-A 2xpg HLA-B 2bvp HLA-C 4nt6 HLA-DP 3lqz HLA-DQ 4z7w HLA-DR 3pdo"},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"using-nglviewer","dir":"Articles","previous_headings":"","what":"Using NGLVieweR","title":"Visualize HLA protein structures","text":"Let’s try visualize amino acid PDB position 9 HLA-B protein structure. visualize structure 2bvp Protein Data Bank (PDB). example NGLVieweR R package Niels van der Velden: view , see blue peptide red HLA-B protein. tyrosine PDB position 9 highlighted ball+stick representation, also labeled text label. structure rotating can getter better view. can use hlabud answer questions HLA-B amino acid sequence. first question need ask : IMGT position corresponds tyrosine PDB position 9? need open PDB Sequence Annotations tool order figure IMGT number corresponds PDB number 9. screenshot tool: Next, can view amino acid sequence numbering IMGT: eye, can see sequence YFYT starting PDB position 9 corresponds YFYT sequence IMGT position 3. , manually confirmed PDB position 9 matches IMGT position 3. Next, might ask HLA-B alleles Y3? fraction reported HLA-B alleles tyrosine IMGT position 3 (Y3)? turns , almost HLA-B alleles Y3.","code":"# devtools::install_github(\"nvelden/NGLVieweR\") # we need the latest version library(NGLVieweR) library(magrittr) my_sele <- \"9:A\" NGLVieweR(\"2bvp\") %>% stageParameters( backgroundColor = \"white\", zoomSpeed = 1, cameraFov = 80 ) %>% addRepresentation( type = \"cartoon\" ) %>% addRepresentation( type = \"ball+stick\", param = list( sele = my_sele ) ) %>% addRepresentation( type = \"label\", param = list( sele = my_sele, labelType = \"format\", labelFormat = \"[%(resname)s]%(resno)s\", # or enter custom text labelGrouping = \"residue\", # or \"atom\" (eg. sele = \"20:A.CB\") color = \"black\", fontFamiliy = \"sans-serif\", xOffset = 1, yOffset = 0, zOffset = 0, fixedSize = TRUE, radiusType = 1, radiusSize = 5.5, # Label size showBackground = TRUE # backgroundColor=\"black\", # backgroundOpacity=0.5 ) ) %>% zoomMove( center = my_sele, zoom = my_sele, duration = 0, # animation time in ms z_offSet = -20 ) %>% setSpin() library(hlabud) a <- hla_alignments(\"B\") library(stringr) a$alleles[which(str_detect(rownames(a$alleles), \"B*57:03\")),][1,1:50] #> n30 n29 n28 n27 n26 n25 n24 n23 #> \"M\" \"R\" \"V\" \"T\" \"A\" \"P\" \"R\" \"T\" #> n22 n22_n21 n21 n20 n19 n18 n17 n16 #> \"V\" \"......\" \"L\" \"L\" \"L\" \"L\" \"W\" \"G\" #> n15 n14 n13 n12 n11 n10 n9 n8 #> \"A\" \"V\" \"A\" \"L\" \"T\" \"E\" \"T\" \"W\" #> n7 n6 n5 n4 n3 n2 n1 1 #> \"A\" \"G\" \"S\" \"H\" \"S\" \"M\" \"R\" \"Y\" #> 2 3 4 5 6 7 8 9 #> \"F\" \"Y\" \"T\" \"A\" \"M\" \"S\" \"R\" \"P\" #> 10 11 12 13 14 15 16 17 #> \"G\" \"R\" \"G\" \"E\" \"P\" \"R\" \"F\" \"I\" #> 18 18_19 #> \"A\" \".....\" my_alleles <- names(which(a$onehot[,\"Y3\"] == 1)) length(my_alleles) #> [1] 7023 head(my_alleles, 20) #> [1] \"B*07:02:01:01\" \"B*07:02:01:02\" \"B*07:02:01:03\" \"B*07:02:01:04\" #> [5] \"B*07:02:01:05\" \"B*07:02:01:06\" \"B*07:02:01:07\" \"B*07:02:01:08\" #> [9] \"B*07:02:01:09\" \"B*07:02:01:10\" \"B*07:02:01:11\" \"B*07:02:01:12\" #> [13] \"B*07:02:01:13\" \"B*07:02:01:14\" \"B*07:02:01:15\" \"B*07:02:01:16\" #> [17] \"B*07:02:01:17\" \"B*07:02:01:18\" \"B*07:02:01:19\" \"B*07:02:01:20\" sum(a$onehot[,\"Y3\"] == 1) / nrow(a$onehot) #> [1] 0.711406"},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"using-pymol","dir":"Articles","previous_headings":"","what":"Using PyMOL","title":"Visualize HLA protein structures","text":"PyMOL one favorite methods visualizing protein structures, allows us change residue existing protein visualize new mutated protein. takes lines PyMOL create nice figure. example, want quickly highlight positions 13 45 HLA-DQB1, snippet PyMOL code produce figure . Bash script : Write PyMOL script Run PyMOL script pymol command PyMOL script : Load structure Protein Data Bank (PDB). 7kei identifier published protein structure. Color HLA-DQA1 protein teal. Color HLA-DQB1 protein orange. Color peptide purple. color residues 13 45 HLA-DQB1 red. Label residues positions names. Write PNG file view structure. image , manually rotated structure mouse added text labels like \"PDB: 7kei\" saving file.","code":"#!/usr/bin/env bash # Write a pymol script cat << EOF > script.pml fetch 7kei show cartoon remove solvent remove chain D remove chain H color teal, chain A color orange, chain B color purple, chain C color red, chain B & resi 13 color red, chain B & resi 45 label n. CA and chain B & resi 13, \"%s %s\" % (resi, resn) label n. CA and chain B & resi 45, \"%s %s\" % (resi, resn) png 7kei.png, width=1200, height=800, dpi=300 EOF # On Linux, we can just use `pymol` without making an alias # On macOS, we need to make an alias alias pymol=/Applications/PyMOL.app/Contents/MacOS/PyMOL pymol -c script.pml"},{"path":"https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html","id":"other-software-for-viewing-pdb-data","dir":"Articles","previous_headings":"","what":"Other software for viewing PDB data","title":"Visualize HLA protein structures","text":"ChimeraX: https://www.cgl.ucsf.edu/chimerax/ Python: https://github.com/nglviewer/nglview Javascript: https://www.rcsb.org/3d-view https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=7kei&bu=1 https://github.com/nglviewer/ngl https://github.com/biasmv/pv R: https://www.raymolecule.com","code":""},{"path":"https://slowkow.github.io/hlabud/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Kamil Slowikowski. Author, maintainer.","code":""},{"path":"https://slowkow.github.io/hlabud/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"J R, DJ B, X G, MA C, P F, SGE. M (2019). “IPD-IMGT/HLA Database.” Nucleic Acids Research, 48(D1), D948–D955. doi:10.1093/nar/gkz950. Slowikowski K (2023). hlabud: IMGTHLA Data R. doi:10.5281/zenodo.8183949, R package version 2.0.0, https://github.com/slowkow/hlabud.","code":"@Article{, author = {Robinson J and Barker DJ and Georgiou X and Cooper MA and Flicek P and Marsh SGE.}, title = {IPD-IMGT/HLA Database}, doi = {10.1093/nar/gkz950}, year = {2019}, month = {oct}, publisher = {Oxford University Press}, volume = {48}, number = {D1}, pages = {D948–D955}, journal = {Nucleic Acids Research}, } @Manual{, title = {{hlabud}: IMGTHLA Data from R}, author = {Kamil Slowikowski}, year = {2023}, note = {R package version 2.0.0}, doi = {10.5281/zenodo.8183949}, url = {https://github.com/slowkow/hlabud}, }"},{"path":"https://slowkow.github.io/hlabud/index.html","id":"hlabud-hla-analysis-in-r-","dir":"","previous_headings":"","what":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"hlabud provides methods retrieve sequence alignment data IMGTHLA convert data convenient R matrices ready downstream analysis. See usage examples learn use data logistic regression dimensionality reduction. example, let’s consider simple question two HLA genotypes. amino acid positions different two genotypes? output, can conclude four positions (26, 28, 47, 86) distinguish two HLA-DRB1 alleles. see DRB1*03:01:05 Y position 26 DRB1*03:02:03 F.","code":"library(hlabud) a <- hla_alignments(\"DRB1\") dosage(a$onehot, c(\"DRB1*03:01:05\", \"DRB1*03:02:03\")) ## F26 Y26 D28 E28 F47 Y47 G86 V86 ## DRB1*03:01:05 0 1 1 0 1 0 0 1 ## DRB1*03:02:03 1 0 0 1 0 1 1 0"},{"path":"https://slowkow.github.io/hlabud/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"quickest way get hlabud install GitHub:","code":"# install.packages(\"devtools\") devtools::install_github(\"slowkow/hlabud\")"},{"path":"https://slowkow.github.io/hlabud/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"See usage examples get ideas use hlabud analyses. Get one-hot encoded matrix HLA-DRB1 alleles Convert genotypes dosage matrix Logistic regression association amino acid positions UMAP embedding 3,516 HLA-DRB1 alleles Get HLA allele frequencies Allele Frequency Net Database (AFND) Compute HLA divergence Grantham distance matrix Download unpack data latest IMGTHLA release Visualize 3D molecular structure HLA proteins highlight specific amino acid residues","code":""},{"path":"https://slowkow.github.io/hlabud/index.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"hlabud provides access data IMGT/HLA database. Therefore, use hlabud please cite IMGT/HLA paper: Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. IPD-IMGT/HLA Database. Nucleic Acids Res. 2020;48: D948–D955. doi:10.1093/nar/gkz950 hlabud also provides access data Allele Frequency Net Database (AFND). Therefore, use hlabud::hla_frequencies() please cite AFND paper: Gonzalez-Galarza FF, McCabe , Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data new query tools. Nucleic Acids Res. 2020;48: D783–D788. doi:10.1093/nar/gkz1029 Additionally, can also cite hlabud package like : Slowikowski K. hlabud: methods access analysis human leukocyte antigen (HLA) gene sequence alignments IMGT/HLA. R package version 1.0.0.","code":""},{"path":"https://slowkow.github.io/hlabud/index.html","id":"related-work","dir":"","previous_headings":"","what":"Related work","title":"Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA","text":"recommend article anyone new HLA, beautiful figures help build intuition: La Gruta NL, Gras S, Daley SR, Thomas PG, Rossjohn J. Understanding drivers MHC restriction T cell receptors. Nat Rev Immunol. 2018;18: 467–478. Learn conventions HLA nomenclature: Marsh SGE, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, et al. Nomenclature factors HLA system, 2010. Tissue Antigens. 2010;75: 291–455. case-control analysis HLA genotype data, consider BIGDAWG R package available CRAN. related article: Pappas DJ, Marin W, Hollenbach JA, Mack SJ. Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): integrated case-control analysis pipeline. Hum Immunol. 2016;77: 283–287.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"default, return amino acid distance matrix Grantham 1974 (doi:10.1126/science.185.4154.862).","code":""},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"","code":"amino_distance_matrix(method = \"grantham\")"},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"method \"grantham\" Grantham 1974 matrix \"uniform\" matrix ones non-diagonal.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"20x20 symmetric matrix positive numbers zeros diagonal.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/amino_distance_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a pairwise 20x20 distance matrix for all pairs of amino acids — amino_distance_matrix","text":"","code":"# By default, the Grantham 1974 matrix amino_distance_matrix(\"grantham\") #> A R N D C Q E G H I L K M F P S T W Y #> A 0 112 111 126 195 91 107 60 86 94 96 106 84 113 27 99 58 148 112 #> R 112 0 86 96 180 43 54 125 29 97 102 26 91 97 103 110 71 101 77 #> N 111 86 0 23 139 46 42 80 68 149 153 94 142 158 91 46 65 174 143 #> D 126 96 23 0 154 61 45 94 81 168 172 101 160 177 108 65 85 181 160 #> C 195 180 139 154 0 154 170 159 174 198 198 202 196 205 169 112 149 215 194 #> Q 91 43 46 61 154 0 29 87 24 109 113 53 101 116 76 68 42 130 99 #> E 107 54 42 45 170 29 0 98 40 134 138 56 126 140 93 80 65 152 122 #> G 60 125 80 94 159 87 98 0 98 135 138 127 127 153 42 56 59 184 147 #> H 86 29 68 81 174 24 40 98 0 94 99 32 87 100 77 89 47 115 83 #> I 94 97 149 168 198 109 134 135 94 0 5 102 10 21 95 142 89 61 33 #> L 96 102 153 172 198 113 138 138 99 5 0 107 15 22 98 145 92 61 36 #> K 106 26 94 101 202 53 56 127 32 102 107 0 95 102 103 121 78 110 85 #> M 84 91 142 160 196 101 126 127 87 10 15 95 0 28 87 135 81 67 36 #> F 113 97 158 177 205 116 140 153 100 21 22 102 28 0 114 155 103 40 22 #> P 27 103 91 108 169 76 93 42 77 95 98 103 87 114 0 74 38 147 110 #> S 99 110 46 65 112 68 80 56 89 142 145 121 135 155 74 0 58 177 144 #> T 58 71 65 85 149 42 65 59 47 89 92 78 81 103 38 58 0 128 92 #> W 148 101 174 181 215 130 152 184 115 61 61 110 67 40 147 177 128 0 37 #> Y 112 77 143 160 194 99 122 147 83 33 36 85 36 22 110 144 92 37 0 #> V 64 96 133 152 192 96 121 109 84 29 32 97 21 50 68 124 69 88 55 #> V #> A 64 #> R 96 #> N 133 #> D 152 #> C 192 #> Q 96 #> E 121 #> G 109 #> H 84 #> I 29 #> L 32 #> K 97 #> M 21 #> F 50 #> P 68 #> S 124 #> T 69 #> W 88 #> Y 55 #> V 0 # All ones, and zeros on the diagonal amino_distance_matrix(\"uniform\") #> A R N D C Q E G H I L K M F P S T W Y V #> A 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> R 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> N 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> D 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> C 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> Q 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> E 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 #> G 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 #> H 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 #> I 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 #> L 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 #> K 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 #> M 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 #> F 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 #> P 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 #> S 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 #> T 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 #> W 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 #> Y 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 #> V 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0"},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"genotype name, return dosage matrix residue (amino acid nucleotide) position.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"","code":"dosage( mat, names, drop_constants = TRUE, drop_duplicates = FALSE, verbose = FALSE )"},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"mat one-hot encoded matrix one row per allele one column residue (amino acid nucleotide) position. names Input character vector one genotype individual. entries must present rownames(mat). drop_constants Filter constant amino acid positions. TRUE default. drop_duplicates Filter duplicate amino acid positions. FALSE default. verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"matrix one row input genotype, one column residue position.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"genotype represented like \"HLA-*01:01,HLA-*01:01\" default, returned matrix filtered exclude: positions input genotypes allele","code":""},{"path":"https://slowkow.github.io/hlabud/reference/dosage.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert a set of genotype names into a dosage matrix of each residue at each position — dosage","text":"","code":"DRB1_file <- file.path( \"https://github.com/ANHIG/IMGTHLA/raw\", \"5f2c562056f8ffa89aeea0631f2a52300ee0de17\", \"alignments/DRB1_prot.txt\" ) a <- read_alignments(DRB1_file) genotypes <- c( \"DRB1*12:02:02:03,DRB1*12:02:02:03,DRB1*14:54:02\", \"DRB1*04:174,DRB1*15:152\", \"DRB1*04:56:02,DRB1*15:01:48\", \"DRB1*14:172,DRB1*04:160\", \"DRB1*04:359,DRB1*04:284:02\" ) dosage <- dosage(a$onehot, genotypes) dosage[,1:5] #> n29unk Mn29 n28unk Vn28 n27unk #> DRB1*12:02:02:03,DRB1*12:02:02:03,DRB1*14:54:02 1 2 1 2 1 #> DRB1*04:174,DRB1*15:152 2 0 2 0 2 #> DRB1*04:56:02,DRB1*15:01:48 2 0 2 0 2 #> DRB1*14:172,DRB1*04:160 2 0 2 0 2 #> DRB1*04:359,DRB1*04:284:02 2 0 2 0 2"},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"function : Get folder name getOption(\"hlabud_dir\") else automatically choose appropriate folder operating system thanks rappdirs. Create folder automatically already exist. Set hlabud_dir option new folder.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"","code":"get_hlabud_dir()"},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"name folder.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"locations hlabud_dir folder operating system. Linux: Mac: Windows: set hlabud_dir option, please use:","code":"~/.local/share/hlabud ~/Library/Application Support/hlabud C:\\Documents and Settings\\{User}\\Application Data\\slowkow\\hlabud options(hlabud_dir = \"/my/favorite/path\")"},{"path":"https://slowkow.github.io/hlabud/reference/get_hlabud_dir.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the name of the folder for caching downloaded IMGTHLA files — get_hlabud_dir","text":"","code":"if (FALSE) { hlabud_dir <- get_hlabud_dir() }"},{"path":"https://slowkow.github.io/hlabud/reference/get_onehot.html","id":null,"dir":"Reference","previous_headings":"","what":"Make a one-hot encoded matrix from a dataframe of amino acid sequences. — get_onehot","title":"Make a one-hot encoded matrix from a dataframe of amino acid sequences. — get_onehot","text":"Make one-hot encoded matrix dataframe amino acid sequences.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/get_onehot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make a one-hot encoded matrix from a dataframe of amino acid sequences. — get_onehot","text":"","code":"get_onehot(sequences, n_pre, verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/get_onehot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make a one-hot encoded matrix from a dataframe of amino acid sequences. — get_onehot","text":"n_pre number amino acid sequences position 1. verbose Print messages along way. al dataframe columns allele, seq","code":""},{"path":"https://slowkow.github.io/hlabud/reference/grantham.html","id":null,"dir":"Reference","previous_headings":"","what":"Table 1 from Grantham 1974 — grantham","title":"Table 1 from Grantham 1974 — grantham","text":"Grantham R. Amino Acid Difference Formula Help Explain Protein Evolution. Science. 1974;185: 862–864. doi:10.1126/science.185.4154.862","code":""},{"path":"https://slowkow.github.io/hlabud/reference/grantham.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Table 1 from Grantham 1974 — grantham","text":"","code":"grantham"},{"path":"https://slowkow.github.io/hlabud/reference/grantham.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Table 1 from Grantham 1974 — grantham","text":"data frame 20 rows 5 columns: amino Amino acid c Composition c, atomic weight ratio noncarbon elements end groups rings carbons side chain p Polarity p published data v Volume v published data","code":""},{"path":"https://slowkow.github.io/hlabud/reference/grantham.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Table 1 from Grantham 1974 — grantham","text":"","code":"grantham #> amino c p v #> 1 Ser 1.42 9.2 32.0 #> 2 Arg 0.65 10.5 124.0 #> 3 Leu 0.00 4.9 111.0 #> 4 Pro 0.39 8.0 32.5 #> 5 Thr 0.71 8.6 61.0 #> 6 Ala 0.00 8.1 31.0 #> 7 Val 0.00 5.9 84.0 #> 8 Gly 0.74 9.0 3.0 #> 9 Ile 0.00 5.2 111.0 #> 10 Phe 0.00 5.2 132.0 #> 11 Tyr 0.20 6.2 136.0 #> 12 Cys 2.75 5.5 55.0 #> 13 His 0.58 10.4 96.0 #> 14 Gln 0.89 10.5 85.0 #> 15 Asn 1.33 11.6 56.0 #> 16 Lys 0.33 11.3 119.0 #> 17 Asp 1.38 13.0 54.0 #> 18 Glu 0.92 12.3 83.0 #> 19 Met 0.00 5.7 105.0 #> 20 Trp 0.13 5.4 170.0"},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":null,"dir":"Reference","previous_headings":"","what":"Get sequence alignments from IMGTHLA — hla_alignments","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"conventions used alignments (EBI IMGT-HLA help page): entry allele displayed respect reference sequences. identity reference sequence present base displayed hyphen (-). Non-identity reference sequence shown displaying appropriate base position. insertion deletion occurred represented period (.). sequence unknown point alignment, represented asterisk (*). protein alignments null alleles, 'Stop' codons represented hash (X). protein alignments, sequence following termination codon, marked appear blank. conventions used nucleotide protein alignments.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"","code":"hla_alignments( gene = \"DRB1\", type = \"prot\", release = \"latest\", verbose = FALSE )"},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"gene name gene like \"DRB1\" type type sequence, one \"prot\", \"nuc\", \"gen\" release Default \"latest\". release name like \"3.51.0\". verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"list character vector called sequences two matrices called alleles onehot. character vector sequences one sequence allele, names allele names. matrix alleles one row allele, one column position, values representing residues position allele. matrix onehot one-hot encoding variants distinguish alleles, one row allele one column amino acid position.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hla_alignments.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get sequence alignments from IMGTHLA — hla_alignments","text":"","code":"# \\donttest{ a <- hla_alignments(\"DRB1\") head(a$sequences) #> DRB1*01:01:01:01 #> \"MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVR.LLERCIYNQEE.SVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRR.VEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSKMLSGVGGFVLGLLFLGAGLFIYFRNQKGHSGLQPTGFLS\" #> DRB1*01:01:01:02 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:03 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:04 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:05 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:06 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" a$alleles[1:6,1:6] #> n29 n28 n27 n26 n25 n24 #> DRB1*01:01:01:01 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:02 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:03 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:04 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:05 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" #> DRB1*01:01:01:06 \"M\" \"V\" \"C\" \"L\" \"K\" \"L\" a$onehot[1:6,1:6] #> n29unk Mn29 n28unk Ln28 Vn28 n27unk #> DRB1*01:01:01:01 0 1 0 0 1 0 #> DRB1*01:01:01:02 0 1 0 0 1 0 #> DRB1*01:01:01:03 0 1 0 0 1 0 #> DRB1*01:01:01:04 0 1 0 0 1 0 #> DRB1*01:01:01:05 0 1 0 0 1 0 #> DRB1*01:01:01:06 0 1 0 0 1 0 # }"},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"Download list allele names HLA genes particular IMGTHLA release.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"","code":"hla_alleles(release = \"latest\", overwrite = FALSE, verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"release Default \"latest\". release name like \"3.51.0\". overwrite Overwrite existing alleles.json file Allelelist.{version}.txt file verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"data frame HLA allele ids names","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hla_alleles.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a table of allele names for a particular IMGTHLA release — hla_alleles","text":"","code":"# \\donttest{ head(hla_alleles()) #> AlleleID Allele #> 1 HLA00001 A*01:01:01:01 #> 2 HLA02169 A*01:01:01:02N #> 3 HLA14798 A*01:01:01:03 #> 4 HLA15760 A*01:01:01:04 #> 5 HLA16415 A*01:01:01:05 #> 6 HLA16417 A*01:01:01:06 # }"},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate HLA divergence for each individual — hla_divergence","title":"Calculate HLA divergence for each individual — hla_divergence","text":"First, convert allele name (e.g. *01:01) amino acid sequence. divergence sum distances pair amino acids position, divided total sequence length. amino acid distance matrix use one published Grantham 1974 (doi:10.1126/science.185.4154.862), based three physical properties amino acids (composition, polarity, molecular volume) correlated estimate relative substitution frequency.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate HLA divergence for each individual — hla_divergence","text":"","code":"hla_divergence( alleles = c(\"A*01:01,A*02:01\"), method = \"grantham\", release = \"latest\", verbose = FALSE )"},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate HLA divergence for each individual — hla_divergence","text":"alleles character vector comma-delimited alleles individual. usually expect two alleles per individual, possible (fewer) copies due copy number alterations. function still works individual different number alleles. method pairwise amino acid matrix, method name: \"grantham\" \"uniform\" indicate pairwise amino acid distance matrix use. choose pass matrix, 20x20 symmetric matrix zeros diagonal, rownames colnames one-letter amino acid codes R N D C Q E G H L K M F P S T W Y V. release Default \"latest\". release name like \"3.51.0\". verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate HLA divergence for each individual — hla_divergence","text":"dataframe divergence individual.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Calculate HLA divergence for each individual — hla_divergence","text":"code function translation original Perl code Tobias Lenz, published Pierini & Lenz 2018 MolBiolEvol (https://doi.org/10.1093/molbev/msy116). comparing two amino acid sequences, characters one 20 amino acids considered divergence calculation, gaps (characters) count.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hla_divergence.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate HLA divergence for each individual — hla_divergence","text":"","code":"my_genos <- c(\"A*23:01:12,A*24:550\", \"A*25:12N,A*11:27\", \"A*24:381,A*33:85\", \"A*01:01:,A*01:01,A*02:01\") hla_divergence(my_genos, method = \"grantham\") #> A*23:01:12,A*24:550 A*25:12N,A*11:27 A*24:381,A*33:85 #> 0.5131579 3.4736842 5.1078947 #> A*01:01:,A*01:01,A*02:01 #> 3.9982456 # This is equivalent hla_divergence(my_genos, method = amino_distance_matrix(\"grantham\")) #> A*23:01:12,A*24:550 A*25:12N,A*11:27 A*24:381,A*33:85 #> 0.5131579 3.4736842 5.1078947 #> A*01:01:,A*01:01,A*02:01 #> 3.9982456"},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":null,"dir":"Reference","previous_headings":"","what":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"Download read table HLA allele frequencies Allele Frequency Net Database (AFND).","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"","code":"hla_frequencies(verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"dataframe HLA allele frequencies genes.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"use data, please cite latest manuscript Allele Frequency Net Database: Gonzalez-Galarza FF, McCabe , Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data new query tools. Nucleic Acids Res. 2020;48: D783–D788. doi:10.1093/nar/gkz1029","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_frequencies.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get HLA frequences from Allele Frequency Net Database (AFND) — hla_frequencies","text":"","code":"# \\donttest{ hla_frequencies() #> # A tibble: 123,502 × 7 #> group gene allele population indivs_over_n alleles_over_2n n #> #> 1 hla A A*01:01 Argentina Rosario To… 15.1 0.076 86 #> 2 hla A A*01:01 Armenia combined Reg… NA 0.125 100 #> 3 hla A A*01:01 Australia Cape York … NA 0.053 103 #> 4 hla A A*01:01 Australia Groote Eyl… NA 0.027 75 #> 5 hla A A*01:01 Australia New South … NA 0.187 134 #> 6 hla A A*01:01 Australia Yuendumu A… NA 0.008 191 #> 7 hla A A*01:01 Austria 27 0.146 200 #> 8 hla A A*01:01 Azores Central Islan… NA 0.08 59 #> 9 hla A A*01:01 Azores Oriental Isla… NA 0.115 43 #> 10 hla A A*01:01 Azores Terceira Isla… NA 0.109 130 #> # ℹ 123,492 more rows # }"},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":null,"dir":"Reference","previous_headings":"","what":"Get HLA gene names from IMGTHLA — hla_genes","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"Retrieve list txt files github.com/ANHIG/IMGTHLA/alignments return list gene names derived file names.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"","code":"hla_genes(release = \"latest\", overwrite = FALSE, verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"release Default \"latest\". release name like \"3.51.0\". overwrite Overwrite existing genes.json file new one GitHub verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"tibble two columns: HLA gene names (\"\", \"DRB1\") types (\"nuc\", \"gen\", \"prot\").","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hla_genes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get HLA gene names from IMGTHLA — hla_genes","text":"","code":"# \\donttest{ hla_genes() #> # A tibble: 107 × 2 #> gene type #> #> 1 A gen #> 2 A nuc #> 3 A prot #> 4 B gen #> 5 B nuc #> 6 B prot #> 7 C gen #> 8 C nuc #> 9 C prot #> 10 DMA gen #> # ℹ 97 more rows # }"},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the names of releases from IMGTHLA — hla_releases","title":"Get the names of releases from IMGTHLA — hla_releases","text":"Get tags github.com/ANHIG/IMGTHLA, save file called tags.json getOption(\"hlabud_dir\"), return release names file.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the names of releases from IMGTHLA — hla_releases","text":"","code":"hla_releases(overwrite = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the names of releases from IMGTHLA — hla_releases","text":"overwrite Overwrite existing tags.json file getOption(\"hlabud_dir\")","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the names of releases from IMGTHLA — hla_releases","text":"character vector release names like \"3.51.0\"","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get the names of releases from IMGTHLA — hla_releases","text":"tags.json file automatically overwritten older 24 hours.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/hla_releases.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the names of releases from IMGTHLA — hla_releases","text":"","code":"# \\donttest{ hla_releases() #> [1] \"3.56.0\" \"3.55.0\" \"3.54.0\" \"3.53.0\" \"3.52.0\" \"3.51.0\" #> [7] \"3.50.0\" \"3.49.0\" \"3.48.0\" \"3.47.0\" \"3.46.0\" \"3.45.1\" #> [13] \"3.45.01\" \"3.45.0.1\" \"3.45.0\" \"3.44.1\" \"3.44.0\" \"3.43.0\" #> [19] \"3.42.0\" \"3.41.2\" \"3.41.0\" \"3.40.0\" \"3.39.0\" \"3.38.0\" #> [25] \"3.37.0\" \"3.36.0\" \"3.35.0\" \"3.34.0\" \"3.33.0\" \"3.32.0\" # }"},{"path":"https://slowkow.github.io/hlabud/reference/hlabud-package.html","id":null,"dir":"Reference","previous_headings":"","what":"hlabud: Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA — hlabud-package","title":"hlabud: Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA — hlabud-package","text":"Fetch sequence alignment data IMGTHLA database Robinson et al (2020) doi:10.1093/nar/gkz950 , automatically convert sequence alignments convenient R matrices ready downstream analysis. vignette shows examples using one-hot encoding data logistic regression dimensionality reduction. Data downloaded lazily, -needed, cached user-configurable folder.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/hlabud-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"hlabud: Methods for Access and Analysis of the Human Leukocyte Antigen (HLA) Gene Sequence Alignments from IMGTHLA — hlabud-package","text":"Maintainer: Kamil Slowikowski kslowikowski@gmail.com (ORCID)","code":""},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":null,"dir":"Reference","previous_headings":"","what":"Download and unpack a tarball release from IMGTHLA — install_hla","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"release tarball Github unpacked getOption(\"hlabud_dir\") folder.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"","code":"install_hla(release = \"latest\", overwrite = FALSE, verbose = FALSE)"},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"release Default \"latest\". release name like \"3.51.0\". overwrite TRUE, overwrite existing files release folder. verbose TRUE, print messages along way.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"Note latest releases 100 MB size, download might take slow connections.","code":""},{"path":[]},{"path":"https://slowkow.github.io/hlabud/reference/install_hla.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Download and unpack a tarball release from IMGTHLA — install_hla","text":"","code":"if (FALSE) { install_hla() install_hla(\"3.51.0\") install_hla(\"3.51.0\", verbose = TRUE) # Change the install directory options(hlabud_dir = \"path/to/my/dir\") install_hla() }"},{"path":"https://slowkow.github.io/hlabud/reference/one_to_three.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert one letter amino acid codes to three letter amino acid codes — one_to_three","title":"Convert one letter amino acid codes to three letter amino acid codes — one_to_three","text":"Convert one letter amino acid codes three letter amino acid codes","code":""},{"path":"https://slowkow.github.io/hlabud/reference/one_to_three.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert one letter amino acid codes to three letter amino acid codes — one_to_three","text":"","code":"one_to_three(aminos)"},{"path":"https://slowkow.github.io/hlabud/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe operator — %>%","title":"Pipe operator — %>%","text":"See magrittr::%>% details.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/pipe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pipe operator — %>%","text":"lhs value magrittr placeholder. rhs function call using magrittr semantics.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/pipe.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Pipe operator — %>%","text":"result calling rhs(lhs).","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":null,"dir":"Reference","previous_headings":"","what":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"function reads txt files provided IMGTHLA.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"","code":"read_alignments(file)"},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"file File name txt file IMGTHLA like \"DQB1_prot.txt\"","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"list character vector called sequences two matrices alleles onehot. matrix alleles one row allele, one column position, values representing residues position allele. matrix onehot one-hot encoding variants distinguish alleles, one row allele one column amino acid position.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"Consider using hla_alignments() instead function. already txt file want read, can read read_alignments(\"myfile.txt\"). sequences contained file: {gene}_prot.txt amino acid sequence HLA allele. {gene}_nuc.txt nucleotide sequence exons. {gene}_gen.txt genomic sequence exons introns.","code":""},{"path":"https://slowkow.github.io/hlabud/reference/read_alignments.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read an alignment file *_(nuc|gen|prot).txt from IMGTHLA — read_alignments","text":"","code":"my_file <- file.path( \"https://github.com/ANHIG/IMGTHLA/raw\", \"5f2c562056f8ffa89aeea0631f2a52300ee0de17\", \"alignments/DRB1_prot.txt\" ) a <- read_alignments(my_file) head(a$sequences) #> DRB1*01:01:01:01 #> \"MVCLKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVR.LLERCIYNQEE.SVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRR.VEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSKMLSGVGGFVLGLLFLGAGLFIYFRNQKGHSGLQPTGFLS\" #> DRB1*01:01:01:02 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:03 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:04 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:05 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" #> DRB1*01:01:01:06 #> \"------------------------------------------------------.-----------.----------------------------------------------------------.-----------------------------------------------------------------------------------------------------------------------------------------------\" a$alleles[1:5,1:5] #> n29 n28 n27 n26 n25 #> DRB1*01:01:01:01 \"M\" \"V\" \"C\" \"L\" \"K\" #> DRB1*01:01:01:02 \"M\" \"V\" \"C\" \"L\" \"K\" #> DRB1*01:01:01:03 \"M\" \"V\" \"C\" \"L\" \"K\" #> DRB1*01:01:01:04 \"M\" \"V\" \"C\" \"L\" \"K\" #> DRB1*01:01:01:05 \"M\" \"V\" \"C\" \"L\" \"K\" a$onehot[1:5,1:5] #> n29unk Mn29 n28unk Vn28 n27unk #> DRB1*01:01:01:01 0 1 0 1 0 #> DRB1*01:01:01:02 0 1 0 1 0 #> DRB1*01:01:01:03 0 1 0 1 0 #> DRB1*01:01:01:04 0 1 0 1 0 #> DRB1*01:01:01:05 0 1 0 1 0"},{"path":[]},{"path":"https://slowkow.github.io/hlabud/news/index.html","id":"bug-fixes-2-0-0","dir":"Changelog","previous_headings":"","what":"Bug fixes","title":"hlabud 2.0.0","text":"Fix incorrect position numbering, accounting insertions deletions indicated “.” character. Thanks Vinicius Stelet bringing attention issue #3. Instead discarding positions *, include label unk, example pos241_unk indicates unknown amino acid position 241. Thanks Sreekar Mantena reporting issue! Fix --one error. example, HLA-pos361_- colnames($onehot) reference allele instead -. now fixed. Thanks Sreekar Mantena reporting issue!","code":""},{"path":"https://slowkow.github.io/hlabud/news/index.html","id":"changes-2-0-0","dir":"Changelog","previous_headings":"","what":"Changes","title":"hlabud 2.0.0","text":"Change position names pos21_D D21. negative, posn21_D Dn21. Change dosage() take one-hot matrix first argument. Change dosage() return full allele names IMGT matching partial allele names like DRB1*03 DRB1*03:01. show messages indicating alleles matched verbose=TRUE. Automatically overwrite {hlabud_dir}/alleles.json older 24 hours. Automatically overwrite {hlabud_dir}/tags.json older 24 hours.","code":""},{"path":"https://slowkow.github.io/hlabud/news/index.html","id":"hlabud-100","dir":"Changelog","previous_headings":"","what":"hlabud 1.0.0","title":"hlabud 1.0.0","text":"Initial release. Added NEWS.md file track changes package.","code":""}]