-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
2,022 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
--- | ||
title: "Regex_정규표현식" | ||
author: "SangminJe" | ||
date: '2021 6 19 ' | ||
output: | ||
html_document: | ||
number_sections: true | ||
fig_caption: true | ||
toc: true | ||
fig_width: 5 | ||
fig_height: 4 | ||
theme: cosmo | ||
highlight: tango | ||
code_folding: show | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE, | ||
fig.align = "center") | ||
``` | ||
|
||
# REGEX {.tabset .tabset-fade} | ||
|
||
예시문 | ||
Hi there, Nice to meet you. And Hello there and hi. | ||
I love grey(gray) color not a gry, graay and graaay. | ||
Ya ya YaYaYa Ya | ||
abcdefghijklmnopqrstuvwxyz | ||
ABSCEFGHIJKLMNOPQRSTUVWZYZ | ||
1234567890 | ||
.[]{}()\^$|?*+ | ||
010-898-0893 | ||
010-405-3412 | ||
02-878-8888 | ||
[email protected] | ||
[email protected] | ||
[email protected] | ||
https://www.youtu.be/-ZClicWm0zM | ||
https://youtu.be/-ZClicWm0zM | ||
youtu.be/-ZClicWm0zM | ||
|
||
- [정규표현식 연습링크](https://regexr.com/5mhou) | ||
- [엘리의 드림코딩 - 정규표현식](https://www.youtube.com/watch?v=t3M6toIflyQ) | ||
|
||
|
||
## Groups and ranges | ||
|
||
- **\|** : OR | ||
- Hi|Hello : Hi나 Hello에 매칭되는 문자열을 반환 | ||
- **\(\)** : 그룹 | ||
- (Hi|Hello)|(And) :Hi와 Hello는 Group1에 매칭되며, And는 Group2에 매칭됨 | ||
- gr(e|a)y : grey나 gray 모두 매칭되어 반환 | ||
- gr(?:e|a)y : 매칭은 되지만 Group1, Group2는 지정이 되지 않음 | ||
- **\[\]** 괄호 안의 어떤 문자든 반환 | ||
- gr[abcdef]y / gr[a-f]y : 대괄호 안의 모든 문자열을 반환한다 | ||
- [a-zA-Z0-9] : a~z 까지의 문자와, A~Z까지의 문자와 0~9가 들어간 모든 숫자를 반환 | ||
- **\^** : NOT | ||
- [^a-zA-Z0-9] : 영어와 숫자 빼고 모든 문자를 반환 | ||
|
||
## Quantifier | ||
|
||
- **\?** : 없거나 있거나 | ||
- gra?y : gray, gry 모두 반환 | ||
- **\*** : 없거나 있거나 많거나 | ||
- gra*y : gray, graaay, gry 모두 반환 | ||
- **\+** : 한개 이상 | ||
- gra+y : gray, graaay만 추가 | ||
- **\{n\}** : n개만 특정해서 반환 | ||
- gra{2}y : graay만 반환 | ||
- **{min,max}** : min ~ max개 사이의 문자열을 반환 | ||
- gra{2,3}y : graay, graaay를 반환 | ||
- gra{2,}y : graay, graaay를 반환 (최소값만 지정하여 2개 이상인 것들을 반환) | ||
|
||
## Boundary type | ||
|
||
- **\\b** : 단어 경계 | ||
- \\bYa : 단어 앞에서 쓰이는 Ya를 반환함 | ||
- Ya\\b : 단어 뒤에서만 쓰이는 Ya를 반환 | ||
- **\\B** : 단어 경계의 반대 | ||
- \\BYa : 단어 앞에서 쓰이는 Ya는 빼고 반환함 | ||
- Ya\\B : 단어 뒤에서만 쓰이는 Ya는 빼고 반환 | ||
- **\^,\$** : 문장 경계 | ||
- ^Ya : 문장에서 시작하는 Ya 반환 | ||
- Ya$ : 문장에서 끝나는 Ya 반환 | ||
|
||
## Character Classes | ||
|
||
- **.** : 모든 문자열 | ||
- . : 모든 문자열을 선택 가능 | ||
- **\\** : Escape | ||
- **\\d** : 숫자를 모두 반환 | ||
- **\\D** : 숫자를 제외하고 모두 반환 | ||
- **\\w** : 모든 word | ||
- **\\W** : 모든 문자를 제외하고 반환 | ||
- **\\s** : space 반환 | ||
- **\\S** : space를 제외하고 반환 | ||
|
||
|
||
## Exercise | ||
|
||
```{r} | ||
library(tidyverse) | ||
``` | ||
|
||
|
||
1. 전화번호만 선택하기 | ||
|
||
```{r} | ||
sentence <- "Hi there, Nice to meet you. And Hello there and hi. I love grey(gray) color not a gry, graay and graaay. Ya ya YaYaYa Ya .[]{}()^$|?*+ 010-898-0893 010-405-3412 02-878-8888 [email protected] [email protected] [email protected] https://www.youtu.be/-ZClicWm0zM https://youtu.be/-ZClicWm0zM youtu.be/-ZClicWm0zM" | ||
print(sentence) | ||
str_extract_all(sentence, "[0-9]+[\\-|\\.][0-9]+[\\-|\\.][0-9]+") %>% unlist() | ||
str_extract_all(sentence, '\\d{2,3}[.-]\\d{3,4}[.-]\\d{3,4}') %>% unlist() | ||
``` | ||
|
||
2. 이메일 주소 가져오기 | ||
|
||
```{r} | ||
str_extract_all(sentence, | ||
"[a-zA-Z0-9._+-]+@[a-zA-Z0-9-_]+\\.[a-zA-Z0-9.]+") %>% | ||
unlist() | ||
``` | ||
|
||
3. 인터넷 주소 가져오기 | ||
|
||
```{r} | ||
# 유튜브 아이디 가져오기 | ||
str_extract_all(sentence, "(https?:\\/\\/)?(www.)?youtu.be\\/([a-zA-Z0-9-]+)") | ||
# 뒤에 유튜브 ID만 추출 | ||
str_match(sentence, "(?:https?:\\/\\/)?(?:www.)?youtu.be\\/([a-zA-Z0-9-]+)") %>% .[2] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
일자,중식계,석식계 | ||
2021-01-27,0,0 | ||
2021-01-28,0,0 | ||
2021-01-29,0,0 | ||
2021-02-01,0,0 | ||
2021-02-02,0,0 | ||
2021-02-03,0,0 | ||
2021-02-04,0,0 | ||
2021-02-05,0,0 | ||
2021-02-08,0,0 | ||
2021-02-09,0,0 | ||
2021-02-10,0,0 | ||
2021-02-15,0,0 | ||
2021-02-16,0,0 | ||
2021-02-17,0,0 | ||
2021-02-18,0,0 | ||
2021-02-19,0,0 | ||
2021-02-22,0,0 | ||
2021-02-23,0,0 | ||
2021-02-24,0,0 | ||
2021-02-25,0,0 | ||
2021-02-26,0,0 | ||
2021-03-02,0,0 | ||
2021-03-03,0,0 | ||
2021-03-04,0,0 | ||
2021-03-05,0,0 | ||
2021-03-08,0,0 | ||
2021-03-09,0,0 | ||
2021-03-10,0,0 | ||
2021-03-11,0,0 | ||
2021-03-12,0,0 | ||
2021-03-15,0,0 | ||
2021-03-16,0,0 | ||
2021-03-17,0,0 | ||
2021-03-18,0,0 | ||
2021-03-19,0,0 | ||
2021-03-22,0,0 | ||
2021-03-23,0,0 | ||
2021-03-24,0,0 | ||
2021-03-25,0,0 | ||
2021-03-26,0,0 | ||
2021-03-29,0,0 | ||
2021-03-30,0,0 | ||
2021-03-31,0,0 | ||
2021-04-01,0,0 | ||
2021-04-02,0,0 | ||
2021-04-05,0,0 | ||
2021-04-06,0,0 | ||
2021-04-07,0,0 | ||
2021-04-08,0,0 | ||
2021-04-09,0,0 |
Oops, something went wrong.