forked from dragonman225/notablog-starter
-
Notifications
You must be signed in to change notification settings - Fork 0
/
a-python-multi-input-dictionary.html
137 lines (92 loc) · 16.9 KB
/
a-python-multi-input-dictionary.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- iOS Safari -->
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<!-- Chrome, Firefox OS and Opera Status Bar Color -->
<meta name="theme-color" content="#FFFFFF">
<link rel="stylesheet" type="text/css" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.11.1/katex.min.css">
<link rel="stylesheet" type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.19.0/themes/prism.min.css">
<link rel="stylesheet" type="text/css" href="css/SourceSansPro.css">
<link rel="stylesheet" type="text/css" href="css/theme.css">
<link rel="stylesheet" type="text/css" href="css/notablog.css">
<!-- Favicon -->
<link rel="shortcut icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text text-anchor=%22middle%22 dominant-baseline=%22middle%22 x=%2250%22 y=%2255%22 font-size=%2280%22>📼</text></svg>">
<style>
:root {
font-size: 20px;
}
</style>
<title>Extracting Meanings of List of Words From A Website URL (Python Webscraping) | Projects</title>
<meta property="og:type" content="blog">
<meta property="og:title" content="Extracting Meanings of List of Words From A Website URL (Python Webscraping)">
<meta name="description" content="One day, I was preparing for the SAT, specifically the vocabulary. I wanted to memorize several hundred words (500 to be exact). However, a majority of the words I don’t know the meanings of, not even the roots of the words. But, when I tried to find the definitions, I had to look for the menings of the words one by one. To look for the meaning of words a hundred times by searching up the words a hundred times is very tedious. The general process is that I have to search up a word, click a link to a website, copy the definition and input it back to excel. Then I have to repeat the process all over again. If I can automate it, I can do it for as many as words as possible So, I had an idea. Why not put all the words in excel, input all those words into a python program, and the program will output the definitions of those words all at once? With this in mind, I created a function that takes a list of words, extracts the definitions, and puts all the definitions back into excel (Explained in more depth below). I encourage you to use this program or you can modify it as well.">
<meta property="og:description" content="One day, I was preparing for the SAT, specifically the vocabulary. I wanted to memorize several hundred words (500 to be exact). However, a majority of the words I don’t know the meanings of, not even the roots of the words. But, when I tried to find the definitions, I had to look for the menings of the words one by one. To look for the meaning of words a hundred times by searching up the words a hundred times is very tedious. The general process is that I have to search up a word, click a link to a website, copy the definition and input it back to excel. Then I have to repeat the process all over again. If I can automate it, I can do it for as many as words as possible So, I had an idea. Why not put all the words in excel, input all those words into a python program, and the program will output the definitions of those words all at once? With this in mind, I created a function that takes a list of words, extracts the definitions, and puts all the definitions back into excel (Explained in more depth below). I encourage you to use this program or you can modify it as well.">
<meta property="og:image" content="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text text-anchor=%22middle%22 dominant-baseline=%22middle%22 x=%2250%22 y=%2255%22 font-size=%2280%22>📖</text></svg>">
<style>
.DateTagBar {
margin-top: 1.0rem;
}
</style>
</head>
<body>
<nav class="Navbar">
<a href="index.html">
<div class="Navbar__Btn">
<span><img class="inline-img-icon" src="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text text-anchor=%22middle%22 dominant-baseline=%22middle%22 x=%2250%22 y=%2255%22 font-size=%2280%22>📼</text></svg>"></span>
<span>Home</span>
</div>
</a>
<span class="Navbar__Delim">·</span>
<a href="about.html">
<div class="Navbar__Btn">
<span><img class="inline-img-icon" src="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text text-anchor=%22middle%22 dominant-baseline=%22middle%22 x=%2250%22 y=%2255%22 font-size=%2280%22>🕵️</text></svg>"></span>
<span>About</span>
</div>
</a>
</nav>
<header class="Header">
<div class="Header__Cover">
<img src="https://images.unsplash.com/photo-1543165796-5426273eaab3?ixlib=rb-1.2.1&q=80&cs=tinysrgb&fm=jpg&crop=entropy">
</div>
<div class="Header__Spacer ">
</div>
<div class="Header__Icon">
<span><img class="inline-img-icon" src="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text text-anchor=%22middle%22 dominant-baseline=%22middle%22 x=%2250%22 y=%2255%22 font-size=%2280%22>📖</text></svg>"></span>
</div>
<h1 class="Header__Title">Extracting Meanings of List of Words From A Website URL (Python Webscraping)</h1>
<div class="DateTagBar">
<span class="DateTagBar__Item DateTagBar__Date">Posted on Mon, Jul 25, 2022</span>
<span class="DateTagBar__Item DateTagBar__Tag DateTagBar__Tag--red">
<a href="tag/Python.html">Python</a>
</span>
<span class="DateTagBar__Item DateTagBar__Tag DateTagBar__Tag--red">
<a href="tag/Software.html">Software</a>
</span>
</div>
</header>
<article id="https://www.notion.so/0c3190c88fd6421da3d980ed433d8cf7" class="PageRoot"><div id="https://www.notion.so/44522da6b5874eaa806de6cc06883b4c" class="ColorfulBlock ColorfulBlock--BgGray Callout"><div class="Callout__Icon"><div class="Icon">🔗</div></div><p class="Callout__Content"><span class="SemanticStringArray"><span class="SemanticString">Check the </span><span class="SemanticString"><a class="SemanticString__Fragment SemanticString__Fragment--Link" href="https://github.com/rishisim/Extracting-Meanings-of-List-of-Words-From-A-Website-URL-Python-Webscraping">source code</a></span><span class="SemanticString"> of the program on GitHub. The following details will be based on the code. </span></span></p></div><h1 id="https://www.notion.so/ac19ca45ea22498a81cbee7e2d4e31ef" class="ColorfulBlock ColorfulBlock--ColorDefault Heading Heading--1"><a class="Anchor" href="#https://www.notion.so/ac19ca45ea22498a81cbee7e2d4e31ef"><svg width="16" height="16" viewBox="0 0 16 16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><span class="SemanticStringArray"><span class="SemanticString">Introduction</span></span></h1><div id="https://www.notion.so/1d746cbc5ca64f028076af8a1cce4b6a" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"><span class="SemanticString">One day, I was preparing for the SAT, specifically the vocabulary. I wanted to memorize several hundred words (500 to be exact). However, a majority of the words I don’t know the meanings of, not even the roots of the words. But, when I tried to find the definitions, I had to look for the menings of the words one by one. To look for the meaning of words a hundred times by searching up the words a hundred times is very tedious. The general process is that I have to search up a word, click a link to a website, copy the definition and input it back to excel. Then I have to repeat the process all over again. If I can automate it, I can do it for as many as words as possible So, I had an idea. Why not put all the words in excel, input all those words into a python program, and the program will output the definitions of those words all at once? With this in mind, I created a function that takes a list of words, extracts the definitions, and puts all the definitions back into excel (Explained in more depth below). I encourage you to use this program or you can modify it as well.</span></span></p></div><h1 id="https://www.notion.so/eadf2523e1744f9996428db5634dd931" class="ColorfulBlock ColorfulBlock--ColorDefault Heading Heading--1"><a class="Anchor" href="#https://www.notion.so/eadf2523e1744f9996428db5634dd931"><svg width="16" height="16" viewBox="0 0 16 16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><span class="SemanticStringArray"><span class="SemanticString">Overview - A Python Multiple Input Dictionary</span></span></h1><div id="https://www.notion.so/3d1d803d63464f85b6b5f95807285a0e" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"><span class="SemanticString">A basic Python Dictionary in JupyterLab that can take multiple words and output the definitions of those words at once. </span></span></p></div><div id="https://www.notion.so/64d1e93d84194eec8227c73ccb5a46cd" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"></span></p></div><div id="https://www.notion.so/e1d1e08c025f4c3ab49dc08a6e9c1fae" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"><span class="SemanticString">The dictionary was created using python pandas, the BeautifulSoup library, and the website: </span><span class="SemanticString"><a class="SemanticString__Fragment SemanticString__Fragment--Link" href="http://www.dictionary.com/">www.dictionary.com</a></span><span class="SemanticString">. The code webscrapes a modified URL using the word parameter of the function </span><span class="SemanticString"><code class="SemanticString__Fragment SemanticString__Fragment--Code">ggl_search()</code></span><span class="SemanticString">. From html of the URL, the function extracts the first definition typed in the website and outputs it as a string.</span></span></p></div><div id="https://www.notion.so/b56a819925e24761a7a9266b7fb4d369" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"><span class="SemanticString">Below is a code snippet of the function.</span></span></p></div><pre id="https://www.notion.so/d6294ac25e164eacb483f699fe591663" class="Code Code--NoWrap"><code><span class="SemanticStringArray"><span class="SemanticString"><span><span class="token comment"># A function that searches the word the user wants a definition for.</span>
<span class="token keyword">def</span> <span class="token function">ggl_search</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># Using the urlib module, a url is created based on the word </span>
<span class="token comment"># parameter inputted into the function.</span>
search_url <span class="token operator">=</span> <span class="token string">"https://www.dictionary.com/browse/{}"</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>urllib<span class="token punctuation">.</span>parse<span class="token punctuation">.</span>quote_plus<span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">,</span> safe<span class="token operator">=</span><span class="token string">'/'</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token comment"># Reads the site from the url.</span>
google_request <span class="token operator">=</span> requests<span class="token punctuation">.</span>get<span class="token punctuation">(</span>search_url<span class="token punctuation">)</span>
<span class="token comment"># Parses the html.</span>
soup <span class="token operator">=</span> BeautifulSoup<span class="token punctuation">(</span>google_request<span class="token punctuation">.</span>text<span class="token punctuation">,</span> <span class="token string">"html.parser"</span><span class="token punctuation">)</span>
<span class="token comment"># Extracts the definition of the word.</span>
results <span class="token operator">=</span> soup<span class="token punctuation">.</span>find<span class="token punctuation">(</span><span class="token string">'div'</span><span class="token punctuation">,</span> attrs <span class="token operator">=</span> <span class="token punctuation">{</span><span class="token string">'value'</span><span class="token punctuation">:</span><span class="token string">'1'</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">.</span>text
<span class="token keyword">return</span> results</span></span></span></code></pre><div id="https://www.notion.so/bb15ca8eb5484fb3aa04175eff5cf45a" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"></span></p></div><div id="https://www.notion.so/28e9e14e55c14704a20d2c22b6e3a674" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"><span class="SemanticString">In order to get the definition of a word, the function will be typed in the following way: </span><span class="SemanticString"><code class="SemanticString__Fragment SemanticString__Fragment--Code">ggl_search("word")</code></span></span></p></div><div id="https://www.notion.so/c1c64975c53a4d5586c0a370b914192b" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"></span></p></div><div id="https://www.notion.so/9bfafa17c20949a8bb9c18887c7ed7c3" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"><span class="SemanticString">The dictionary can also output multiple single definitions of several words at once using lists. The words can come from a csv file (like in this repository) or they can come from a simple python list.</span></span></p></div><div id="https://www.notion.so/82e8d678188e44e098607426989cbee3" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"></span></p></div><div id="https://www.notion.so/d701cfefa29d44178cd156f725b31509" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"><span class="SemanticString">In both cases, the dictionary outputs the word and definition in a DataFrame. Below is the DataFrame output if the code in the repository is run.</span></span></p></div><div id="https://www.notion.so/85b1f12b6c854ca8a701d2fb15bf9312" class="Image Image--PageWidth"><figure><a href="https://www.notion.so/signed/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F5518cb34-9aaa-4538-ba12-6bde4edd81c8%2FUntitled.png?width=720&table=block&id=85b1f12b-6c85-4ca8-a701-d2fb15bf9312"><img src="https://www.notion.so/signed/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F5518cb34-9aaa-4538-ba12-6bde4edd81c8%2FUntitled.png?width=720&table=block&id=85b1f12b-6c85-4ca8-a701-d2fb15bf9312" style="width:100%"/></a><figcaption><span class="SemanticStringArray"></span></figcaption></figure></div><div id="https://www.notion.so/037285ee3b754d0db0050f5bd3a5a6cd" class="ColorfulBlock ColorfulBlock--ColorDefault Text"><p class="Text__Content"><span class="SemanticStringArray"></span></p></div></article>
<footer class="Footer">
<div>© Projects 2022</div>
<div>·</div>
<div>Powered by <a href="https://github.com/dragonman225/notablog" target="_blank"
rel="noopener noreferrer">Notablog</a>.
</div>
</footer>
</body>
</html>