This Python project scrapes data from Wikipedia based on a user-specified topic to create text file.
Web scraping is a technique used to extract data from websites. In this project, we focus on scraping data from Wikipedia articles. Users provide a topic name, and the program fetches relevant content from the corresponding Wikipedia page.
- Python: Main programming language for the project.
- Requests: Used for making HTTP requests to fetch web pages.
- BeautifulSoup: A Python library for parsing HTML and XML documents.
- User Input: Prompts the user to enter a topic name.
- Web Scraping:
- Utilizes
requests
library to fetch the Wikipedia article URL based on the user-provided topic. - Employs BeautifulSoup (
bs4
) to parse the HTML content of the retrieved webpage.
- Utilizes
- Data Extraction:
- Iterates through paragraphs (
<p>
) tags to accumulate factual content. - Removes unnecessary elements like bracketed references ([1], [2], etc.).
- Iterates through paragraphs (
- File Creation:
- Saves the extracted data (article title and factual content) into a text file named after the Wikipedia article title with a
.txt
extension. - Uses UTF-8 encoding for broader character compatibility.
- Saves the extracted data (article title and factual content) into a text file named after the Wikipedia article title with a
- Run the Python script.
- Input the desired topic name when prompted.
- The program fetches data from the Wikipedia page related to the topic and saves it in a text file.
Suppose the user enters "Artificial Intelligence" as the topic. The program will fetch data from the Wikipedia page on Artificial Intelligence and save it in a text file.