A React + Express app that takes any public website URL and deconstructs it to:
Display URL components (protocol, hostname, path, query, etc.)
Crawl the site and list all discovered internal sub-URLs
Visually show the loading progress and errors (if any)
- ✅ Real-time URL parsing with validation
- ✅ Sub-URL crawling via backend using axios + cheerio
- ✅ Friendly UI with status handling and live updates
- ✅ Proxy setup for smooth React ↔ Express communication
- ✅ Fully working CORS & environment-based porting
Layer | Tech |
---|---|
Frontend | React + TypeScript |
Backend | Express + Axios + Cheerio |
Styling | CSS Modules (vanilla) |
Tooling | Concurrently, Dotenv |
- Node.js installed
- IDE (VSCode Recommended)
-
Clone the repo via
git clone [email protected]:mahtabnejad90/web-url-deconstructor.git
-
Redirect your terminal path into the clone repo from step 1
cd url-deconstructor
for the rest of the steps -
Enter
npm install
-
Run the frontend and backend concurrently via
npm run dev
-
Type in a valid full URL (e.g. https://news.ycombinator.com)
-
Hit Crawl URL
-
View:
-
Deconstructed components of the URL
-
Discovered sub-URLs in a scrollable list
-
Try crawling:
- ✅ https://news.ycombinator.com
- ✅ https://en.wikipedia.org/wiki/Web_crawler
- ✅ https://playwright.dev
⚠️ Avoid: https://example.com (too simple), JavaScript-heavy sites
- Add E2E testing via Playwright
- Add performance metrics
- Add Depth-Level indicator
- Export discovered links to CSV
- Filter links by type (internal/external)
- Deploy on Vercel + Render for frontend/backend
Built by Mahtab 🐧👩🏻💻