-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't return CSS files as results #147
Comments
@dev-nicolaos Many thanks for spotting this issue and taking the time to report it. I've modified the search query to restrict results to the following content_types: i.e. only HTML and plain text results will be shown, so no CSS, JS, JSON, XML, RSS or any kind of binary files. For reference, the search index currently contains the following content_types: There may be a case for blocking indexing of anything that isn't text/html or text/plain (or application/json, text/xml, application/rss+xml, application/xml, application/xhtml+xml, application/atom+xml and application/feed+json because these may contain useful data) to leave more space in the index for useful content - if I decide to do this I'll log another ticket. For reference, there are a couple of previous modifications to what type of content is shown:
And there's a related open ticket:
|
Now deployed. A search like https://searchmysite.net/search/?q=box+shadow+100%25+height still looks slightly problematic at first because most of the content snippets on the results page are just CSS, but it is better because (a) each result has a proper title, and (b) clicking into each result gets to a web page which includes CSS snippets in the content, i.e. expected behaviour. |
Additional info: of the 101,032 pages currently in the system, 102, i.e. around 0.1%, don't have a content_type set at all, so none of those 102 pages will be returned via the new filter. That's a small percentage, and looking at some of the pages they're not likely to be useful search results, e.g. .opml, .webmanifest, .v, .ly, .py, .pub, .gpg, .xml, .awk, .bib, .hex etc. files, so not an issue. I think that does add weight to the case for cleaning up the index, by only indexing known content types. I've raised #149 for this. |
It appears searchmysite is not filtering out CSS files from its results. This means if you perform a search that is filled with CSS keywords and values, a lot of the results page is just a bunch of random CSS files with no title.
Example: https://searchmysite.net/search/?q=box+shadow+100%25+height
The text was updated successfully, but these errors were encountered: