You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been parsing my own robots.txt file with Python and found an interesting compatibility scenario:
If you create multiple Robot records with the same user-agent, they are spaced apart by a blank line, causing Python's RobotFileParser to miss subsequent lines if you read it in. I'm looking at Robots v3 and Python 3.5. Is this something you'd want to change or document?
It seems that when generating the robots.txt file using django-robots, it's possible to create multiple Disallow directives for the same user agent, which can result in a blank line between the directives in the generated file. This can cause compatibility issues with Python's RobotFileParser module, which may miss subsequent lines if the file is read in.
To work around this issue, you can create a single Disallow directive with multiple paths specified, as you mentioned. Alternatively, you can modify the get_robots_txt() function in django-robots to generate the robots.txt file in a format that is compatible with RobotFileParser. For example, you can modify the function to generate the Disallow directives on a single line separated by commas:
I've been parsing my own robots.txt file with Python and found an interesting compatibility scenario:
If you create multiple Robot records with the same user-agent, they are spaced apart by a blank line, causing Python's
RobotFileParser
to miss subsequent lines if you read it in. I'm looking at Robots v3 and Python 3.5. Is this something you'd want to change or document?https://github.com/python/cpython/blob/3.5/Lib/urllib/robotparser.py
Example robots.txt generated:
The work-around is simple -- you create a single Robot record with both rules so that robots.txt has no blank line:
To reproduce:
The text was updated successfully, but these errors were encountered: