-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#93 Change xpath based on 'changed tvtropes' #94
Conversation
process_redirections(session, url, final_url, namespace, name) | ||
return True, tree, namespace, name, final_url | ||
else: | ||
(namespace, name) = map(lambda x: x.strip(), name.split(':')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems pretty complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is code is exactly same with below code.
namespace = name.split(':')[0].strip()
name = name.split(':')[1].strip()
or
(namespace, name) = name.split(':')
namespace = namespace.strip()
name = name.strip()
or
(namespace,name)=name.replace(r'₩s','').split(':')
Do you think that it is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add a variable.
in line 137,
tvtropes_title = tree.xpath(name_path)[0].text.strip()
So I can write code like that
namespace = name.split(':')[0].strip()
name = name.split(':')[1].strip()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
namespace, name = re.match(r'^\s*([^:]+?)\s*:\s*(.+?)\s*$', name).groups()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good. But before apply, you must complie to use.
name_pattern = re.compile(r'^\s*([^:]+?)\s*:\s*(.+?)\s*$')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are available ideas. I will try it.
If you merge this pr, issue #93 will be resolved. |
1 similar comment
It should be reviewed by @tkiapril. |
@@ -133,23 +133,20 @@ def fetch_link(url, session, *, log_prefix=''): | |||
return False, None, None, None, final_url | |||
tree = document_fromstring(r.text) | |||
try: | |||
namespace = tree.xpath('//div[@class="pagetitle"]')[0] \ | |||
.text.strip()[:-1] | |||
name = (tree.find_class('article_title')[0]).text_content() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? I inspected tvtropes, and it seems the structure is like below:
<div class="pagetitle">
<div class="article_title"><h1><span>Home Page</span></h1></div>
</div>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works.
Sorry for being late for reviewing, I was like out of service for a few days after my trip. |
#93 Change xpath based on 'changed tvtropes'
I changed what to patching html, because tvtropes.org has been changed a lot.