We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
以前运行的时候都没有错误,错误是出现在 AmazonCrawler.java的Document doc = Jsoup.connect(url).headers(headers).userAgent(USER_AGENT).timeout(100000).get(); 下面是exception log和我的代码
AmazonCrawler.java
Document doc = Jsoup.connect(url).headers(headers).userAgent(USER_AGENT).timeout(100000).get();
Exception in thread "main" java.lang.IllegalArgumentException: String must not be empty at org.jsoup.helper.Validate.notEmpty(Validate.java:92) at org.jsoup.nodes.Attribute.setKey(Attribute.java:51) at org.jsoup.parser.ParseSettings.normalizeAttributes(ParseSettings.java:54) at org.jsoup.parser.HtmlTreeBuilder.insert(HtmlTreeBuilder.java:185) at org.jsoup.parser.HtmlTreeBuilderState$7.process(HtmlTreeBuilderState.java:553) at org.jsoup.parser.HtmlTreeBuilder.process(HtmlTreeBuilder.java:113) at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:50) at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:43) at org.jsoup.parser.HtmlTreeBuilder.parse(HtmlTreeBuilder.java:56) at org.jsoup.parser.Parser.parseInput(Parser.java:32) at org.jsoup.helper.DataUtil.parseByteData(DataUtil.java:136) at org.jsoup.helper.HttpConnection$Response.parse(HttpConnection.java:666) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:225) at io.bittiger.crawler.AmazonCrawler.GetAdBasicInfoByQuery(AmazonCrawler.java:167) at io.bittiger.crawler.CrawlerMain.main(CrawlerMain.java:54)
The text was updated successfully, but these errors were encountered:
@xiayank
这个exception从JSoup parser内部出来的,可能Amazon返回了50X网页。 如果这个错误不是每次都出现,试试用try-catch包起来,然后直接忽略?
try-catch
Sorry, something went wrong.
print url and check what's the value of url
No branches or pull requests
以前运行的时候都没有错误,错误是出现在
AmazonCrawler.java
的Document doc = Jsoup.connect(url).headers(headers).userAgent(USER_AGENT).timeout(100000).get();
下面是exception log和我的代码
The text was updated successfully, but these errors were encountered: