Skip to content
This repository has been archived by the owner on Jul 15, 2019. It is now read-only.

Seems the HDFS file path contains ':' colon will throw exception #4

Open
stevegy opened this issue Jan 3, 2016 · 1 comment
Open

Comments

@stevegy
Copy link

stevegy commented Jan 3, 2016

I had download this whole source code and built it successfully. When i tried to run a crawl test:
bin/crawl urls/ TestCrawl/ http://localhost:8983/solr/nutch 2
I run into this URI path name issue.
hadoop.log.zip

i have this log file attached. It seems the HDFS file path name special characters issue is still there?

2016-01-03 13:27:08,405 INFO fetcher.Fetcher - Fetcher: starting at 2016-01-03 13:27:08
2016-01-03 13:27:08,405 INFO fetcher.Fetcher - Fetcher: segment: TestCrawl/segments/drwxr-xr-xnn4nstevennstaffnn136nJannn3n13:24n20160103090925
2016-01-03 13:27:08,406 INFO fetcher.Fetcher - Fetcher Timelimit set for : 1451809628406
2016-01-03 13:27:08,631 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-01-03 13:27:08,677 ERROR fetcher.Fetcher - Fetcher: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: drwxr-xr-xnn4nstevennstaffnn136nJannn3n13:24n20160103090925
at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.(Path.java:126)
at org.apache.hadoop.fs.Path.(Path.java:50)

@petarR
Copy link
Contributor

petarR commented Jan 18, 2016

Hi,

You could try the fix given in this thread.

Or simply use the following command to start the crawl:
runtime/local/bin/nutch crawl urls/ -solr http://localhost:8983/solr/ -dir TestCrawl -depth 3 -topN 50

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants