Skip to content

Mongo DB need support for HTTPS #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
butchersoft1 opened this issue Jan 4, 2019 · 3 comments
Open

Mongo DB need support for HTTPS #19

butchersoft1 opened this issue Jan 4, 2019 · 3 comments

Comments

@butchersoft1
Copy link

Hello

Using the Azure Search committer I wanted to use CosmosDB as a URL Store rather than MongoDB.

MongoCrawlDataStoreFactory - looking at code the only reason it does not work is that there is no support for HTTPS connection. If you add this option you should find that it works ok

@essiembre
Copy link
Contributor

A new snapshot of Collector Core was made with these two new configuration options on MongoCrawlDataStoreFactory:

    <sslEnabled>[false|true]</sslEnabled>
    <sslInvalidHostNameAllowed>[false|true]</sslInvalidHostNameAllowed>

Both HTTP and Filesystem Collectors latest snapshots have been updated as well to include the latest Collector Core dependency.

Please give it a try and let me know.

@butchersoft1
Copy link
Author

butchersoft1 commented Jan 9, 2019

Hello

I can confirm that the committer now successfully connects to ComosDB and works for a while but soon starts to fail. Initially I raise the number of RU's to compensate but I even when set to 10,000 (from 1000,2000,5000) eventually the system will not longer feed any records to the DB.

Its possible that the is a limit on connection and its trying to establish a new connection each time without closing the old one and then running into a max limit.

WARN [SLF4JLogger] Got socket exception on connection [connectionId{localValue:5, serverValue:-96182621}] to dev.documents.azure.com:10255. All connections to dev.documents.azure.com:10255 will be closed.
INFO [SLF4JLogger] Closed connection [connectionId{localValue:5, serverValue:-96182621}] to dev.documents.azu
re.com:10255 because there was a socket exception raised by this connection.
INFO [SLF4JLogger] Closed connection [connectionId{localValue:3, serverValue:-265208872}] to dev.documents.az
ure.com:10255 because there was a socket exception raised on another connection from this pool.
INFO [SLF4JLogger] No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description Cl
usterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=dev.docume
nts.azure.com:10255, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
INFO [SLF4JLogger] Closed connection [connectionId{localValue:6, serverValue:116253926}] to dev.documents.azu
re.com:10255 because there was a socket exception raised on another connection from this pool.
INFO [SLF4JLogger] Closed connection [connectionId{localValue:7, serverValue:-1176914507}] to dev.documents.azure.com:10255 because there was a socket exception raised on another connection from this pool.
FATAL [AbstractCrawler$ProcessReferencesRunnable] UTS Crawler Setup: An error occured that could compromise the stabilit
y of the crawler. Stopping excution to avoid further issues...
com.mongodb.MongoSocketWriteException: Exception sending message
at com.mongodb.connection.InternalStreamConnection.translateWriteException(InternalStreamConnection.java:445)
at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:194)
at com.mongodb.connection.UsageTrackingInternalConnection.sendMessage(UsageTrackingInternalConnection.java:90)
at com.mongodb.connection.DefaultConnectionPool$PooledConnection.sendMessage(DefaultConnectionPool.java:427)
at com.mongodb.connection.CommandProtocol.sendMessage(CommandProtocol.java:182)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:104)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:153)
at com.mongodb.operation.FindAndUpdateOperation$1.call(FindAndUpdateOperation.java:335)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:424)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:415)
at com.mongodb.operation.FindAndUpdateOperation.execute(FindAndUpdateOperation.java:331)
at com.mongodb.Mongo.execute(Mongo.java:819)
at com.mongodb.Mongo$2.execute(Mongo.java:802)
at com.mongodb.MongoCollectionImpl.findOneAndUpdate(MongoCollectionImpl.java:435)
at com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataSerializer.getNextQueued(MongoCrawlDataSerial
izer.java:79)
at com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore.nextQueued(MongoCrawlDataStore.java:210
)
at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:406)
at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:820)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketException: Software caused connection abort: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.net.SocketOutputStream.write(Unknown Source)
at sun.security.ssl.OutputRecord.writeBuffer(Unknown Source)
at sun.security.ssl.OutputRecord.write(Unknown Source)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(Unknown Source)
at sun.security.ssl.SSLSocketImpl.writeRecord(Unknown Source)
at sun.security.ssl.AppOutputStream.write(Unknown Source)
at com.mongodb.connection.SocketStream.write(SocketStream.java:74)
at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:191)
... 23 more
INFO [SLF4JLogger] Closed connection [connectionId{localValue:4, serverValue:-1426120187}] to dev.documents.azure.com:10255 because there was a socket exception raised on another connection from this pool.
INFO [SLF4JLogger] Closed connection [connectionId{localValue:8, serverValue:1160019703}] to dev.documents.azure.com:10255 because there was a socket exception raised on another connection from this pool.
INFO [SLF4JLogger] Closed connection [connectionId{localValue:9, serverValue:-823362857}] to dev.documents.azure.com:10255 because there was a socket exception raised on another connection from this pool.
INFO [SLF4JLogger] Closed connection [connectionId{localValue:2, serverValue:-574855167}] to dev.documents.azure.com:10255 because there was a socket exception raised on another connection from this pool.
INFO [CrawlerEventManager] CRAWLER_STOPPING

@butchersoft1
Copy link
Author

Continued from Above - I was able to configure the committer to work with CosmosDB Developer version and hit the same error. By unchecking the "Rate Limiting" feature it appears to be working.

CosmosDB is limited by RU/s. Setting this limit to 1000 all starts working fine for a while, then slowly starts to error, eventually grinding to a stop. Try to restart the crawl and the system crawls again as the limit of RU's is reached. Deleting the database and increasing the number even to 5000 does not help, just delays the process and as the upper limit for RU's is 10,000, I don't think this is currently viable as a URL Store

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants