Support Downloads (edited title) #161

andynuss · 2021-10-21T17:27:42Z

When I had been scraping a user's requestUrl with playwright, given that some urls that do not end with the suffix '.pdf' ARE in fact pdfs, and in even rarer cases, some urls that end with '.pdf' are actually text/html, I had been using playwright to tell me if the document was a pdf.

i.e. I looked at the 'content-type' header found in the playwright page.goto() response, and made sure it is 'text/html', before doing further things with that visited document.

But when I use the agent.goto function to visit any pdf in secret-agent, I get something like the following exception:

Error: net::ERR_ABORTED
    at Page.navigate (/Users/andynuss/repos/stag-secret-agent/app/node_modules/puppet-chrome/lib/Page.ts:212:45)
    at runNextTicks (internal/process/task_queues.js:58:5)
    at processImmediate (internal/timers.js:434:9)
    at Timer.waitForPromise (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/Timer.ts:56:20)
    at Tab.goto (/Users/andynuss/repos/stag-secret-agent/app/node_modules/core/lib/Tab.ts:369:5)
    at CommandRecorder.runCommandFn (/Users/andynuss/repos/stag-secret-agent/app/node_modules/core/lib/CommandRecorder.ts:73:16)
    at ConnectionToClient.executeCommand (/Users/andynuss/repos/stag-secret-agent/app/node_modules/core/server/ConnectionToClient.ts:324:14)
    at ConnectionToClient.handleRequest (/Users/andynuss/repos/stag-secret-agent/app/node_modules/core/server/ConnectionToClient.ts:70:14)
------REMOTE CORE---------------------------------
    at Function.reviver (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/TypeSerializer.ts:208:26)
    at JSON.parse (<anonymous>)
    at Function.parse (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/TypeSerializer.ts:24:17)
    at WebSocket.<anonymous> (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/connections/RemoteConnectionToCore.ts:67:42)
    at WebSocket.emit (events.js:315:20)
    at Receiver.receiverOnMessage (/Users/andynuss/repos/stag-secret-agent/app/node_modules/ws/lib/websocket.js:983:20)
    at Receiver.emit (events.js:315:20)
    at Receiver.dataMessage (/Users/andynuss/repos/stag-secret-agent/app/node_modules/ws/lib/receiver.js:517:14)
    at /Users/andynuss/repos/stag-secret-agent/app/node_modules/ws/lib/receiver.js:468:23
    at /Users/andynuss/repos/stag-secret-agent/app/node_modules/ws/lib/permessage-deflate.js:308:9
------CONNECTION----------------------------------
    at new Resolvable (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/Resolvable.ts:17:18)
    at Object.createPromise (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/utils.ts:68:10)
    at RemoteConnectionToCore.createPendingResult (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/connections/ConnectionToCore.ts:328:31)
    at RemoteConnectionToCore.internalSendRequestAndWait (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/connections/ConnectionToCore.ts:253:43)
    at RemoteConnectionToCore.sendRequest (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/connections/ConnectionToCore.ts:156:17)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at Object.cb (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/lib/CoreCommandQueue.ts:104:26)
    at Queue.next (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/Queue.ts:82:19)
------CORE COMMANDS-------------------------------
    at Queue.run (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/Queue.ts:35:19)
    at CoreCommandQueue.run (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/lib/CoreCommandQueue.ts:100:8)
    at CoreTab.goto (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/lib/CoreTab.ts:92:36)
    at Tab.goto (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/lib/Tab.ts:160:36)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at testUrl (/Users/andynuss/repos/stag-secret-agent/app/src/test.js:51:30)
    at /Users/andynuss/repos/stag-secret-agent/app/src/test.js:130:5

The text was updated successfully, but these errors were encountered:

blakebyrnes · 2021-10-22T20:51:10Z

I'm guessing this is triggering a file download, which in Chrome, will sometimes redirect to load into the browser. I got halfway through the downloads PR and then the dev who was interested in implementing it kind of disappeared.

andynuss · 2021-10-30T18:01:43Z

I noticed that I previously found a similar behavior with puppeteer, and that this puppeteer issue indicated it had
something to do with chromium?

puppeteer/puppeteer#2794

blakebyrnes · 2021-10-30T18:26:03Z

I don't see any confirmation in the thread, but it sounds to me like this is because headless chrome triggers downloads when it encounters PDFs. Which makes sense because headless chrome has no "plugins" installed in it, and the plugins are what knows how to render PDFs. Like I said, I need to finish (or get someone's help?!?! hint, hint) the PR I linked to above. Been wrapped up in some things for the new Hero project, so I haven't been able to get to this.

blakebyrnes · 2022-10-14T14:11:06Z

Existing PR in SecretAgent

There's a PR that was mostly completed against SecretAgent. It can be mostly applied to the Agent repo. HOWEVER.. I came away thinking that the best approach for this was actually to allow Downloads to behave like normal resources.

Request Interception

I think to achieve this, we might want "request interception" with an ability to "stream" the response body as it becomes available.

blakebyrnes linked a pull request Oct 30, 2021 that will close this issue

feat: file downloads ulixee/secret-agent#332

Draft

9 tasks

andynuss mentioned this issue Oct 31, 2021

failure to substitute :method (with GET?) sometimes in MitmRequestAgent ulixee/secret-agent#368

Open

blakebyrnes linked a pull request Nov 15, 2021 that will close this issue

feat: file downloads ulixee/secret-agent#332

Draft

9 tasks

blakebyrnes transferred this issue from ulixee/secret-agent Oct 14, 2022

blakebyrnes changed the title ~~an error: NET::ABORTED is returned when visiting a pdf~~ Support Downloads (edited title) Oct 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Downloads (edited title) #161

Support Downloads (edited title) #161

andynuss commented Oct 21, 2021 •

edited

blakebyrnes commented Oct 22, 2021 •

edited

andynuss commented Oct 30, 2021

blakebyrnes commented Oct 30, 2021

blakebyrnes commented Oct 14, 2022

Support Downloads (edited title) #161

Support Downloads (edited title) #161

Comments

andynuss commented Oct 21, 2021 • edited

blakebyrnes commented Oct 22, 2021 • edited

andynuss commented Oct 30, 2021

blakebyrnes commented Oct 30, 2021

blakebyrnes commented Oct 14, 2022

Existing PR in SecretAgent

Request Interception

andynuss commented Oct 21, 2021 •

edited

blakebyrnes commented Oct 22, 2021 •

edited