-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warp with versioned, enablemd5 true, ran into "We encountered an internal error. Please try again" PUT method errors #8524
Comments
@rkomandu the box is empty. any news about the logs? |
reposting again the box folder --> https://ibm.ent.box.com/folder/293689458577 it had files from 13th Nov as posted in the slack screenshot |
on the time these 4 errors happen we see the following error:
looks like the issue happens during multiple upload
|
@rkomandu can you take full logs, so I can farther investigate this: |
@nadavMiz , you can check this, it had 8 errors from the Warp summary
Warp analyze showed PUT errors at the timestamps
output is located here https://ibm.ent.box.com/folder/294550758268 ./warp versioned --insecure --duration 180m --host :6443 --access-key KCxP4AN9937kVqoCrNIs --secret-key bIdwF/5nJtSnrHWXrhPOhkv1WqGjtayMk6D+aU/U --tls --obj.size 256M --bucket newbucket-warp-vers-mup-19nov |
run started around "Nov 18 23:58:30" , the noobaa file in box are from 00 of Nov 19th, nothing much in the older logs (as such no errors from analyze as well) .. Nov 18 23:58:30 gui0-node [4006211]: [nsfs/4006211] [L1] core.endpoint.s3.s3_rest:: S3 REQUEST PUT /newbucket-warp-vers-mup-19nov op put_bucket request_id m3nzi3yy-4d47kz-m3d |
@guymguym I talked with @nadavMiz about this bug today, is there a reason we are awaiting for the deletion of the parts? seems to me like this can be done in the background so we won't delay our response. |
@romayalon if we don't wait for the deletion of parts we can't delete the multipart directory as well. Why are we waiting for so long anyway? If we want to speed it up then we can create a single fs_napi operation that will delete the entire directory and all the files in it, so we won't have to submit so many work items. |
@romayalon @guymguym the TODO mentioned is this:
looks like the function I am not sure why unlink is taking so long, but other operations occasionally take that long as well. for example:
|
@guymguym Why do we need to wait with sending the response to after the deletion of the directory? |
we don't have to but otherwise the mp upload still assumed to exist by other api calls |
@guymguym Thanks Guy that's what I thought, I think we can really improve this flow, maybe also we can await for the rename of the folder and not wait for the unlinks. |
@guymguym @romayalon even if we won't wait for the unlink, we would still might still take a long time uploading the file itself. it seems to be the reason we have the keepalive mechanism in the first place. maybe we can somehow pass the version-id to the header before uploading the file and deleting the folder, that way we can wait as long as needed |
@nadavMiz regardless of the fix of the |
@madhuthorat FS commands are taking very long in this run. link command takes 10 seconds and each unlink command takes 230 ms:
tested it on my cluster and for 30 GB file link took only 0.321312 ms. looks like the main issue is with the filesystem regardless, I have opened a new issue (#8569 ) to decide how to handle this scenario on our side in case there are file system problems. maybe we should just let it fail |
Environment info
noobaa-core-5.17.1-20241104.el9.ppc64le (standalone)
Actual behavior
ERROR in the noobaa.log (snippet is copied) --> more in the logs
Nov 13 02:20:39 node-gui1 [3689895]: [nsfs/3689895] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR
Interna
We encountered an internal error. Please try again./newbucket-warp-vers-mup-13nov/n9pjzZ6F/5.7eWO9uYt8k5sK7Wn.rnd?uplError
loadId=bd897211-764d-48a6-a572-73bf184e750em3fjvnp1-1bh1z0-1nv POST /newbucket-warp-vers-mup-13nov/n9pjzZ6F/5.7eWO9u
Yt8k5sK7Wn.rnd?uploadId=bd897211-764d-48a6-a572-73bf184e750e {"host":"gpfs-p10-s3-ces.rtp.raleigh.ibm.com:6443","user-agent":"MinIO (linux; ppc64le) minio-go/
v7.0.70 warp/(dev)","content-length":"1450","authorization":"AWS4-HMAC-SHA256 Credential=KCxP4AN9937kVqoCrNIs/20241113/us-east-1/s3/aws4_request, SignedHeader
s=content-type;host;x-amz-checksum-crc32c;x-amz-content-sha256;x-amz-date, Signature=259d9b307e73f794210a9ea45b43da9818ba49cbca69c3ac409e0e1f76f499ce","conten
t-type":"application/octet-stream","x-amz-checksum-crc32c":"DiTesA==","x-amz-content-sha256":"dbc6f5a50de4dfc013242d1332e6e6caf77d458a4040248a85a5d6b8ef56085a
","x-amz-date":"20241113T071859Z"} Error [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client at ServerResponse.setHeader (node:_h
ttp_outgoing:652:11) at Object.post_object_uploadId [as handler] (/usr/local/noobaa-core/src/endpoint/s3/ops/s3_post_object_uploadId.js:43:13) at async handle_request (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:161:19) at async Object.s3_rest [as handler] (/usr/local/noobaa-core/src/endpoint/s3/s3_rest.js:66:9)
Nov 13 02:20:39 node-gui1 [3689895]: [nsfs/3689895] [L0] core.endpoint.s3.s3_rest:: Sending error xml in body, but too late for headers...
...
s3 configuration
ALLOW_HTTP : false
DEBUGLEVEL : default
ENABLEMD5 : true
ENDPOINT_FORKS : 2
ENDPOINT_PORT : 6001
ENDPOINT_SSL_PORT : 6443
Expected behavior
Why does the Error crop up ? when there is no connection reset
4 errors are observed as shown below
Operation: PUT (7391). Ran 3h0m4s. Size: 256000000 bytes. Concurrency: 20.
Errors: 4
First Errors:
Steps to reproduce
Run Warp as shown above with options , enablemd5=true
More information - Screenshots / Logs / Other output
Will upload the logs to box folder and update here (start with here --> https://ibm.box.com/s/5jsco2y3f50zcpwr39yt0f0scpo2tns8)
The text was updated successfully, but these errors were encountered: