Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build broken in current master #628

Closed
de-code opened this issue Aug 27, 2020 · 11 comments
Closed

Docker build broken in current master #628

de-code opened this issue Aug 27, 2020 · 11 comments
Labels
bug From Hemiptera and especially its suborder Heteroptera implemented The issue has been implemented

Comments

@de-code
Copy link
Collaborator

de-code commented Aug 27, 2020

Running the Docker build is currently broken in master and also in 0.6.1 (but not in 0.6.0).

Command:

docker build .

Error:

Step 20/37 : COPY --from=builder /opt/grobid-source/grobid-core/build/libs/grobid-core-*-onejar.jar ./grobid/grobid-core-onejar.jar
 ---> f50b25031de8
Step 21/37 : COPY --from=builder /opt/grobid-source/grobid-service/build/distributions/grobid-service-*.zip ./grobid-service.zip
COPY failed: no source files were specified

This seems to be caused by 683804d which disabled the creation of the distribution zip.

Might be good to add the Docker build as part of the CI build.

/cc @lfoppiano

@lfoppiano
Copy link
Collaborator

Yes, we would need to add a new task that creates the zip/tar file, which can be used for the docker, or we could modify the current docker builder to use the various jars (less clean).

@lfoppiano
Copy link
Collaborator

The quick fix is to set tasks.distZip.enabled to true in the grobid-service section:

    tasks.distZip.enabled = true

The correct fix is to create a new task that would do that, and call this task when building the new docker image. I've spent some time already without managing to have it work.

@kermitt2
Copy link
Owner

kermitt2 commented Sep 7, 2020

What about the following:

  • in gradle.build, line 333
tasks.distZip.enabled = true
  • in the docker file, line 40:
RUN ./gradlew distZip

and that's it, no distribution for the simple ./gradlew clean install, only when building a new docker image.

Step 22/38 : COPY --from=builder /opt/grobid-source/grobid-service/build/distributions/grobid-service-*.zip ./grobid-service.zip
 ---> dd1f627afab9
Step 23/38 : COPY --from=builder /opt/grobid-source/grobid-home/build/distributions/grobid-home-*.zip ./grobid-home.zip
 ---> 2511627a323c
Step 24/38 : RUN unzip -o ./grobid-service.zip -d ./grobid &&     mv ./grobid/grobid-service-* ./grobid/grobid-service
 ---> Running in 04dad7d7bb09

@kermitt2
Copy link
Owner

kermitt2 commented Sep 7, 2020

Might be good to add the Docker build as part of the CI build.

This would make each CI build at 40 minutes or more, so degrading a lot the interest of the CI.

@de-code
Copy link
Collaborator Author

de-code commented Sep 7, 2020

Might be good to add the Docker build as part of the CI build.

This would make each CI build at 40 minutes or more, so degrading a lot the interest of the CI.

40 minutes seem to be a bit high, why do you think that? The build for #629 (https://travis-ci.org/github/kermitt2/grobid/builds/721804752) ran for for over 7 minutes without Docker and over 8 minutes with Docker (although no tests in that case). The main difference is that caching artefacts across builds would be more difficult. I believe one way of doing it if it seems worth it, is to push / pull layers of the image (as the cloud CI servers don't usually cache layers out of the box). I get more information on that (a colleague of mine used it for some project I believe).

@de-code
Copy link
Collaborator Author

de-code commented Sep 7, 2020

What about the following:

  • in gradle.build, line 333
tasks.distZip.enabled = true
  • in the docker file, line 40:
RUN ./gradlew distZip

and that's it, no distribution for the simple ./gradlew clean install, only when building a new docker image.

Step 22/38 : COPY --from=builder /opt/grobid-source/grobid-service/build/distributions/grobid-service-*.zip ./grobid-service.zip
 ---> dd1f627afab9
Step 23/38 : COPY --from=builder /opt/grobid-source/grobid-home/build/distributions/grobid-home-*.zip ./grobid-home.zip
 ---> 2511627a323c
Step 24/38 : RUN unzip -o ./grobid-service.zip -d ./grobid &&     mv ./grobid/grobid-service-* ./grobid/grobid-service
 ---> Running in 04dad7d7bb09

I am not sure what the motivation of changing it to false was. I suspected it would have generated the zip as part of another build where it wasn't used? Just changing the tasks.distZip.enabled to true (the quick fix) would make this work already. So maybe do that initially and optimize later?

@kermitt2
Copy link
Owner

kermitt2 commented Sep 7, 2020

40 minutes seem to be a bit high, why do you think that? The build for #629 (https://travis-ci.org/github/kermitt2/grobid/builds/721804752) ran for for over 7 minutes without Docker and over 8 minutes with Docker (although no tests in that case). The main difference is that caching artefacts across builds would be more difficult. I believe one way of doing it if it seems worth it, is to push / pull layers of the image (as the cloud CI servers don't usually cache layers out of the box). I get more information on that (a colleague of mine used it for some project I believe).

You're right, a new build is much faster, I had 35mn for docker but it was with downloading gradle-6.5.1-all.zip and all the artefacts on maven central, and my internet connection is currently very slow. New build of the docker image takes less than 2mn with cached artefacts, so that could be indeed painless and well worth with cache.

@de-code
Copy link
Collaborator Author

de-code commented Sep 7, 2020

Regarding caching Docker layers, I asked my colleague @erkannt

For github actions you can look at this pipeline: https://github.com/libero/reviewer-client/blob/bcde2bfcf5464f07574ca389d11bf3e1b92e8c84/.github/workflows/ci.yml#L38https://github.com/libero/reviewer-client/blob/bcde2bfcf5464f07574ca389d11bf3e1b92e8c84/.github/workflows/ci.yml#L38
It uses a github action that takes care of setting up the caches that github actions need as they always run on fresh VMs.

For Travis CI:

the effectiveness of caching docker layers will depend strongly on their Dockerfile design. If they have a long build with rarely changing base layers they might be able to simply use --cache-from in their docker builld invocation: http://atodorov.org/blog/2017/08/07/faster-travis-ci-tests-with-docker-cache/http://atodorov.org/blog/2017/08/07/faster-travis-ci-tests-with-docker-cache/
I think this won't provide speedup if the Dockerfile uses multistage i.e. COPY --from. In that case one could either also push the intermediate images to a registry or save them to the Travis cache which will then load them into the build VM upon creation. The cache lives in S3 AFAICT. So depending on the time it takes to build the different layers building can be faster than copying from registry or loading from cache.

@lfoppiano
Copy link
Collaborator

I am not sure what the motivation of changing it to false was. I suspected it would have generated the zip as part of another build where it wasn't used? Just changing the tasks.distZip.enabled to true (the quick fix) would make this work already. So maybe do that initially and optimize later?

Yes, it is generating the zip when you run ./gradlew clean build. I remember discussing about the fact that the zip will slow down the build time, so therefore I set it to false.

@kermitt2
Copy link
Owner

kermitt2 commented Sep 8, 2020

Yes, it is generating the zip when you run ./gradlew clean build. I remember discussing about the fact that the zip will slow down the build time, so therefore I set it to false.

ah yes I remember! At the same time, ./gradlew clean build runs the tests so it's already a time-consuming process, no? In the documentation, we indicate ./gradlew clean install which will never create the distribution zip (personally I am only using ./gradlew clean install).

@lfoppiano
Copy link
Collaborator

Yes that was before we migrated to gradle 6, I think with the install it was creating the zip file if I'm not wrong. Anyway, it seem that this change would fix this two issues in one go. So 👍.

@lfoppiano lfoppiano added the implemented The issue has been implemented label Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug From Hemiptera and especially its suborder Heteroptera implemented The issue has been implemented
Projects
None yet
Development

No branches or pull requests

3 participants