Skip to content

efforts towards building aarch64 grobid #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 33 additions & 24 deletions Dockerfile.software
Original file line number Diff line number Diff line change
Expand Up @@ -53,26 +53,26 @@ RUN rm -rf grobid-source
# build runtime image
# -------------------

# use NVIDIA Container Toolkit to automatically recognize possible GPU drivers on the host machine
FROM tensorflow/tensorflow:2.7.0-gpu
FROM python:3.8-slim

# setting locale is likely useless but to be sure
ENV LANG C.UTF-8

# update NVIDIA Cuda key (following a key rotation in April 2022)
RUN apt-get install -y wget
RUN apt-key del 7fa2af80
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
RUN dpkg -i cuda-keyring_1.0-1_all.deb
# RUN apt-get install -y wget
# RUN apt-key del 7fa2af80
# RUN rm /etc/apt/sources.list.d/cuda.list
# RUN rm /etc/apt/sources.list.d/nvidia-ml.list
# RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
# RUN dpkg -i cuda-keyring_1.0-1_all.deb

# install JRE, python and other dependencies
RUN apt-get update && \
apt-get -y --no-install-recommends install apt-utils build-essential gcc libxml2 libfontconfig unzip curl \
openjdk-17-jre-headless openjdk-17-jdk ca-certificates-java \
musl gfortran \
python3 python3-pip python3-setuptools python3-dev \
&& export JAVA_HOME \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

Expand All @@ -82,8 +82,15 @@ COPY --from=builder /opt/grobid .

RUN python3 -m pip install pip --upgrade

RUN pip install tensorflow tensorflow-io

# install DeLFT via pypi
RUN pip3 install requests delft==0.3.3
# RUN pip3 install requests delft==0.3.3
RUN apt-get update && \
apt-get -y --no-install-recommends install git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN pip install git+https://github.com/jameshowison/delft
# link the data directory to /data
# the current working directory will most likely be /opt/grobid
RUN mkdir -p /data \
Expand All @@ -98,18 +105,18 @@ WORKDIR /opt/grobid
ENV JAVA_OPTS=-Xmx4g

# install jep (and temporarily the matching JDK)
ENV JDK_URL=https://download.java.net/java/GA/jdk17.0.2/dfd4a8d0985749f896bed50d7138ee7f/8/GPL/openjdk-17.0.2_linux-x64_bin.tar.gz
RUN curl --fail --show-error --location -q ${JDK_URL} -o /tmp/openjdk.tar.gz
RUN mkdir /tmp/jdk-17
RUN tar xvfz /tmp/openjdk.tar.gz --directory /tmp/jdk-17 --strip-components 1 --no-same-owner
RUN /tmp/jdk-17/bin/javac -version
RUN JAVA_HOME=/tmp/jdk-17 pip3 install jep==4.0.2
RUN rm -f /tmp/openjdk.tar.gz
RUN rm -rf /tmp/jdk-17
ENV LD_LIBRARY_PATH=/usr/local/lib/python3.8/dist-packages/jep:grobid-home/lib/lin-64:grobid-home/lib/lin-64/jep:${LD_LIBRARY_PATH}
# ENV JDK_URL=https://download.java.net/java/GA/jdk17.0.2/dfd4a8d0985749f896bed50d7138ee7f/8/GPL/openjdk-17.0.2_linux-x64_bin.tar.gz
# RUN curl --fail --show-error --location -q ${JDK_URL} -o /tmp/openjdk.tar.gz
# RUN mkdir /tmp/jdk-17
# RUN tar xvfz /tmp/openjdk.tar.gz --directory /tmp/jdk-17 --strip-components 1 --no-same-owner
# RUN /tmp/jdk-17/bin/javac -version
# RUN JAVA_HOME=/tmp/jdk-17 pip3 install jep==4.0.2
# RUN rm -f /tmp/openjdk.tar.gz
# RUN rm -rf /tmp/jdk-17
# ENV LD_LIBRARY_PATH=/usr/local/lib/python3.8/dist-packages/jep:grobid-home/lib/lin-64:grobid-home/lib/lin-64/jep:${LD_LIBRARY_PATH}
# remove libjep.so because we are providing our own version in the virtual env above
RUN rm /opt/grobid/grobid-home/lib/lin-64/jep/libjep.so

# RUN rm /opt/grobid/grobid-home/lib/lin-64/jep/libjep.so
RUN JAVA_HOME="$(dirname $(dirname $(readlink -f $(which java))))" pip3 install jep==4.0.2
# preload embeddings

COPY --from=builder /opt/grobid-source/grobid-home/scripts/preload_embeddings.py .
Expand All @@ -122,9 +129,10 @@ COPY --from=builder /root/.m2/repository/org /opt/grobid/software-mentions/lib/o

# install Pub2TEI
WORKDIR /opt/
RUN wget https://github.com/kermitt2/Pub2TEI/archive/refs/heads/master.zip
RUN unzip master.zip
RUN mv Pub2TEI-master Pub2TEI
RUN git clone --depth 1 https://github.com/kermitt2/Pub2TEI.git
# RUN wget https://github.com/kermitt2/Pub2TEI/archive/refs/heads/master.zip
# RUN unzip master.zip
# RUN mv Pub2TEI-master Pub2TEI

WORKDIR /opt/grobid/software-mentions

Expand All @@ -138,7 +146,8 @@ RUN ./gradlew --version
# install all the ML models
RUN ./gradlew copyModels installModels && rm -rf resources/models && rm -f /opt/grobid/grobid-home/models/software/model.wapiti.gz && rm -f /opt/grobid/grobid-home/models/software-BERT-0.3.2.zip && rm -f /opt/grobid/grobid-home/models/context_bert-0.3.2.zip && rm -f /opt/grobid/grobid-home/models/context_used_bert-0.3.2.zip && rm -f /opt/grobid/grobid-home/models/context_shared_bert-0.3.2.zip && rm -f /opt/grobid/grobid-home/models/context_creation_bert-0.3.2.zip

RUN ./gradlew clean assemble install --no-daemon --stacktrace --info -x test
# removed --no-daemon from this as it was causing the docker build to hang. Seems opposite of what one would expect.
RUN ./gradlew clean assemble install --stacktrace --info -x test

CMD ["sh", "-c", "java --add-opens java.base/java.lang=ALL-UNNAMED -jar build/libs/software-mentions-0.8.0-SNAPSHOT-onejar.jar server resources/config/config.yml"]

Expand Down