Skip to content

run_jetson.sh error #1706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AdsonNAlves opened this issue Mar 10, 2023 · 23 comments
Open

run_jetson.sh error #1706

AdsonNAlves opened this issue Mar 10, 2023 · 23 comments

Comments

@AdsonNAlves
Copy link

AdsonNAlves commented Mar 10, 2023

Describe the bug

when testing the jetson nano example(https://github.com/adap/flower/tree/main/examples/embedded_devices) and run the
$ ./run_jetson.sh --server_address=<SERVER_ADDRESS> --cid=0 --model=ResNet18. I check the error.

Traceback (most recent call last):
File "./client.py", line 27, in
from flwr.common import (
ImportError: cannot import name 'NDArrays'

I added the flwr path:
/usr/local/lib/python3.6/dist-packages/flwr
/usr/local/lib/python3.6/dist-packages/flwr/common

Some help?

Steps/Code to Reproduce

./run_jetson.sh --server_address=192.168.55.100 --cid=0 --model=ResNet18

Expected Results

Actual Results

=> => naming to docker.io/library/flower_client:latest 0.0s
Traceback (most recent call last):
File "./client.py", line 27, in
from flwr.common import (
ImportError: cannot import name 'NDArrays'

@Sanya000
Copy link

This problem also appears in the raspberry pi part of that same example.
When running: ./run_pi.sh --server_address=<xxx.xxx.xx.xxx> --cid=0 --model=Net
The docker builds fine right up to the end and then also fails at:
File "./client.py", line 27, in
from flwr.common import (
ImportError: cannot import name 'NDArrays'

@matteuscruz
Copy link

I am currently facing the same problem to implement federated learning through flower on my Jetson Nano and Raspberry Pi. Unfortunately I have not found any solution to this problem yet.

The error message is as follows:

Traceback (most recent call last):
File "./client.py", line 35, in
import utils
File "/app/utils.py", line 46, in
class Net(nn.Module):
File "/app/utils.py", line 69, in Net
def get_weights(self) -> fl.common.NDArrays:
AttributeError: module 'flwr.common' has no attribute 'NDArrays'

@cleong110
Copy link

I've been running into the same issue here: the root cause seems to be this line:

RUN pip3 install flwr>=1.0.0

This pip install command lacks quotation marks, so it installs an old version of flwr, and creates a file named "=1.0.0"

The proper command should be

pip3 install "flwr>=1.0.0 "

@cleong110
Copy link

cleong110 commented May 25, 2023

This can be checked by running the image interactively, like so:

docker run -it --runtime nvidia --rm --entrypoint bash flower_client

image

@cleong110
Copy link

So we're logged into the docker container itself, and we can now also check what versions of what things got installed:
image

root@924f9e620cca:/app# pip3 list
Package            Version
------------------ ---------------
appdirs            1.4.4
beautifulsoup4     4.12.2
Cython             0.29.21
dataclasses        0.6
decorator          4.4.2
flwr               0.18.0
future             0.18.2
google             2.0.3
grpcio             1.43.0
importlib-metadata 1.7.0
Mako               1.1.3
MarkupSafe         1.1.1
numpy              1.19.4
Pillow             8.0.1
pip                21.3.1
protobuf           3.19.6
pycuda             2020.1
pytools            2020.4.3
setuptools         51.0.0
six                1.15.0
soupsieve          2.3.2.post1
torch              1.6.0
torchaudio         0.6.0a0+f17ae39
torchvision        0.7.0a0+78ed10c
wheel              0.36.1
zipp               3.6.0

@cleong110
Copy link

Which brings us to the next problem: it's not possible to install the latest version of flwr because our python version is 3.6.9
image

@cleong110
Copy link

One workaround is to edit the Dockerfile to install Python3.7 instead, and then either use python3.7 and python3.7 -m pip as your commands from then on, or find a way to set 3.7 as the default over 3.6

@cleong110
Copy link

Something like this, added before the pip update line:

RUN apt-get update && apt-get upgrade -y
RUN apt-get install python3.7 -y
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 2
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 1
RUN update-alternatives --config python3

# update pip
RUN pip3 install --upgrade pip

@cleong110
Copy link

@cleong110
Copy link

Now I get this error:

ModuleNotFoundError: No module named 'torch'

@cleong110
Copy link

This is because installing flwr doesn't install pytorch. pytorch was installed for 3.6, but not 3.7

@cleong110
Copy link

So torchvison and torch must be pip installed as well

@cleong110
Copy link

But then I'm not sure the new version of pytorch has GPU access

@rmouram
Copy link

rmouram commented Jun 26, 2023

cleong110 I followed your recommendations to fix this error and I come across the following:

ModuleNotFoundError: No module named 'torch'

I installed torch via pip by placing the installation in a RUN command inside the Dockerfile:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

however it gave an error of not finding the PIL module, so I installed the PIL via pip3 inside the RUN in the Dockerfile, and the following error appeared:
ImportError: cannot import name 'ParametersRes' from 'flwr.common' (/usr/local/lib/python3.7/dist-packages/flwr/common/init.py)

Is there any solution? Thanks.

@cleong110
Copy link

You are starting down a rabbit hole, I'm afraid. In your case, my suspicion is that pip3 isn't the "right" pip, and is installing torch for the wrong python.

(I did confirm that you can apt install python3.8 We eventually found someone who had built a GPU wheel of Pytorch which was compatible with Python 3.8 somewhere, but it's not trivial)

One helpful tip for debugging is if you login interactively to the Docker by using something like docker run -it --rm --runtime nvidia --entrypoint bash flower_client:latest, I think that's roughly the right command, note how I override the entrypoint so that it doesn't run client.py, but lets me run bash instead.

Then you can manually start a Python shell and try to import torch and see what's wrong.

Some things to watch out for:

  • are you using pip or pip3?
  • Are you using python or python3? Check things like python --version or python3 --version
  • If you used apt to install python3.8, do you need to call it specifically, with, like, python3.8 <some command>?

In your case I would do something like:

# edit the Dockerfile
# rebuild the image, I think run_jetson.sh calls a docker build command
docker run -it --rm --runtime nvidia --entrypoint bash flower_client:latest 

#once you're IN the docker container...
python --version
python3 --version
python3.8 --version
pip --version
pip3 --version

# run python
python3.8 
>> import torch
# etc

and make sure what's installed where.

Probably the best thing would be to, instead of using pip3, instead do:

python3.8 -m pip install <whatever>

which guarantees you are using the specific pip which installs packages that python 3.8 can use

@cleong110
Copy link

Regarding ParametersRes: that is actually not your fault at all, I believe it is a compatibility issue between versions of flwr

@cleong110
Copy link

#1214 in one of the updates, the function was renamed.

@cleong110
Copy link

cleong110 commented Jun 26, 2023

So the embedded example seems to use the old syntax.

Your options:

  • rewrite/update the embedded example to use modern flwr syntax
  • install a version of flwr that works for the embedded example (from before Rename protobuf messages #1214 was merged)
  • adapt the code from one of the updated examples and run that in the Docker instead (this is what I've been working on)

(Edit: another option is to buy one of the newer Jetsons, that can install a newer Jetpack, that can support newer versions of Pytorch and Flower)

@WilliamLindskog
Copy link
Contributor

Hi @cleong110

Thanks for your comments on this issue. Is this a problem that you are still experiencing or can we go ahead and close this issue? Have you tried the new example: https://github.com/adap/flower/tree/main/examples/embedded-devices

@cleong110
Copy link

Ah, sorry, haven't looked at this in quite a while. Haven't tried the new example, no.

@cleong110
Copy link

@WilliamLindskog the new example does look cool. I see it's designed for a Raspberry Pi. We were using various older Jetson modules, have you tested installing on there?

Other things to possibly check before closing this issue:

  • have the problems with the Dockerfiles been fixed, e.g. the lack of quotation marks in the pip install? (I see the file is removed in the latest version)
  • what Jetson devices/Jetpacks/ etc has run_jetson.sh been tested with, or are you planning to support? I think an answer like "Nothing older than a Xavier" might be the easiest answer

Given that it seems the latest example doesn't use Jetsons and older versions especially can be a bit of a pain, perhaps the answers to both of these could just be "N/A, None", and the issue can be closed.

Certainly it would be nice to have an example running on TX2s or Xaviers or Nanos or something, those are exactly the sort of device that it would be really interesting to network together and run Flower on.

@WilliamLindskog
Copy link
Contributor

WilliamLindskog commented Mar 12, 2025

@cleong110 sorry for late reply. Checked out this conversation and seems like it is possible to run "Jetson Orin NX device". #4399

regarding the quotation marks issue, I think that has been resolved in newer versions of Flower. https://github.com/adap/flower/tree/main/examples/embedded-devices

You're right that it would be nice to have instructions on how to run on these sort of devices, would you be interested in creating a PR for this? Just like a .md file on how to run on discussed devices?

@cleong110
Copy link

I can't do a PR at this time, sorry. Good to hear that those issues have been solved and people have gotten it working! But yes, a .md might be the easiest!

Ideally it'd have: (1) install/setup instructions (2) instructions to run a very basic FL experiment on two or more devices!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants