Skip to content

Commit 6e26cf4

Browse files
authoredAug 25, 2023
Setup (#250)
* allow hdf5:// prefix - #247 * fix flake8 errors * use tcp by default in hsds app * update test to run with quick stat * add toml file * catch login name not defined * remove defunct rangeget test * updated readme quick start * bump aiohttp version to 3.8.5 * fix quick start instructions
1 parent 334bef8 commit 6e26cf4

23 files changed

+422
-387
lines changed
 

‎Dockerfile

+7-2
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,18 @@ RUN mkdir /usr/local/src/hsds/ \
1212
/usr/local/src/hsds/hsds/util/ \
1313
/etc/hsds/
1414

15-
COPY setup.py /usr/local/src/hsds/
15+
COPY pyproject.toml /usr/local/src/hsds/
16+
COPY setup.cfg /user/local/src/hsds/
1617
COPY hsds/*.py /usr/local/src/hsds/hsds/
1718
COPY hsds/util/*.py /usr/local/src/hsds/hsds/util/
1819
COPY admin/config/config.yml /etc/hsds/
1920
COPY admin/config/config.yml /usr/local/src/hsds/admin/config/
2021
COPY entrypoint.sh /
21-
RUN /bin/bash -c 'cd /usr/local/src/hsds; pip install -e ".[azure]" ; cd -'
22+
RUN /bin/bash -c 'cd /usr/local/src/hsds; \
23+
pip install build;\
24+
python -m build;\
25+
pip install -v . ;\
26+
cd -'
2227

2328
EXPOSE 5100-5999
2429
ENTRYPOINT ["/bin/bash", "-c", "/entrypoint.sh"]

‎README.md

+16-16
Original file line numberDiff line numberDiff line change
@@ -5,27 +5,27 @@
55
HSDS is a web service that implements a REST-based web service for HDF5 data stores.
66
Data can be stored in either a POSIX files system, or using object-based storage such as
77
AWS S3, Azure Blob Storage, or [MinIO](https://min.io).
8-
HSDS can be run a single machine using Docker or on a cluster using Kubernetes (or AKS on Microsoft Azure).
8+
HSDS can be run a single machine with or without Docker or on a cluster using Kubernetes (or AKS on Microsoft Azure).
99

1010
In addition, HSDS can be run in serverless mode with AWS Lambda or h5pyd local mode.
1111

1212
## Quick Start
1313

14-
Make sure you have Python 3, Pip, and git installed, then:
15-
16-
1. Clone this repo: `$ git clone https://github.com/HDFGroup/hsds`
17-
2. Go to the hsds directory: `$ cd hsds`
18-
3. Run install: `$ python setup.py install` OR install from pypi: `$ pip install hsds`
19-
4. Setup password file: `$ cp admin/config/passwd.default admin/config/passwd.txt`
20-
5. Create a directory the server will use to store data, and then set the ROOT_DIR environment variable to point to it: `$ mkdir hsds_data; export ROOT_DIR="${PWD}/hsds_data"` For Windows: `C:> set ROOT_DIR=%CD%\hsds_data`
21-
6. Create the hsds test bucket: `$ mkdir hsds_data/hsdstest`
22-
7. Start server: `$ ./runall.sh --no-docker` For Windows: `C:> runall.bat`
23-
8. In a new shell, set environment variables for the admin account: `$ export ADMIN_USERNAME=admin` and `$ export ADMIN_PASSWORD=admin` (adjust for any changes made to the passwd.txt file). For Windows - use the corresponding set commands
24-
9. Run the test suite: `$ python testall.py --skip_unit`
25-
10. (Optional) Post install setup (test data, home folders, cli tools, etc): [docs/post_install.md](docs/post_install.md)
26-
11. (Optional) Install the h5pyd package for an h5py compatible api and tool suite: https://github.com/HDFGroup/h5pyd
27-
28-
To shut down the server, and the server was started with the --no-docker option, just control-C.
14+
Make sure you have Python 3 and Pip installed, then:
15+
16+
1. Run install: `$ ./build.sh` from source tree OR install from pypi: `$ pip install hsds`
17+
2. Create a directory the server will use to store data, example: `$ mkdir ~/hsds_data`
18+
3. Start server: `$ hsds --root_dir ~/hsds_data`
19+
4. Run the test suite. In a separate terminal run:
20+
- Set user_name: `$ export USER_NAME=$USER`
21+
- Set user_password: `$ export USER_PASSWORD=$USER`
22+
- Set admin name: `$ export ADMIN_USERNAME=$USER`
23+
- Set admin password: `$ $export ADMIN_PASSWORD=$USER`
24+
- Run test suite: `$ python testall.py --skip_unit`
25+
5. (Optional) Install the h5pyd package for an h5py compatible api and tool suite: https://github.com/HDFGroup/h5pyd
26+
6. (Optional) Post install setup (test data, home folders, cli tools, etc): [docs/post_install.md](docs/post_install.md)
27+
28+
To shut down the server, and the server is not running in Docker, just control-C.
2929

3030
If using docker, run: `$ ./stopall.sh`
3131

‎build.sh

+12-6
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,17 @@ if [ $run_pyflakes ]; then
2727
fi
2828
fi
2929

30-
echo "running setup.py"
31-
python setup.py install
30+
pip install --upgrade build
3231

33-
echo "clean stopped containers"
34-
docker rm -v $(docker ps -aq -f status=exited)
32+
echo "running build"
33+
python -m build
34+
pip install -v .
3535

36-
echo "building docker image"
37-
docker build -t hdfgroup/hsds .
36+
command -v docker
37+
if [ $? -ne 1 ]; then
38+
echo "clean stopped containers"
39+
docker rm -v $(docker ps -aq -f status=exited)
40+
41+
echo "building docker image"
42+
docker build -t hdfgroup/hsds .
43+
fi

‎entrypoint.sh

-4
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,6 @@ elif [ $NODE_TYPE == "head_node" ]; then
2222
echo "running hsds-headnode"
2323
export PYTHONUNBUFFERED="1"
2424
hsds-headnode
25-
elif [ $NODE_TYPE == "rangeget" ]; then
26-
echo "running hsds-rangeget"
27-
export PYTHONUNBUFFERED="1"
28-
hsds-rangeget
2925
else
3026
echo "Unknown NODE_TYPE: " $NODE_TYPE
3127
fi

‎hsds/app.py

+63-40
Original file line numberDiff line numberDiff line change
@@ -15,24 +15,33 @@
1515
import sys
1616
import logging
1717
import time
18-
import uuid
1918

2019
from .hsds_app import HsdsApp
20+
from . import config
2121

22-
_HELP_USAGE = "Starts hsds a REST-based service for HDF5 data."
22+
_HELP_USAGE = "Starts HSDS, a REST-based service for HDF5 data."
2323

2424
_HELP_EPILOG = """Examples:
2525
26+
- with a POSIX-based storage using a directory: ./hsdata for storage:
27+
28+
hsds --root_dir ~/hsdata
29+
30+
- with POSIX-based storage and config settings and password file:
31+
32+
hsds --root_dir ~/hsdata --password-file ./admin/config/passwd.txt \
33+
--config_dir ./admin/config
34+
2635
- with minio data storage:
2736
2837
hsds --s3-gateway http://localhost:6007 --access-key-id demo:demo
2938
--secret-access-key DEMO_PASS --password-file ./admin/config/passwd.txt
30-
--bucket-name hsds.test
3139
32-
- with a POSIX-based storage for 'hsds.test' sub-folder in the './data'
33-
folder:
40+
- with AWS S3 storage and a bucket in the us-west-2 region:
41+
42+
hsds --s3-gateway http://s3.us-west-2.amazonaws.com --access-key-id ${AWS_ACCESS_KEY_ID} \
43+
--secret-access-key ${AWS_SECRET_ACCESS_KEY} --password-file ./admin/config/passwd.txt
3444
35-
hsds --bucket-dir ./data/hsds.test
3645
"""
3746

3847
# maximum number of characters if socket directory is given
@@ -139,14 +148,13 @@ def main():
139148
epilog=_HELP_EPILOG,
140149
)
141150

142-
group = parser.add_mutually_exclusive_group(required=True)
143-
group.add_argument(
151+
parser.add_argument(
144152
"--root_dir",
145153
type=str,
146154
dest="root_dir",
147155
help="Directory where to store the object store data",
148156
)
149-
group.add_argument(
157+
parser.add_argument(
150158
"--bucket_name",
151159
nargs=1,
152160
type=str,
@@ -197,7 +205,7 @@ def main():
197205
)
198206
parser.add_argument(
199207
"--count",
200-
default=1,
208+
default=4,
201209
type=int,
202210
dest="dn_count",
203211
help="Number of dn sub-processes to create.",
@@ -241,16 +249,25 @@ def main():
241249
print(f"unsupported log_level: {log_level_cfg}, using INFO instead")
242250
log_level = logging.INFO
243251

244-
print("set logging to:", log_level)
252+
print("set logging to::", log_level)
245253
logging.basicConfig(level=log_level)
246254

247255
userConfig = UserConfig()
248256

249-
# set username based on command line, .hscfg, $USER, or $JUPYTERHUB_USER
257+
login_username = None
258+
try:
259+
login_username = os.getlogin()
260+
except OSError:
261+
pass # ignore
262+
263+
# set username based on command line, .hscfg, or login user
250264
if args.hs_username:
251265
username = args.hs_username
252266
elif "HS_USERNAME" in userConfig:
253267
username = userConfig["HS_USERNAME"]
268+
elif not args.password_file:
269+
# no password file, add the login name as user
270+
username = login_username
254271
else:
255272
username = None
256273

@@ -260,7 +277,7 @@ def main():
260277
elif "HS_PASSWORD" in userConfig:
261278
password = userConfig["HS_PASSWORD"]
262279
else:
263-
password = "1234"
280+
password = login_username
264281

265282
if username:
266283
kwargs["username"] = username
@@ -271,38 +288,23 @@ def main():
271288
sys.exit(f"password file: {args.password_file} not found")
272289
kwargs["password_file"] = args.password_file
273290

274-
if args.host:
275-
# use TCP connect
276-
kwargs["host"] = args.host
291+
# use unix domain socket if a socket dir is set
292+
if args.socket_dir:
293+
socket_dir = os.path.abspath(args.socket_dir)
294+
if not os.path.isdir(socket_dir):
295+
raise FileNotFoundError(f"directory: {socket_dir} not found")
296+
kwargs["socket_dir"] = socket_dir
297+
else:
298+
# USE TCP connect
299+
if args.host:
300+
kwargs["host"] = args.host
301+
else:
302+
kwargs["host"] = "localhost"
277303
# sn_port only relevant for TCP connections
278304
if args.port:
279305
kwargs["sn_port"] = args.port
280306
else:
281307
kwargs["sn_port"] = 5101 # TBD - use config
282-
else:
283-
# choose a tmp directory for socket if one is not provided
284-
if args.socket_dir:
285-
socket_dir = os.path.abspath(args.socket_dir)
286-
if not os.path.isdir(socket_dir):
287-
raise FileNotFoundError(f"directory: {socket_dir} not found")
288-
else:
289-
if "TMP" in os.environ:
290-
# This should be set at least on Windows
291-
tmp_dir = os.environ["TMP"]
292-
print("set tmp_dir:", tmp_dir)
293-
else:
294-
tmp_dir = "/tmp"
295-
if not os.path.isdir(tmp_dir):
296-
raise FileNotFoundError(f"directory {tmp_dir} not found")
297-
rand_name = uuid.uuid4().hex[:8]
298-
socket_dir = os.path.join(tmp_dir, f"hs{rand_name}")
299-
print("using socket dir:", socket_dir)
300-
if len(socket_dir) > MAX_SOCKET_DIR_PATH_LEN:
301-
raise ValueError(
302-
f"length of socket_dir must be less than: {MAX_SOCKET_DIR_PATH_LEN}"
303-
)
304-
os.mkdir(socket_dir)
305-
kwargs["socket_dir"] = socket_dir
306308

307309
if args.logfile:
308310
logfile = os.path.abspath(args.logfile)
@@ -329,6 +331,27 @@ def main():
329331
if args.dn_count:
330332
kwargs["dn_count"] = args.dn_count
331333

334+
if args.bucket_name:
335+
bucket_name = args.bucket_name
336+
else:
337+
bucket_name = config.get("bucket_name")
338+
if not bucket_name:
339+
sys.exit("bucket_name not set")
340+
if args.root_dir:
341+
root_dir = args.root_dir
342+
else:
343+
root_dir = config.get("root_dir")
344+
if not root_dir:
345+
# check that AWS_S3_GATEWAY or AZURE_CONNECTION_STRING is set
346+
if not config.get("aws_s3_gateway") and not config.get("azure_connection_string"):
347+
sys.exit("root_dir not set (and no S3 or Azure connection info)")
348+
else:
349+
if not os.path.isdir(root_dir):
350+
sys.exit(f"directory: {root_dir} not found")
351+
bucket_path = os.path.join(root_dir, bucket_name)
352+
if not os.path.isdir(bucket_path):
353+
os.mkdir(bucket_path)
354+
332355
app = HsdsApp(**kwargs)
333356
app.run()
334357

‎hsds/basenode.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
from .util.k8sClient import getDnLabelSelector, getPodIps
3434
from . import hsds_logger as log
3535

36-
HSDS_VERSION = "0.8.1"
36+
HSDS_VERSION = "0.8.2"
3737

3838

3939
def getVersion():

‎hsds/config.py

+11-2
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,21 @@ def getCmdLineArg(x):
4242
# return value of command-line option
4343
# use "--x=val" to set option 'x' to 'val'
4444
# use "--x" for boolean flags
45+
4546
option = "--" + x + "="
4647
for i in range(1, len(sys.argv)):
4748
arg = sys.argv[i]
49+
if i < len(sys.argv) - 1:
50+
next_arg = sys.argv[i + 1]
51+
else:
52+
next_arg = None
4853
if arg == "--" + x:
49-
# boolean flag
5054
debug(f"got cmd line flag for {x}")
51-
return True
55+
if next_arg is None or next_arg.startswith("-"):
56+
# treat as a boolean flag
57+
return True
58+
else:
59+
return next_arg
5260
elif arg.startswith(option):
5361
# found an override
5462
nlen = len(option)
@@ -69,6 +77,7 @@ def _load_cfg():
6977
config_dir = getCmdLineArg("config_dir")
7078

7179
if config_dir:
80+
eprint("got command line arg for config_dir:", config_dir)
7281
config_dirs.append(config_dir)
7382
if not config_dirs and "CONFIG_DIR" in os.environ:
7483
config_dirs.append(os.environ["CONFIG_DIR"])

‎hsds/domain_sn.py

+1-6
Original file line numberDiff line numberDiff line change
@@ -1123,11 +1123,6 @@ async def PUT_Domain(request):
11231123
else:
11241124
is_toplevel = False
11251125

1126-
if is_toplevel and not is_folder:
1127-
msg = "Only folder domains can be created at the top-level"
1128-
log.warn(msg)
1129-
raise HTTPBadRequest(reason=msg)
1130-
11311126
if is_toplevel and not isAdminUser(app, username):
11321127
msg = "creation of top-level domains is only supported by admin users"
11331128
log.warn(msg)
@@ -1164,7 +1159,7 @@ async def PUT_Domain(request):
11641159
linked_json = await getDomainJson(app, l_d, reload=True)
11651160
log.debug(f"got linked json: {linked_json}")
11661161
if "root" not in linked_json:
1167-
msg = "Folder domains cannot ber used as link target"
1162+
msg = "Folder domains cannot be used as link target"
11681163
log.warn(msg)
11691164
raise HTTPBadRequest(reason=msg)
11701165
root_id = linked_json["root"]

‎hsds/hsds_app.py

+2
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,8 @@ def run(self):
274274
pargs = [py_exe, cmd_path, "--node_type=sn", "--log_prefix=sn "]
275275
if self._username:
276276
pargs.append(f"--hs_username={self._username}")
277+
# make this user admin
278+
pargs.append(f"--admin_user={self._username}")
277279
if self._password:
278280
pargs.append(f"--hs_password={self._password}")
279281
if self._password_file:

‎hsds/servicenode_lib.py

+4
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,10 @@ async def getObjectIdByPath(app, obj_id, h5path, bucket=None, refresh=False, dom
270270
# find domain object is stored under
271271
domain = link_json["h5domain"]
272272

273+
if domain.startswith("hdf5:/"):
274+
# strip off prefix
275+
domain = domain[6:]
276+
273277
if bucket:
274278
domain = bucket + domain
275279

‎hsds/util/domainUtil.py

+4
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,10 @@ def getDomainFromRequest(request, validate=True, allow_dns=True):
213213
if not domain:
214214
raise ValueError("no domain")
215215

216+
if domain.startswith("hdf5:/"):
217+
# strip off the prefix to make following logic easier
218+
domain = domain[6:]
219+
216220
if domain[0] != "/":
217221
# DNS style hostname
218222
if validate:

‎hsds/util/s3Client.py

+1
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ def _get_client_kwargs(self):
145145
kwargs["endpoint_url"] = self._s3_gateway
146146
kwargs["use_ssl"] = self._use_ssl
147147
kwargs["config"] = self._aio_config
148+
log.debug(f"s3 kwargs: {kwargs}")
148149
return kwargs
149150

150151
def _renewToken(self):

0 commit comments

Comments
 (0)
Please sign in to comment.