-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFileOp PythonFileOp NotebookFileOp log stdout and stderr output to stdout always, S3 file put for run log output file made configurable #3227
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work with the feature.
Tested this, some suggestions:
Remaining thing, related to #3233, is to conditionally put file run log output (stdout etc.) to object storage S3 (put_file_to_object_storage) (this is independent of process_outputs for node output file argument processing, a different feature). egarding subprocess.POpen in #3233, I think I can log to stdout with subprocess.run, as tested, and optionally also write to a file, if necessary (i.e. if enable_generic_node_script_output_to_s3 is true). I will introduce an ENV variable for this, with default true and put it in the documentation as well. Defaults to true. like
A good place to document this is at elyra-ai-examples There, it says: Elyra utilizes S3-compatible cloud storage to make data available Containers in which the notebooks or scripts are executed don't share a file system. Elyra utilizes S3-compatible cloud storage to facilitate the transfer of files from the JupyterLab environment to the containers and between containers. I suggest also writing in elyra-ai-examples: By default, Elyra also puts the STDOUT and STDERR output into a file when env var ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3 is set to true, which is the default. If you prefer to only use S3-compatible storage for transfer of files and not for logging information, either set env var ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3 to false in runtime container builds or pass that env value explicitely in the env section of the pipeline editor, either at Pipeline Properties - Generic Node Defaults - Environment Variables or at What do you think? I could not find a good page for mentioning this here in documentation folder. |
ffe0dac
to
2f3934b
Compare
22b1ec2
to
c3eba17
Compare
@RHRolun requesting review and comments from you as well since you were on this before. |
c6f30ee
to
6bab86d
Compare
2ba4041
to
bfb1f70
Compare
integration test failed, some yarn error that previously was not there. Wonder what the reason is https://github.com/elyra-ai/elyra/actions/runs/10421458537/job/28863741895 i.e. Update: did a rebase off of latest main and it worked again. |
cd28239
to
68b25a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shalberd - this seems like a good addition. My only comment is regarding the use of print
and it appears that the statement print("Processing file:", r_script)
should be replaced with an appropriate logger method throughout. (I added comments to the first two instances, but there are several others.) Given you're using info
for other "success" logging, that seems like a good replacement.
Thanks.
fe45107
to
9380e13
Compare
…t appears in system logs, catch non zero return/exit code in exception, no logging of output to file in S3 COS if env var ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3 is set to false Signed-off-by: shalberd <[email protected]>
737be92
to
cdd4c34
Compare
@kevin-bates we never figured out why .log file output was written to S3 storage at all, so we kept in on by default in the new env variable. I mean, outputs and output file exchange on one pipeline step via s3 to another pipeline step are all clear conceptually, but putting .log files to s3 for a pipeline task was not that clear, the reasoning. I for myself will put the env var to false in our runtime images and not put log files with stdout to S3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
waiting for @harshad16 using this in a kfp runtime image and will let us know here, after his hopefully positive review, we can merge. I tested this with Airflow 2.x and IBM COS. Can confirm that no .log files are written to S3 when the env variable ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3 is set to false, in my case in the runtime image directly, or in the env definition of the tasks globally or per task. Also, task status is now at red when exit code 1 (failed), as it should be, in Airflow. So both logging, conditional S3 bucket .log file storage, as well as task status in Airflow are working as intended. Tested this with a build of Elyra dev from main, installed from whl file elyra-3.16.0.dev0-py3-none-any.whl in Jupyterlab 3.6.5, Python 3.9, image based on PR 3240 Used Elyra together with Gitlab (not Github). Runtime image for the tasks contains what is in requirements-elyra, ran fine, with both python file and jupyter notebook ipynb based tasks. |
thank you @harshad16 is the env variable approach for putting logs to S3 okay for your team, with the default being true, i.e. putting logs to S3 if env variable not defined? I think I saw code on your branch and fork that commented out S3 log submit to S3, what I made conditional on env value here. |
@shalberd could you also create a quick pr to update the docs |
@harshad16 @kevin-bates @lresende docs added in examples repo when S3 is mentioned: See PR elyra-ai/examples#123 Unsure if I also need to mention this PR at 4.0 already in this repo at https://github.com/elyra-ai/elyra/blob/main/docs/source/getting_started/changelog.md |
I meant, to document the new env var in https://github.com/elyra-ai/elyra/tree/main/docs, I believe we added a few ones without proper documentation. |
I don't know where to put those aspects in the regular documentation. I added a new page. Please review. |
RFileOp and PythonFileOp log stdout and stderr output to stdout so it appears in system logs, catch non zero return/exit code in exception, no logging of output to file in S3 COS if env var ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3 is set to false Signed-off-by: shalberd <[email protected]>
/fixes #3222
Change as it is now already does solve the issue of no logs appearing in Airflow or KFP for R and Python and iPython Notebook generic pipeline nodes. Also, success or failed (exit code) status of any file run now ripples through correctly to Airflow GUI and task and dag metadata.
What changes were proposed in this pull request?
Hello! I have a problem with my elyra pipelines not showing logs.
I have looked for issues on github but couldn't find anything related, and i am not sure if I am missing some needed config or if this is a bug.
When I run an elyra pipeline with two nodes, one a jupyter notebook and one a python script, only the jupyter notebook one shows the logging. I am not sure why this is the case as they are both managed the same way.
This error happens in both kfp and airflow pipeline runs. It is related to the bootstrapper.py file that executes the scripts behind generic pipeline nodes, in this case pythonFileOp and RFileOp have similar syntax, subprocess.run, in both kfp and airflow bootstrapper.py
https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L521
https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L332
i.e.
with open(r_script_output, "w") as log_file: subprocess.run(run_args, stdout=log_file, stderr=subprocess.STDOUT, check=True)
We only see the name of the processed file in stdout and no logs in case of python or r nodes.
This change uses the subprocess.run mechanism in such as way that all stderr and stdout output is written to a variable whose content (i.e. the script run output) is then logged.
Added new env variable for runtime tasks
ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3
with defaulttrue
if not set explicitely.For users that do not want log output in a file on S3, those can set the env to false.
How was this pull request tested?
Added changed bootstrapper.py file for airflow to a runtime image and used it in conjunction with Elyra and Airflow 2.8.2 to see whether the logs appear now in the container log output / in Airflow Logs tab.
Results positive and as intended, documented here
#3222 (comment)
There was an issue with log output for ipynb not showing in all cases, only when log level was set to debug.
That has been fixed here in papermill.execute as well.
Developer's Certificate of Origin 1.1