Skip to content

Function definition in Variable_List breaks PBS scheduler job parsing #7395

@SeanBryan51

Description

@SeanBryan51

Describe the bug

When working with some PBS based schedulers, the Variable_List field produced by qstat -f may contain a bash function definition which will cause the job parsing to throw the exception aiida.schedulers.scheduler.SchedulerParsingError: There are lines without equals sign..

In my specific case, the presence of various functions in the login environment was unavoidable for my HPC, causing them to appear in the Variable_List field.

This issue seems to have also been raised here: https://aiida.discourse.group/t/aiida-not-getting-number-of-jobs-from-scheduler/176/3

Steps to reproduce

Running the following script reproduces the issue:

import aiida.plugins

PbsproScheduler = aiida.plugins.SchedulerFactory("core.pbspro")

retval = 0
stderr = ""
stdout = """Job Id: 68350.mycluster
    Variable_List = BASH_FUNC_my_func%%=() {  if true; then
 echo foo; else
 echo bar; fi
\t}

"""

scheduler = PbsproScheduler()
job_list = scheduler._parse_joblist_output(retval, stdout, stderr)

A more realistic test can be run with the following steps:

  1. Login into PBS Pro based cluster and export the following function in ~/.bashrc or ~/.bash_profile :

    my_func() {
        if true; then
            echo foo
        else
            echo bar
        fi
    }
    export -f my_func
    
  2. Submit a job either through AiiDA or directly with qsub -V ... ensuring the login environment is exported to the job.

  3. Run the following python script immediately after submitting to the queue to test the PBS job parser (replacing <computer> with a computer configured with a PBS based scheduler). This should reproduce the exception

import aiida
import aiida.orm
import aiida.plugins

PbsproScheduler = aiida.plugins.SchedulerFactory("core.pbspro")

_ = aiida.load_profile()

user = aiida.orm.User.collection.get_default()
computer = aiida.orm.load_computer("<computer>")
authinfo = computer.get_authinfo(user)
transport = authinfo.get_transport()
scheduler = PbsproScheduler()

with transport:
    scheduler.set_transport(transport)
    retval, stdout, stderr = transport.exec_command_wait(scheduler._get_joblist_command())

job_list = scheduler._parse_joblist_output(retval, stdout, stderr)

Expected behavior

No exception should be raised.

Your environment

  • Operating system: Linux
  • Python version: Python 3.10.20
  • aiida-core version: AiiDA version 2.8.0

Additional context

The exception seems to only occur if:

  1. The bash function definition is multiple lines in length
  2. The parser is run while the job is still in the queue

My current work-around is to override the parsing behaviour using a custom scheduler plugin which leverages the -F json flag to parse the job output as json. Would this be a useful contribution to make to aiida-core? If so, I am more than happy to open a pull request.

Thanks 🙂

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions