Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible lock starvation in lock conflict scenerio #1126

Open
skmprabhu252 opened this issue May 6, 2024 · 2 comments
Open

possible lock starvation in lock conflict scenerio #1126

skmprabhu252 opened this issue May 6, 2024 · 2 comments
Labels
Analyzing Need Info Need more information from the reporter

Comments

@skmprabhu252
Copy link

Lets say

  1. Process-1 takes a blocking read lock and goes into sleep for 5 seconds.
  2. Process-2 requests a blocking write lock, and the lock will be added to the blocked lock list since Ganesha finds a conflicting lock in the internal structure. The sbd_grant_type is set to STATE_GRANT_INTERNAL.
  3. Process-3 continuously sends read lock requests and unlocks in a loop.
  4. Process-4 from NFS client-2 continuously sends read lock requests and unlocks in a loop.

The problem is that the write lock request from Process-2 may starve for a long time if there is continuous conflict in the FSA. I suggest considering upgrading STATE_GRANT_INTERNAL to STATE_GRANT_POLL after the first failure.

In try_to_grant_lock()

                blocked = lock_entry->sle_blocked;
                lock_entry->sle_blocked = STATE_GRANTING;
                if (lock_entry->sle_block_data->sbd_grant_type ==
                    STATE_GRANT_NONE)
                        lock_entry->sle_block_data->sbd_grant_type =
                            STATE_GRANT_INTERNAL;

                status = call_back(lock_entry->sle_obj,
                                   lock_entry);

                if (status == STATE_LOCK_BLOCKED) {
                        /* The lock is still blocked, restore it's type and
                         * leave it in the list.
                         */
                        lock_entry->sle_blocked = blocked;
                        lock_entry->sle_block_data->sbd_grant_type =
                                                        STATE_GRANT_NONE;
                        LogEntry("Granting callback left lock still blocked",
                                 lock_entry);
                        return;
                }

In the above code, I guess we need to change

                        lock_entry->sle_block_data->sbd_grant_type =
                                                        STATE_GRANT_NONE;

to

                        lock_entry->sle_block_data->sbd_grant_type =
                                                        STATE_GRANT_POLL;
@ffilz
Copy link
Member

ffilz commented May 6, 2024

Hmm, I'd have to think about this. We haven't really addressed lock fairness and there is a question how much energy to put into NFSv3. What does GPFS itself do with this scenario? Also, is this an actual customer scenario or just a torture test that QA came up with?

@ffilz ffilz added Analyzing Need Info Need more information from the reporter labels May 6, 2024
@skmprabhu252
Copy link
Author

This is a customer scenario where performance is degraded due to frequent lock conflicts. Ganesha requests a non-blocking lock instead of a blocking lock, causing GPFS FSAL to return an error. If it were a blocking lock, the request might be queued and granted with priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analyzing Need Info Need more information from the reporter
Projects
None yet
Development

No branches or pull requests

2 participants