Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mutex for app token refresh #22457

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

Add mutex for app token refresh #22457

wants to merge 32 commits into from

Conversation

licitdev
Copy link
Member

@licitdev licitdev commented May 10, 2024

Scope

What's changed:

  • Added a mutex using local storage to prevent concurrent app token refreshes
  • expiresMs is added in case of browser crashes

Potential Risks / Drawbacks

  • Potential but small risk of concurrent refreshes the random sleep duration clashes, resulting in the mutex being read at the same nanosecond 👀

Review Notes / Questions

  • This PR does not address tokens that were already invalidated
  • Unit tests have not been added yet, feel free to add if necessary 🙏

Fixes #22360

Copy link

changeset-bot bot commented May 10, 2024

🦋 Changeset detected

Latest commit: 333ec52

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@directus/app Patch
@directus/api Patch
directus Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@licitdev licitdev marked this pull request as draft May 10, 2024 11:23
@licitdev licitdev marked this pull request as ready for review May 10, 2024 14:01
@licitdev licitdev changed the title Add localstorage mutex for app token refresh Add mutex for app token refresh May 10, 2024
@br41nslug br41nslug self-assigned this May 10, 2024
hanneskuettner
hanneskuettner previously approved these changes May 14, 2024
Copy link
Contributor

@hanneskuettner hanneskuettner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎸

I've still been able to provoke an invalid token error when using the localStorage fallback implementation, that was however only one time, after triggering ~100 refreshes very quickly by pulling a tab in an out of the Chrome browser. So I would assume this is super unlikely to be an issue in peoples setup.

.changeset/clever-jars-enjoy.md Outdated Show resolved Hide resolved
app/src/composables/use-mutex.ts Outdated Show resolved Hide resolved
app/src/composables/use-mutex.ts Show resolved Hide resolved
app/src/composables/use-mutex.ts Outdated Show resolved Hide resolved
app/src/composables/use-mutex.ts Outdated Show resolved Hide resolved
const internalKey = `directus-mutex-${key}`;
const useWebLock = !!navigator.locks;

async function acquireMutex(callback: (lock?: Lock | null) => Promise<any>): Promise<any> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a timeout if the lock should not be released for some reason, either here in useMutex (use expiresMs?) or on caller side

https://developer.mozilla.org/en-US/docs/Web/API/LockManager/request#signal_example

Copy link
Member

@br41nslug br41nslug May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added an abort controller but in my tests it was causing more issues than it was solving. The issue i was seeing is that the "timeout to abort" is dependent on how many tabs you have open, if we set a timeout of 5 seconds and the first tab takes 2.5 seconds to refresh then 2 tabs will refresh and any beyond that will logout having passed the 5second timeout, ending up in the error state but is recoverable if you refresh on the login screen (that however is not intuitive i think). I have attempted to create a situation where the lock was not released but could only manage to do by adding a non-resolving promise in the code, the lock is released when the refresh request errors, times out or finishes.

For this reason and because I was able to confirm that a page refresh clears active locks (in the weblocks API) I think simply letting this time out will be a better user experience in the end.

app/src/composables/use-mutex.ts Outdated Show resolved Hide resolved
br41nslug

This comment was marked as outdated.

@br41nslug br41nslug requested a review from paescuj May 14, 2024 16:53
@br41nslug
Copy link
Member

@paescuj mind having a last look with the feedback resolved?

Copy link
Member Author

@licitdev licitdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I had started to be logged out very easily with the navigator.locks method. I was testing with 4 open tabs with concurrent fresh restarts from HMR by saving auth.ts.

I've tried adding some sleep in the callback for acquireMutex(), and it seems like there's some race condition elsewhere from a fresh full reload.

Lastly, merged PRs have been updated into this PR, which might be the reason for the behavioural change.

🔍🔍🔍

@br41nslug
Copy link
Member

br41nslug commented May 14, 2024

Similar behavior in production mode, with pinning 4 or more tabs

Edit: I found a "fix" not a perfect one but am curious what you think 🤔
By not refreshing tabs that are not currently visible we can greatly reduce the reliance on the locking mechanisms with larger numbers as this results in only 1 visible tab per window trying to acquire a lock
app/src/auth.ts
image
app/src/idle.ts
image

With these 2 changes the only the active tabs acquire locks and refresh, the inactive tabs will got to the "user is currently authenticated as" login screen.
Demo with 2 windows of 4 tabs:

tMWuHluVLu.mp4

My explanation for why this is happening with a lot of tabs is that each tab will try to refresh the token in the cookie in a waterfall fashion (if the locks are doing their job) however once the lock is released we start hydrating with a bunch of authenticated requests using that cookie so if while hydrating the next tab refreshes the token there is a short window where in-flight requests suddenly have a token with the old session reference.

With this in mind an alternative to my above suggested solution could be to acquire the lock for the full hydration, however thats not a perfect fix either as any authenticated request could theoretically race with a refresh from another tab 🤔

@hanneskuettner
Copy link
Contributor

hanneskuettner commented May 15, 2024

My explanation for why this is happening with a lot of tabs is that each tab will try to refresh the token in the cookie in a waterfall fashion (if the locks are doing their job) however once the lock is released we start hydrating with a bunch of authenticated requests using that cookie so if while hydrating the next tab refreshes the token there is a short window where in-flight requests suddenly have a token with the old session reference.

I can definitely see this being the reason for the log-outs. Good call!

Another possible solution could be to, instead of storing the current authentication state and expiry duration in the app, to store in local (or session) storage instead. This would prevent each tab/window from even having to do a request to /auth/refresh on the first refresh. This could be set and checked in the locked part of the refresh logic and short-circuit if another tab populated the info. What do you think about that? I personally would not consider the auth state and the expiry time to be sensitive info, so storing it in session storage is fine by me, but that is mostly a gut feeling.

@licitdev
Copy link
Member Author

@hanneskuettner Yes! I had the same thinking to use localstorage for the expiry as the other requests. As a bunch of "refresh" requests might be queued up, I believe this "skip refresh if fresh" logic needs to be added within the callback passed to acquireMutex().

On the other hand, I think we would need to clear the session cookie only for auth routes instead of all routes from #22459. 🤔

Example:

browser A -> refresh
browser B -> fetches data
browser A -> gets back new session
browser B -> errors as the session is incorrect and the session cookie is cleared

@alexchopin alexchopin added this to the Next Minor Release milestone May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

Race condition on refresh with multiple tabs
5 participants