Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Assistants] trending feature #938

Merged
merged 14 commits into from Mar 28, 2024
Merged

[Assistants] trending feature #938

merged 14 commits into from Mar 28, 2024

Conversation

mishig25
Copy link
Collaborator

@mishig25 mishig25 commented Mar 18, 2024

Description

There is a new field assistant.last24HoursCount that will be used to sort assistants in trending manner.

last24HoursCount:{
	count: 3356, // total number of requests an assistant received over the last 24 hours
	byHour: {
		0: 567, // number of requests an assistant received in the last 24 hours from 01:00-02:00
		1: 928, // number of requests an assistant received in the last 24 hours from 02:00-03:00
		2: 734,
		...
		23: 374, // number of requests an assistant received in the last 24 hours from 23:00-24:00
	}
}

The trick is: we created a 24-hour window rolling counter for assistants by running the function below every hour:

const hourExpired = new Date().getUTCHours() + 1;
// then in mongo database
assistant.last24HoursCount.count -= assistant.last24HoursCount[hourExpired]; // 24-hour window rolling counter
assistant.last24HoursCount[hourExpired] = 0; // reset

using semaphore/lock/migration helpers from #897 to make sure that this periodic function runs correctly & runs only on one instance (no double running)

Screenshots

Desktop Mobile
Screenshot 2024-03-18 at 11 46 02 AM Screenshot 2024-03-18 at 11 46 12 AM

Deployment plan

  1. Get this PR approved
  2. Hide the sort assistants UX/frontend while the data is being collected
  3. Deploy
  4. Enable/show sort assistants UX/frontend

@mishig25 mishig25 force-pushed the feature_trending_assistants branch 2 times, most recently from b12de57 to 9ee5861 Compare March 18, 2024 11:54
@mishig25 mishig25 marked this pull request as ready for review March 18, 2024 11:54
Copy link
Member

@coyotte508 coyotte508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

  • You can use a dedicated collection for assistant analytics, with a unique key on date (truncated to the hour) + assistant's _id
await collections.assistantStats.updateOne({
  // or `"date.at": startOfHour(new Date()) - with `date-fns` 
  "date.at": {$dateTrunc: {date: new Date(), unit: "hour"}},
  assistantId: ...
}, {
  $inc: {count: 1}
}, {upsert: true});

This will be more robust

And periodically you can run a $merge aggregation query:

await collections.assistantStats.aggregate([..., {$merge: {into: "assistants"}})
await collections.assistants.updateMany({
  "last24HoursCount.count": {$gt: 0},
  "last24HoursCount.updatedAt": ...
}, {$set: {"last24HoursCount.count": 0}});
  • The "cron" shouldn't be in "migrations.ts". There should maybe be an env var to enable it. And you can reuse the collections exported from src/lib/server/database.ts, no need to create a separate DB connection.

eg

setInterval(refreshAssistantsStats, 3600_000)

async function refreshAssistantsStats() {
  if (!cantLockDb()) {
    return;
  }

  // run DB queries
}

Copy link
Collaborator

@gary149 gary149 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for the UI

@nsarrazin nsarrazin added the assistants Related to the assistants feature label Mar 19, 2024
@mishig25 mishig25 force-pushed the feature_trending_assistants branch from f59003f to be1766e Compare March 26, 2024 09:10
Comment on lines 7 to 32
const migration: Migration = {
_id: new ObjectId("65f7ff14298f30c5060eb6ac"),
name: "Refresh assistants count for last 24 hours",
// eslint-disable-next-line no-shadow
up: async (client) => {
const { assistantStats } = getCollections(client);
const currentDate = new Date();
const twentyFourHoursAgo = new Date();
twentyFourHoursAgo.setDate(twentyFourHoursAgo.getDate() - 1); // subtract 24 hours (i.e. 1 day)

assistantStats.aggregate([
{ $match: { dateDay: { $gte: twentyFourHoursAgo, $lt: currentDate } } },
{
$group: {
_id: { _id: "$_id", dateWithHour: "$dateWithHour" },
totalCount: { $sum: "$count" },
},
},
{
$merge: {
into: "assistants",
on: "_id",
whenMatched: [
{
$set: {
"last24HoursCount.count": "$$ROOT.totalCount",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to put it in a Migration anymore?

Just to be clear, let's not mix up cron jobs & migrations in the migrations collection

I don't think we need to store cron job results in DB, but if we do, probably a separate collection

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in assumption that multiple machines are running chat-ui (and the call below might be called my multiple machines within a same period). therefore, we would need to use Migration & locks. No ?

refreshAssistantsCounts();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locks yes, but no need for Migration

We can create a lock on refresh-assistant-counts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handled in 0803740

  1. Removed usage of Migration
  2. Also set those lines below to make sure that different machines will try to run the refresh "cron" job at the same time (i.e. synced even if the machines start at different times)

const now = new Date();
const nextHour = startOfHour(addHours(now, 1));
const delay = nextHour.getTime() - now.getTime();
const ONE_HOUR_MS = 3_600_000;
// wait until the next hour to start the hourly interval
setTimeout(() => {
refreshAssistantsCountsHelper();
setInterval(refreshAssistantsCountsHelper, ONE_HOUR_MS);

@mishig25
Copy link
Collaborator Author

based on #938 (review), I've committed 93c9ebc

Description:

  1. Created a new collection called AssistantStats. Fields are: _id: Assistant["_id"], dateWithHour: Date, count: number
  2. dateWithHour is a date field. However, the value should have minutes, seconds, ms truncated. let date = new Date(); date.setUTCMinutes(0, 0, 0) so that we can use this field dateWithHour to increment count field everytime an assistant gets a visit within same hour
  3. Every hour, there is a "cron-like" aggregation query that does: for the last 24 hours, group on _id: Assistant["_id"] & sum count: number and afterwards using $merge.whenMatched pipeline update assistant.last24HoursCount field

@coyotte508
Copy link
Member

Did several backend changes

  • For stats collection, the _id shouldn't be assistantId (otherwise we can create only one document per assistant, while we should have 1 / hour / assistant)
  • added date.at and date.span fields to be consistent with other stat collections - also in case we want to be more granular in the future (instead of 1 doc/hour, 1 doc/10 minutes for example)
  • Reworked the locking mechanism
    • Made lockKey mandatory
    • Return the lockId, to make sure we return / release a lock we own
  • Maintain the lock for the cron job during the whole process's lifespan - no need to sync job start
  • Changed the aggregation to use $unionWith - a nice trick originating from @Kakulukian I think. No need for a lastUpdatedAt field with this!
  • Fixed a bug on usage limits for assistant creation

Maybe the lock mechanism (maintainLock) can be more standardized if we have more cron jobs

@mishig25
Copy link
Collaborator Author

@nsarrazin wdyt ?

@nsarrazin
Copy link
Collaborator

Nice! Doing one last pass

Copy link
Collaborator

@nsarrazin nsarrazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! If I can help with the deployment let me know 😁

@mishig25
Copy link
Collaborator Author

checking one more time from my end & will ping you on deployment

@mishig25
Copy link
Collaborator Author

as written in the dscription

Deployment plan

  1. Get this PR approved
  2. Hide the sort assistants UX/frontend while the data is being collected
  3. Deploy
  4. Enable/show sort assistants UX/frontend

I've hidden the UI 4dd7e65

Ready to merge & deploy @nsarrazin

@nsarrazin nsarrazin merged commit a4c3fca into main Mar 28, 2024
3 checks passed
@nsarrazin nsarrazin deleted the feature_trending_assistants branch March 28, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assistants Related to the assistants feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants