Skip to content

Commit

Permalink
feat: hydrater la table du dernier évènement connu pour un email à pa…
Browse files Browse the repository at this point in the history
…rtir des évènements passés (#896)

## Description

🎸 Collecter les `User.last_login`, `Event`, `DSP`, `UpVote`,
`ForumRating`, `Post` anonymes et authentifiés, `Notification` visitées
(#891)
🎸 dédupliquer les emails en gardant l'evènement le plus récent
🎸 ignorer les emails déjà enregistrés dans `EmailLastSeen`, en
considerant que l'enregistrement dans `EmailLastSeen` est le plus récent
🎸 Insérer l'ensemble dans `EmailLastSeen`

## Type de changement

🚧 technique

### Points d'attention

🦺 `test_collect_clicked_notifs` casse par principe en attendant #891 
🦺 prérequis #892 & #894 

### simulation sur données du 27 janvier 2025

```
$ python manage.py populate_emaillastseen
INFO 2025-02-11 11:20:09,215 ./lacommunaute/users/management/commands/populate_emaillastseen.py : starting to populate EmailLastSeen table
INFO 2025-02-11 11:20:09,215 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing users
INFO 2025-02-11 11:20:09,493 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 19353 emails
INFO 2025-02-11 11:20:09,577 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 18353
INFO 2025-02-11 11:20:09,693 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 17353
INFO 2025-02-11 11:20:09,777 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 16353
INFO 2025-02-11 11:20:09,862 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 15353
INFO 2025-02-11 11:20:09,947 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 14353
INFO 2025-02-11 11:20:10,095 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 13353
INFO 2025-02-11 11:20:10,454 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 12353
INFO 2025-02-11 11:20:10,739 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 11353
INFO 2025-02-11 11:20:10,838 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 10353
INFO 2025-02-11 11:20:10,975 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 9353
INFO 2025-02-11 11:20:11,075 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 8353
INFO 2025-02-11 11:20:11,172 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 7353
INFO 2025-02-11 11:20:11,271 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 6353
INFO 2025-02-11 11:20:11,369 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 5353
INFO 2025-02-11 11:20:11,464 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 4353
INFO 2025-02-11 11:20:11,553 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 3353
INFO 2025-02-11 11:20:11,647 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 2353
INFO 2025-02-11 11:20:11,743 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 1353
INFO 2025-02-11 11:20:11,873 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 353
INFO 2025-02-11 11:20:11,912 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:11,912 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing events
INFO 2025-02-11 11:20:11,921 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 309 emails
INFO 2025-02-11 11:20:11,937 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:11,937 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing DSP
INFO 2025-02-11 11:20:11,994 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 3087 emails
INFO 2025-02-11 11:20:12,085 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 2087
INFO 2025-02-11 11:20:12,179 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 1087
INFO 2025-02-11 11:20:12,278 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 87
INFO 2025-02-11 11:20:12,298 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:12,298 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing upvotes
INFO 2025-02-11 11:20:12,319 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 837 emails
INFO 2025-02-11 11:20:12,367 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:12,367 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing forum ratings
INFO 2025-02-11 11:20:12,380 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 236 emails
INFO 2025-02-11 11:20:12,409 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:12,409 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing posts
INFO 2025-02-11 11:20:12,562 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 4777 emails
INFO 2025-02-11 11:20:12,654 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 3777
INFO 2025-02-11 11:20:12,741 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 2777
INFO 2025-02-11 11:20:12,880 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 1777
INFO 2025-02-11 11:20:13,019 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 777
INFO 2025-02-11 11:20:13,077 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:13,077 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing clicked notifications
INFO 2025-02-11 11:20:13,082 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 82 emails
INFO 2025-02-11 11:20:13,100 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:13,100 ./lacommunaute/users/management/commands/populate_emaillastseen.py : that's all folks!
```
  • Loading branch information
vincentporte authored Feb 12, 2025
1 parent 9aa316c commit 28c4cb2
Show file tree
Hide file tree
Showing 2 changed files with 311 additions and 0 deletions.
154 changes: 154 additions & 0 deletions lacommunaute/users/management/commands/populate_emaillastseen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
from itertools import batched
from logging import getLogger

from django.core.management.base import BaseCommand
from django.db.models import Case, F, Value, When

from lacommunaute.event.models import Event
from lacommunaute.forum.models import ForumRating
from lacommunaute.forum_conversation.models import Post
from lacommunaute.forum_upvote.models import UpVote
from lacommunaute.notification.models import Notification
from lacommunaute.surveys.models import DSP
from lacommunaute.users.enums import EmailLastSeenKind
from lacommunaute.users.models import EmailLastSeen, User


logger = getLogger("commands")


def collect_users_logged_in():
qs = (
User.objects.exclude(last_login=None)
.annotate(last_seen_kind=Value(EmailLastSeenKind.LOGGED))
.values_list("email", "last_login", "last_seen_kind")
)
return tuple(qs)


def collect_event():
qs = (
Event.objects.all()
.annotate(last_seen_kind=Value(EmailLastSeenKind.LOGGED))
.values_list("poster__email", "created", "last_seen_kind")
)
return tuple(qs)


def collect_DSP():
qs = (
DSP.objects.all()
.annotate(last_seen_kind=Value(EmailLastSeenKind.LOGGED))
.values_list("user__email", "created", "last_seen_kind")
)
return tuple(qs)


def collect_upvote():
qs = (
UpVote.objects.exclude(voter=None)
.annotate(last_seen_kind=Value(EmailLastSeenKind.LOGGED))
.values_list("voter__email", "created_at", "last_seen_kind")
)
return tuple(qs)


def collect_forum_rating():
qs = (
ForumRating.objects.exclude(user=None)
.annotate(last_seen_kind=Value(EmailLastSeenKind.LOGGED))
.values_list("user__email", "created", "last_seen_kind")
)
return tuple(qs)


def collect_post():
qs = Post.objects.annotate(
email=Case(
When(poster__isnull=False, then=F("poster__email")),
When(poster__isnull=True, then=F("username")),
default=Value(""),
),
kind=Value(EmailLastSeenKind.POST),
).values_list("email", "created", "kind")
return tuple(qs)


def collect_clicked_notifs():
qs = (
Notification.objects.exclude(visited_at=None)
.annotate(last_seen_kind=Value(EmailLastSeenKind.VISITED))
.values_list("recipient", "visited_at", "last_seen_kind")
)
return tuple(qs)


def collect_existing_email_last_seen(emails):
qs = EmailLastSeen.objects.filter(email__in=emails).values_list("email", "last_seen_at", "last_seen_kind")
return tuple(qs)


def keep_most_recent_tuple(last_seen):
return {tup[0]: tup for tup in sorted(last_seen, key=lambda tup: (tup[0], tup[1]))}


def insert_last_seen(emails):
obj = [
EmailLastSeen(email=email, last_seen_at=last_seen_at, last_seen_kind=last_seen_kind)
for email, last_seen_at, last_seen_kind in emails.values()
]
return EmailLastSeen.objects.bulk_create(
obj,
update_fields=["last_seen_at", "last_seen_kind"],
update_conflicts=True,
unique_fields=["email"],
batch_size=1000,
)


def iterate_over_emails(collected_emails, size=1000):
logger.info("will process %s emails", len(collected_emails))

proceeded = 0
for batch_emails in batched(collected_emails, size):
existing_emails = collect_existing_email_last_seen([email for email, _, _ in batch_emails])
most_recent = keep_most_recent_tuple(batch_emails + existing_emails)
insert_last_seen(most_recent)
logger.info("proceeded %s emails", proceeded := proceeded + len(batch_emails))


class Command(BaseCommand):
help = "hydratation de la table EmailLastSeen avec la date de dernière visite des emails connus"

def handle(self, *args, **options):
logger.info("starting to populate EmailLastSeen table")

logger.info("processing users")
users = collect_users_logged_in()
iterate_over_emails(users)

logger.info("processing events")
events = collect_event()
iterate_over_emails(events)

logger.info("processing DSP")
dsp = collect_DSP()
iterate_over_emails(dsp)

logger.info("processing upvotes")
upvotes = collect_upvote()
iterate_over_emails(upvotes)

logger.info("processing forum ratings")
forum_ratings = collect_forum_rating()
iterate_over_emails(forum_ratings)

logger.info("processing posts")
posts = collect_post()
iterate_over_emails(posts)

logger.info("processing clicked notifications")
clicked_notifs = collect_clicked_notifs()
iterate_over_emails(clicked_notifs)

logger.info("that's all folks!\n")
157 changes: 157 additions & 0 deletions lacommunaute/users/tests/tests_management_commands.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
from datetime import datetime

from dateutil.relativedelta import relativedelta
from django.core.management import call_command
from django.utils import timezone

from lacommunaute.event.factories import EventFactory
from lacommunaute.forum.factories import ForumFactory, ForumRatingFactory
from lacommunaute.forum_conversation.factories import AnonymousTopicFactory, TopicFactory
from lacommunaute.forum_upvote.factories import UpVoteFactory
from lacommunaute.notification.factories import NotificationFactory
from lacommunaute.surveys.factories import DSPFactory
from lacommunaute.users.enums import EmailLastSeenKind
from lacommunaute.users.factories import EmailLastSeenFactory, UserFactory
from lacommunaute.users.management.commands.populate_emaillastseen import (
collect_clicked_notifs,
collect_DSP,
collect_event,
collect_existing_email_last_seen,
collect_forum_rating,
collect_post,
collect_upvote,
collect_users_logged_in,
insert_last_seen,
iterate_over_emails,
keep_most_recent_tuple,
)
from lacommunaute.users.models import EmailLastSeen


def test_collect_users_logged_in(db):
logged_user = UserFactory(last_login=timezone.make_aware(datetime(2024, 10, 22)))
UserFactory(last_login=None)
assert collect_users_logged_in() == ((logged_user.email, logged_user.last_login, EmailLastSeenKind.LOGGED),)


def test_collect_event(db):
event = EventFactory()
assert collect_event() == ((event.poster.email, event.created, EmailLastSeenKind.LOGGED),)


def test_collect_DSP(db):
dsp = DSPFactory()
assert collect_DSP() == ((dsp.user.email, dsp.created, EmailLastSeenKind.LOGGED),)


def test_upvote(db):
upvote = UpVoteFactory(content_object=ForumFactory(), voter=UserFactory())
assert collect_upvote() == ((upvote.voter.email, upvote.created_at, EmailLastSeenKind.LOGGED),)


def test_forum_rating(db):
ForumRatingFactory(user=None)
forum_rating = ForumRatingFactory(user=UserFactory())
assert collect_forum_rating() == ((forum_rating.user.email, forum_rating.created, EmailLastSeenKind.LOGGED),)


def test_collect_post(db):
topic = TopicFactory(with_post=True)
anonymous_topic = AnonymousTopicFactory(with_post=True)

assert collect_post() == (
(topic.first_post.poster.email, topic.first_post.created, EmailLastSeenKind.POST),
(anonymous_topic.first_post.username, anonymous_topic.first_post.created, EmailLastSeenKind.POST),
)


def test_collect_clicked_notifs(db):
notification = NotificationFactory(visited_at=timezone.now())

assert collect_clicked_notifs() == ((notification.recipient, notification.visited_at, EmailLastSeenKind.VISITED),)


def test_collect_existing_email_last_seen(db):
emails = ["[email protected]", "[email protected]", "[email protected]"]
for email in emails[:2]:
EmailLastSeenFactory(email=email)

collected = list(collect_existing_email_last_seen(emails[1:]))
expected = list(
EmailLastSeen.objects.filter(email=emails[1]).values_list("email", "last_seen_at", "last_seen_kind")
)
assert collected == expected


def test_keep_most_recent_tuple():
emails = ["[email protected]", "[email protected]"]
now = timezone.now()
tuples = [
(emails[0], now, EmailLastSeenKind.LOGGED),
(emails[0], now - relativedelta(days=5), EmailLastSeenKind.VISITED),
(emails[0], now + relativedelta(days=10), EmailLastSeenKind.POST),
(emails[1], now, EmailLastSeenKind.LOGGED),
]
expected = {
emails[0]: tuples[2],
emails[1]: tuples[3],
}
assert keep_most_recent_tuple(tuples) == expected


def test_insert_last_seen(db):
emails = ["[email protected]", "[email protected]"]
EmailLastSeenFactory(email=emails[0], last_seen_kind=EmailLastSeenKind.VISITED)
datas_to_insert = {
emails[0]: (emails[0], timezone.now(), EmailLastSeenKind.LOGGED),
emails[1]: (emails[1], timezone.now(), EmailLastSeenKind.POST),
}
insert_last_seen(datas_to_insert)

assert EmailLastSeen.objects.count() == 2
assert EmailLastSeen.objects.filter(email=emails[0], last_seen_kind=EmailLastSeenKind.LOGGED).exists()
assert EmailLastSeen.objects.filter(email=emails[1], last_seen_kind=EmailLastSeenKind.POST).exists()


def test_iterate_over_emails(db):
size = 2
EventFactory.create_batch(size * 2 + 1)
iterate_over_emails(collect_event(), size=size)
assert EmailLastSeen.objects.count() == size * 2 + 1


def test_populate_emaillastseen_command(db):
user = UserFactory(last_login=timezone.make_aware(datetime(2024, 10, 22)))
event = EventFactory()
dsp = DSPFactory()
upvote = UpVoteFactory(content_object=ForumFactory(), voter=UserFactory())
forum_rating = ForumRatingFactory(user=UserFactory())
topic = TopicFactory(with_post=True)
anonymous_topic = AnonymousTopicFactory(with_post=True)
clicked_notification = NotificationFactory(visited_at=timezone.now())

# existing email last seen
EmailLastSeenFactory(email=dsp.user.email)

# duplicated email
EventFactory(poster__email=event.poster.email)

call_command("populate_emaillastseen")

assert EmailLastSeen.objects.count() == 8
assert EmailLastSeen.objects.filter(email=user.email, last_seen_kind=EmailLastSeenKind.LOGGED).exists()
assert EmailLastSeen.objects.filter(email=event.poster.email, last_seen_kind=EmailLastSeenKind.LOGGED).exists()
assert EmailLastSeen.objects.filter(email=dsp.user.email, last_seen_kind=EmailLastSeenKind.LOGGED).exists()
assert EmailLastSeen.objects.filter(email=upvote.voter.email, last_seen_kind=EmailLastSeenKind.LOGGED).exists()
assert EmailLastSeen.objects.filter(
email=forum_rating.user.email, last_seen_kind=EmailLastSeenKind.LOGGED
).exists()
assert EmailLastSeen.objects.filter(
email=topic.first_post.poster.email, last_seen_kind=EmailLastSeenKind.POST
).exists()
assert EmailLastSeen.objects.filter(
email=anonymous_topic.first_post.username, last_seen_kind=EmailLastSeenKind.POST
).exists()
assert EmailLastSeen.objects.filter(
email=clicked_notification.recipient, last_seen_kind=EmailLastSeenKind.VISITED
).exists()

0 comments on commit 28c4cb2

Please sign in to comment.