-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for 116 #236
Fix for 116 #236
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a question.
if (!emails.contains(contact.email)) { | ||
def existingContact = old.getContacts().find { | ||
(it.contact.email && !it.contact.email.isEmpty() && it.contact.email == contact.email) || | ||
(it.contact.firstName == contact.firstName && it.contact.lastName == contact.lastName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the context so this might be misinformed.
The firstName/lastName comparison may be a problem. For example:
- A contact has a new email (same firstName and lastName). This no longer adds a new contact with the new email. I would always want to add the new email.
- A contact has same firstName and lastName as an old contact. The new contact is not added. While unlikely, I do not want prevent two contacts with the same name from existing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's problematic, but as email is not mandatory and we don't have a primary key, we should use the names.
For instance, this is the IPT:
Duplicated contacts but with different names or emails results in duplicate contacts. The IPT only fix these duplicates if you use exactly the same names and email.
I was thinking that probably a better solution for us to maintain contacts in sync, is to always remove all contact_for
from a data resource and re-add with the new contacts instead of trying to compare and update. In this case we can have "orphans" old contacts without any dr, so maybe we can also check this and remove the orphans contacts (imagine a contact without email or a misspelled name that is corrected in a new version of a dr).
I was thinking of adding in a future PR and doing this PR something similar to what we have but improved and less drastic and easy to review.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, when merging new information, it must not exclude more new contacts that it already does. I now understand it is already excluding those without an email or those that share an email (given that it is not the primary key).
At the very least, this name comparison line needs a change. I'll leave it to you if you want to address it in this PR or remove this line and address it a new PR.
I agree with your suggestion. It is preferable to:
- Add all new contacts. There will be a comparison of all contact fields to determine what is new.
- Remove old contacts (the ContactFor) that are no longer in the IPT source. This makes the assumption no contact information, for an IPT resource, will be manually altered in the collectory.
- Deleting of contacts that are orphaned (those without a ContactFor)
Having looked at the code a little longer I have another question. Previously the creator contact was added as a primary=true
. The additional dataset.contacts
and dataset.associatedParties
are also added as primary=true
. Should these be primary=false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we have a list of primary contacts and we check if it should be marked as primary or not. See below the part:
boolean isPrimaryContact = update.primaryContacts.contains(contact)
I think is better to continue with the rest of contactFor
part in a new PR to simplify the review.
This PR already addresses the original issues described in #116. Currently the drs with contacts without email are all assigned (incorrectly) to the first contact without email. In our case:
I think we can refactor this code a bit like this for better results: First the
In the With this I think we'll have the contacts exactly with the same info as the EML and the IPT, and contacts not longer linked to a dr will be removed. I think I'm not missing anything, but... |
I added a first part that update existing contacts too taking into consideration if the contact is primary or not. I can continue with the rest or do it in a new PR in the near future. This PR solves a lot of issues (the most severe is that currently the first contact without email get associated all drs of contacts without emails, that is a big list for one contact). |
def contact = addOrUpdateContact(it) | ||
if (contact){ | ||
contacts << contact | ||
primaryContacts << contact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to only use eml.dataset.contact
s as primaryContacts
is OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what my data manager suggested.
boolean hasEmail = emlElement?.electronicMailAddress?.text()?.trim()?.isEmpty() == false | ||
boolean hasName = emlElement?.individualName?.surName?.text()?.trim()?.isEmpty() == false | ||
|
||
if (!contact && (hasEmail || hasName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inclusion of && (hasEmail || hasName)
might cause null contact
exceptions further along, if it is ever false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the case in our 500 contacts. Looking to the eml expecification:
The only case I can imagine is a contact with only organizationName and no email and no individualName that seems to me something quite useless for a "contact" (Are you suggesting that we should take into account the case of "contact: organizationName: CSIRO"?).
} | ||
return null | ||
} else { | ||
return null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adam-collins I added this to prevent the NPE if a contact does not have name or email
This PR fix #116. Basically:
Contact.findByEmail(empty)
was picked in this case.