You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently only support analyzing one file per submission, however for larger projects it would be useful to allow multiple files per submission.
A temporary workaround is to concatenate all files for one submission into one. This is not ideal since it is not clear in the analysis to which file a plagiarized fragment belongs.
Changes needed
A rough overview of the changes that need to be done to implement this feature:
dolos-core / dolos-lib
To make plagiarsm detection work over multiple files, we could essentially let Dolos perform the concatenation in a smart way. The changes in the library are mostly bookkeeping:
Rename File to Submission, such that Pair, SharedFingerprint, etc. now have references to submissions instead of files
Make a new File class that represents a single file, a Submission would have a list of Files instead of one content string
Make the changes necessary to handle winnowing and indexing the multiple files. It does not make sense to make kgrams that span over multiple files.
Make the changes necessary to handle the aggregation with matches that can occur over multiple files. Probably Region would need to be changed to also include the file that has been matched.
dolos-cli
Implement options to enable this feature
Implement how to detect which files belong together for a submission, similar to Support multiple submissions per student #1584 this could be achieved using directories or the CSV-file (with a submission_id grouping files).
Make changes to the output files to be able to communicate the different files to the front-end:
Make a submissions.csv that includes the information now contained in files.csv, except file contents
Change files.csv to include a reference to submissions.csv
dolos-web
Make changes to the API stores to parse and store the new format correctly
Edit the submissions page with a way to browse a submission's files
Pairwise Comparison
This part is probably the most complex, as this page has a lot going on already and as there is (to my knowledge) no out-of-the-box support for multiple files with the code editor that we use (Monaco).
I think the easiest way is to sort the files by name for each submission (as we expect similar file names for each submission, and that way these similar files are close together), concatenate them in the browser and add a "file separator" marker between subsequent files.
If we want to go more advanced, it should probably be possible to make a small file browser for each side with a similarity% next to it. This would require calculating this similarity for each file, which might have some caveats as well. I do not think we want to implement this advanced version right away, but keeping this in mind while implementing is probably a good idea.
The text was updated successfully, but these errors were encountered:
We currently only support analyzing one file per submission, however for larger projects it would be useful to allow multiple files per submission.
A temporary workaround is to concatenate all files for one submission into one. This is not ideal since it is not clear in the analysis to which file a plagiarized fragment belongs.
Changes needed
A rough overview of the changes that need to be done to implement this feature:
dolos-core / dolos-lib
To make plagiarsm detection work over multiple files, we could essentially let Dolos perform the concatenation in a smart way. The changes in the library are mostly bookkeeping:
File
toSubmission
, such thatPair
,SharedFingerprint
, etc. now have references to submissions instead of filesFile
class that represents a single file, aSubmission
would have a list ofFile
s instead of onecontent
stringRegion
would need to be changed to also include the file that has been matched.dolos-cli
submission_id
grouping files).submissions.csv
that includes the information now contained infiles.csv
, except file contentsfiles.csv
to include a reference tosubmissions.csv
dolos-web
submissions
page with a way to browse a submission's filesPairwise Comparison
This part is probably the most complex, as this page has a lot going on already and as there is (to my knowledge) no out-of-the-box support for multiple files with the code editor that we use (Monaco).
I think the easiest way is to sort the files by name for each submission (as we expect similar file names for each submission, and that way these similar files are close together), concatenate them in the browser and add a "file separator" marker between subsequent files.
If we want to go more advanced, it should probably be possible to make a small file browser for each side with a similarity% next to it. This would require calculating this similarity for each file, which might have some caveats as well. I do not think we want to implement this advanced version right away, but keeping this in mind while implementing is probably a good idea.
The text was updated successfully, but these errors were encountered: