You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The mapping of the raw IDs of the users to the internal IDs is not correct when the dataset contains more than 25000 rows. I tried to read the ratings from a file and from a dataframe, but it always gives a wrong mapping of the user IDs. I tested several datasets.
In the code below, after saving the training set with the internal IDs to SupriseTrainingSet.csv, I compare the file Train.txt to SupriseTrainingSet.csv.
Steps/Code to Reproduce
from surprise import Dataset, KNNBasic, Reader
import pandas as pd
import csv
data = Dataset.load_from_file(train_file, reader=reader)
trainset = data.build_full_trainset() #creates the training set from the whole dataset
with open(files_dir + folder +"SupriseTrainingSet.csv", 'w', newline='') as file:
writer = csv.writer(file)
# write each row of data to the CSV file
for row in trainset.all_ratings():
writer.writerow(row)
Hi,
Description
The mapping of the raw IDs of the users to the internal IDs is not correct when the dataset contains more than 25000 rows. I tried to read the ratings from a file and from a dataframe, but it always gives a wrong mapping of the user IDs. I tested several datasets.
In the code below, after saving the training set with the internal IDs to SupriseTrainingSet.csv, I compare the file Train.txt to SupriseTrainingSet.csv.
Steps/Code to Reproduce
from surprise import Dataset, KNNBasic, Reader
import pandas as pd
import csv
train_file = files_dir + folder + "Train.txt"
reader = Reader(line_format="user item rating", sep="\t")
data = Dataset.load_from_file(train_file, reader=reader)
trainset = data.build_full_trainset() #creates the training set from the whole dataset
with open(files_dir + folder +"SupriseTrainingSet.csv", 'w', newline='') as file:
writer = csv.writer(file)
# write each row of data to the CSV file
for row in trainset.all_ratings():
writer.writerow(row)
algo = KNNBasic()
algo.fit(trainset)
Expected Results
####Original dataset
User Item Rating
1 225 2
1 154 5
1 73 3
1 43 4
1 199 4
1 34 2
1 227 4
1 94 2
1 74 1
1 76 4
1 181 5
1 105 2
1 253 5
1 200 3
1 61 4
1 93 5
1 272 3
1 53 3
1 174 5
1 193 4
1 161 4
1 129 5
1 195 5
1 9 5
1 156 4
1 262 3
1 99 3
1 21 1
1 35 1
1 123 4
1 104 1
1 148 2
1 184 4
1 249 4
1 54 3
1 66 4
1 107 4
1 8 1
1 145 2
1 102 2
1 134 4
1 125 3
1 165 5
1 49 3
1 114 5
1 32 5
1 252 2
1 209 4
1 153 3
1 26 3
1 137 5
1 133 4
1 217 3
1 245 2
1 24 3
2 286 4
2 292 4
2 313 5
2 272 5
2 290 3
2 10 2
2 312 3
2 280 3
2 281 3
2 14 4
2 296 3
2 1 4
2 279 4
3 332 1
3 339 3
3 350 3
3 319 2
3 352 2
3 260 4
3 336 1
3 348 4
3 345 3
3 271 3
3 346 5
4 327 5
4 357 4
4 329 5
4 288 4
4 300 5
5 457 1
5 2 3
####Internal IDs of surprise
User Item Rating
0 0 2
0 1 5
0 2 3
0 3 4
0 4 4
0 5 2
0 6 4
0 7 2
0 8 1
0 9 4
0 10 5
0 11 2
0 12 5
0 13 3
0 14 4
0 15 5
0 16 3
0 17 3
0 18 5
0 19 4
0 20 4
0 21 5
0 22 5
0 23 5
0 24 4
0 25 3
0 26 3
0 27 1
0 28 1
0 29 4
0 30 1
0 31 2
0 32 4
0 33 4
0 34 3
0 35 4
0 36 4
0 37 1
0 38 2
0 39 2
0 40 4
0 41 3
0 42 5
0 43 3
0 44 5
0 45 5
0 46 2
0 47 4
0 48 3
0 49 3
0 50 5
0 51 4
0 52 3
0 53 2
0 54 3
1 369 4
1 533 5
1 503 3
1 451 1
1 239 4
1 314 4
1 110 4
1 956 4
1 714 4
1 134 4
1 674 4
1 227 5
1 471 1
2 180 5
2 382 5
2 264 4
2 213 3
2 517 1
2 86 1
2 351 5
2 162 5
2 272 2
2 410 4
2 822 2
3 1328 1
3 401 5
3 807 3
3 84 3
3 1074 5
4 415 5
4 589 4
Actual Results
####Original dataset
User Item Rating
1 225 2
1 154 5
1 73 3
1 43 4
1 199 4
1 34 2
1 227 4
1 94 2
1 74 1
1 76 4
1 181 5
1 105 2
1 253 5
1 200 3
1 61 4
1 93 5
1 272 3
1 53 3
1 174 5
1 193 4
1 161 4
1 129 5
1 195 5
1 9 5
1 156 4
1 262 3
1 99 3
1 21 1
1 35 1
1 123 4
1 104 1
1 148 2
1 184 4
1 249 4
1 54 3
1 66 4
1 107 4
1 8 1
1 145 2
1 102 2
1 134 4
1 125 3
1 165 5
1 49 3
1 114 5
1 32 5
1 252 2
1 209 4
1 153 3
1 26 3
1 137 5
1 133 4
1 217 3
1 245 2
1 24 3
2 286 4
2 292 4
2 313 5
2 272 5
2 290 3
2 10 2
2 312 3
2 280 3
2 281 3
2 14 4
2 296 3
2 1 4
2 279 4
3 332 1
3 339 3
3 350 3
3 319 2
3 352 2
3 260 4
3 336 1
3 348 4
3 345 3
3 271 3
3 346 5
4 327 5
4 357 4
4 329 5
4 288 4
4 300 5
5 457 1
5 2 3
####Internal IDs of surprise
User Item Rating
0 0 2
0 1 5
0 2 3
0 3 4
0 4 4
0 5 2
0 6 4
0 7 2
0 8 1
0 9 4
0 10 5
0 11 2
0 12 5
0 13 3
0 14 4
0 15 5
0 16 3
0 17 3
0 18 5
0 19 4
0 20 4
0 21 5
0 22 5
0 23 5
0 24 4
0 25 3
0 26 3
0 27 1
0 28 1
0 29 4
0 30 1
0 31 2
0 32 4
0 33 4
0 34 3
0 35 4
0 36 4
0 37 1
0 38 2
0 39 2
0 40 4
0 41 3
0 42 5
0 43 3
0 44 5
0 45 5
0 46 2
0 47 4
0 48 3
0 49 3
0 50 5
0 51 4
0 52 3
0 53 2
0 54 3
0 369 4
0 533 5
0 503 3
0 451 1
0 239 4
0 314 4
0 110 4
0 956 4
0 714 4
0 134 4
0 674 4
0 227 5
0 471 1
0 180 5
0 382 5
0 264 4
0 213 3
0 517 1
0 86 1
0 351 5
0 162 5
0 272 2
0 410 4
0 822 2
0 1328 1
0 401 5
0 807 3
0 84 3
0 1074 5
0 415 5
0 589 4
Uploading results.xlsx…
Versions
Windows-10-10.0.22621-SP0
Python 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
surprise 1.1.3
The text was updated successfully, but these errors were encountered: