You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experimenting with data.world and it gave some interesting ideas for validation:
Seen at 16 duplicate rows detected
Row 1630 is a duplicate of the one above it.
Row 29643 is a duplicate of the one above it.
Row 65945 is a duplicate of the one above it.
Row 77711 is a duplicate of the one above it.
+ 12 similar issues
158 blank cells detected
Dismiss
Cell at row 1345, column title appears blank.
Cell at row 1457, column title appears blank.
Cell at row 1949, column title appears blank.
Cell at row 2214, column title appears blank.
+ 154 similar issues
Numeric(114)
114 numeric values outside standard deviation detected
Dismiss
Value 1.1102015E7 at row 66099, column checkin is more than 4 standard deviations of 646327.73 from the mean of 45586.41.
Value 1.4102015E7 at row 66099, column checkout is more than 4 standard deviations of 778967.24 from the mean of 50554.01.
Value -1800.0 at row 109847, column hours is more than 4 standard deviations of 248.63 from the mean of 8.14.
Value 1130.0 at row 171909, column hours is more than 4 standard deviations of 248.63 from the mean of 8.14.
+ 110 similar issues
Noise(146)
2 non-numeric characters in number field detected
Dismiss
Value at row 110516, column latitude does not appear to be numeric, but column is numeric.
Value at row 110516, column longitude does not appear to be numeric, but column is numeric.
1 possible placeholder number detected
Dismiss
Value 5555 at row 185359, column title appears to be a placeholder.
143 possible placeholder text values detected
Dismiss
Value VVV at row 5521, column alt appears to be a placeholder.
Value *** at row 12105, column alt appears to be a placeholder.
Value CCC at row 39923, column alt appears to be a placeholder.
Value *** at row 104112, column alt appears to be a placeholder.
+ 139 similar issues
Text(1,945)
1,945 text values outside standard deviation detected
Dismiss
Text value 95 at row 508, column address has length more than 4 standard deviations away from the mean.
Text value 88 at row 591, column address has length more than 4 standard deviations away from the mean.
Text value 162 at row 599, column address has length more than 4 standard deviations away from the mean.
Text value 144 at row 742, column address has length more than 4 standard deviations away from the mean.
+ 1,941 similar issues
Date(3)
3 dates detected far in the future
Dismiss
Date is far in the future at column lastedit, row 77420.
Date is far in the future at column lastedit, row 130787.
Date is far in the future at column lastedit, row 205038.
Feel free to split this issue into separate issues, and implement only the ones you want.
I am experimenting with data.world and it gave some interesting ideas for validation:
Feel free to split this issue into separate issues, and implement only the ones you want.
For your information, here are the current validations: http://wvpoi.batalex.ru/download/listings/wikivoyage-listings-en-latest.validation-report.html
The text was updated successfully, but these errors were encountered: