Don't call all records deleted if record count is not set#4
Don't call all records deleted if record count is not set#4pramsey wants to merge 1 commit intoOSGeo:masterfrom
Conversation
the SHP file or SHX file already give a record count. In that case, checking the shape index being > number of records always returns deleted, which is not correct.
| /* Verify selection. */ | ||
| /* -------------------------------------------------------------------- */ | ||
| if( iShape < 0 || iShape >= psDBF->nRecords ) | ||
| if( iShape < 0 || (iShape > 0 && iShape >= psDBF->nRecords) ) |
There was a problem hiding this comment.
Shouldn't that be ? (I changed the iShape > 0 to psDBF->nRecords > 0 since that matches more your rationale + added the comment)
| if( iShape < 0 || (iShape > 0 && iShape >= psDBF->nRecords) ) | |
| /* Some DBF files are written without a record count (because, say */ | |
| /* the SHP file or SHX file already give a record count. In that */ | |
| /* case, checking the shape index being > number of records always */ | |
| /* returns deleted, which is not correct. */ | |
| if( iShape < 0 || ( psDBF->nRecords > 0 && iShape >= psDBF->nRecords) ) |
I didn't check the DBF spec but I don't think a DBF header with nRecords == 0 would be valid ?
And as far as I can see DBFReadAttribute() would have the same issue., and quite a few other places How much is your proposed change tested ;-) ?
There was a problem hiding this comment.
Ha ha. Well, in theory it's been tested widely in PostGIS forever, but I only had to make the one change to pass regression... and your change is correct, the comment is right and my code is not. I think our test files are so simple we aren't even seeing the DBFReadAttribute case because we have no attributes in our DBF files.
There was a problem hiding this comment.
Is there a point in trying to support such broken DBF files, unless you see them in great use in the wild ? I don't have memories of issues reported in GDAL regarding such cases
There was a problem hiding this comment.
It's possible we are generating them ourselves... leave this PR open, I will check tomorrow... somehow we got these shape files in the very early days and added them to our regression suite... https://github.com/postgis/postgis/tree/master/regress/loader
There was a problem hiding this comment.
Either we generated them ourselves with early versions of shapelib or maybe created them with ArcView 3.
There was a problem hiding this comment.
maybe regenerate them with a recent ogr2ogr ?
|
So, doing some testing with our regression files... I loaded up a the point file, which looks like this: So, a table with only a point column. Then dumped it, both with ogr2ogr and with pgsql2shp. The DBF file from ogr2ogr looks like this It looks like ogr2ogr adds in a FID column during the dump, so ogrinfo returns this: The DBF file from pgsql2shp on the other hand: So, this is generated using shapelib, naturally, and it's got the record count all zeroed out. I'm guessing that this is a result of the dumper not trying to write any tuples because there's not any fields. Or maybe writing tuples but with no fields added. Have to confirm still. |
|
Implication being, we've been generating DBF files with pgsql2shp that have a zero record count in the case of geometry-only tables, and doing so for 18 years... so there are definitely some in the wild. Hm. |
OK, so if we want to support such files in shapelib (and then in GDAL), I would suggest a different approach that will be less invasive than patching all sites where we look for num_records. If the num_records header is read to be 0 at file opening, then compute its value from the file size and record size. |
|
The size of the DBF file seems invariant to the number of rows... Looking into the dumper, it only calls |
I don't get this. Normally file_size (roughly) = fixed_header_size + number_of_fields * field_description_size + number_of_rows * record_size |
In that case, OGR creates a dummy "FID" field so that there's at least one field |
|
Closing this stalled PR |
Some DBF files are written without a record count (because, say
the SHP file or SHX file already give a record count. In that
case, checking the shape index being > number of records always
returns deleted, which is not correct.