Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't call all records deleted if record count is not set #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pramsey
Copy link

@pramsey pramsey commented Dec 2, 2019

Some DBF files are written without a record count (because, say
the SHP file or SHX file already give a record count. In that
case, checking the shape index being > number of records always
returns deleted, which is not correct.

the SHP file or SHX file already give a record count. In that
case, checking the shape index being > number of records always
returns deleted, which is not correct.
@@ -1744,7 +1744,7 @@ int SHPAPI_CALL DBFIsRecordDeleted( DBFHandle psDBF, int iShape )
/* -------------------------------------------------------------------- */
/* Verify selection. */
/* -------------------------------------------------------------------- */
if( iShape < 0 || iShape >= psDBF->nRecords )
if( iShape < 0 || (iShape > 0 && iShape >= psDBF->nRecords) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be ? (I changed the iShape > 0 to psDBF->nRecords > 0 since that matches more your rationale + added the comment)

Suggested change
if( iShape < 0 || (iShape > 0 && iShape >= psDBF->nRecords) )
/* Some DBF files are written without a record count (because, say */
/* the SHP file or SHX file already give a record count. In that */
/* case, checking the shape index being > number of records always */
/* returns deleted, which is not correct. */
if( iShape < 0 || ( psDBF->nRecords > 0 && iShape >= psDBF->nRecords) )

I didn't check the DBF spec but I don't think a DBF header with nRecords == 0 would be valid ?
And as far as I can see DBFReadAttribute() would have the same issue., and quite a few other places How much is your proposed change tested ;-) ?

Copy link
Author

@pramsey pramsey Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha ha. Well, in theory it's been tested widely in PostGIS forever, but I only had to make the one change to pass regression... and your change is correct, the comment is right and my code is not. I think our test files are so simple we aren't even seeing the DBFReadAttribute case because we have no attributes in our DBF files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a point in trying to support such broken DBF files, unless you see them in great use in the wild ? I don't have memories of issues reported in GDAL regarding such cases

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible we are generating them ourselves... leave this PR open, I will check tomorrow... somehow we got these shape files in the very early days and added them to our regression suite... https://github.com/postgis/postgis/tree/master/regress/loader

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we generated them ourselves with early versions of shapelib or maybe created them with ArcView 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe regenerate them with a recent ogr2ogr ?

@pramsey
Copy link
Author

pramsey commented Dec 3, 2019

So, doing some testing with our regression files... I loaded up a the point file, which looks like this:

CREATE TABLE "point" (gid serial);
ALTER TABLE "point" ADD PRIMARY KEY (gid);
SELECT AddGeometryColumn('','point','geom','0','POINT',2);
INSERT INTO "point" (geom) VALUES ('01010000000000000000000000000000000000F03F');
INSERT INTO "point" (geom) VALUES ('01010000000000000000002240000000000000F0BF');
INSERT INTO "point" (geom) VALUES ('01010000000000000000002240000000000000F0BF');

So, a table with only a point column. Then dumped it, both with ogr2ogr and with pgsql2shp. The DBF file from ogr2ogr looks like this

Sparrow:/tmp pramsey$ hexdump foo.dbf 
0000000 03 77 0c 03 03 00 00 00 41 00 0c 00 00 00 00 00
0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 57 00 00
0000020 46 49 44 00 00 00 00 00 00 00 00 4e 00 00 00 00
0000030 0b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000040 0d 20 20 20 20 20 20 20 20 20 20 20 30 20 20 20
0000050 20 20 20 20 20 20 20 20 31 20 20 20 20 20 20 20
0000060 20 20 20 20 32 1a                              

It looks like ogr2ogr adds in a FID column during the dump, so ogrinfo returns this:

OGRFeature(foo):0
  FID (Integer64) = 0
  POINT (0 1)

The DBF file from pgsql2shp on the other hand:

0000000 03 5f 07 1a 00 00 00 00 21 00 01 00 00 00 00 00
0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000020 0d                                             

So, this is generated using shapelib, naturally, and it's got the record count all zeroed out. I'm guessing that this is a result of the dumper not trying to write any tuples because there's not any fields. Or maybe writing tuples but with no fields added. Have to confirm still.

@pramsey
Copy link
Author

pramsey commented Dec 3, 2019

Implication being, we've been generating DBF files with pgsql2shp that have a zero record count in the case of geometry-only tables, and doing so for 18 years... so there are definitely some in the wild. Hm.

@rouault
Copy link
Member

rouault commented Dec 3, 2019

so there are definitely some in the wild.

OK, so if we want to support such files in shapelib (and then in GDAL), I would suggest a different approach that will be less invasive than patching all sites where we look for num_records. If the num_records header is read to be 0 at file opening, then compute its value from the file size and record size.

@pramsey
Copy link
Author

pramsey commented Dec 3, 2019

The size of the DBF file seems invariant to the number of rows...

Looking into the dumper, it only calls DBFWriteAttributeDirectly when the number of non-geometry fields is non-zero. So we go through the dumping process never writing anything to the DBF except for the header when we open the file. I wonder what the correct workflow is for the DBF in the case where it has no fields?

@rouault
Copy link
Member

rouault commented Dec 3, 2019

The size of the DBF file seems invariant to the number of rows...

I don't get this. Normally file_size (roughly) = fixed_header_size + number_of_fields * field_description_size + number_of_rows * record_size

@rouault
Copy link
Member

rouault commented Dec 3, 2019

I wonder what the correct workflow is for the DBF in the case where it has no fields?

In that case, OGR creates a dummy "FID" field so that there's at least one field

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants