Don't call all records deleted if record count is not set #4

pramsey · 2019-12-02T23:29:10Z

Some DBF files are written without a record count (because, say
the SHP file or SHX file already give a record count. In that
case, checking the shape index being > number of records always
returns deleted, which is not correct.

the SHP file or SHX file already give a record count. In that case, checking the shape index being > number of records always returns deleted, which is not correct.

rouault · 2019-12-02T23:39:56Z

dbfopen.c

@@ -1744,7 +1744,7 @@ int SHPAPI_CALL DBFIsRecordDeleted( DBFHandle psDBF, int iShape )
 /* -------------------------------------------------------------------- */
 /*      Verify selection.                                               */
 /* -------------------------------------------------------------------- */
-    if( iShape < 0 || iShape >= psDBF->nRecords )
+    if( iShape < 0 || (iShape > 0 && iShape >= psDBF->nRecords) )


Shouldn't that be ? (I changed the iShape > 0 to psDBF->nRecords > 0 since that matches more your rationale + added the comment)

Suggested change

if( iShape < 0 || (iShape > 0 && iShape >= psDBF->nRecords) )

/* Some DBF files are written without a record count (because, say */

/* the SHP file or SHX file already give a record count. In that */

/* case, checking the shape index being > number of records always */

/* returns deleted, which is not correct. */

if( iShape < 0 || ( psDBF->nRecords > 0 && iShape >= psDBF->nRecords) )

I didn't check the DBF spec but I don't think a DBF header with nRecords == 0 would be valid ?
And as far as I can see DBFReadAttribute() would have the same issue., and quite a few other places How much is your proposed change tested ;-) ?

Ha ha. Well, in theory it's been tested widely in PostGIS forever, but I only had to make the one change to pass regression... and your change is correct, the comment is right and my code is not. I think our test files are so simple we aren't even seeing the DBFReadAttribute case because we have no attributes in our DBF files.

Is there a point in trying to support such broken DBF files, unless you see them in great use in the wild ? I don't have memories of issues reported in GDAL regarding such cases

It's possible we are generating them ourselves... leave this PR open, I will check tomorrow... somehow we got these shape files in the very early days and added them to our regression suite... https://github.com/postgis/postgis/tree/master/regress/loader

Either we generated them ourselves with early versions of shapelib or maybe created them with ArcView 3.

maybe regenerate them with a recent ogr2ogr ?

pramsey · 2019-12-03T16:24:49Z

So, doing some testing with our regression files... I loaded up a the point file, which looks like this:

CREATE TABLE "point" (gid serial);
ALTER TABLE "point" ADD PRIMARY KEY (gid);
SELECT AddGeometryColumn('','point','geom','0','POINT',2);
INSERT INTO "point" (geom) VALUES ('01010000000000000000000000000000000000F03F');
INSERT INTO "point" (geom) VALUES ('01010000000000000000002240000000000000F0BF');
INSERT INTO "point" (geom) VALUES ('01010000000000000000002240000000000000F0BF');

So, a table with only a point column. Then dumped it, both with ogr2ogr and with pgsql2shp. The DBF file from ogr2ogr looks like this

Sparrow:/tmp pramsey$ hexdump foo.dbf 
0000000 03 77 0c 03 03 00 00 00 41 00 0c 00 00 00 00 00
0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 57 00 00
0000020 46 49 44 00 00 00 00 00 00 00 00 4e 00 00 00 00
0000030 0b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000040 0d 20 20 20 20 20 20 20 20 20 20 20 30 20 20 20
0000050 20 20 20 20 20 20 20 20 31 20 20 20 20 20 20 20
0000060 20 20 20 20 32 1a

It looks like ogr2ogr adds in a FID column during the dump, so ogrinfo returns this:

OGRFeature(foo):0
  FID (Integer64) = 0
  POINT (0 1)

The DBF file from pgsql2shp on the other hand:

0000000 03 5f 07 1a 00 00 00 00 21 00 01 00 00 00 00 00
0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000020 0d

So, this is generated using shapelib, naturally, and it's got the record count all zeroed out. I'm guessing that this is a result of the dumper not trying to write any tuples because there's not any fields. Or maybe writing tuples but with no fields added. Have to confirm still.

pramsey · 2019-12-03T18:39:59Z

Implication being, we've been generating DBF files with pgsql2shp that have a zero record count in the case of geometry-only tables, and doing so for 18 years... so there are definitely some in the wild. Hm.

rouault · 2019-12-03T18:45:10Z

so there are definitely some in the wild.

OK, so if we want to support such files in shapelib (and then in GDAL), I would suggest a different approach that will be less invasive than patching all sites where we look for num_records. If the num_records header is read to be 0 at file opening, then compute its value from the file size and record size.

pramsey · 2019-12-03T19:06:03Z

The size of the DBF file seems invariant to the number of rows...

Looking into the dumper, it only calls DBFWriteAttributeDirectly when the number of non-geometry fields is non-zero. So we go through the dumping process never writing anything to the DBF except for the header when we open the file. I wonder what the correct workflow is for the DBF in the case where it has no fields?

rouault · 2019-12-03T19:13:11Z

The size of the DBF file seems invariant to the number of rows...

I don't get this. Normally file_size (roughly) = fixed_header_size + number_of_fields * field_description_size + number_of_rows * record_size

rouault · 2019-12-03T19:46:28Z

I wonder what the correct workflow is for the DBF in the case where it has no fields?

In that case, OGR creates a dummy "FID" field so that there's at least one field

rouault · 2024-08-13T18:11:16Z

Closing this stalled PR

Some DBF files are written without a record count (because, say

3a1b6d4

the SHP file or SHX file already give a record count. In that case, checking the shape index being > number of records always returns deleted, which is not correct.

rouault reviewed Dec 2, 2019

View reviewed changes

thbeu mentioned this pull request Mar 9, 2024

Let CTest succeed on Windows by ignoring CR #111

Merged

rouault closed this Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't call all records deleted if record count is not set #4

Don't call all records deleted if record count is not set #4

Uh oh!

pramsey commented Dec 2, 2019

Uh oh!

rouault Dec 2, 2019

Uh oh!

pramsey Dec 2, 2019 •

edited

Loading

Uh oh!

rouault Dec 2, 2019

Uh oh!

pramsey Dec 2, 2019

Uh oh!

pramsey Dec 2, 2019

Uh oh!

rouault Dec 2, 2019

Uh oh!

pramsey commented Dec 3, 2019 •

edited

Loading

Uh oh!

pramsey commented Dec 3, 2019

Uh oh!

rouault commented Dec 3, 2019 •

edited

Loading

Uh oh!

pramsey commented Dec 3, 2019 •

edited

Loading

Uh oh!

rouault commented Dec 3, 2019

Uh oh!

rouault commented Dec 3, 2019

Uh oh!

rouault commented Aug 13, 2024

Uh oh!

Uh oh!

-    if( iShape < 0 || (iShape > 0 && iShape >= psDBF->nRecords) )
+    /* Some DBF files are written without a record count (because, say */
+    /* the SHP file or SHX file already give a record count. In that */
+    /* case, checking the shape index being > number of records always */
+    /* returns deleted, which is not correct. */
+    if( iShape < 0 || ( psDBF->nRecords > 0 && iShape >= psDBF->nRecords) )

Don't call all records deleted if record count is not set #4

Don't call all records deleted if record count is not set #4

Uh oh!

Conversation

pramsey commented Dec 2, 2019

Uh oh!

rouault Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

pramsey Dec 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rouault Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

pramsey Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

pramsey Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

rouault Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

pramsey commented Dec 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pramsey commented Dec 3, 2019

Uh oh!

rouault commented Dec 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pramsey commented Dec 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rouault commented Dec 3, 2019

Uh oh!

rouault commented Dec 3, 2019

Uh oh!

rouault commented Aug 13, 2024

Uh oh!

Uh oh!

pramsey Dec 2, 2019 •

edited

Loading

pramsey commented Dec 3, 2019 •

edited

Loading

rouault commented Dec 3, 2019 •

edited

Loading

pramsey commented Dec 3, 2019 •

edited

Loading