Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying Nouveau index with >3 sort params results in badarith error #5453

Open
jkuester opened this issue Feb 28, 2025 · 9 comments · May be fixed by #5463
Open

Querying Nouveau index with >3 sort params results in badarith error #5453

jkuester opened this issue Feb 28, 2025 · 9 comments · May be fixed by #5463

Comments

@jkuester
Copy link

Description

Reading the docs for querying a Nouveau index it states that:

You can use a single string to sort by one field or an array of strings to sort by several fields in the same order as the array.

Unfortunately, when I try to use more than 3 fields in the sort array on a query request, an error occurs in my Couch instance.

Steps to Reproduce

I have created a design doc with a Nouveau index that has the following code:

function (doc) {
  var types = [ 'district_hospital', 'health_center', 'clinic', 'person' ];
  if (types.indexOf(doc.type) === -1) {
    return;
  }

  index('string', 'contact_type', doc.type);
  index('string', 'contact_type_index', String(types.indexOf(doc.type)));
  index('string', 'dead', String(!!doc.date_of_death));
  index('string', 'muted', String(!!doc.muted));
  var name = doc.name && typeof doc.name === 'string' ? doc.name.toLowerCase() : '';
  index('string', 'name', name);
}

Then, I am querying this index by POSTing the following body:

{
  "q":"*:*",
  "sort":[
    "dead",
    "muted",
    "contact_type_index",
    "name"
  ]
}

Unfortunately, the server responds with a 500 error and:

{
  "error": "unknown_error",
  "reason": "badarith",
  "ref": 3197294518
}

In the logs from my CouchDB instance I see:

[notice] 2025-02-28T21:00:01.823920Z [email protected] <0.10807.50> 87d80e15f1 localhost:5984 172.19.0.1 medic POST /medic/_design/medic/_nouveau/test_index 500 ok 9
2025-02-28T21:00:15.086077441Z [error] 2025-02-28T21:00:15.085810Z [email protected] <0.10928.50> c47dc7a1e9 req_err(3197294518) unknown_error : badarith
2025-02-28T21:00:15.086103849Z     [<<"erlang:bsr/2">>,<<"base64:encode_list/2 L94">>,<<"nouveau_httpd:handle_search_req/6 L109">>,<<"nouveau_httpd:handle_search_req/3 L57">>,<<"chttpd:handle_req_after_auth/2 L428">>,<<"chttpd:process_request/1 L406">>,<<"chttpd:handle_request_int/1 L341">>,<<"mochiweb_http:headers/6 L140">>]

There are no errors in the Nouveau logs (only 200 successes).

If I repeat the request with only 3 entries in the sort array, it succeeds (regardless of which field I remove from the array).

Expected Behaviour

I expect the sort array to be able to support a reasonable amount of sort parameters (>3) without erroring.

Your Environment

{
  "couchdb": "Welcome",
  "version": "3.4.2",
  "git_sha": "6e5ad2a5c",
  "uuid": "326d5bb8-96c4-47d4-a5c0-bade13a2c24c",
  "features": [
    "nouveau",
    "access-ready",
    "partitioned",
    "pluggable-storage-engines",
    "reshard",
    "scheduler"
  ],
  "vendor": {
    "name": "The Apache Software Foundation"
  }
}

CouchDB Docker image built FROM couchdb:3.4.2
Nouveau Docker image build FROM couchdb:3.4.2-nouveau

Additional Context

I am new to Nouveau indexes and have been trying to closely follow the docs, but perhaps I missed something and am actually trying to do something in an improper way....

@rnewson
Copy link
Member

rnewson commented Mar 5, 2025

Thank you for the report, unfortunately I cannot reproduce the error locally. the badarith seems to be part of the base64 encoding process which implies something interesting about what is being encoded. I don't think it's about the number of items in the sort order, that just happens to trigger a bug elsewhere.

Could you provide some sample docs? I made one up based on your index function and I don't see how the details could matter, but clearly there's a difference between my setup and yours.

curl -g 'http://foo:bar@localhost:15984/db1/_design/foo/_nouveau/foo?q=*:*&sort=["dead","muted","contact_type_index","name"]' | jq
{
  "update_latency": 3,
  "total_hits_relation": "EQUAL_TO",
  "total_hits": 1,
  "ranges": null,
  "hits": [
    {
      "order": [
        {
          "value": "true",
          "@type": "string"
        },
        {
          "value": "false",
          "@type": "string"
        },
        {
          "value": "2",
          "@type": "string"
        },
        {
          "value": "foo",
          "@type": "string"
        },
        {
          "value": "doc1",
          "@type": "string"
        }
      ],
      "id": "doc1",
      "fields": {}
    }
  ],
  "counts": null,
  "bookmark": "W1t7InZhbHVlIjoidHJ1ZSIsIkB0eXBlIjoic3RyaW5nIn0seyJ2YWx1ZSI6ImZhbHNlIiwiQHR5cGUiOiJzdHJpbmcifSx7InZhbHVlIjoiMiIsIkB0eXBlIjoic3RyaW5nIn0seyJ2YWx1ZSI6ImZvbyIsIkB0eXBlIjoic3RyaW5nIn0seyJ2YWx1ZSI6ImRvYzEiLCJAdHlwZSI6InN0cmluZyJ9XV0="
}

@jkuester
Copy link
Author

jkuester commented Mar 5, 2025

I am testing with docs that all have this set of fields (with random values):

{
  "_id": "03476254-7ca0-457c-b39f-7266516e8717",
  "_rev": "1-0cd9bf8a77a686db365d58c41ccf61f7",
  "type": "person",
  "name": "malcolm",
  "short_name": "Alex",
  "date_of_birth": "1992-3-5",
  "date_of_birth_method": "",
  "ephemeral_dob": {
    "dob_calendar": "1992-3-5",
    "dob_method": "",
    "dob_approx": "1992-04-03T17:44:27.928Z",
    "dob_raw": "1992-3-5",
    "dob_iso": "1992-3-5"
  },
  "sex": "male",
  "phone": "+254755818055",
  "phone_alternate": "",
  "role": "patient",
  "external_id": "",
  "notes": "",
  "meta": {
    "created_by": "medic",
    "created_by_person_uuid": "",
    "created_by_place_uuid": ""
  },
  "reported_date": 1739891502334
}

Playing around with this some more, I think the error is related to the number of hits found for the query and the limit for the query.

With an empty DB, the query works fine. As I start adding docs with the above structure one-by-one the query still works. However, once I pass 25 docs, I start getting that badarith error. There is some variability in the behavior in that I have seen the error for less than 25, but I have never had a successful query with more than 25 hits. (And I suspect the variability might be caused by my rough test setup. I am just purging docs between tests, not actually destroying and recreating the whole db...)

So, once I have >25 hits in the DB, the query is always failing. However, if I specify an actual limit on the query that is <24, the query works fine no matter how many hits are in the DB! But, if I set a limit that is >=24, I get the error again.

@rnewson
Copy link
Member

rnewson commented Mar 6, 2025

That's interesting, I still can't repro even when adding documents, however I am not purging docs and I wonder if that's somehow related. Are you willing and able to modify the couchdb install to gather more information? I could propose a patch with additional logging.

@rnewson
Copy link
Member

rnewson commented Mar 6, 2025

can you also state your couchdb and erlang versions please?

@rnewson
Copy link
Member

rnewson commented Mar 6, 2025

derp, nvm, the image should tell me all that

@rnewson
Copy link
Member

rnewson commented Mar 6, 2025

wondering if this is bookmark related, base64 vs urlsafe. dreyfus uses couch_util:encode/decodeBase64Url but nouveau doesn't (though I think it will have to...)

@rnewson
Copy link
Member

rnewson commented Mar 6, 2025

think I figured it out.

@rnewson
Copy link
Member

rnewson commented Mar 6, 2025

1> base64:encode(<<"foo">>).
<<"Zm9v">>
2> base64:encode([<<"foo">>]).
** exception error: an error occurred when evaluating an arithmetic expression
     in operator  bsr/2

this line;

base64:encode(jiffy:encode(maps:values(UnpackedBookmark))).

only works when jiffy:encode returns a binary.

So the sorting/limit stuff is incidental, the output of jiffy for those inputs just happens to be represented as an iolist().

@rnewson
Copy link
Member

rnewson commented Mar 6, 2025

dreyfus uses the b64url NIF instead which always returns a binary.

rnewson added a commit that referenced this issue Mar 6, 2025
dreyfus used b64url to encode its bookmark which always returns a binary, so
let's do that. also makes bookmarks easier to pass around.

closes #5453
@rnewson rnewson linked a pull request Mar 6, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants