Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage or build index size? #573

Open
H4ad opened this issue Dec 8, 2023 · 0 comments
Open

Reduce memory usage or build index size? #573

H4ad opened this issue Dec 8, 2023 · 0 comments

Comments

@H4ad
Copy link
Contributor

H4ad commented Dec 8, 2023

Current Behavior

After #441 be merged, we had ~34% reduce of index size when saving to .json.

But the memory usage increased, so I rewrote the test to generate truly random data (to not take advantage of v8 string cache):

import { faker } from "@faker-js/faker";
import { writeFileSync } from "fs";

// create fake data
const data = Array.from({length: 100000}, () => ({
  id: faker.string.uuid(),
  name: faker.person.firstName() + '_' + Math.random().toString(16).slice(2),
  surname: faker.person.lastName() + '_' + Math.random().toString(16).slice(2),
  fiscalCode: faker.string.alphanumeric({length: 16, casing: "uppercase"}),
  season: faker.number.int({min: 2010, max: 2020}),
}));

writeFileSync('./large-object.json', JSON.stringify(data, null, 2));

And then I see the memory usage:

import { readFileSync } from "fs";
import { resolve } from "path";
import { fileURLToPath } from "url";
import { create, insertMultiple, search } from "./dist/index.js";

// function to print the used memory
function printUsedMemory() {
  const used = process.memoryUsage().heapUsed / 1024 / 1024;
  console.log(`The script uses approximately ${Math.round(used * 100) / 100} MB`);
}

const __dirname = resolve(fileURLToPath(import.meta.url), '..');

const data = JSON.parse(
  readFileSync(__dirname + '/large-object.json', 'utf8'),
);

printUsedMemory();

// create index and add data
const db = await create({
  schema: {
    id: "string",
    name: "string",
    surname: "string",
    fiscalCode: "string",
    season: "number",
  },
});

await insertMultiple(db, data);
console.log("Index created");

printUsedMemory();

// search the index
const results = await search(db, {
  term: "john",
  properties: "*",
});

printUsedMemory();

The results with internalId are:

The script uses approximately 44.61 MB
Index created
The script uses approximately 546.78 MB
The script uses approximately 549.3 MB

But, if we disable/remove that feature:

The script uses approximately 44.65 MB
Index created
The script uses approximately 502.32 MB
The script uses approximately 504.65 MB

We can reduce the memory usage by ~8%.

New Behavior

Can we have both scenarios where we store the IDs as internal and then remap them? Or is it too much work?
Can we give some flag to the user to choose between lower index size or reduced memory usage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant