Skip to content
This repository was archived by the owner on Aug 6, 2025. It is now read-only.
This repository was archived by the owner on Aug 6, 2025. It is now read-only.

Whole document embedding #152

@aalloul

Description

@aalloul

Hi there,

I was wondering whether it makes sense to "trick" LASER to consider a whole document made out of multiple sentences as a single sentence? That way I'd get a whole document embedding and wouldn't need to devise any aggregation method.

I know there's a limit of 12000 tokens on sentences (as per

parser.add_argument('--max-tokens', type=int, default=12000,
) but let's forget this for now please :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions