feat: #641 - Extend document chunker transform to support fixed-size token window chunker with overlap #642

juancappi · 2024-09-29T13:04:06Z

Why are these changes needed?

Partially implements #641

This is a draft for pure python only. Ray is still pending

Signed-off-by: Juan Cappi <[email protected]>

juancappi · 2024-10-03T15:19:24Z

@touma-I would you please take a look to confirm I'm in the right track?

to better reflect the new chunker is also leveraging a Llama Index chunker Signed-off-by: Juan Cappi <[email protected]> IBM#641

Signed-off-by: Juan Cappi <[email protected]>

juancappi · 2024-10-04T14:57:48Z

@dolfim-ibm this is ready for review

Signed-off-by: Juan Cappi <[email protected]>

dolfim-ibm

Nice. LGTM

juancappi marked this pull request as draft September 29, 2024 13:04

feat: IBM#641 - first draft implementation, python only

242fad1

Signed-off-by: Juan Cappi <[email protected]>

juancappi force-pushed the feat/641-fixed-size-token-chunking branch from 43d36f1 to 242fad1 Compare September 29, 2024 13:05

touma-I requested a review from dolfim-ibm October 3, 2024 15:44

juancappi added 2 commits October 3, 2024 13:50

fix: change naming

c481c5c

to better reflect the new chunker is also leveraging a Llama Index chunker Signed-off-by: Juan Cappi <[email protected]> IBM#641

fix: adjust documentation - IBM#641

6d21ef3

Signed-off-by: Juan Cappi <[email protected]>

juancappi marked this pull request as ready for review October 3, 2024 18:26

juancappi added 2 commits October 4, 2024 14:18

fix: add missing metadata.json as expected file in test fixture

73e35d7

Signed-off-by: Juan Cappi <[email protected]>

fix: comment extra config line

137fb2d

Signed-off-by: Juan Cappi <[email protected]>

dolfim-ibm approved these changes Oct 7, 2024

View reviewed changes

touma-I changed the title ~~feat: #641 - first draft implementation, python only~~ feat: #641 - Extend document chunker transform to support fixed-size token window chunker with overlap Oct 7, 2024

touma-I merged commit d04454e into IBM:dev Oct 7, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: #641 - Extend document chunker transform to support fixed-size token window chunker with overlap #642

feat: #641 - Extend document chunker transform to support fixed-size token window chunker with overlap #642

juancappi commented Sep 29, 2024

juancappi commented Oct 3, 2024

juancappi commented Oct 4, 2024

dolfim-ibm left a comment

feat: #641 - Extend document chunker transform to support fixed-size token window chunker with overlap #642

feat: #641 - Extend document chunker transform to support fixed-size token window chunker with overlap #642

Conversation

juancappi commented Sep 29, 2024

Why are these changes needed?

juancappi commented Oct 3, 2024

juancappi commented Oct 4, 2024

dolfim-ibm left a comment

Choose a reason for hiding this comment