Skip to content

Commit

Permalink
merge main
Browse files Browse the repository at this point in the history
  • Loading branch information
alnutile committed Jul 23, 2024
2 parents 379b011 + ce63d13 commit a12f32d
Show file tree
Hide file tree
Showing 4 changed files with 107 additions and 17 deletions.
61 changes: 44 additions & 17 deletions app/Domains/Documents/Transformers/CSVTransformer.php
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

class CSVTransformer
{
public $keysFound = [];

protected Document $document;

protected TypesEnum $mimeType = TypesEnum::CSV;
Expand All @@ -33,6 +35,7 @@ public function handle(Document $document): array

$chunks = [];


/**
* Going to turn into a document then chunks
*/
Expand All @@ -47,41 +50,44 @@ public function handle(Document $document): array
})->implode("\n");

/**
* @TODO
* We have the text but what does the user want to do with the text
* 1) Here we should have a source with a chat_id or make the chat id
* 2) this becomes a message (that is a lot of them?)
* 3) then the LLM gets the sources prompt and sees what the user wants to do with the data.
* 4) Example "Take these dates and save them to the document start and end data then save the content to the document as an event"
* Then tag the document by the Region seen in the data (or hard coded in the prompt)
* 5) The Prompt using OrchestrateV2 should take the Chat and Message and start building out the results
* this will update or create a document
* this will find start_date and end_date new fields in a document
* this will tag the document Region: Foobar
* NOTE: We already have date_range so bummer it is created_at
* @NOTE
* Row number is tricky
* going to introduce a Key to the meta_data
* in this case i will hard code key to see it work
* then it will establish a key to update
* BUT by saving the keys we can find any documents not updated
* and delete those
*/
$file_name = 'row_'.$rowNumber.'_'.str($document->file_path)->beforeLast('.')->toString().'.txt';
if (collect($row)->has('key')) {
$rowNumber = collect($row)->get('key');
}

$this->keysFound[] = $rowNumber;

$file_name = $this->getFileName($rowNumber, $document->file_path);

Storage::disk('collections')
->put((string) $document->collection->id.'/'.$file_name, $content);

$documentRow = Document::updateOrCreate([
'collection_id' => $document->collection_id,
'file_path' => $file_name,
'type' => $this->mimeType,
], [
'status' => StatusEnum::Pending,
'summary' => $content,
'meta_data' => $row,
'type' => $this->mimeType,
'original_content' => $content,
'subject' => "Row $rowNumber import from ".$document->file_path,
'subject' => "Key or Row $rowNumber import from ".$document->file_path,
]);

$size = config('llmdriver.chunking.default_size');

$chunked_chunks = TextChunker::handle($content, $size);

if ($documentRow->wasRecentlyCreated) {
if ($documentRow->wasRecentlyCreated || $documentRow->wasChanged([
'original_content',
])) {
foreach ($chunked_chunks as $chunkSection => $chunkContent) {

$guid = md5($chunkContent);
Expand All @@ -90,10 +96,10 @@ public function handle(Document $document): array
[
'document_id' => $documentRow->id,
'sort_order' => $rowNumber,
'section_number' => $chunkSection,
],
[
'guid' => $guid,
'section_number' => $chunkSection,
'content' => $chunkContent,
'meta_data' => $row,
'original_content' => $content,
Expand All @@ -120,8 +126,29 @@ public function handle(Document $document): array

Log::info($this->mimeType->name.':Transformer:handle', ['chunks' => count($chunks)]);

$this->cleanUpDeletedRows();

$document->delete();

return $chunks;
}

protected function cleanUpDeletedRows(): void
{
Document::where('collection_id', $this->document->collection_id)
->whereNotIn('file_path', $this->getKeysWithFileName())
->delete();
}

protected function getKeysWithFileName(): array
{
return collect($this->keysFound)->map(function ($rowNumber) {
return $this->getFileName($rowNumber, $this->document->file_path);
})->toArray();
}

protected function getFileName(int $rowNumber, string $filePath): string
{
return 'row_'.$rowNumber.'_'.str($filePath)->beforeLast('.')->toString().'.txt';
}
}
51 changes: 51 additions & 0 deletions tests/Feature/CSVTransformerTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
use App\Domains\Documents\Transformers\CSVTransformer;
use App\Domains\Documents\Transformers\XlsxTransformer;
use App\Domains\Documents\TypesEnum;
use App\Models\Document;
use Illuminate\Support\Facades\File;
use Tests\TestCase;

Expand Down Expand Up @@ -53,6 +54,56 @@ public function test_import_csv(): void
$this->assertDatabaseCount('document_chunks', 5);
}

public function test_cleans_up(): void
{

$file = 'strategies_with_keys.csv';

$document = $this->setupFile($file);

$document->update([
'type' => TypesEnum::CSV,
]);

Document::factory()->count(5)->create([
'collection_id' => $document->collection_id,
'file_path' => '6666'.$file,
]);

$results = (new CSVTransformer())->handle($document);

$this->assertCount(5, $results);

$this->assertDatabaseCount('documents', 5);
$this->assertDatabaseCount('document_chunks', 5);
}

public function test_updates_keys(): void
{

$file = 'strategies_with_keys.csv';

$document = $this->setupFile($file);

$document->update([
'type' => TypesEnum::CSV,
]);

$testingDocument = Document::factory()->create([
'collection_id' => $document->collection_id,
'file_path' => 'row_555_strategies_with_keys.txt',
]);

$results = (new CSVTransformer())->handle($document);

$this->assertCount(5, $results);

$this->assertDatabaseCount('documents', 5);
$this->assertDatabaseCount('document_chunks', 5);

$this->assertStringContainsString('For the front end we focus on simplicity', $testingDocument->refresh()->original_content);
}

protected function tearDown(): void
{
if (! File::exists($this->document->pathToFile())) {
Expand Down
6 changes: 6 additions & 0 deletions tests/example-docs/strategies_with_keys.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
KEY,CATEGORY,ORIGINAL QUESTION,RESPONSES,OTHER
111,ABOUT THE AGENCY,AGENCY HISTORY,Sundance Solutions owner Alfred Nutile has been building Laravel Applications for 14+ years including for companies as large as Pfizer and as small as RuffTools. ,
222,ABOUT THE AGENCY,CAPABILITIES OVERVIEW,"Truly full stack but more importantly your CTO who will help you clearly define your vision and goals each step of the way. Using the build, measure learn philosophy he will make sure you are building the right thing for your vision and customers and not just the next thing. It truly is a partnership",
333,CAPABILITIES,STRATEGY,From the start your project will start with a foundation that is solid and flexible so NO TIME is wasted on the tech all of it focusing on your business goals. Using the 90-10 strategy we focus on the 10% of your idea that is the unique part and prove it works. The other 90 we most likely have it covered in our foundation Laravel Stack,We help you build the right things quickly
444,CAPABILITIES,STRATEGY,From dashboards to user management when your needs fit Filament we use it to save time and provide for you an easy to support code base. ,
555,CAPABILITIES,STRATEGY,For the front end we focus on simplicity with Inertia and Vue. Inertia is a Laravel first class citizen and allows us to build dynamic websites without the complexity most javascript heavy applications bring to a project!,
6 changes: 6 additions & 0 deletions tests/fixtures/sample_data/strategies_with_keys.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
KEY,CATEGORY,ORIGINAL QUESTION,RESPONSES,OTHER
111,ABOUT THE AGENCY,AGENCY HISTORY,Sundance Solutions owner Alfred Nutile has been building Laravel Applications for 14+ years including for companies as large as Pfizer and as small as RuffTools. ,
222,ABOUT THE AGENCY,CAPABILITIES OVERVIEW,"Truly full stack but more importantly your CTO who will help you clearly define your vision and goals each step of the way. Using the build, measure learn philosophy he will make sure you are building the right thing for your vision and customers and not just the next thing. It truly is a partnership",
333,CAPABILITIES,STRATEGY,From the start your project will start with a foundation that is solid and flexible so NO TIME is wasted on the tech all of it focusing on your business goals. Using the 90-10 strategy we focus on the 10% of your idea that is the unique part and prove it works. The other 90 we most likely have it covered in our foundation Laravel Stack,We help you build the right things quickly
444,CAPABILITIES,STRATEGY,From dashboards to user management when your needs fit Filament we use it to save time and provide for you an easy to support code base. ,
555,CAPABILITIES,STRATEGY,For the front end we focus on simplicity with Inertia and Vue. Inertia is a Laravel first class citizen and allows us to build dynamic websites without the complexity most javascript heavy applications bring to a project!,

0 comments on commit a12f32d

Please sign in to comment.