Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoBD compliance #16

Open
timschneider opened this issue Apr 22, 2022 · 4 comments
Open

GoBD compliance #16

timschneider opened this issue Apr 22, 2022 · 4 comments
Assignees

Comments

@timschneider
Copy link

The goal of this issue is to provide a manual and working copy of papermerge to be GoBD compliant.

The basic installation of papermerge will already meet alot of the requirements of the german GoBD

The Bitkom e.V. published a guide e.g. checklist to check if a software and process can be GoBD compliant.

In my humble opinion there is currently on major blocker and this is 2.2.1 c) requirement 17 where it reads:

Vollständigkeit von Daten und Dokumenten über die gesamte Aufbewahrungsfrist

  • Keine Löschmöglichkeit vor dem Ende der Aufbewahrungsfrist.
  • Keine automatisierte Löschung von Daten und Dokumenten nach Ende der Aufbewahrungsfrist (z. B. stets alle Daten oder Dokumente löschen, die älter als X Jahre sind).
  • Für die Löschung ist zwingend eine organisatorische Freigabe einzuholen, um dem Umstand gerecht zu werden, dass entsprechend § 147 Abs. 3 S. 3 AO die Aufbewahrungsfrist nicht abläuft, soweit und solange die Unterlagen steuerlich von Bedeutung sind und deren Festsetzungsfrist noch nicht abgelaufen ist (Ablaufhemmung).
  • Hinweis: Außersteuerliche Regelungen können eine längere Aufbewahrungsfrist erfordern.

This delete lock (Keine Löschmöglichkeit vor dem Ende der Aufbewahrungsfrist) includes priviliged access like root / admin. One way to implement such mechanism is MinIO Retention, it turnes your S3 bucket into a WORM (Write Once Read Many) storage backend. And the Cohasset Associates, Inc. did already an assesment on MinIO to deploy such an S3 Storage to be SEC 17a-4(f), FINRA 4511(c) and CFTC 1.31(c)-(d) compliance. We can consider GoBD and SEC 17a-4(f), FINRA 4511(c) and CFTC 1.31(c)-(d) comparable as both deal with storing of tax data on digital devices.

From version 2.1 ongoing papermerge is moving towards a kubernetes ready architecture and RWX file storage to store data.
The RWX storage is also used to share the data between the app and the worker nodes. But this development moved papermerge a bit further away from the S3 backend.

A further goal of this issue is to adapt papermerge in order to use an S3 WORM storage backend (storing only relevant data, this includes any intermediate steps in the processing of the original data to the processed data, but nothing more - as this data is stored for at least 10 years on that WORM drive).

To archive this goal, we need to adapt not only the core, but also other parts of the papermerge project. We should link all adaptions in order to be GoBD compliant to this issue, so that we can track the development.

@timschneider
Copy link
Author

timschneider commented Apr 22, 2022

What I tryied so far is to update the provided S3 Storage Backend to meet the 2.1 architecture papermerge/s3#1

This allows me to connect to an S3 Bucket and papermerge-core is uploading V1 into that bucket - but no further versions are stored in the S3 bucket, e.g. V2 (with OCR).

@timschneider
Copy link
Author

I also enabled TLS / HTTPS and needed to adapt papermerge.js to connect via secure websocket (wss)
papermerge/papermerge.js#7

@ciur
Copy link
Member

ciur commented Apr 23, 2022

Thank you very much for taking your time and opening this issue and opening pull requests.

I merged both of your merge requests. Thank you again for your contributions!

Regarding GoDB compliance.
The equivalent feature name in english is data retention and various countries have different requirements on how long what documents should be retained. In simple English the feature is to enable user to configure how long specific data will stay before being deleted.

Data retention feature will be implemented. However, it won't be part of papermerge-core - it will be provided as plugin (django app). There two main reasons why data retention will be implemented as plugin:

  1. because data retention is not a feature everybody needs
  2. because monolithic architecture is something to be avoided (in long term it leads to very high cost of maintenance)

Papermerge-Core will contain only features wanted by all users (vast majority of users) - everything else will be provided as plugin.

I will keep this ticket open for reference and GoDB links.

@ciur ciur self-assigned this Apr 23, 2022
@timschneider
Copy link
Author

timschneider commented Apr 26, 2022

That sounds good! But I don't know if it is worth the effort to code a storage layer which is able to provide the data retention layer as required by the GoBD. I think to put the data retention into a seperate plugin, frontend (set retention time) and backend (prohibit deletion for files under retention) is the way to go. But I think to actually perform the task of archiving and storing the data under retention - prevent data loss, ensure data integrity , prevent admin/root access/modifications, keep track of ANY modifications and so on over the hole storage time - is way to complex for a simple plugin and should be done by the filesystem/storage layer.

I think you already developed part of the solution with the storage class under core.lib.storage (btw. I like the concept). As far as I understand the storage class, it is the place to handle all File IO. So a Retention Plugin could provide an additional storageclass which is for example backed by MinIO S3 with enabled retention and a folder with enabled retention would use such storage class - while other folders are using the default storage class - or to set the default sotrage class to an retention enabled storage.

Is it the way to go?

I just had a quick look at the source code, in this perspective and found some other places where File IO is performed - Is there any plan to move these File IO into the storage class or is there any blocker to do so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants