-
-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement encoding
in .rules
#2319
base: master
Are you sure you want to change the base?
Conversation
e30d9ad
to
5966c8a
Compare
encoding
for -f *.rules
encoding
in .rules
Thanks for exploring this. It's probably a good idea. But what if rules files made it easier to run a preprocessor, eg
That requires a little more expertise from the user, I guess ? |
The case of reading CSV from stdin might also be worth thinking about. |
I like the idea, though implementing it with the other cases is a bit wierd, since we ignore We could split it into two options with the other being
Is it even possible to read a csv from stdin?
Though if they would work, they would be handled the same as the cases mentioned above. |
Is it even possible to read a csv from stdin?
Like so: `-f csv:- --rules-file=foo.rules`
|
Hi @jokesper, some more thoughts. On mac, it doesn't install cleanly for me:
I have brew's icu4c installed:
I haven't been able to test on windows (I have only an arm vm, which GHC doesn't support yet). |
But it can run x86_64 ghc. And yes, it's similar on Windows: text-icu isn't an easy thing to install.
|
Should I instead switch to something like |
Yes that looks pretty good. Converting from Text to String won't affect anyone noticeably I predict, though if it can be avoided when unnecessary, all the better.
|
5966c8a
to
88749fd
Compare
I just switched the part I already implemented to use I noticed Should I document this anywhere, as it's already documented in the Arch wiki. |
88749fd
to
3fd57bb
Compare
I implemented the remaining two cases. To do this I had to change the arguments to I think this is ready for review. |
3fd57bb
to
0512800
Compare
Writing tests for this would require quite a bit of rework in how they are treated. |
Hi @jokesper. Sorry for the slow reply. If this would make hledger installers need to use special build flags, that would be a problem unfortunately. |
I have inquired at dmwit/encoding#23 |
Instead of `text-icu` as [recommended by `text`](https://hackage.haskell.org/package/text-2.1.2/docs/Data-Text-Encoding.html): > To gain access to a much larger family of encodings, use the `text-icu` package. we use `encoding`, since `text-icu` requires an external library. `encoding` does require a custom setup, which is currently affected by a [bug in Cabal](haskell/cabal#6505) but it doesn't require an external runtime dependency, which makes it easier to distribute on macos and windows. I also changed the `rReadFn` to get a `Handle` rather than a `Text`. Such that we can use non standard functions for reading it. This is kind of sideways, since the one place we care about it, we don't use it at all. Though it still caused issues, since the input wasn't parseable as text. I think not using the handle has something to do with how `-` is treated as stdin but I didn't do any further tests on that, since this is out of the scope of this PR.
While trying to fix the CI, I noticed you already needed to supply Though I would still prefer if we didn't need a flag just to build the core |
2c19779
to
0b6ce73
Compare
This makes sure we don't have to build a `build-type: Custom` package
Since Though the new |
Thanks @jokesper . Adding encoding-0.9 to stack*.yaml should fix the current CI failure. |
(Or in the case of stack.yaml, bumping it to the latest nightly-2025-02-11.) |
85deea2
to
d775b01
Compare
Thanks for the info. It's fixed now |
Thanks @jokesper. 26 files changed encoding >0.9 dependency added Main commit description
Github AI summaryThis pull request includes several changes to the Updates to journal reading functions:
Import and helper function updates:
Changes in specific readers:
Test updates:
Miscellaneous:
These changes improve the flexibility and robustness of the journal reading functions by allowing them to handle different input sources and text encodings. SM summary/doc draftPreviously, hledger could read CSV files containing non-ascii characters only if they are UTF8-encoded. Actions
|
I see ImplicitParams is added, was that intended ? |
Yes, this is due to the interface of |
Edited my recent comment a little (re commit message). I see. I have used ImplicitParams in the past and don't like it much, it always confuses me. But it seems a requirement here.. and maybe a benefit. Do you have any insight into what it does for us here ? |
I see: https://hackage.haskell.org/package/encoding-0.9/docs/System-IO-Encoding.html
An encoding can be assigned to |
Alternatively we could inline the body of |
I kicked off the multi-platform CI builds. encoding is not building on Windows unfortunately, because of:
Here's related discussion. It sounds like this file wasn't and perhaps still isn't present on a stock Windows. And dmwit/encoding#14 mentions a build flag that can work around it. |
I opened dmwit/encoding#26 |
I applied the workaround. Though I don't know whether this fixes the issue for cabal on windows. Is this (cabal+windows) even part of our target developer base? |
Yes, being easy to build with stack or cabal on any platform is a hledger requirement. Better if we can fix this upstream if possible. Here's another note for the doc: The following encodings are supported (case-insensitive, can also be written with inner spaces or hyphens): ASCII |
I added the list to the manual. With a link to the cases for the alternative spellings. |
@jokesper, beyond the scope of this PR but do you have any thoughts about wider support for text encodings - would we ideally support this for all input files ? Output files ? Or is it best left to external tools, with CSV reading being a special case because we are more often dealing with CSV from diverse sources ? |
I think having it just for CSVs is fine, since this is the common use case. It's unlikely to get a journal file in a different encoding, though what would be possible, since we changed from Expanding it to other files wouldn't be that hard, except for the design choice of where to document the encoding (command line parameter?). Though in my opinion it's really not needed at all for any other cases. What could be useful is in an csv export, though I don't quite know if this is really relevent. If anyone wants to use it in an CSV export please open an issue. |
An implementation for
encoding
in rules file:The cases are:
-f *.rules
-f *.csv
-f *.csv --rules *.rules
The second and third case are effectively the same. The problem is that for them we call a
Reader
srReadFn
value taking the contents rather then the filepath / handle.So we either would have to change
Reader
to take aFilePath
instead ofText
, which would break how we do Tests,or we add an extra field
rGetContents :: FilePath -> IO Text
.I would rather get some feedback on this before I implement a potentially bigger change.
If implemented this PR should: