Avoid resource leaking in scanners

# Clarification and motivation

## Problems

This is an issue that I kept in mind for a long time, and since xrefcheck is becoming a serious tool, I think we should have this issue handled.

It is a known minor issue of `readFile` that until the full produced lazy bytestring/text is processed, the file descriptor remains open.

An open file descriptor is a resource, and OSes tend to have a limit on the number of file descriptors that can be opened, and maybe a separate limit per process too. Meaning that we should be cautious to not exceed these limits.

In the markdown scanner, we use `readFile` and seemingly do not enforce the computation (we apply parsing, but its result is not forced), meaning that the file descriptor won't be closed before the result of scanning being requested, and in case of large repositories this may be a severe issue.

---

Another issue similar to that: our markdown parser works on a strict `Text`, meaning that during parsing the entire file's content will be kept in memory, and memory can be treated as a resource too.

How much severe this issue is? I'm not sure, documentation usually does not take much space, probably even if it is auto-generated. Documentation size is fundamentally limited by how much a human being can read, so we do not expect dozen of gigabytes here, let's further throw this concern away. 

## Proposed solution

Conceptually, we want two things:
1. Minimize the time between `readFile` start and full parsing, as this is the time when we hold the resource (first the opened descriptor, then file content in memory).
2. Make parsing of various files as parallel as possible, in theory.

So from a `ScanAction` object we probably expect that it reads the file as quickly as possible, maybe putting its full content in the memory, but is free to return a not fully evaluated result (`(FileInfo, [ScanError])` pair). Forcing it is the caller's responsibility. 

## What should be done in code

First, let's document the expectations from `ScanAction`, add them to haddock of this type.

Next, we should make sure that after `readFile` call finishes, the file is fully read.

Since we do not parallelize file reads (in case of I/O against the filesystem this AFAIU does not make much sense), this will automatically resolve our issue with too many opened file descriptors.

# Acceptance criteria

* Documentation of `ScanAction` is updated to include the mentioned comments about laziness of each inner action (file read and parsing).
* The current implementation of `ScanAction` is updated respectively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid resource leaking in scanners #220

Clarification and motivation

Problems

Proposed solution

What should be done in code

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid resource leaking in scanners #220

Description

Clarification and motivation

Problems

Proposed solution

What should be done in code

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions