Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FileDescriptor.memoryMap, FileDescriptor.memoryUnmap, and FileDescriptor.memorySync #68

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jsflax
Copy link

@jsflax jsflax commented Nov 7, 2021

Add Swift implementations for mmap, munmap, and msync:

let swiftString = "swift"
let map = try fd.memoryMap(length: swiftString.count, pageOffset: 0, kind: .shared, protection: [.write])
let ptr = map.assumingMemoryBound(to: String.self)
// Update the mapping
ptr.pointee = swiftString
// Flush changes to the file system
try fd.memorySync(memoryMap: map, length: swiftString.count, kind: .synchronous)
// Delete the mapping
try fd.memoryUnmap(memoryMap: map, length: swiftString.count)

// Read in the file and validate the string was written correctly
let readBytes = try Array<CChar>(unsafeUninitializedCapacity: swiftString.count + 1) { (buf, count) in
    count = try fd.read(into: UnsafeMutableRawBufferPointer(buf))
}
XCTAssertEqual(String(validatingPlatformString: readBytes), swiftString)

This PR adds all three methods because the ability to sync and delete mappings are sensible requirements of being able to create them.

Add FileDescriptor.MemoryProtection to describe the desired memory protection of a mapping. The memoryMap method accepts these values as an array to be bitwise OR'd by the implementation.

Add shared and private for FileDescriptor.MemoryMapKind. There are more C flags available, but this seemed like the simplest starting point.

Add synchronous and asynchronous FileDescriptor.MemorySyncKind. The memorySync method accepts either one of these values and an invalidateOtherMappings boolean that is bitwise AND'd in the implementation to represent the appropriate C flag.

Add internal constants for flags/values related to mmap, munmap, and msync.

Add additional methods for more generic mocking and error handling. This seemed like a good addition for system calls that do not return CInt.

This PR does not include a Windows implementation.

Open question: Should memory mappings be a unique type?

Arguments in favor:

  1. Memory mappings can outlive FileDescriptors.
  2. Unmapping an UnsafeMutableRawPointer leaves you with an unusable instance with no way to ascertain that it has been unmapped.

Arguments against:

  1. For the most elegant API for this, it would add a series of structs/overhead: probably a MemoryMapRaw and MemoryMap<T> that have a very similar API to UnsafeMutableRawPointer and UnsafeMutablePointer<T>. The added methods would be sync, unmap, and isUnmapped. This is more precise than what currently exists in this PR at the cost of overhead.

Copy link
Member

@lorentey lorentey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very much welcome. System definitely needs mmap -- but we should expose its full power!

I did an initial review pass and noted many suggestions for improvements.

To correctly use mmap, we probably also need to expose a way to retrieve the page size (via getpagesize, sysconf, sysctl, or one of a myriad other ways I'm sure). This could technically be deferred, but without a known page size, some of mmap's use cases would be a bit limited.

@_alwaysEmitIntoClient
private init(_ raw: CInt) { self.init(rawValue: raw) }

/// Pages may not be accessed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs should specify the original C names of these constants (and types, where appropriate), to help orientate people who are already familiar with them, and to aid searching.

Suggested change
/// Pages may not be accessed.
/// Pages may not be accessed.
///
/// The corresponding C constant is `PROT_NONE`.

(This applies to all of these.)

@_alwaysEmitIntoClient
public static var `private`: MemoryMapKind { MemoryMapKind(rawValue: _MAP_PRIVATE) }

// TODO: There are several other MemoryMapKinds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should default to exposing every flag that is present in the underlying system headers. (This will be different on every supported platform.)

E.g., on Darwin, we have:

#define MAP_FIXED        0x0010 /* [MF|SHM] interpret addr exactly */
#define MAP_RENAME       0x0020 /* Sun: rename private pages to file */
#define MAP_NORESERVE    0x0040 /* Sun: don't reserve needed swap area */
#define MAP_RESERVED0080 0x0080 /* previously unimplemented MAP_INHERIT */
#define MAP_NOEXTEND     0x0100 /* for MAP_FILE, don't change file size */
#define MAP_HASSEMAPHORE 0x0200 /* region may contain semaphores */
#define MAP_NOCACHE      0x0400 /* don't cache pages for this mapping */
#define MAP_JIT          0x0800 /* Allocate a region that will be used for JIT purposes */
#define MAP_FILE        0x0000  /* map from file (default) */
#define MAP_ANON        0x1000  /* allocated from memory, swap space */
#define MAP_ANONYMOUS   MAP_ANON
#define MAP_RESILIENT_CODESIGN  0x2000 /* no code-signing failures */
#define MAP_RESILIENT_MEDIA     0x4000 /* no backing-store failures */
#define MAP_32BIT       0x8000          /* Return virtual addresses <4G only */
#define MAP_TRANSLATED_ALLOW_EXECUTE 0x20000 /* allow execute in translated processes */
#define MAP_UNIX03       0x40000 /* UNIX03 compliance */

/// visible to other processes mapping the same region, and whether
/// updates are carried through to the underlying file. This
/// behavior is determined by exactly one flag.
public struct MemoryMapKind: RawRepresentable, Hashable, Codable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POSIX calls these "flags"; we should follow the same terminology.

Suggested change
public struct MemoryMapKind: RawRepresentable, Hashable, Codable {
public struct MemoryMapFlags: RawRepresentable, Hashable, Codable {

Like MemoryProtection, this needs to also be an OptionSet.

public static var write: MemoryProtection { MemoryProtection(rawValue: _PROT_WRITE) }
/// Pages may be executed.
@_alwaysEmitIntoClient
public static var executed: MemoryProtection { MemoryProtection(rawValue: _PROT_EXEC) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static var executed: MemoryProtection { MemoryProtection(rawValue: _PROT_EXEC) }
public static var execute: MemoryProtection { MemoryProtection(rawValue: _PROT_EXEC) }

/// mapping (and must not conflict with the open mode of the file).
/// Flags can be the bitwise OR of one or more of each case.
@frozen
public struct MemoryProtection: RawRepresentable, Hashable, Codable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be an OptionSet!

/// Specifies that an update be scheduled, but the call
/// returns immediately.
@_alwaysEmitIntoClient
public static var asynchronous: MemorySyncKind { MemorySyncKind(rawValue: _MS_ASYNC) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want MS_INVALIDATE here.

On Darwin, we probably also want to expose MS_KILLPAGES and MS_DEACTIVATE.

Comment on lines +498 to +500
public func memoryMap(
length: Int, pageOffset: Int, kind: MemoryMapKind, protection: [MemoryProtection]
) throws -> UnsafeMutableRawPointer {
Copy link
Member

@lorentey lorentey Nov 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs significant changes.

  • I think mapMemory would be a better name for this function.
  • MemoryProtection needs to be an OptionSet. The protection argument must not be an array.
  • The term pageOffset (and its documentation above) does not match what mmap's offset argument means -- it's an offset into the file object referenced by the file descriptor.
  • The type of the offset must be Int64, to match FileDescriptor.seek.
  • If we want to reorder arguments this way, then the offset ought to precede the length, not vice versa.
  • We must expose mmap's addr parameter (as an optional UnsafeMutableRawPointer, with a nil default).
  • The original mmap's fd parameter is optional. (MAP_ANONYMOUS mappings are very important, and they do not take a file descriptor.) This means that we must either move this method out of FileDescriptor, or we must introduce a second, top-level mapMemory that's dedicated to the anonymous case.
    Note that for anonymous maps, Darwin uses the fd argument as an extra information channel (see VM_FLAGS_PURGABLE, VM_MAKE_TAG()) -- this may be an argument in favor of spinning of a separate function for that use case.
  • mmap can technically be used to map memory at address zero, i.e., the null pointer, which isn't representable by an UnsafeMutableRawPointer. This is an extremely niche edge case, but I think it would make sense to change the return type to an optional to reflect this.

Here is one way to embrace these suggestions:

public struct MemoryProtection { ... }
public struct MemoryMapFlags { ... }

extension FileDescriptor {
  public func mapMemory(
    at address: UnsafeMutableRawPointer? = nil,
    from offset: Int64,
    length: Int,
    protection: MemoryProtection,
    flags: MemoryMapFlags,
  ) throws -> UnsafeMutableRawPointer?
}

public struct MachVMFlags: RawRepresentable {
  public var rawValue: CInt
  public init?(rawValue: CInt)
  public static var purgeable: Self { /* VM_FLAGS_PURGABLE */ }
  public static func tag(_ value: UInt8) -> Self
}

public func mapAnonymousMemory(
  at address: UnsafeMutableRawPointer? = nil,
  length: Int,
  protection: MemoryProtection,
  flags: MemoryMapFlags,
  machVMFlags: MachVMFlags? = nil
) throws -> UnsafeMutableRawPointer?

(If we go in this direction, then both public variants must forward to the same @usableFromInline implementation.)

Given that either MAP_SHARED or MAP_PRIVATE must always be present in the flags, it may make sense to model these as something other than mere flags -- perhaps going as far as splitting mapMemory into two variants, mapSharedMemory and mapPrivateMemory. Then again, this would mean that we'd also need to split the anonymous variant, and this treads into slightly dangerous territory -- we could end up painting ourselves into a combinatorial explosion in case a platform decides to introduce new flavors in addition to these two.

/// - memoryMap: The memory map to unmap
/// - length: Amount in bytes to unmap.
@_alwaysEmitIntoClient
public func memoryUnmap(memoryMap: UnsafeMutableRawPointer, length: Int) throws {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this belongs under FileDescriptor. If we don't introduce a namespace for these, then this needs to be a top-level function.

I think unmapMemory would be a better name for this. The pointer argument needs to be optional and I think it would make sense to label it at.

Suggested change
public func memoryUnmap(memoryMap: UnsafeMutableRawPointer, length: Int) throws {
public func unmapMemory(
at address: UnsafeMutableRawPointer?,
length: Int
) throws

Comment on lines +553 to +558
public func memorySync(
memoryMap: UnsafeMutableRawPointer,
length: Int,
kind: MemorySyncKind,
invalidateOtherMappings: Bool = false
) throws {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this belongs under FileDescriptor. If we don't introduce a namespace for these, then this needs to be a top-level function.

I don't think we should split the MS_INVALIDATE flag out of the rest -- there are additional (system-specific) flags that we also need to model, and it makes sense to keep things simple/consistent.

Suggested change
public func memorySync(
memoryMap: UnsafeMutableRawPointer,
length: Int,
kind: MemorySyncKind,
invalidateOtherMappings: Bool = false
) throws {
public func syncMemory(
at address: UnsafeMutableRawPointer,
length: Int,
flags: MemorySyncFlags
) throws {

We can also have this function take an UnsafeMutableRawBuffer instead of the first two arguments. (I'm not sure if that would be more or less convenient though. If we take this route, we probably need to adjust the return value of the mmap wrappers accordingly, as well as the parameters of munmap etc.

/// Determines whether memory sync should be
/// synchronous, asynchronous.
@frozen
public struct MemorySyncKind: RawRepresentable, Hashable, Codable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like MemoryMapKind, I think this should be modeling the original (open-ended) flags. It ought to be called MemorySyncFlags, and it needs to be an OptionSet, not an enum-like struct.

@lorentey
Copy link
Member

There is this whole family of address space-related functionality that we can't currently group under a specific type. This includes the anonymous flavor of mmap, as well as munmap, msync, madvise etc. (It would also be great if we could expose a way to enumerate existing memory mappings, but I have no idea if there are public APIs for that, or even if that would be a good fit for System.)

@milseman are you okay with just pulling these in as top-level functions?

We could choose to instead define a namespacing enum, and add all of these as members:

enum Memory {
  static var pageSize: Int { get }
  // properties for getting the cache line size, cache sizes etc.

  func map(at:, length:, protection:, flags:, from:, offset:) throws -> UMRP
  func unmap(at:, length:) throws
  func protect(at:, length:, protection:) throws
  func sync(at:, length: flags:) throws

  // BSD/Darwinisms
  func advise(at:, length:, advice:) throws
  func inherit(at:, length:, inherit:) throws
  func isInCore(at:, length:) throws -> ??? // mincore (BSD)

  // Realtime stuff
  func lock(at:, length:) throws
  func unlock(at:, length:) throws
  func lockAll(flags:) throws
  func unlockAll() throws
}

This could be an attractive option, if for no other reason but to move these really low-level functions out of highly visible namespaces, such as FileDescriptor. (I believe (hope?) the number of Swift programs that have to use file descriptors is likely to heavily outweigh those who need to deal with memory mappings. I could be wrong, though.)

@milseman
Copy link
Contributor

For sysconf, we have #33, which we can pull stuff from any time. Those sketches also have the general need for a namespace for global functions. Should we have many or just one? E.g. enum FileSystem {}, enum SystemConfig {}, ...

@lorentey
Copy link
Member

Oh, good point! I forgot we wanted to introduce SystemConfig for that. The page size etc. information feels like a good fit for it.

I think if we add one of these, we may as well add as many as we need.

(I remember being worried about adding file operations like copy/rename/remove into a FileSystem namespace rather than having them directly on FilePath, because that is such a convenient place for them. But then again, perhaps those convenience functions would be better defined in a separate module, with System restricting itself to basic syscalls like creat/unlink/rmdir/rename etc, which could just as well go under a namespace.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants