From 4001d18ac340c9361e78a50bba7e90d089c08a31 Mon Sep 17 00:00:00 2001 From: Dimitri B Date: Tue, 20 Dec 2022 10:56:39 +0100 Subject: [PATCH] Fix grammar, formatting, and make shorter sentences in Chapter 5 --- Chapter 5: Memory Management.md | 207 ++++++++++++++++---------------- 1 file changed, 103 insertions(+), 104 deletions(-) diff --git a/Chapter 5: Memory Management.md b/Chapter 5: Memory Management.md index e1fd2c2..b3e117a 100644 --- a/Chapter 5: Memory Management.md +++ b/Chapter 5: Memory Management.md @@ -6,110 +6,110 @@ author: 'Mohammed Q. Hussain' # Chapter 5: Memory Management {#ch-memory-management} ## Introduction -It's well-known to us right now that one of most important aspects in modern operating systems is protecting the memory in a way that doesn't allow a process to access or write to the memory of another process, furthermore, the memory of the kernel should be protected from the running processes, that is, they should be prevented from accessing directly the memory of the kernel or writing to the memory of the kernel. When we use the term *memory of the kernel* or *memory of the process*, we mean the region of the main memory that is being used by the kernel or the process and all of its data or code is stored in this region of the memory. +By now, we're very well aware that one of the most important aspects in modern operating systems is to protect the memory in a way that doesn't allow a process to read from or write to the memory of another process. Furthermore, the kernel memory should be protected from the running processes, that is, they should be prevented from directly reading from it or writing to it. When we use the term *kernel memory* or *process memory*, we mean the region of the main memory that is being used by the kernel or the process where all of its data and code is stored. -In chapter \ref{ch-x86}, we have presented the distinction between the logical view and physical view of the memory and one of the logical views of the memory has been presented on the same chapter, this logical view was segmented-memory model. We have seen how the hardware has employed the protection techniques to provide memory protection and protect the segments from each other. In the same chapter, we have presented another logical view of the memory, it is flat-memory model, which is exactly same as the physical view of the memory. In this view, the memory is a big bunch of contiguous bytes and each byte has its unique address that can be used to refer to this byte in order to read it or to write to it. +In chapter \ref{ch-x86}, we have presented the distinction between the logical view and the physical view of the memory. One of the logical views of the memory has been presented in the same chapter, this logical view was the segmented-memory model. We've seen how to employ the hardware protection techniques to provide memory protection and protect the segments from each other. In the same chapter, we've presented another logical view of the memory, the flat-memory model, which is exactly the same as the physical view of the memory. In this view, the memory was a big chunk of contiguous bytes and each byte had its unique address that could be used to refer to this byte in order to read from it or to write to it. -We know that modern operating systems use the flat-memory model and based on that we decided to use this model on 539kernel instead of the segmented-memory model. Deciding which model to use is the job of the kernelist. However, unlike segmentation, when we introduced the flat-memory model, we haven't shown how the memory can be protected in it, in this chapter we present one of the methods that can be used to implement memory protection in flat-memory model. This technique is known as *paging*, it is a well-known technique that is used widely by modern operating systems and it has a hardware support in x86 architecture. +We know that modern operating systems use the flat-memory model and based on that we decided to use this model in the 539kernel instead of the segmented-memory model. Deciding which model to use is the job of the kernelist. However, unlike for segmentation, when we introduced the flat-memory model, we haven't shown how the memory can be protected in this mode. In this chapter we will present one of the techniques that can be used to implement memory protection in the flat-memory model. This technique is known as *paging*. It is a well-known technique that is used widely by modern operating systems and it has hardware support in the x86 architecture (and many other architectures). ## Paging in Theory -In paging, the memory of the process (before being loaded to the physical memory) is divided into a number of fixed size blocks known as *pages*, in the same manner, the physical memory is divided into blocks with the same fixed size, these blocks of physical memory are known as *page frames*. Figure @fig:28092021_0 shows an example of pages and page frames, as you can see in the figure, process `A` is divided into `n` pages and the main memory is divided into `n` page frames, please note that the both `n`s shouldn't necessarily be equal. Because both page and page frame have the same size, for example `4KB` ^[That is, each page is of size `4KB` and each page frame is of size `4KB`,], each page can be loaded exactly into one page frame. To load process `A` into the memory, each of its pages should be loaded into a page frame. A page can be loaded into any page frame, for example, let's assume we are loading page `0` of process `A` and the first free page frame that we found is page frame `30`, then, the page `0` can be loaded into page frame `30`. Of course, the pages of more than one process can be loaded into the page frames. +In paging, the memory of the process (before being loaded to the physical memory) is divided into a number of fixed size blocks known as *pages*. In the same manner, the physical memory is divided into blocks with the same fixed size, these blocks of physical memory are known as *page frames*. Figure @fig:28092021_0 shows an example of pages and page frames. As you can see in the figure, process `A` is divided into `n` pages and the main memory is divided into `n` page frames. Please note that the two `n`s don't necessarily have to be equal. Because both, pages and page frames have the same size, for example `4KB` ^[That is, each page is of size `4KB` and each page frame is of size `4KB`.], each page can be loaded exactly into one page frame. To load process `A` into the memory, each of its pages should be loaded into a page frame. A page can be loaded into any page frame. For example, let's assume we are loading page `0` of process `A` and the first free page frame that we've found is the page frame `30`. Then, the page `0` can be loaded into page frame `30`. Of course, different page frames may contain pages of different processes but a single page frame may only contain the page of a single process. ![An Example that Shows Pages and Page Frames](Figures/memory-ch/Fig28092021_0.png){#fig:28092021_0 width=35%} -A data structure known as *page table* is used to maintain these information about the mapping between pages and their corresponding page frame. Each process has its own page table, in our example of process `A`, the information that tells the processor that page `0` can be found in page frame `30` is stored in process `A`'s page table. In paging, any memory address generated by the process to read or write some data to the memory will be a logical memory address ^[In x86, this logical memory address is known as *linear memory address* as we have discussed earlier in chapter \ref{ch-x86}.], that is, a not real not physical memory address, it has a meaning for the process itself, but for the processor it should be translated to the corresponding physical memory address. The page table of the process is used to perform this translation. +A data structure known as the *page table* is used to maintain the information about the mapping between pages and their corresponding page frames. Each process has its own page table, in our example of process `A`, the information that tells the processor that page `0` can be found in page frame `30` is stored in process `A`'s page table. In the paging model, any memory address generated by the process to read some data from or to write some data to the memory will be a logical memory address ^[In x86, this logical memory address is known as the *linear memory address* as we have discussed earlier in chapter \ref{ch-x86}.], that is, not a real physical memory address. It will have a meaning for the process itself, but for the processor to understand it, it should be translated to the corresponding physical memory address. The page table of the process is used to perform this translation. -Basically, every generated logical memory address of a process running in a paging-enabled environment is composed of the following parts: the page number and the offset. For example, assume a system which employs paging with length of `2` bytes for memory addresses ^[In 32-bit x86 architecture, the length of memory address is `4 bytes = 32 bits`, that is, `2^32 bytes = 4GB` are addressable. An example of a memory address in this environment is `FFFFFFFFh` which is of length `4 bytes` and refers to the last byte of the memory.]. In this hypothetical system, the format of logical address is the following, the first byte represents the page number and the second byte represents the offset. Process `B` is a process that runs in that system, assume that it performed an instruction to read data from the following memory address `0150h`, this is a logical memory address that needs to be translated to the physical address to be able to get the required content. Based on the format of the logical memory addresses in this system, the first byte of the generated memory address which is `01h` represents the page, that means that the required data is stored in page `01h = 1d` of the process `B`, but where exactly? According the the generated address, it is on the offset `50h = 80d` of that page. +Basically, every generated logical memory address of a process running in a paging-enabled environment is composed of the following parts: the page number and the offset into the page. For example, consider a system which employs paging with length of `2` bytes for memory addresses ^[In the `32-bit` x86 architecture, the length of a memory address is `4 bytes = 32 bits`, that is, `2^32 bytes = 4GB` are addressable at maximum. An example of a memory address in this environment is `FFFFFFFFh` which is of length `4 bytes` and refers to the last byte of the memory.]. In this hypothetical system, the format of logical address is the following: the first byte represents the page number and the second byte represents the offset. Process `B` is a process that runs on that system. Assume that it performs an instruction to read data from the memory address `0150h`, this is a logical memory address that needs to be translated into a physical address for the processor to be able to get the required contents. Based on the aforementioned format of the logical memory addresses in this system, the first byte of the generated memory address, which is `01h` in our example, represents the page, that means that the requested data is stored in page `01h = 1d` of the process `B`, but where exactly? According the the generated address, it is in the offset `50h = 80d` of that page. -To perform the translation and get the physical memory address, we need to know in which page frame the page `1` of process `B` is loaded. To answer this question the page table of process `B` should be consulted. For each page, there is an entry in the page table that contains necessary information, and of course one of those information is the page frame that this page is stored on. This information can be the page frame number or the base memory address of the page frame, it doesn't matter since we can get the base memory address of the page frame by knowing its number and the size of page frames. After getting the base memory address, we can combine it with the required offset to get the physical memory address of the data in question. The hardware that is responsible for the process of memory address translation is known as *memory management unit* (MMU). +To perform the translation and get the physical memory address, we need to know in which page frame the page `1` of the process `B` is loaded. To answer this question the page table of the process `B` should be consulted. For each page, there is an entry in the page table that contains the necessary information, and of course part of this information is the page frame that this page is loaded in. This information can be the page frame number or the base memory address of the page frame. It doesn't matter since we can get the base memory address of the page frame by knowing its number and the size of the page frames. After getting the base memory address, we can combine it with the required offset to get the physical memory address of the data in question. The hardware that is responsible for the process of memory address translation is known as the *memory management unit* (`MMU`). -Sometimes, the page table is divided into more than one level. For example, in two-level page table, the entries of the main page table refers to an entry on another page table that contains the the base address of the page frame, x86 architecture uses this design, so we are going to see it on details later on. The reason of using such design is the large size of page tables for a large main memory. As you know, the page table is a data structure that should reside in the main memory itself, and for each page there is an entry in the page table, in x86 for example, the size of this entry is `8` bytes. Furthermore, the size a page tend to be small, `4KB` is a real example of page size. So, if `4GB` is needed to be represented by a page table with `8` bytes of entry size, then `8MB` is needed for this page table which is not a small size for a data structure needed for each process in the system. +Sometimes, the page table is divided into more than one level. For example, in a two-level page table, the entries of the main page table refer to entries in an other page table that contain the the base addresses of the page frames. The x86 architecture uses this design, so we are going to examine it in detail later on. The reason for using such design is the large size of the page tables for a large main memory. As you know, the page table is a data structure that should reside in the main memory itself, and for each page there must an entry in the page table. In x86 for example, the size of this entry is `8` bytes. Furthermore, the size of a page tends to be small, `4KB` is a real-world example of common page size. So, if `4GB` of memory must be represented by a page table that has entries of `8` bytes size, then `8MB` of memory is needed for this page table. Needless to say that this is not a small size for a data structure that needs to be created for each process in the system. -It should be clear by now how paging provides memory protection. Any memory address that is generated by the process will be translated to the physical memory by the hardware, there is no way for the process to access the data of any other process since it knows nothing about the physical memory and how to reach it. Consider process `C` that runs on the same hypothetical system that we have described above, in the memory location that's represented by the physical memory address `A1 9Bh` there is some important data which is stored by the kernel and process `C` wishes to read it. If process `C` tries the normal way to read from the memory address `A1 9Bh` the `MMU` of the system is going to consider it as a logical memory address, so, the page table of process `C` is used to identify in which page frame that page `00A1h` of process `C` is stored. As you can see, the process knows nothing about the outside world and cannot gain this knowledge, it thinks it is the only process in the memory, and any memory address it generates belongs to itself and it cannot interfere the translation process or modify its own page table. +It should be clear by now how paging provides memory protection. Any memory address that is generated by the process will be translated to the physical memory by the hardware, there is no way for the process to access the data of any other process since it knows nothing about the physical memory layout and how to reach parts of it. Consider the process `C` that runs on the same hypothetical system that we have described above. At the memory location that's represented by the physical memory address `A1 9Bh` there is some important data which is stored by the kernel and process `C` wishes to read it. If process `C` tries to go the normal way to read from the memory address `A1 9Bh` the `MMU` of the system is going to interpret it as a logical memory address, so, the page table of process `C` is used to identify in which page frame the page `00A1h` of the process `C` is stored. As you can see, the process knows nothing about the outside world and cannot gain this knowledge, it thinks it is the only process in the memory, and any memory address it generates belongs to it, it can't interfere with the translation process or modify its own page table. ## Virtual Memory -In multitasking system, beside the need of memory protection, also, the main memory should be utilized as much as we can. In such environment, multiple processes should reside in the main memory and at some point of time the main memory will become full and the kernel will not be able to create any new process before stopping a currently running process to use its space in the main memory. +In multitasking systems, besides the need for memory protection, the main memory should also be utilized as much as possible. In such environments, multiple processes will reside in the main memory and at some point in time the main memory will become full and the kernel won't be able to create new processes before stopping a currently running process to use its space in the main memory. -There are many situations where the current processes are occupying a space from the main memory but doesn't really use this space, that wastes this space since it can be used to load a process that really needs this space. An example of these situations is when the process is idle, that is, doing nothing but waiting for some external action (e.g. a button click), in this case the only active code of this process that should be in the main memory is the code that makes the process waits for an event. Furthermore, modern software tend to be too large, there are a lot of routines in a code of modern software that might not be called at all during executing that software, loading the code of those routines into the main memory wastes the occupied space, the routines will be there in the memory, taking some space that can be used for more useful purposes and they will never be called. +There are many situations where a running process can occupy space from the main memory without really using it, that wastes this space since it can be used more effectively to load data for a process that really needs this space. An example of such a situation is when the process is idle, that is, doing nothing but waiting for some external action (e.g. a button click). In this case the only active code of this process that should be in the main memory is the code that makes the process wait for an event. Furthermore, modern programs tend to be too large to fit entirely into the main memory, there are a lot of routines in the code of modern programs that might not be called at all during the runtime of a program. Loading the code of those routines into the main memory wastes the occupied space, unused routines will be loaded into memory, occupying space that can be better utilized for more useful purposes. -Virtual memory is a memory management technique that can be used to utilize the main memory. You might noticed in modern operating systems, you can open any number of software in a given time and you never get a message from the operating system that tells you that there is no enough space in the main memory although the software that you are running need a large space of memory (modern web browsers are obvious example), how can that be achieved? Well, by using virtual memory which depends on paging that we have discussed earlier. +Virtual memory is a memory management technique that can be used to better utilize the main memory. You might have noticed that in modern operating systems, you can open any number of programs at the same time and you never get a message from the operating system that tells you that there is not enough space in the main memory although the programs that you're running need a lot of space in the memory (modern web browsers are an obvious example), how can this be achieved? Well, by using virtual memory which depends on paging that we've discussed earlier. -Regarding to paging, we may ask ourselves an important question, should all process' pages be loaded into the memory? In fact, not really. As we have said, the binary code of the software may have a lot of routines that may not be called, so, the pages that contain these routines should not be loaded into the memory since they will not be used, instead, this space can be used for another pages that should really be on the memory. To realize that, when the software is loaded for the first time, only the page the contains the entry code of the software (e.g. `main` function in C) is loaded into the memory, not any other page of that software. When some instruction in the entry code tries to read data or call a routine that doesn't exist on the loaded page, then, the needed page will be loaded into the main memory and that piece which was not there can be used after this loading, that is, any page of the process will not be loaded into a free page frame unless it's really needed, otherwise, it will be waiting on the disk, this is known as *demand paging*. +With regard to paging, we may ask ourselves an important question: should all process' pages be loaded into memory? In fact, it's not really necessary. As we've said, a program's binary code may have a lot of routines that may never be called, so, the pages that contain these routines should not be loaded into memory since they will not be used, instead, this space can be kept unused until another process actually needs them. To realize that, when the program is loaded for the first time, only the page that contains the entry code of the program (e.g. the `main` function in C) is loaded into memory, not any other page of that program. If some instruction in the entry code tries to read data or call a routine that doesn't exist in the loaded page, then the needed page will be loaded into the main memory and the previously missing piece will then be available to be used by the program. This means that only pages that are actually used by a process will be loaded into page frames, all the other pages will wait on the disk until they are required by the program flow, this is known as *demand paging*. -By employing demand paging, virtual memory saves a lot of memory space. Furthermore, virtual memory uses the disk for two things, first, to store the pages that are no demanded yet, they should be there so anytime one of them is needed, it can be loaded from the disk to the main memory. Second, the disk is used to implement an operation known as *swapping*. +By employing demand paging, virtual memory saves a lot of memory space. Furthermore, virtual memory uses the disk for two other things. First, it stores pages from the hard disk that weren't demanded yet in a cache by utilizing the unused page frames, they are kept there so anytime one of them is needed, it can be mapped from the cache into the process' address space. Second, the disk is used to implement an operation known as *swapping*. -Even with demand paging, at some point of time, the main memory will become full, in this situation, when a page is needed to be loaded the kernel that implements virtual memory should load it, even if the memory is full! How? The answer is by using the swapping operation, one of page frames should be chosen to be removed from the main memory, this frame in this case is known as *victim frame*, the content of this frame is written into the disk, it is being *swapped out*, and its place in the main memory is used for the new page that should be loaded. The swapped out page is not in the main memory anymore, so, when it is needed again, it should be reloaded from the disk to the main memory. +Even with demand paging, at some point in time, the main memory will become full. In a situation where a page is needed but there are no free page frames left, how should a kernel that implements virtual memory proceed? The answer is: by using the swapping operation. One of the occupied page frames should be chosen to be removed from the main memory, this frame is then called the *victim frame*. The contents of this frame are written to disk, which means that it's being *swapped out*, and its place in the main memory is then used for the new page that should be loaded. The swapped out page is not in the main memory anymore, so, when it's needed again, it should be reloaded from the disk to the main memory. -The problem of which victim frame should be chosen is known as *page replacement* problem, that is, when there is no free page frame and a new page should be loaded, which page frame should we make free to be able to load the new page. Of course, there are many page replacement algorithms out there, one of them is *first-in first-out* in which the page frame that was the first one to be loaded among the current page frames is chosen as a victim frame. Another well-known algorithm is *least recently used* (LRU), in this algorithm, everytime the page is accessed, the time of access is stored, when a victim frame is needed, then it will be the oldest one that has been accessed. +The problem of choosing the victim frame is known as the *page replacement* problem, that is, when there are no free page frames left and a new page should be loaded, we need to decide which page frame should be freed up to be able to load the new page. Of course, there are many page replacement algorithms out there, one of them is *first-in first-out* in which the page frame that was the first one to be loaded among the set of currently occupied page frames is chosen as a victim frame. Another well-known algorithm is *least recently used* (LRU), with this algorithm, every time the page is accessed, the access time is stored, when a victim frame is needed, then the page frame with the oldest access time will be chosen. -The page table can be used to store a bunch of information that are useful for virtual memory. First, a page table usually has a flag known as *present*, by using this flag, the processor can tell if the page that the process tries to access is loaded into the memory or not, if it is loaded, then a normal access operation is performed, but when the present flag indicates that this page is not in the memory, what should be done? For sure, the page should be loaded from the disk to the memory. Usually, the processor itself doesn't perform this loading operation, instead, it generates an exception known as *page fault* and makes the kernel deal with it. A page fault tells the kernel that one of the processes tried to access a not-loaded page, so it needs to be loaded. As you can see, page faults help in implementing demand paging, anytime a page needs to be loaded into the memory then a page fault will be generated. +The page table can be used to store different information that is useful for virtual memory. First, a page table entry usually has a flag known as *present*, by using this flag, the processor can tell if the page that the process tries to access is loaded into the memory or not. If it's loaded, then a normal access operation is performed, but when the present flag indicates that this page isn't in the main memory, what should be done then? For sure, the page should be loaded from the disk to the memory. Usually, it's not the processor itself that performs this loading operation, instead, it generates an exception known as a *page fault* and makes the kernel deal with it. A page fault tells the kernel that one of the processes tried to access a not-loaded page, so it has to make sure that this page is loaded. As you can imagine, page faults help in implementing demand paging, anytime a page needs to be loaded into the memory, a page fault will be generated. -With this mechanism that virtual memory uses to manage the memory, we can make a process to own a size of memory that is not even available on the system. For example, in x86 architecture with systems that employ virtual memory, each process thinks that it owns `4GB` of main memory, even if the system has only `2GB` of RAM for instance. This is possible due to demand paging and page replacements. Of course, a large size of memory being available for the process, makes it easier for the programmers to write their code. +By using this mechanism to manage the virtual memory, we can present the processes on the system to have much more memory than is actually physically available. For example, in the x86 architecture with systems that employ virtual memory, each process thinks that it has `4GB` of main memory available to it, even if the system has only `2GB` of RAM for instance. This is possible due to demand paging and page replacements/swapping. Of course, a large size of available process memory makes it easier for the programmers to write their code. ## Paging in x86 -In x86, unlike segmentation which is enabled by default, paging is disabled by default, also, paging is not available on real mode, in `32-bit` environment it can only be used in protected-mode ^[In 64-bit architecture paging is available in both protected-mode and long-mode.]. If paging is intended to be used, the kernel should switch to protected-mode first, then, enables paging through a special register in x86 known as `CR0` which is one of *control registers* of x86 architecture. The last bit of `CR0` is the one that decides if paging is enabled, when its value is `1`, or disabled when its value is `0`. +In x86, paging is disabled by default, unlike segmentation which is enabled by default. Also, paging isn't available in real mode, in a `32-bit` environment it can only be used in protected-mode ^[In the `64-bit` architecture paging is available in both, the protected-mode and the long-mode.]. If paging is intended to be used, the kernel should switch to protected-mode first, then enable paging through a special register in x86 known as `CR0` which is one of the *control registers* of the x86 architecture. The last bit of `CR0` is the one that decides if paging is enabled, it's enabled when its value is `1`, or disabled when its value is `0`. -There are three *paging modes* in x86 a kernelist can chooses from, the difference between these three modes is basically related to the size of memory addresses and the available sizes of a page. These modes are *32-bit paging*, *PAE paging* (PAE stands for "Physical Address Extension") and *4-level paging* which is available for `64-bit` environment only. +There are three *paging modes* in x86 a kernelist can choose from. The difference between these three modes is basically related to the size of the memory addresses and the available sizes of a page. These modes are *`32-bit` paging*, *PAE paging* (PAE stands for "Physical Address Extension") and *4-level paging* which is available for the `64-bit` environment only. -Beside the last bit in `CR0`, there are another two bits that can be used to decide the current paging mode. The first one is known as *PAE bit* which is the fifth bit of the control register `CR4`. When the value of this bit is `1` that means PEA mode is enabled, while `0` means otherwise. The second bit is known as `LME` in a register known as `IA32_EFER`, setting the value of this register to `1` makes the processor to switch from the protected-mode (`32-bit` environment) to the long-mode (`64-bit` environment) and when the value of `PAE` bit is `1`, then 4-level mode will be enabled. +Beside the last bit in `CR0`, there are another two bits that can be used to decide the current paging mode. The first one is known as the *`PAE` bit* which is the fifth bit of the control register `CR4`. When the value of this bit is `1` it means that the PAE mode is enabled, while `0` means otherwise. The second relevant bit is known as `LME` in a register known as `IA32_EFER`, setting the value of this bit to `1` makes the processor switch from the protected-mode (`32-bit` environment) into the long-mode (`64-bit` environment) and when the value of the `PAE` bit is `1`, then the 4-level mode will be enabled. -In our next discussions, we are going to focus on `32-bit` paging mode which is the most basic one that is available for `32-bit` environment. In this mode, there are two available sizes for a page `4KB` and `4MB`, also, `4GB` of memory is addressable in this mode. +In our next discussions, we are going to focus on the `32-bit` paging mode which is the most basic one that is available for a `32-bit` environment. In this mode, there are two available sizes for a page: `4KB` and `4MB`. Also, overall, `4GB` of memory are addressable in this mode. ### The Structure of Linear Memory Address -Previously, we have discussed a part of the translation process of memory addresses in x86. To sum what we have already discussed up, any memory address that is generated by an executing code in x86 is known as a logical address, which is not the real memory address that contains the required data. This logical address need to be translated to get the real address. The first step of this translation process is to use segment descriptors to translate a logical address to a linear address by using the mechanism that we have already mentioned in chapter \ref{ch-x86}. When paging is disabled, the resulted linear address will be the physical (real) address that can be sent to the main memory to get the required data. On the other hand, when paging is enabled, the linear address needs a further translation step to obtain the physical memory address by using paging mechanism. To be able to perform this step of translation, a page table is used with the parts that compose the linear address. +Previously, we've discussed a part of the translation process of memory addresses in x86. To sum up what we've already discussed, any memory address that is generated by the executing code in x86 is known as a logical address, which is not the real memory address that contains the required data. This logical address needs to be translated to get the physical address. The first step of this translation process is to use the segment descriptors to translate a logical address to a linear address by using the mechanism that we've already mentioned in chapter \ref{ch-x86}. When paging is disabled, the resulting linear address will be the physical (real) address that can be sent to the main memory to get the requested data. On the other hand, when paging is enabled, the linear address needs a further translation step to obtain the physical memory address by using the paging mechanism. To be able to perform this step of the translation process, a page table is used for the parts that compose the linear address. -![The Structure of Linear Address](Figures/memory-ch/Fig091021_0.png){#fig:10062021_3 width=35%} +![The Structure of A Linear Address](Figures/memory-ch/Fig091021_0.png){#fig:10062021_3 width=35%} -Figure @fig:10062021_3 shows the structure of a linear address and its parts. As you can see, the size of a linear address is `32-bit` which is divided into three parts. The bits `22` to `31` represent a *page directory* entry, the bits `12` to `21` represent a page table entry and the bits `0` to `11` represent an offset that contains the required data within a page frame. For example, assume a linear address which is composed of the following: page directory entry `x`, page table entry `y` and offset `z`. That means that this linear address needs to read the offset `z` from a page that is represented by the entry `y` in the page table, and this page table is represented by the page directory entry `x`. +Figure @fig:10062021_3 shows the structure of a linear address and its parts. As you can see, the size of a linear address is `32-bit` which is divided into three parts. The bits `22` to `31` represent a *page directory* entry, the bits `12` to `21` represent a page table entry and the bits `0` to `11` represent an offset that contains the referenced data within a page frame. For example, assume a linear address which is composed of the following: page directory entry `x`, page table entry `y` and offset `z`. That means that this linear address needs to read the offset `z` from a page that is represented by the entry `y` in the page table, and this page table is represented by the page directory entry `x`. -As you can see here, unlike our previous discussion of page table, the one which is implemented in x86 is a two-level page table, the first level is known as *page directory* which is used to point to the second level which is a page table and each page table, as we know, points to a page frame. As we mentioned before, the reason of using multi-level page tables is to save some memory since the size of page tables tend to have relatively large sizes in modern architecture and given that each process needs its own page table, then, its better to use multi-level page table which allows us to load just the needed parts of a page table (in a way similar to paging) instead of loading the whole page table into the memory. +As you can see here, unlike our previous discussion of the page table, the page table that is implemented in x86 is a two-level page table. The first level is known as the *page directory* which is used to point to the second level which is a page table and each page table, as we know, points to a page frame. As we've mentioned before, the reason of using multi-level page tables is to save some memory since the the page tables tend to have relatively large sizes in modern architectures and given that each process needs its own page table, then its better to use multi-level page tables which allow us to load just the needed parts of a page table (in a way similar to paging) instead of loading the whole page table into the memory. ### Page Directory -The page directory in x86 can hold up to `1024` entries. Each entry points to a page table and each one of those page tables can hold up to `1024` entries which represent a process's pages. In other words, we can say that, for each process, there are more than one page table, and each one of those page tables is loaded in a different part of the main memory and the page directory of the process helps us in locating the page tables of a process. +The page directory in x86 can hold up to `1024` entries. Each entry points to a page table and each one of those page tables can hold up to `1024` entries which represent a process' pages. In other words, we can say that, for each process, there is more than one page table, and each one of those page tables is loaded in a different part of the main memory. The page directory of the process helps us in locating the page tables of a process. -As we have mentioned before, the page directory is the first level of x86's page table and each process has its own page directory. How the processor can find the current page directory, that is, the page directory of the current process? This can be done by using the register `CR3` which stores the base physical memory address of the current page directory. The first part of a linear address is an offset within the page directory, when an addition operation is performed between the first part of a linear address and the value on `CR3` the result will be the base memory address of the entry that represents a page table that contains an entry which represents the page that contains the required data. +As we have mentioned before, the page directory is the first level of x86's page table and each process has its own page directory. How can the processor find the current page directory, that is, the page directory of the current process? This can be done by using the register `CR3` which stores the base physical memory address of the current page directory. The first part of a linear address (bits `31` to `22`) is an offset within the page directory. The physical address in `CR3` is added to the first part of the linear address resulting in the physical memory address of the page directory entry that contains the physical base memory address of the page table. This page table then contains an entry which references the page that contains the required data. #### The Structure of a Page Directory Entry -The size of an entry in the page directory is `4` bytes (`32` bits) and its structure is shown in the figure @fig:10102021_0. The bits from `12` to `31` contain the physical memory address of the page table that this entry represent. Not all page tables that a page directory points to should be loaded into the main memory, instead, only the needed page tables, the rest are stored in a secondary storage until they are needed then they should be loaded. To be able to implement this mechanism, we need some place to store a flag that tells us whether the page table in question is loaded into the main memory or not, and that's exactly the job of bit `0` of a page directory entry, this bit is known as *present bit*, when its value is `1` that means the page table exists in the main memory, while the value `0` means otherwise. When an executing code tries to read a content from a page frame that its page table is not in the memory, the processor generates a page fault that tells the kernel to load this page table because it is needed right now. +The size of an entry in the page directory is `4` bytes (`32` bits) and its structure is shown in the figure @fig:10102021_0. The bits from `12` to `31` contain the physical memory address of the page table that this entry represents. Not all page tables that a page directory points to should be loaded into the main memory. Instead, only the needed ones are actually loaded, all the others are stored in the secondary storage until they are needed, only then should they be loaded. To be able to implement this mechanism, we need some place to store a flag that tells us whether the page table in question is loaded into the main memory or not, and that's exactly the job of bit `0` of a page directory entry. This bit is known as *present bit*, when its value is `1` it means the page table exists in the main memory, while the value `0` means otherwise. When the executing code tries to read contents from a page frame whose page table is not in memory, the processor generates a page fault that tells the kernel to load this page table because it is needed right now. -![The Structure of Page Directory Entry](Figures/memory-ch/Fig10102021_0.png){#fig:10102021_0 width=35%} +![The Structure of the Page Directory Entry](Figures/memory-ch/Fig10102021_0.png){#fig:10102021_0 width=35%} -When we have discussed segment descriptors, we have witnessed some bits that aim to provide additional protection for a segment. Paging in x86 also has bits that help in providing additional protection. Bit `1` in a page directory entry decides whether the page table that the entry points to is read-only when its value is `0` or if its writable `1`. Bit `2` decides whether the access to the page table that this entry points to is restricted to privileged code, that is, the code that runs on privilege level `0`, `1` and `2` when the bit's value is `0` or that the page table is also accessible by a non-privileged code, that is, the code that runs on privilege level `3`. +When we've discussed the segment descriptors, we've witnessed some bits that were aimed to provide additional protections for a segment. Paging in x86 also has bits that help in providing additional protections. Bit `1` in a page directory entry decides whether the page table that the entry points to is read-only when its value is `0` or if its writable when the value is `1`. Bit `2` decides whether the access to the page table that this entry points to is restricted to privileged code, that is, the code that runs in privilege level `0`, `1` and `2` when the bit's value is `0` or that the page table is also accessible by non-privileged code, that is, the code that runs in privilege level `3`. -Generally in computing, *caching* is a well-known technique. When caching is employed in a system, some data are fetched from a source and stored in a place which is faster to reach if compared to the source, these stored data are known as *cache*. The goal of caching is to make a frequently accessed data faster to obtain. Think of your web browser as an example of caching, when use visit a page ^[Please do not confuse a web page with a process page in this example.] in a website, the web browser fetches the images of that page from the source (the server of the website) and stores it in your own machine's storage device which is definitely too much faster to access if compared to a web server, when you visit the same website later, and the web browser encounters an image to be shown, it searches if it's cached, if so, this is known as *cache hit*, the image will be obtained from your storage device instead of the web server, if the image is not cached, this is known as *cache miss*, the image will be obtained from the web server to be shown and cached. +Generally in computing, *caching* is a well-known technique. When caching is employed in a system, some data is fetched from a source that's slow to reach and stored in a place which is faster to reach when compared to the source, such a storage is known as a *cache*. The goal of caching is to make frequently accessed data faster to obtain. Think of your web browser as an example of caching, when you visit a web page ^[Please do not confuse a web page with a process page in this example.], the web browser fetches the images of that page from the source (the server of the website) and stores it in your own machine's storage device which is definitely much faster to access when compared to a web server. When you visit the same website later, and the web browser encounters an image to be shown, it searches if it's cached, and if so, the image will be obtained from your storage device instead of the web server. This is known as a *cache hit*. If the image is not cached, the image will be obtained from the web server to be shown and cached. This is known as a *cache miss*. -The processor is not an exception, it also uses cache to make things faster. As you may noticed, the entries of page directories and page tables are frequently accessed, in the code of software a lot of memory accesses happen and with each memory access both page directory and pages tables need to be accessed. With this huge number of accesses to page table and given the fact that the main memory is too much slower than the processor, then some caching is needed, and that exactly what is done in x86, a part of the page directory and page tables are cached in an internal, small and fast memory inside the processor known as *translation lookaside buffer* (`TLB`), each time an entry of page table of directory is needed, this memory is checked first, if the needed entry is on it, that is, we got a cache hit, then it will be used. +The processor is not an exception, it also uses caches to make things faster. As you may have noticed, the entries of page directories and page tables are frequently accessed, the code of a program usually makes a lot of memory accesses and with each memory access both, the page directory and a page table needs to be accessed. With this huge number of accesses to page tables and given the fact that the main memory is lots of times slower than the processor, some caching is needed. And that's exactly what is done in x86, some parts of the page directory and some page tables are cached in an internal, small and fast memory inside of the processor known as the *translation lookaside buffer* (`TLB`). Each time a page table entry pointed to by a page directory entry is needed, this memory is checked first, if this memory contains the needed entry, which means we've got a cache hit, then it will be used. -In x86 paging, caching is controllable, say that for some reason, you wish to disable caching for a given entry, that can be done with bit `4` in a page directory entry. When the value of this bit is `1`, then the page table that is represented by this entry will not be cached by the processor, but the value `0` in this bit means otherwise. +In x86 paging, caching is controllable. Say that for some reason, you wish to disable caching for a given entry, that can be done with the bit `4` in a page directory entry. When the value of this bit is `1`, then the page table that is pointed to by this entry will not be cached by the processor, the value `0` in this bit means otherwise. -Unlike web browsers, the cached version of page table can be written to, for example, assume that page table `x` has been cached after using it in the first time and there is a page in this page table, call it `y`, that isn't loaded into the memory. We decided to load the page `y` which means present bit of the entry that represents this page should be changed in the page table `x`. To make things faster, instead of writing the changes to the page table `x` in the main memory (the source), these changes will be written to the cache which makes a difference between the cached data and the source data, that is, they are not identical anymore. This inconsistency between the two places that store the data should be resolved somehow, the obvious thing to do is to write these changes later also on the source. +Unlike with web browsers, the cached version of the page table can be written to. For example, assume that page table `x` has been cached after using it for the first time and there is a page in this page table, call it `y`, that isn't loaded into memory. We decide to load the page `y` which means the present bit of the entry that represents this page should be changed in the page table `x`. To make things faster, instead of writing the changes to the page table `x` in the main memory (the source), these changes will be written to the cache which introduces a difference between the cached data and the source data in memory, that is, they are not identical anymore. This inconsistency between the two places that store the data must be resolved somehow, the obvious thing to do is to also write these changes a little bit later into the source. -In caching context, the timing of writing the changes to the source is known as *write policy* and there are two available policies in x86 for page tables and directory caches, the first one is known as *write-through*, in this policy, the new data is written on both the cache and the source at same time. The second policy is known as *write-back*, in which the writing process is performed only on the cache, while writing the changes on the source is performed later, for example when we decide to clear the cache. Bit `3` of the page directory entry decides which write policy will be used for the cached data, the value `1` means write-through policy will be used, while the value `0` means write-back policy will be used. +In the context of caching, the timing of writing the changes in the cache back to the source is known as the *write policy* and there are two available policies in x86 for page tables and directory caches. The first one is known as the *write-through* policy, with this policy, the new data is written in both, the cache and the source, at the same time. The second policy is known as the *write-back* policy, with which the writing process is performed only in the cache, while writing the changes to the source is performed later, for example when we decide to clear the cache. Bit `3` of the page directory entry decides which write policy will be used for the cached data, the value `1` means the write-through policy will be used, while the value `0` means the write-back policy will be used. -As in segment descriptors, when a page table which is referred by a given page directory entry is accessed, there is a bit in the directory entry known as *access bit* which is the fifth bit in the entry. The processor sets the value `1` automatically when the page table is accessed. Setting the value to `0` for any reason is the responsibility of the kernel. +As in segment descriptors, when a page table which is referred to by a given page directory entry is accessed, there is a bit in the directory entry known as the *access bit* which is the fifth bit in the entry. The processor sets the value to `1` automatically when the page table is accessed. Setting the value to `0` for any reason is the responsibility of the kernel. -We have said earlier that `32-bit` paging in x86 provides us with two possible options for the size of a page, either `4KB` page or `4MB` page. The bit `7` in a page directory entry decides the size of the pages, when its value is `0` then the page size will be `4KB` while the value `1` means that the page size is `4MB`. There is a major difference between the two options. When the size of the page is `4MB`, the page table will be a normal one-level page table, which means that the page directory will not refer to a page table anymore, but it is going to refer to a page frame. When the size of the page is `4KB`, the two-level hierarchy will be employed. That makes sense, the number of entries that are needed to represent `4KB` pages are way more than the number of entries that are needed to represent `4MB` pages. However, in our discussion, we have focused (and will focus) on the case of `4KB` pages. Finally, the bits `6`, `8`, `9`, `10` and`11` in the page directory entry are ignored. +We've said earlier that `32-bit` paging in x86 provides us with two possible options for the size of a page, it's either a `4KB` page or a `4MB` page. The bit `7` in a page directory entry decides the size of the pages, when its value is `0` then the page size will be `4KB` while the value `1` means that the page size is `4MB`. There is a major difference between the two options. When the size of the page is `4MB`, the page table will be a normal one-level page table, which means that the page directory will not refer to a page table anymore, but it is going to refer directly to a page frame. When the size of the page is `4KB`, the two-level hierarchy will be employed. That makes sense, the number of entries that are needed to represent `4KB` pages is way higher than the number of entries that are needed to represent `4MB` pages. However, in our discussion, we've focused (and will focus) on the case of `4KB` pages. Finally, the bits `6`, `8`, `9`, `10` and `11` in the page directory entry are ignored. ### Page Table -In `4KB` pages environment, a page table is referred to by an entry in the page directory. As mentioned earlier, each page table can hold `1024` entries. After finding the base memory address of the page table in question by consulting the page directory, this base memory address will be used with the second part of the linear address to figure out which page table entry should the processor consult to reach the required data in the physical memory. Of course, the most important information that a page table entry stores is the base physical memory address of the page frame, this memory address will be used with the third part of the linear address (offset) to get the final physical memory address. +In an environment with `4KB` pages, a page table is referred to by an entry in the page directory. As mentioned earlier, each page table can hold `1024` entries. After finding the base memory address of the page table in question by consulting the page directory, this base memory address will be used with the second part of the linear address (bits `21` to `12`) to figure out which page table entry the processor should consult to reach the required data in the physical memory. Of course, the most important information that a page table entry stores is the base physical memory address of the page frame, this memory address will be used with the third part of the linear address (offset, bits `11` to `0`) to get the final physical memory address. -The entry of a page table is exactly same as the entry of a page directory, its size is `4` bytes. Though, there are some simple differences, the first difference is bit `7`, which was used to decide the page size in page directory, is ignored in the entry of a page table. The second difference is in bit `6`, which was ignored in the entry of page directory, in page tables this bit is known as *dirty bit*. +The structure of a page table entry is almost the same as the structure of a page directory entry, with some minor differences. Just like the page directory entry, the size of the page table entry is is `4` bytes. The first difference is bit `7`, which was used to decide the size of a page in the page directory, this bit is ignored in the entry of a page table. The second difference is in bit `6`, which was ignored in the entry of page directory, in a page table entry this bit is known as the *dirty bit*. -In our previous discussion on virtual memory we know that at some point of time, a victim frame may be chosen. This frame is removed from the main memory to free up some space for another page that we need to load from the disk. When the victim frame is removed from the main memory, its content should be written to the disk since its content may have been changed while it was loaded into the memory. Writing the content of the victim frame to the disk and loading the new page also from disk, given that the disk is really too slow compared to the processor, is going to cause some performance penalty. +From our previous discussion on virtual memory we know that sometimes, a victim frame needs to be chosen. This frame is removed from the main memory to free up some space for another page that we need to load from the disk. When the victim frame is removed from the main memory, its contents should be written to the disk since its contents may have been changed while it was loaded into the memory. Writing the contents of the victim frame to the disk and loading the new page also from the disk is going to cause some performance penalty, given that the disk is much slower than the main memory which is again much slower than the processor. -To make the matter a little bit better, we should write the content of the victim frame only if there is a real change in its content compared to the version which is already stored in the disk. If the victim frame version which is on the main memory and the version on the disk are identical, there is no need to waste valuable resource on writing the same content on the disk, for example, page frames that contain only code will most probably be the same all the time, so their versions on disk and main memory will be identical. The dirty bit is used to indicate whether the content of the page frame has been changed and has differences with the disk version, that is, the page (when the value of the bit `1`) or the two versions are identical (value `0`). +To make the matter a little bit better, we should write the contents of the victim frame only if there was a change in its contents compared to the version already stored on the disk. If the victim frame version in the main memory and the version on the disk are identical, there's no need to waste valuable resources on writing the same contents to the disk. For example, page frames that contain only code will most probably stay the same all the time, so their versions on the disk and in the main memory will be identical. The dirty bit is used to indicate whether the contents of the page frame have been changed and whether the page has differences with the disk version, it has differences when the value of the bit is `1` or the two versions are identical when the value is `0`. -## Paging and Dynamic Memory in 539kernel -The last result of this section is version `G` of 539kernel which contains the basic stuff that are related to the memory. Previously, we have seen that we have no way in 539kernel to allocate memory dynamically, due to that, the allocation of entries of processes table and the process control block was a static allocation. Making dynamic allocation possible is an important thing since a lot of kernel's objects need to be allocated dynamically. Therefore, the first memory-related thing to implement is a way to allocate memory dynamically. The other major part of version `G` is implementing paging by using x86 architecture's support. Since there is no way yet in 539kernel to access the hard disk, virtual memory cannot be implemented yet. However, basic paging can be implemented and this can be used as basis for further development. +## Paging and Dynamic Memory in the 539kernel +The final result of this section will be version `G` of the 539kernel which contains the basic implementation of things related to memory. Previously, we have seen that we have no way to allocate memory dynamically in the 539kernel. Due to that, the allocation of entries in the process table and the process control block was a static allocation. Making dynamic allocation possible is an important thing since a lot of the kernel's objects need to be allocated dynamically. Therefore, the first memory-related thing to implement is a way to allocate memory dynamically. The other major part of version `G` is implementing paging by using x86 architecture's support. Since there's no way yet in the 539kernel to access the hard disk, virtual memory cannot be implemented yet. However, basic paging can be implemented and this can be used as a basis for further development. ### Dynamic Memory Allocation -As we have mentioned earlier, in our normal process of developing applications by using programming languages that don't employ garbage collection, we are responsible for allocating spaces from memory. When we need to store data in memory, a free space in memory should be available for this data to put this data in. The process of telling that we need `n` bytes from memory to store some data is known as memory allocation. There are two possible ways to allocate memory, statically or dynamically. +As we've mentioned earlier, in our normal process of developing applications by using programming languages that don't employ garbage collection, we are responsible for allocating space from memory. When we need to store data in memory, some free space should be available for this data in memory to put this data in. The process of telling that we need `n` bytes from memory to store some data is known as *memory allocation*. There are two possible ways to allocate memory, statically or dynamically. -Usually, a static memory allocation is used when we know the size of data at compile time, that is, before running the application that we are developing. Dynamic memory allocation is used when the size of data will be known at run time. Static memory allocation is the responsibility of the compiler of the language that we are using, while the dynamic memory allocation is the responsibility of the programmer ^[Not in all cases though.], also, the regions that we have allocated dynamically should be freed manually ^[This holds true in the case of programming languages like C. New system programming languages such as Rust for example may have different ways to deal with the matter. However, what we are discussing here is the basis, depending on this basis more sophisticated concepts (e.g. Rust) can be built.]. +Usually, a static memory allocation is used when we know the size of data at compile time, that is, before running the application that we are developing. Dynamic memory allocation is used when the size of the data will only be known at run time. Static memory allocation is the responsibility of the compiler of the language that we are using, while dynamic memory allocation is the responsibility of the programmer ^[Not in all cases though.]. Also, the regions that we've allocated dynamically should be freed manually ^[This holds true in the case of programming languages like C. New system programming languages such as Rust, for example, may have different ways to deal with the matter. However, what we are discussing here is the basis, depending on this basis more sophisticated concepts (e.g. Rust) can be built.]. -As we have seen, there are multiple region of a running process's memory and each region has a different purpose, we already discussed run-time stack which is one of those region. The other data region of a process that we also discussed previously is the run-time heap. When we allocate memory dynamically, the memory region that we have allocated is a part of the run-time heap, which is a large region of process memory that is used for dynamic allocation, in C, for example, the most well-known way to allocate bytes dynamically, that is, from the run-time heap is to use the function `malloc` which implements an algorithm known as *memory allocator*. The run-time heap need to be managed, due to that, this kind of algorithms use data structures that maintain information about the allocated space and free space. +As we've seen, there are multiple regions of a running process' memory and each region has a different purpose. We've already discussed the run-time stack which is one of those regions. The other data region of a process that we also mentioned previously is the run-time heap. When we allocate memory dynamically, the memory region that we've allocated is a part of the run-time heap, which is a large region of the process memory that is used for dynamic allocations. In C, for example, the most well-known way to allocate bytes dynamically, that is, from the run-time heap, is to use the function `malloc` which implements an algorithm known as *memory allocator*. The run-time heap needs to be managed, due to that, memory allocator algorithms use data structures that maintain information about the allocated space and free space. -A need of dynamic memory allocation have shown up previously in 539kernel. Therefore, in the current version 539kernel we are going to implement the most basic memory allocator possible. Through a new function `kalloc` (short for *kernel allocate*), which works in a similar way as `malloc`, a bunch of bytes can be allocate from the kernel's run-time heap, the starting memory address of this allocated region will be returned by the function, after that, the region can be used to store whatever we wish. The stuff that are related to the kernel's run-time heap will be defined in a new file `heap.c` and its header file `heap.h`, let's start with the latter which is the following. +A need of dynamic memory allocation have shown up previously in the 539kernel. Therefore, in the current version of the 539kernel we are going to implement the most basic memory allocator possible. It will be implemented through a new function called `kalloc` (short for *kernel allocate*), which works in a similar way as `malloc`. It will allocate a bunch of bytes from the kernel's run-time heap, the starting memory address of this allocated region will be returned by the function, after that, the region can be used to store whatever data we wish. The code that is related to the kernel's run-time heap will be defined in a new file `heap.c` and its header file `heap.h`, let's start with the latter whose contents are the following. ```{.c} unsigned int heap_base; @@ -118,9 +118,9 @@ void heap_init(); int kalloc( int ); ``` -A global variable known as `heap_base` is defined, this variable contains the memory address that the kernel's run-time heap starts from, and starting from this memory address we can allocate user's needed bytes through the function `kalloc` which its prototype is presented here. +A global variable named `heap_base` is defined, this variable contains the memory address that the kernel's run-time heap starts from, and starting from this memory address we can allocate the bytes requested by the user by using the function `kalloc` whose prototype is presented here. -As usual, with each subsystem in 539kernel, there is an initialization function that sets the proper values and does whatever needed to make this subsystem ready to use, as you may recall, these functions are called right after the kernel starts in protected mode, in our current case `heap_init` is the initialization function of the kernel's run-time heap. We can now start with `heap.c`, of course, the header file `heap.h` is needed to be included in `heap.c`, and we begin with the code of `heap_init`. +As usual, with each subsystem in the 539kernel, there is an initialization function that sets the proper initial values and does whatever is needed to make this subsystem ready to use. As you may recall, these functions are called right after the kernel starts in protected mode, in our current case `heap_init` is the initialization function of the kernel's run-time heap. We can now start with `heap.c`. Of course, the header file `heap.h` must be included in `heap.c`. Then we can begin with the code of `heap_init`. ```{.c} #include "heap.h" @@ -131,7 +131,7 @@ void heap_init() } ``` -As you can see, the function `heap_init` is too simple. It sets the value `0x100000` to the global variable `heap_base`. That means that kernel's run-time heap starts from the memory address `0x100000`. In `main.c` we need to call this function in the beginning to make sure that dynamic memory allocation is ready and usable by any other subsystem, so, we first add `#include "heap.h"` in including section of `main.c`, then we add the call line `heap_init();` in the beginning of `kernel_main` function. Next is the code of `kalloc` in `heap.c`. +As you can see, the function `heap_init` is very simple. It sets the value `100000h` to the global variable `heap_base`. That means that kernel's run-time heap starts from the memory address `100000h`. In `main.c` we need to call this function at the beginning to make sure that dynamic memory allocation is ready and usable by any other subsystem, so, we first add `#include "heap.h"` in the include section of `main.c`, then we add the line `heap_init();` at the beginning of the `kernel_main` function. Next comes the code of `kalloc` in `heap.c`. ```{.c} int kalloc( int bytes ) @@ -144,22 +144,22 @@ int kalloc( int bytes ) } ``` -Believe it or not! This is a working memory allocator that can be used for dynamic memory allocation. It's too simple, though, it has some disadvantages but in our case it is more than enough. It receives the number of bytes that the caller needs to allocate from the memory through a parameter called `bytes`. +Believe it or not, this is a working memory allocator that can be used for dynamic memory allocation. It's very primitive, though, it has some disadvantages but in our case it's more than enough. It receives the number of bytes that the caller wants to allocate from memory through a parameter called `bytes`. -In the first step of `kalloc`, the value of `heap_base` is copied to a local variable named `new_object_address` which represents the starting memory address of newly allocated bytes, this value will be returned to the caller so the latter can start to use the allocated memory region starting from this memory address. +In the first step of `kalloc`, the value of `heap_base` is copied to a local variable named `new_object_address` which represents the starting memory address of the newly allocated bytes, this value will be returned to the caller so that the latter can start to use the allocated memory region starting from this memory address. -The second step of `kalloc` adds the number of allocated bytes to `heap_base`, that means the next time `kalloc` is called, it starts with a new `heap_base` that contains a memory address which is right after the last byte of the memory region that has been allocated in the previous call. For example, assume we called `kalloc` for the first time with `4` as a parameter, that is, we need to allocate four bytes from kernel's run-time heap, the base memory address that will be returned is `0x100000`, and since we need to store four bytes, we are going to store them on the memory address `0x100000`, `0x100001`, `0x100002` and `0x100003` respectively. Just before returning the base memory address, `kalloc` added `4`, which is the number of required bytes, to the base of the heap `heap_base` which initially contained the value `0x100000`, the result is `0x100004` which will be stored in `heap_base`. Next time, when `kalloc` is called, the base memory address of the allocated region will be `0x100004` which is, obviously, right after `0x100003`. +The second step of `kalloc` adds the number of allocated bytes to `heap_base`. This means that the next time `kalloc` is called, it starts with a new `heap_base` that contains a memory address which comes right after the last byte of the memory region that has been allocated in the previous call. For example, assume we called `kalloc` for the first time with `4` as the parameter, that is, we need to allocate four bytes from kernel's run-time heap. The base memory address that will be returned is `0x100000`, and since we need to store four bytes, we are going to store them in the memory addresses `0x100000`, `0x100001`, `0x100002`, and `0x100003`. Just before returning the base memory address, `kalloc` added `4` to it, which is the number of requested bytes, to the base of the heap `heap_base` which initially contained the value `0x100000` now contains the value `0x100004`. Next time when `kalloc` is called, the returned base memory address of the allocated region will be `0x100004` which comes, obviously, right after `0x100003`. -As you can see from the allocator's code, there is no way to implement `free` function, usually, this function takes a base memory address of a region in run-time heap and tells the memory allocator that the region which starts with this base address is free now and can be used for other allocations. Freeing memory regions when the code finishes from using them helps in ensuring that the run-time heap is not filled too soon, when an application doesn't free up the memory regions that are not needed anymore, it causes a problem known as *memory leak*. +As you can see from the allocator's code, there is no way to implement the `free` function. Usually, this function takes the base memory address of a region in the run-time heap and tells the memory allocator that the region which starts with this base address is now free and can be used for other allocations. Freeing memory regions when the code finishes using them helps in ensuring that the run-time heap doesn't contain unused data and is not filled too soon. When an application doesn't free up the memory regions that are not needed anymore, it causes a problem known as a *memory leak*. -In our current memory allocator, the function `free` cannot be implemented because there is no way to know how many bytes to free up given the base address of a memory region, returning to the previous example, the region of run-time heap which starts with the base address `0x100000` has the size of `4` bytes, if we want to tell the memory allocator to free this region, it must know what is the size of this region which is requested to be freed, that of course means that the memory allocator needs to maintain a data structure that can be used at least when the user needs to free a region up, one simple way to be able to implement `free` in our current memory allocator is to modify `kalloc` and make it uses, for example, a linked-list, whenever `kalloc` is called to allocate a region, a new entry is created and inserted into the linked-list, this entry can be stored right after the newly allocated region and contains the base address of the region and its size, after that, when the user request to free up a region by giving its base memory address, the `free` function can search in this linked-list until it finds the entry of that region and put on the same entry that this region is now free and can be used for future allocation, that is, the memory which was allocated once and freed by using `free` function, can be used later somehow. +In our current memory allocator, the function `free` cannot be implemented because there is no way to know how many bytes to free up given the base address of a memory region. Returning to the previous example, the region of the run-time heap which starts with the base address `0x100000` has the size of `4` bytes. If we want to tell the memory allocator to free up this region, it must know the size of the region which is requested to be freed up, that of course means that the memory allocator needs to maintain a data structure that can be used at least when the user needs to free up a region. One simple to implement `free` in our current memory allocator is to modify `kalloc` and make it use a linked-list, for example. Then, whenever `kalloc` is called to allocate a region, a new entry is created and inserted into the linked-list, this entry is stored right after the newly allocated region and contains the base address of the region and its size. After that, when the user requests to free up a region by providing its base memory address, the `free` function can search in this linked-list until it finds the entry of that region and use this same entry to store that this region is now free and can be used for future allocations. That is, the memory which was allocated once and freed by using the `free` function can now be used later somehow. -Our current focus is not on implementing a full memory allocator, so, it is up to you as a kernelist to decide how your kernel's memory allocator works, of course, there are a bunch of already exist algorithm as we have mentioned earlier. +Our current focus is not on implementing a full memory allocator, so, it is up to you as a kernelist to decide how your kernel's memory allocator works. Of course, many algorithms already exist out there for you to pick as we've mentioned earlier. -#### Using The Allocator with Process Control Block -To make sure that our memory allocator works fine, we can use it when a new process control block is created. It also can be used for processes table, as you may recall, the processes table from version `T` is an array which is allocated statically and its size is `15`, instead, the memory allocator can be used to implement a linked-list to store the list of processes. However, for the sake of simplicity, we will stick here with creating PCB dynamically as an example of using `kalloc`, while keeping the processes table for you to decide if it should be a dynamic table or not and how to design it if you decide that it should be dynamic. +#### Using the Allocator for the Process Control Block +To make sure that our memory allocator works fine, we can use it when a new process control block is created. It also can be used for the process table. As you may recall, the process table from version `T` is an array which is allocated statically and has space for `15` entries. Instead, the memory allocator can be used to implement a linked-list to store the list of processes. However, for the sake of simplicity, here we will stick with only creating the PCBs dynamically as an example of how to use `kalloc`, while keeping the process table static so that you to decide if it should be a dynamic table or not and how to design it if you decide that it should be dynamic. -The first thing we need to do in order to allocate PCBs dynamically is to change the parameters list of the function `process_create` in both `process.h` and `process.c`. As you may recall, in version `T`, the second parameter of this function called `process` and it was the memory address that we will store the PCB of the new process on it. We had to do that since dynamic memory allocation wasn't available, so, we were creating local variables in the caller for each new PCB, then we pass the memory address of the local variable to `process_create` to be used for the new PCB. This second parameter is not needed anymore since the region of the new PCB will be allocated dynamically by `kalloc` and its memory address will be returned by the same function. So, the prototype of the function `process_create` will be in `process.h` and `process.c` respectively as the following. +The first thing we need to do in order to allocate PCBs dynamically is to change the parameters list of the function `process_create` in both `process.h` and `process.c`. As you may recall, in version `T`, the second parameter of this function was called `process` and it was the memory address where we would store the PCB of the newly created process. We had to do this since dynamic memory allocation wasn't available, so, we were creating local variables in the caller for each new PCB, and would pass the memory address of the local variables to `process_create` to be used for the new PCB. This second parameter is not needed anymore since the memory needed to store the new PCB will be allocated dynamically by `kalloc` and this memory address will then be returned by `process_create`. So, the prototype of the function `process_create` must be changed in `process.h` and `process.c` respectively to the following. ```{.c} process_t *process_create( int * ); @@ -169,15 +169,15 @@ process_t *process_create( int * ); process_t *process_create( int *base_address ) ``` -You can also notice that the function now returns a pointer to the newly created PCB, in version `T` it was returning nothing. The next changes will be in the code of `process_create`. The name of the eliminated parameter of `process_create` was `process` and it was a pointer to the type `process_t`. We substitute it with the following line which should be in the beginning of `process_create`. +You may have noticed that the function now returns a pointer to the newly created PCB, in version `T` it was returning nothing. The next changes will be in the code of `process_create`. The name of the eliminated parameter of `process_create` was `process` and it was a pointer to the type `process_t`. We substitute it with the following line which should be at the beginning of `process_create`. ```{.c} process_t *process = kalloc( sizeof( process_t ) ); ``` -Simply, we used the same variable name `process` but instead of getting it as a parameter we define it as a local variable, we call the memory allocator to allocate a region that has the same size of the type `process_t` from the kernel's run-time heap, exactly as we do in user-space applications development, so, the new memory region can be used to store the new PCB and its memory address is stored in the local variable `process`. In the last of `process_create` we should add the line `return process;` to return the memory address for the newly created PCB for the new process. +Here, we simply introduced a local variable named `process` instead of getting it as a parameter. We call the memory allocator to allocate a region of memory that has the size of the type `process_t` from the kernel's run-time heap, exactly as we do in user-space application development. The new memory region can then be used to store the new PCB and its memory address is stored in the local variable `process`. At the end of `process_create` we should add the line `return process;` to return the memory address for the newly created PCB for the new process. -In version `T` we have called `process_create` in `main.c` to create four processes, we need to change the calls by omitting the second parameter, also the line `process_t p1, p2, p3, p4;` in `main.c` which was allocating memory for the PCBs can be removed since we don't need them anymore. The calls of `process_create` will be as the following. +In version `T` we've called `process_create` in `main.c` to create four processes, we need to change the calls by omitting the second parameter, also the line `process_t p1, p2, p3, p4;` in `main.c` which was allocating memory for the PCBs can be removed since we don't need this static allocation anymore. The calls to `process_create` must be changed to the following. ```{.c} process_create( &processA ); @@ -187,11 +187,11 @@ process_create( &processD ); ``` ### Paging -In this section we are going to implement a basic paging for 539kernel. To do that, a number of steps should be performed. A valid page directory should be initialized and its address should be loaded in the register `CR3`. Also, paging should be enabled by modifying the value of `CR0` to tell the processor to start using paging and translate linear memory addresses by using the page tables instead of consider those linear addresses as physical addresses. We have mentioned earlier, for each process we should define a page table, however, in this section we are going to define the page table of the kernel itself since this is the minimum requirement to enable paging. +In this section we are going to implement basic paging for the 539kernel. To do that, a number of steps should be performed. A valid page directory should be initialized and its address should be loaded in the register `CR3`. Also, paging should be enabled by modifying the value of `CR0` to tell the processor to start using paging and translate linear memory addresses by using the page tables instead of considering those linear addresses as physical addresses. We have mentioned this earlier, a page table should be defined for each process, however, in this section we are going to define the page table of the kernel itself since this is the minimum requirement to enable paging. -The page size in 539kernel will be `4KB`, that means we need a page directory that can point to any number of page tables up to `1024` page table. The mapping itself will be *one-to-one mapping*, that is, each linear address will be translated to a physical address and both are identical. For example, in one-to-one mapping the linear address `0xA000` refers to the physical address `0xA000`. This choice has been made to make things simple, more advanced designs can be used instead. We already know the concept of page frame, when the page size is `4KB` that means page frame `0` is the memory region that starts from the memory address `0` to `4095d`. One-to-one mapping is possible, we can simply define the first entry of the first page table ^[The first page table is the one which is pointed to by the first entry in the page directory.] to point to page frame `0` and so on. The memory allocator will be used when initializing the kernel's page directory and page tables, we can allocate them statically as we have done with `GDT` for example, but that can increase the size of kernel's binary file. +The page size in the 539kernel will be `4KB`, that means we need a page directory that can have directory entries for up to `1024` page tables. The mapping itself will be a *one-to-one mapping*, that is, each linear address will be translated to a physical address and both will be identical. For example, in a one-to-one mapping the linear address `0xA000` refers to the physical address `0xA000`. This choice has been made to make things simple, more advanced designs can be used when desired. We already know the concept of a page frame, when the page size is `4KB` it means that the page frame `0` is the memory region that starts from the memory address `0` to `4095d`. To make a one-to-one mapping possible, we can simply define the first entry of the first page table ^[The first page table is the one which is pointed to by the first entry in the page directory.] to point to page frame `0` and so on. The memory allocator will be used when initializing the kernel's page directory and page tables, we could have allocated them statically as we've done with the `GDT` for example, but that would increase the size of the kernel's binary file. -Before getting started with the details two new files are needed to be created: `paging.h` and `paging.c` which will contain the stuff that are related to paging. The content of `paging.h` is the following. +Before getting started with the details of the implementation, two new files need to be created: `paging.h` and `paging.c` which will contain everything related to paging. The contents of `paging.h` are the following. ```{.c} #define PDE_NUM 3 @@ -206,12 +206,12 @@ void paging_init(); int create_page_entry( int, char, char, char, char, char, char, char, char ); ``` -The part `PDE` in the name of the macro `PDE_NUM` means page directory entries, so this macro represents the number of the entries that will be defined in the kernel's page directory. Any page directory may hold `1024` entries but in our case not all of these entries are needed so only `3` will be defined instead, that means only three page tables will be defined for the kernel. How many entries will be defined in those page tables is decided by the macro `PTE_NUM` which `PTE` in its name means page table entries, its value is `1024` which means there will be `3` entries in the kernel's page directory and each one of them points to a page table which has `1024` entries. The total entries will be `3 * 1024 = 3072` and we know that each of these entries map a page frame of the size `4KB` then `12MB` of the physical memory will be mapped in the page table that we are going to define, and since our mapping will be one-to-one, that means the reachable physical memory addresses start at `0` and ends at `12582912`, any region beyond this range, based on our setting, will not be reachable by the kernel and it is going to cause a page fault exception. It is your choice to set the value of `PDE_NUM` to the maximum (`1024`), this will make a `4GB` of memory addressable. +The part `PDE` in the name of the macro `PDE_NUM` means page directory entries, so this macro represents the number of the entries that will be defined in the kernel's page directory. Any page directory may hold up to `1024` entries but in our case not all of these entries are needed so only `3` will be defined instead, that means only three page tables will be defined for the kernel. How many entries will be defined in those page tables is decided by the macro `PTE_NUM`. `PTE` in the name of this macro stands for page table entries. Its value is `1024` which means there will be `3` entries in the kernel's page directory and each one of them will point to a page table which has `1024` entries. The total count of entries will be `3 * 1024 = 3072`. We know that each of these entries maps a page frame of the size of `4KB`. This means that `12MB` of physical memory will be mapped in the page table that we are going to define. Since our mapping will be one-to-one, it means the reachable physical memory addresses will start at `0` and end at `12582912`. Any region beyond this range, based on our settings, will not be reachable by the kernel and will cause page fault exceptions when accessed by the kernel. It is up to you to set the value of `PDE_NUM` to the maximum allowed value (`1024`), this will make `4GB` of memory accessible by the kernel. -Getting back to the details of `paging.h`, both `load_page_directory` and `enable_paging` are external functions that will be defined in assembly and will be used in `paging.c`. The first function loads the address of the kernel's page directory in the register `CR3`, this address can be found in the global variable `page_directory` but of course, its value will be available after allocating the needed space by `kalloc`. The second function is the one that modifies the register `CR0` to enable paging in x86, this should be called after finishing the initialization of kernel's page directory and loading it. +Getting back to the details of `paging.h`, both `load_page_directory` and `enable_paging` are external functions that will be defined in assembly and will be used in `paging.c`. The first function loads the address of the kernel's page directory in the register `CR3`, this address can be found in the global variable `page_directory` but of course, its value will be available after allocating the needed memory by `kalloc`. The second function is the one that modifies the register `CR0` to enable paging in x86, it should be called after the kernel's page directory is initialized and loaded. -#### Initializing Kernel's Page Directory and Tables -From our previous encounter with the structure of page directory/table entry, we know that the size of this entry is `4` bytes and has a specific arrangement of the bits to indicate the properties of the entry being pointed to. The function `create_page_entry` helps in constructing a value that can be stored in a page directory/table entry based on the properties that should be enabled and disabled, this value will be returned to the caller. As you can see from `paging.h`, it returns an integer and that makes sense, as we know, the size of integer in `32-bit` architecture C is `4` bytes, exactly same as the size of an entry. The following is the code of `create_page_entry` that should be defined in `paging.c`, don't forget to include `paging.h` inside it. +#### Initializing Kernel's Page Directory and Page Tables +From our previous encounter with the structure of a page directory/table entry, we know that the size of an entry is `4` bytes and has a specific arrangement of bits to indicate the properties of the entry being pointed to. The function `create_page_entry` helps in constructing a value that can be stored in as a page directory/table entry based on the properties that should be enabled and disabled, the constructed value will be returned to the caller. As you can see in `paging.h`, it returns an integer and that makes sense, because as we know, the size of an integer in `32-bit` architecture C is `4` bytes, exactly same as the size of an entry. The following is the code of `create_page_entry` that should be defined in `paging.c`. Don't forget to include `paging.h` in `paging.c`. ```{.c} int create_page_entry( int base_address, char present, char writable, char privilege_level, char cache_enabled, char write_through_cache, char accessed, char page_size, char dirty ) @@ -231,17 +231,17 @@ int create_page_entry( int base_address, char present, char writable, char privi } ``` -As you can see, each parameter of `create_page_entry` represents a field in the entry of page directory/table, the possible values of all of them but `base_address` are either `0` or `1`, the meaning of each value depends on the flag itself and we already have covered them. By using bitwise operations we put each flag in its correct place. +As you can see, each parameter of `create_page_entry` represents a field in the entry of a page directory/table, the possible values of each one of them but the `base_address` are either `0` or `1`. The meaning of each value depends on the flag itself and we already have covered them. By using bitwise operations we put each flag in its correct place. -The base address represents the base memory address of a page table in case we are creating a page directory entry, while it represents the base memory address of a page frame in case we are creating a page table entry. This base address will be `OR`red with the value that is generated to represent the properties of the entity that the current entry is pointing to, we will discuss more details about the base memory address when we start talking about page-aligned entries. +The base address represents the base memory address of a page table in case we are creating a page directory entry, while it represents the base memory address of a page frame in case we are creating a page table entry. This base address is then `OR`red with the value that is generated to represent the properties of the entity that the current entry is pointing to, we will discuss the base memory address in more detail when we start talking about page-aligned entries. -Now we can use `create_page_entry` to implement the function `paging_init` which should reside in `paging.c`. This function will be called when the kernel switches to protected-mode, as the usual with initialization functions, its job is creating the kernel's page directory and kernel's page tables that implement one-to-one map based on the sizes that defined in the macros `PDE_NUM` and `PTE_NUM`. The code of `paging_init` is the following. +Now we can use `create_page_entry` to implement the function `paging_init` which should reside in `paging.c`. This function will be called when the kernel switches to protected-mode. As usual for initialization functions, its job is to create the kernel's page directory and kernel's page tables that implement a one-to-one map based on the sizes that are defined in the macros `PDE_NUM` and `PTE_NUM`. The code of `paging_init` is the following. ```{.c} void paging_init() { - // PART 1: - + // PART 1: + unsigned int curr_page_frame = 0; page_directory = kalloc( 4 * 1024 ); @@ -255,51 +255,51 @@ void paging_init() page_directory[ currPDE ] = create_page_entry( pagetable, 1, 0, 0, 1, 1, 0, 0, 0 ); } - - // ... // - - // PART 2 - + + // ... // + + // PART 2 + load_page_directory(); enable_paging(); } ``` -For the sake of simpler discussion, I have divided the code of the function into two parts and each part is indicated by a heading comment. The job of the first part is to create the page directory and the page tables. Based on the default values of `PDE_NUM` and `PTE_NUM`, three entries will be defined in the page directory, each one of them points to a page table that contains `1024` entries. +To simplify the discussion, we've divided the code of the function into two parts, each part is indicated by a heading comment. The job of the first part is to create the page directory and the page tables. Based on the default values of `PDE_NUM` and `PTE_NUM`, three entries will be defined in the page directory, each one of them points to a page table that contains `1024` entries. -First, we allocate `4 * 1024` from the kernel's heap for the page directory, that's because the size of each entry is `4` bytes, as you can see, while we need only three entries for the page directory, we are allocating memory for `1024` entries instead, the reason of that is the following: the base memory address of a page table should be page-aligned, also, the base memory address of a page frame should be page-aligned. When the page size is `4KB`, then a memory address that we can describe as a *page-aligned memory address* is the one that is a multiple of `4KB`, that is, a multiple of `4096`. In other words, it should be dividable by `4096` with no remainder. The first six multiples of `4KB` are `0 = 4096 * 0`, `4096 = 4096 * 1`, `8192 = 4096 * 2` (`8KB`), `12288 = 4096 * 3` (`12KB`), `16384 = 4096 * 4` (`16KB`), `20480 = 4096 * 5` (`20KB`) and so on. Each one of those value can be considered as a page-aligned memory address when the page size is `4KB`. +First, we allocate `4 * 1024` from the kernel's heap for the page directory, that's because the size of each entry is `4` bytes. As you can see, while we only need three entries for the page directory, we are allocating memory for `1024` entries instead. The reason for that is the following: the base memory address of a page table should be page-aligned, also, the base memory address of a page frame should be page-aligned. When the page size is `4KB`, then a memory address that we can describe as a *page-aligned memory address* is the one that is a multiple of `4KB`, that is, a multiple of `4096`. In other words, it should be divisible by `4096` with no remainder. The first six multiples of `4KB` are `0 = 4096 * 0`, `4096 = 4096 * 1`, `8192 = 4096 * 2` (`8KB`), `12288 = 4096 * 3` (`12KB`), `16384 = 4096 * 4` (`16KB`), `20480 = 4096 * 5` (`20KB`) and so on. Each one of those value can be considered as a page-aligned memory address when the page size is `4KB`. -Let's get back to the reason of allocating `4 * 1024` bytes for the page directory instead of `4 * 3` bytes. We know that memory allocator sets the base of the heap from the memory address `0x100000`, also, we know, based on the code order of the kernel that `paging_init` will be the first code ever that calls `kalloc`, that is, `kalloc` will be called the first time in 539kernel when we allocate a region for kernel's page directory in the line `page_directory = kalloc( 4 * 1024 );` which means that the memory address of kernel's page directory will be `0x100000` (`1048576d`) which is a page-aligned memory address since `1048576 / 4096 = 256` (in hexadecimal: 0x100000 / 0x1000 = 0x100) with no remainders. +Let's get back to the reason of allocating `4 * 1024` bytes for the page directory instead of `4 * 3` bytes. We know that memory allocator sets the base of the heap starting from the memory address `0x100000`. Also, we know, based on the order of calls in `kernel_main` that `paging_init` will be the first code that ever calls `kalloc`, that is, `kalloc` will be called the first time in the 539kernel when we allocate a region for kernel's page directory in the line `page_directory = kalloc( 4 * 1024 );` which means that the memory address of kernel's page directory will be `100000h` (`1048576d`) which is a page-aligned memory address since `1048576d / 4096d = 256d` (in hexadecimal: `100000h` / `1000h` = `100h`) with no remainders. -When we allocate `4 * 1024` bytes for the page directory (the first case), the next memory address that will be used by the memory allocator for the next allocation will be `1048576 + ( 4 * 1024 ) = 1052672` (`0x101000`) which is also a page-aligned memory address. The second case is when we allocate `4 * 3` bytes for the page directory instead, the next memory address that the memory allocator will use for the next allocation will be `1048576 + ( 4 * 3 ) = 1048588` (`0x10000C`) which is not a page-aligned memory address and cannot be used as a base memory address for a page table. +Consider for the fist case that we'd allocate `4 * 1024` bytes for the page directory. In this case, the next memory address that would be used by the memory allocator for the next allocation would be `1048576d + ( 4d * 1024d ) = 1052672d` (`101000h`) which is also a page-aligned memory address. Now, consider for the second case that we'd allocate `4 * 3` bytes for the page directory instead. Then the next memory address that the memory allocator would use for the next allocation would be `1048576d + ( 4d * 3d ) = 1048588d` (`10000Ch`) which is not a page-aligned memory address and cannot be used as a base memory address for a page table. -If you continue reading the function `paging_init` you will see that the next thing that will be allocated via `kalloc` after that page directory is the first page table which should be in a page-aligned memory address, due to that, we have used the first case which ensures that the next call of `kalloc` is going to return a page-aligned memory address instead of the second case which will not, of course, this is a quick and dirty solution. +If you continue reading the function `paging_init` you will see that the next thing that will be allocated via `kalloc` after that page directory is the first page table which should be in a page-aligned memory address, too. Due to that, we've followed the first case approach which ensures that the next call of `kalloc` is going to return a page-aligned memory address instead of the second case approach which would return a memory address that's not page-aligned. Of course, this is a brittle and a quick and dirty solution. -Getting back to the first part of `paging_init`, as you can see, it is too simple, it allocates regions from the kernel's heap for the page directory and the entries of the three page tables. Then each entry in both page table and page directory is being filled by using the function `create_page_entry`. Let's start with the line which defines entries in a page table. +Getting back to the first part of `paging_init`, as you can see, it allocates regions from the kernel's heap for the page directory and the entries of the three page tables. Then each entry in both, the page tables and the page directory is being filled by using the function `create_page_entry`. Let's start with the line which defines entries in a page table. ```{.c} create_page_entry( curr_page_frame * 4096, 1, 0, 0, 1, 1, 0, 0, 0 ) ``` -Given that the size of a page is `4KB`, then, page frame number `0` which is the first page frame starts at the physical memory address `0` and ends at physical memory address `4095`, in the same way, page frame `1` starts at the physical memory address `4096` and ends at the physical memory address `8191` and so on. In general, with one-to-one mapping, given `n` is the number of a page frame and the page size is `4KB`, then `n * 4096` is the physical memory address that this page frame starts at. We use this equation in the first parameter that we pass to `create_page_entry` when we create the entries that point to the page frames, that is, page tables entries. The local variable `curr_page_frame` denotes the current page frame that we are defining an entry for, and this variable is increased by `1` with each new page table entry. In this way we can ensure that the page tables that we are defining use a one-to-one map. +Given that the size of a page is `4KB`, then, the page frame number `0` which is the first page frame starts at the physical memory address `0` and ends at the physical memory address `4095`. In the same way, page frame `1` starts at the physical memory address `4096d` and ends at the physical memory address `8191d` and so on. In general, with a one-to-one mapping, given `n` is the number of a page frame and the page size is `4KB`, then `n * 4096` is the physical memory address that this page frame starts at. We use this calculation in the first parameter that we pass to `create_page_entry` when we create the entries that point to the page frames, that is, the page table entries. The local variable `curr_page_frame` denotes the current page frame index that we are defining an entry for, and this variable is increased by `1` with each new page table entry. This is how we can ensure that the page tables that we are defining use a one-to-one mapping. -As you can see from the rest of the parameters, for each entry in the page table, we set that the page frame is present, its cache is enabled and write-through policy is used. Also, the page frame belongs to supervisor privilege level and the page size is `4KB`. +As you can see from the rest of the parameters, for each entry in the page table, we specify that the page frame is present, its cache is enabled and the write-through policy is used. Also, the page frame belongs to the supervisor privilege level and the page size is `4KB`. -The code which define a new entry in the page directory is similar to the one which define an entry in a page table, the main difference is, of course, the base address which should be the memory address of the page table that belongs to the current entry of the page directory. When we allocate a memory region for the current page table that we are defining, its base memory address will be returned by `kalloc` and stored in the local variable `pagetable` which is used as the first parameter when we define an entry in the page directory. +The code which defines a new entry in the page directory is similar to the one which defines an entry in a page table, the main difference is, of course, the base address which should be the memory address of the page table that pointed to by the current page directory entry. When we allocate a memory region for the current page table, its base memory address is returned by `kalloc` and stored in the local variable `pagetable` which is used as the first parameter when we define an entry in the page directory. -##### The Need of Page-aligned Memory Addresses -In the previous section we have discussed the meaning of a page-aligned memory address, and we stated the fact that any base memory address that is defined in a page directory/table entry should be a page-aligned memory address. Why? You may ask. +##### The Need of Page-Aligned Memory Addresses +In the previous section we have discussed the meaning of a page-aligned memory address, and we stated the fact that any base memory address that is defined in a page directory/table entry should be a page-aligned memory address. Why, you may ask? -Recalling the structure of page directory/table entry, it is known that the number of bits that are dedicated for the base memory address are `20` bits (`2.5` bytes or `2` bytes and a nibble), also, we know that in `32-bit` architecture, the size of the largest memory address (`0xFFFFFFFF`) is of size `32` bits. +Recalling the structure of a page directory/table entry, we know that the number of bits that are dedicated for the base memory address are `20` bits (`2.5` bytes or `2` bytes and a nibble), also, we know that in the `32-bit` architecture, the largest memory address (`FFFFFFFFh`) has the size of `32` bits. -Now, assume that we want to define a page table entry that points to the last possible page frame which its base address is `0xFFFFF000`. To store this full address `32` bits are needed ^[Remember, each hexadecimal digit represents a nibble. One byte consists of two nibbles.] but only `20` bits are available for base memory address in the page table entry, so, how can we point to this page frame since we can't store its full address in the entry? +Now, assume that we want to define a page table entry that points to the last possible page frame with the base address `FFFFF000h`. To store this full address `32` bits are needed ^[Remember, each hexadecimal digit represents a nibble. One byte consists of two nibbles.] but only `20` bits are available for base memory address in the page table entry, so, how can we point to this page frame since we can't store its full address in the entry? -The numbers that we have defined previously as page-aligned numbers, in other words, the multiples of `4096`, have an interesting property when they are represented in hexadecimal format, they always end with three zeros! ^[And that makes sense, the first one of them after zero is `0x1000` (`4096d`) and to get the next one you need to add `0x1000` on the previous one and so on.] In our current example of the last possible page frame, we need to store `0xFFFFF000` as a base memory address, you can see that it ends with three zeros which means that this number is a page-aligned number. Removing the last three zeros of the example memory address gives us the value `0xFFFFF` which exactly needs `20` bits to be stored, so, due to that the base address the is stored in page directory/table should be a page-aligned memory address which makes it possible to remove the last three zeros from it and make its size `20` bits and later on the processor will be able to get the correct full base address from the `20`bits in the entry, simply, by appending three zeros to it. In `create_page_entry` the place of these three zeros were used to store the properties of the entry when we `OR`red the base address with the value that has been constructed to represent the properties. +The numbers that we have defined previously as page-aligned numbers, in other words, the multiples of `4096d`, have an interesting property when they are represented in hexadecimal format, they always end with three zeros! ^[And that makes sense, the first one of them after zero is `1000h` (`4096d`) and to get the next one you need to add `1000h` to the previous one and so on.] In our current example of the last possible page frame, we need to store `FFFFF000h` as the base memory address, you can see that it ends with three zeros which means that this number is a page-aligned number. Removing the last three zeros of the example memory address gives us the value `FFFFFh` which needs exactly `20` bits to be stored. So, due to that the base address that is stored in the page directory/table should be a page-aligned memory address which makes it possible to remove the last three zeros from it and reduce its size to `20` bits. Later on the processor will be able to get the correct full base address from the `20` bits in the entry, simply by appending three zeros to it (multiplying it with `1000h`). In `create_page_entry` the place of these three zeros was used to store the flags of the entry when we `OR`red the base address with the value that has been constructed to represent the flags. #### Loading Kernel's Page Directory and Enabling Paging -The second part of the function `paging_init` performs two operations, the first one is loading the content of the global variable `page_directory` in the register `CR3`, that is, loading the kernel's page directory so that the processor can use it when the second operation, which enables the paging, is performed. +The second part of the function `paging_init` performs two operations. The first one is loading the contents of the global variable `page_directory` into the register `CR3`, that is, loading the kernel's page directory so that the processor can use it when the second operation, which enables the paging, is performed. -Because both of these functions need to access the registers directly, they will be written in assembly in the file `starter.asm`. Till now, it is the first time that we define a function in assembly and use it in C code, to do that we need to add the following lines in the beginning of `starter.asm` after `extern run_next_process`. +Because both of these functions need to access the registers directly, they will be written in assembly in the file `starter.asm`. It's the first time that we define a function in assembly and use it in C code, to do that we need to add the following lines at the beginning of `starter.asm` after `extern run_next_process`. ```{.asm} extern page_directory @@ -308,9 +308,9 @@ global load_page_directory global enable_paging ``` -There is nothing new in the first line. We are telling NASM that there is a symbol named `page_directory` that will be used in the assembly code, but it isn't defined in it, instead it's defined in a place that the linker is going to tell you about in the future. As you know, `page_directory` is the global variable that we have defined in `paging.h` and holds the memory address of the kernel's page directory, it will be used in the code of `load_page_directory`. +There is nothing new in the first line. We are telling NASM that there is a symbol named `page_directory` that will be used in the assembly code, but it isn't defined in it, instead it's defined in a place that the linker is going to know about in the future. As you know, `page_directory` is the global variable that we've defined in `paging.h` and holds the memory address of the kernel's page directory, it will be used in the code of `load_page_directory`. -The last two lines are new, what we are telling NASM here is that there will be two labels in current assemble code named `load_page_directory` and `enable_paging`, both of them should be global, that is, they should be reachable by places other than the current assembly code, in our case, it's the C code of the kernel. The following is the code of those functions, they reside in `starter.asm` below the line `bits 32` since they are going to run in `32-bit` environment. +The last two lines are new, what we are telling NASM here is that there will be two labels in current assembly code named `load_page_directory` and `enable_paging`, both of them should be global, that is, they should be reachable by places other than the current assembly code file. In our case, they will be used by the C code of the kernel. The following is the code of those functions, they reside in `starter.asm` below the line `bits 32` since they are going to run in the `32-bit` environment. ```{.asm} load_page_directory: @@ -327,30 +327,29 @@ enable_paging: ret ``` -There is nothing new here. In the first function we load the content of `page_directory` into the register `CR3` and in the second function we use bitwise operation to modify bit `31` in `CR0` and sets its value to `1` which means enable paging. Finally, `paging_init` should be called by `kernel_main` right after `heap_init`, the full list of calls in the proper order is the following. +There's nothing new in here. In the first function we load the contents of the variable `page_directory` into the register `CR3` and in the second function we use the bitwise operation `OR` to modify the bit `31` in `CR0` and set its value to `1` which enables paging. Finally, `paging_init` should be called by `kernel_main` right after `heap_init`, the full list of calls in the proper order is the following. ```{.c} heap_init(); -paging_init(); +paging_init(); screen_init(); process_init(); scheduler_init(); ``` ### Finishing up Version `G` -And now version `G` of 539kernel is ready. It contains a basic memory allocator and a basic paging. The following is its `Makefile` which adds the new files to the compilation list. +And now version `G` of the 539kernel is ready. It contains a basic memory allocator and a basic paging. The following is its `Makefile` which adds the new files to the compilation list. ```{.makefile} ASM = nasm CC = gcc BOOTSTRAP_FILE = bootstrap.asm -SIMPLE_KERNEL = simple_kernel.asm INIT_KERNEL_FILES = starter.asm KERNEL_FILES = main.c -KERNEL_FLAGS = -Wall -m32 -c -ffreestanding -fno-asynchronous-unwind-tables -fno-pie +KERNEL_FLAGS = -Wall -m32 -c -ffreestanding -fcommon -fno-asynchronous-unwind-tables -fno-pie -fno-stack-protector KERNEL_OBJECT = -o kernel.elf -build: $(BOOTSTRAP_FILE) $(KERNEL_FILE) +build: $(BOOTSTRAP_FILE) $(INIT_KERNEL_FILES) $(KERNEL_FILES) screen.c process.c scheduler.c heap.c paging.c $(ASM) -f bin $(BOOTSTRAP_FILE) -o bootstrap.o $(ASM) -f elf32 $(INIT_KERNEL_FILES) -o starter.o $(CC) $(KERNEL_FLAGS) $(KERNEL_FILES) $(KERNEL_OBJECT)