操作系统

195 阅读20分钟

fork

Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent's page tables, and to create a unique task structure (pcb) for the child.

pcb

  • Some important information stored in the PCB include:
  1. The process identifier. Every executing process typically has a unique identifier in every OS. For example, this identifier is referred to as pid in Linux.
  2. The identifier and pointers to its parent and child processes.
  3. The current state of the process (running / blocked etc.).
  4. Pointer to the kernel stack of the process for execution in kernel mode. The PCB may also hold pointers to various data structures on the kernel stack, such as the process context, consisting of the values in its program counter and other CPU registers. This information is not updated in the PCB every time the registers change, but is only stored in the PCB when the process context has to be saved, e.g., when moving from user mode to kernel mode, or during a context switch.
  5. Information related to scheduling the process (e.g., its priority for scheduling).
  6. Information related to the memory management of the process (e.g., pointers to page tables).
  7. Information about list of open files, files descriptors, and other information pertaining to I/O activity.

子进程可以访问父进程的全局变量吗

blog.nowcoder.net/n/a6cdab4f1…

  • 如果父子进程只是对全局变量做读操作,则父子进程在内存***享同一份全局变量。
  • 如果父子进程中的任何一个对变量做修改操作,会在内存中拷贝一份副本,然后在这个副本上进行修改,父子进程访问到的就是不同的数据
  • 为什么父子进程打印出来的全局变量地址一样呢?
    原因在于这个地址是虚拟地址,而因为子进程的创建是复制了父进程的虚拟地址空间的,因为这两个变量的虚拟地址也是一样的。(我们打印变量的地址都是虚拟地址,物理内存地址是不能够直接访问的)

page fault

www.geeksforgeeks.org/page-fault-…

A page fault occurs when a program attempts to access data or code that is in its address space, but is not currently located in the system RAM.

image.png

linux 中的最大进程数

/proc/sys/kernel/pid_max is maximum value for PID

ulimit -u is maximum value for number of processes for current user

Note When a new process is created, it is assigned next number available of kernel processes counter. When it reached pid_max, the kernel restart the processes counter to 300.

一个进程最多可以创建多少个线程

image.png

32 位系统的内核空间占用 1G ,位于最高处,剩下的 3G 是用户空间;
64 位系统的内核空间和用户空间都是 128T ,分别占据整个内存空间的最高和最低处,剩下的中间部分是未定义的。

跟两个东西有关系:
进程的虚拟内存空间上限,因为创建一个线程,操作系统需要为其分配一个栈空间,如果线程数量越多,所需的栈空间就要越大,那么虚拟内存就会占用的越多。
系统参数限制,虽然 Linux 并没有内核参数来控制单个进程创建的最大线程个数,但是有系统级别的参数来控制整个系统的最大线程个数。

image.png 我们先看看,在进程里创建一个线程需要消耗多少虚拟内存大小?

我们可以执行 ulimit -a 这条命令,查看进程创建线程时默认分配的栈空间大小,比如我这台服务器默认分配给线程的栈空间大小为 8M。

下面内核参数的大小,都会影响创建线程的上限:

/proc/sys/kernel/threads-max,表示系统支持的最大线程数,默认值是 14553  

/proc/sys/vm/max_map_count,the maximum number of memory map areas a process may have,如果它的值很小,也会导致创建线程失败,默认值是 65530

简单总结下:
32 系统,用户态的虚拟空间只有 3G,如果创建线程时分配的栈空间是 10M,那么一个进程最多只能创建 300 个左右的线程。
64 系统,用户态的虚拟空间大到有 128T,如果按创建一个线程需占用 10M 栈空间的情况来算,那么理论上可以创建 128T/10M 个线程,也就是 1000多万个线程,理论上不会受虚拟内存大小的限制,而会受系统的参数或性能限制。

ulimit

it is used to see, set, or limit the resource usage of the current user.

admin@192 ~ % ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8176
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       2666
-n: file descriptors                256

What are Soft limits and Hard limits in Linux? The soft limits are the limits which are allocated for actual processing of application or users while the Hard limits are nothing but an upper bound to the values of soft limits.

内存乱序访问

程序在运行时内存实际的访问顺序和程序代码编写的访问顺序不一定一致,这就是内存乱序访问。内存乱序访问行为出现的理由是为了提升程序运行时的性能。内存乱序访问主要发生在两个阶段:

  1. 编译时,编译器优化导致内存乱序访问(指令重排)
  2. 运行时,多 CPU 间交互引起内存乱序访问

memory barrior

● Memory barriers are required to ensure correct order of cross-CPU memory updates

● Two memory barriers are common ● Write ● Read

零拷贝

image.png 传统四次拷贝:
首先通过系统调用将文件数据读入到内核态Buffer(DMA拷贝),
然后应用程序将内核态Buffer数据读入到用户态Buffer(CPU拷贝),
接着用户程序通过Socket发送数据时将用户态Buffer数据拷贝到内核态Buffer(CPU拷贝),
最后通过DMA拷贝将数据拷贝到NIC Buffer。伴随四次切换。

mmap+write

mmap : map files or devices into memory

mmap : 内存映射,将文件映射到内核缓冲区,同时,用户空间可以直接读写内核空间的数据,减少内核空间到用户空间的拷贝次数
(If it is a disk file mapping) When the process accesses this mapped address space, a page-out exception is raised and the data is copied from disk to physical memory. The user space can then read and write directly to this physical memory in the kernel space, eliminating the need for a copy between the user space and the kernel space.而系统会自动回写脏页面到对应的文件磁盘上

sendfile

Linux 2.4+内核通过sendfile系统调用,提供了零拷贝。

数据通过DMA拷贝到内核态Buffer后,
直接通过DMA拷贝到NIC Buffer,无需CPU拷贝。

  • 减少数据拷贝外,
  • 减少系统调用和上下文切换

Kafka则是使用mmap+write持久化数据,发送数据使用sendfile

如果open的file不close , 会有什么样的影响

  1. 操作系统有最大打开文件的限制, 如果超过了打开文件会报错

  2. 一个进程如果打开了一个文件, 不close, 程序结束后, 操作系统会自动回收释放

僵尸进程 孤儿进程

僵尸进程:一个进程使用fok创建子进程,如果子进程退出,而父进程并没有调用wait或waitpid获取子进程的状态信息,那么子进程的进程描述符仍然保存在系统中。这种进程称之为僵尸进程。

孤儿进程:一个父进程退出,而它的一个或多个子进程还在运行,那么那些子进程将成为孤儿进程。孤儿进程将被init进程(进程号为1)所收养,并由init进程对 它们完成状态收集工作。

寄存器

baskent.edu.tr/~tkaracay/e…

  • AX, BX, CX, DX,
  • CS, DS, ES, SS,
  • SI, DI
  • SP, BP, IP,
  • Flags

They are all 16-bits. You can treat it as if they are word (or unsigned integer) variables. However, each registers has its own use.

  • AX, BX, CX, and DX are general purpose registers. They can be assigned to any value you want. Of course you need to adjust it into your need.
  • AX is usually called accumulator register, or just accumulator. Most of arithmatical operations are done with AX. Sometimes other general purpose registers can also be involved in arithmatical operation, such as DX. The register
  • BX is usually called base register. The common use is to do array operations. BX is usually worked with other registers, most notably SP to point to stacks.
  • CX is commonly called counter register. This register is used for counter purposes. That's why our PC can do looping.
  • DX is the data register. It is usually for reserving data value.

The registers CS, DS, ES, and SS are called segment registers. You may not fiddle with these registers. You can only use them in the correct ways only.

  • CS is called code segment register. It points to the segment of the running program. We may NOT modify CS directly.
  • DS is called data segment register. It points to the segment of the data used by the running program. You can point this to anywhere you want as long as it contains the desired data.
  • ES is called extra segment register. It is usually used with DI and doing pointers things.
    • The couple DS:SI and ES:DI are commonly used to do string operations.
  • SS is called stack segment register. It points to stack segment.

The register SI and DI are called index registers. These registers are usually used to process arrays or strings. SI is called source index and DI is destination index. As the name follows, SI is always pointed to the source array and DI is always pointed to the destination. This is usually used to move a block of data, such as records (or structures) and arrays. These register is commonly coupled with DS and ES.

The register BP, SP, and IP are called pointer registers. BP is base pointer, SP is stack pointer, and IP is instruction pointer. Usually BP is used for preserving space to use local variables. SP is used to point the current stack. Although SP can be modified easily, you must be cautious. It's because doing the wrong thing with this register could cause your program in ruin. IP denotes the current pointer of the running program. It is always coupled with CS and it is NOT modifiable. So, the couple of CS:IP is a pointer pointing to the current instruction of running program. You can NOT access CS nor IP directly.

The flag register is used to store the current status of the processor. It holds the value of which the programmers may need to access. These involves detecting whether the last arithmatic holds zero result or may be overflow. You can only modify flag from stack.

The general registers AX, BX, CX, and DX are 16-bit. However, they are composed from two smaller registers. For example: AX. The high 8-bit is called AH, and the low 8-bit is called AL. Both AH and AL can be accessed directly. However, since they altogether embodied AX, modifying AH is modifying the high 8-bit of AX. Modifying AL is modifying the low 8-bit of AX. Here's a picture that may enlighten you :-)

image.png

image.png keleshev.com/eax-x86-reg…

interrupt

What is interrupts exactly? It is like its name: interrupts. It interrupts processes. Upon a request of an interrupt, the processor usually stores only the CS:IP and flag state of the running program, then it goes to the interrupt routine. After processing the interrupt, the processor restores all states stored and resume the program. There are three kind of interrupts: hardware (other than CPU) interrupts, software interrupts, and CPU-generated interrupts.

Hardware interrupts occurs if one of the hardware inside your computer needs immediate processing. Delaying the process could cause unpredictable, or even, catastrophic effects. Keyboard interrupt is one of the example. If you pressing a key in your keyboard, you generate an interrupt. Keyboard chips notify the processor that they have a character to send. Can you imagine if the processor ignores the request and go on with his own business? Your key is never processed! :-)

Software interrupts occurs if the running program requests the program to be interrupted and do something else. It is usually like waiting the user input from keyboard, or may be request the graphic driver to initialize itself to graphic screen.

CPU-generated interrupts occurs if the processor knows that is something wrong with the running code. It is usually directed for crash protection. If your program contains instructions that processor doesn't know, the processor interrupts your program. It also happens if you divide a number with 0.

中断过程

  • 中断请求
  • 中断响应
  • 保护现场
  • 定位中断服务程序
  • 中断处理
  • 中断返回

[中断和轮询有什么区别?]

  • 轮询:CPU对特定设备轮流询问。中断:通过特定事件提醒 CPU

原码反码补码移码

为了解决原码做减法的问题, 出现了反码:(反码理解: 12小时的钟表,往前4小时 和 往后8小时是一样的)

发现用反码计算减法,结果的真值部分是正确的。而唯一的问题其实就出现在"0"这个特殊的数值上,虽然人们理解上 +0和-0是一样的,但是0带符号是没有任何意义的,而且会有[0000 0000]原和[1000 0000]原两个编码表示0。

于是补码的出现,解决了0的符号问题以及0的两个编码问题:

虚拟内存

image.png

[进程与线程的切换流程?]

进程切换分两步:

1、切换虚拟地址空间
2、切换内核栈和硬件上下文

对于linux来说,线程和进程的最大区别就在于地址空间,对于线程切换,第1步是不需要做的,第2步是进程和线程切换都要做的。

当发生线程上下文切换时,需要从操作系统用户 态转移到内核态,记录上一个线程的重要寄存器值(例如栈寄存器 SP)、进程状态等信息,这些信息存储在操作系统线程控制块 (Thread Control Block)中。当切换到下一个要执行的线程时,需 要加载重要的CPU寄存器值,并从内核态转移到操作系统用户态。如果 线程在上下文切换时属于不同的进程,那么需要更新额外的状态信息 及内存地址空间,同时将新的页表(Page Tables )导入内存。

[为什么虚拟地址空间切换会比较耗时?]

因为每个进程都有自己的虚拟地址空间,而线程是共享所在进程的虚拟地址空间的,因此同一个进程中的线程进行线程切换时不涉及虚拟地址空间的转换。把虚拟地址转换为物理地址需要查找页表,页表查找是一个很慢的过程,因此通常使用TLB(Translation Lookaside Buffer)来缓存页地址,用来加速页表查找。当进程切换后页表也要进行切换,页表切换后TLB就失效了,那么虚拟地址转换为物理地址就会变慢,表现出来的就是程序运行会变慢,而线程切换则不会导致TLB失效,因为线程线程无需切换地址空间,因此我们通常说线程切换要比较进程切换快,原因就在这里。

进程之间的上下文切换最大的问题在于内存地址空间的切换导致 的缓存失效(例如CPU中用于缓存虚拟地址与物理地址之间映射的TLB 表),所以不同进程的切换要显著慢于同一进程中线程的切换。现代 的CPU使用了快速上下文切换(Rapid Context Switch)技术来解决不 同进程切换带来的缓存失效问题。

linux 常见signal

www.man7.org/linux/man-p…

  • SIGINT:程序终止信号。程序运行过程中,按 Ctrl+C 键将产生该信号。
  • SIGTERM:结束进程信号。shell 下执行 kill 进程 pid 发送该信号。
  • SIGKILL:用户终止进程执行信号。shell 下执行 kill -9 发送该信号。
  • SIGHUP: Hangup detected on controlling terminal or death of controlling process
  • SIGCHLD: Child stopped or terminated

分页

The logical address space of every process is divided into fixed-size (e.g., 4KB) chunks called pages. The physical memory is divided into fixed size chunks called frames, which are typically the same size as pages.

How are virtual addresses mapped to physical addresses with paging

The virtual address is first split into a page number and an offset within the page. The page number is mapped to the physical frame number by looking up the page table of the process. The physical address is then obtained from the physical frame number and the offset within the frame. Who does this translation from CPU-generated logical addresses to physical addresses? The OS takes the responsibility of constructing and maintaining page tables during the lifecycle of a process; the PCB contains a pointer to the page table. The OS also maintains a pointer to the page table of the current process in a special CPU register (CR3 in x86), and updates this pointer during context switches. A specialized piece of hardware called the memory management unit / MMU then uses the page table to translate logical addresses requested by the CPU to physical addresses using the logic described above.

  • 页表记录页号到物理块号的映射
  • 访问分页系统中内存数据需要两次的内存访问(一次是从内存中访问页表,从中找到指定的物理块号,加上页内偏移得到实际物理地址;第二次就是根据第一次得到的物理地址访问内存取出数据)。

分段

Segmentation is another way of allocating physical memory to a process. With segmentation, the process memory image is divided into segments corresponding to code, data, stack, and so on. Each segment is then given a contiguous chunk of physical memory. A segment table stores the mapping between the segments of a process and the base/limit addresses of that segment. Most modern operating systems, however, use paging, or a combination of segmentation and paging. Unix-like operating systems make minimal use of segmentation.

swap space

当内存资源不足时,Linux 把某些页的内容转移至硬盘上的一块空间上,以释放内存空间。硬盘上的那块空间叫做交换空间(swap space),而这一过程被称为交换(swapping)。

用途:

  • 物理内存不足时一些不常用的页可以被交换出去,腾给系统。
  • 程序启动时很多内存页被用来初始化,之后便不再需要,可以交换出去。

虚拟内存

With a separation of virtual address space and physical address space in modern operating systems, each process can have a large virtual address space. In fact, the combined virtual address space of all processes can be much larger than the physical memory available on the machine, and logical pages can be mapped to physical frames only on a need basis. This concept is called demand paging, and is quite common in modern operating systems. With demand paging, the memory allocated to a process is also called virtual memory, because not all of it corresponds to physical memory in hardware.

[页面替换算法有哪些?]

在程序运行过程中,如果要访问的页面不在内存中,就发生缺页中断从而将该页调入内存中。此时如果内存已无空闲空间,系统必须从内存中调出一个页面到磁盘对换区中来腾出空间。

  • 先进先出:选择换出的页面是最先进入的页面。该算法将那些经常被访问的页面也被换出,从而使缺页率升高。
  • 最佳算法:所选择的被换出的页面将是最长时间内不再被访问,通常可以保证获得最低的缺页率。这是一种理论上的算法,因为无法知道一个页面多长时间不再被访问。
  • LRU:虽然无法知道将来要使用的页面情况,但是可以知道过去使用页面的情况。 LRU将最近最久未使用的页面换出。为了实现 LRU,需要在内存中维护一个所有页面的链表。当一个页面被访问时,将这个页面移到链表表头。这样就能保证链表表尾的页面是最近最久未访问的。因为每次访问都需要更新链表,因此这种方式实现的 LRU代价很高。
  • **时钟算法(second chance algorithm):when the OS needs a free page, it runs through the queue of pages, much like in FIFO. However, if the reference bit is set, the OS skips that page, and clears the bit (for the future). The OS walks through the list of pages, clearing set bits, until a page with no reference bit set is found.

缓冲区溢出

缓冲区溢出是指当计算机向缓冲区填充数据时超出了缓冲区本身的容量,溢出的数据覆盖在合法数据上。

危害有以下两点:

1、程序崩溃,导致拒绝服务

2、跳转并且执行一段恶意代码

原因:造成缓冲区溢出的主要原因是程序中没有仔细检查用户输入。

[硬链接和软链接有什么区别?]

  • 硬链接就是在目录下创建一个条目,记录着文件名与 inode 编号,这个 inode 就是源文件的 inode。删除任意一个条目,文件还是存在,只要引用数量不为0。但是硬链接有限制,它不能跨越文件系统,也不能对目录进行链接。
  • 符号链接文件保存着源文件所在的绝对路径,在读取时会定位到源文件上,可以理解为 Windows的快捷方式。当源文件被删除了,链接文件就打不开了。因为记录的是路径,所以可以为目录建立符号链接
 ln [参数][源文件或目录][目标文件或目录]

缓存层次

image.png

为什么内存读写性能一定高于磁盘

内存是电子存储介质,可以在纳秒级别(10^-9秒)的时间内完成读写操作。 而磁盘是机械设备,需要寻找和移动读写头,这会导致较高的延迟,通常在毫秒级别(10^-3秒)或更长的时间内完成读写。

内存芯片位于计算机主板上,与处理器之间的距离非常近,数据传输路径很短。而磁盘通常是通过数据线连接到主板,数据传输路径较长

内存以随机存取的方式工作,可以直接通过内存地址访问任何数据。这使得内存的读写速度不受数据位置的影响。磁盘则需要按照物理磁道上的顺序读取数据,因此在读取大量非连续数据时会有明显的性能下降。

大端序 小端序

大端和小端是计算机体系结构中的两种字节序(byte ordering)。

image.png

为什么会有小端字节序?

答案是,计算机电路先处理低位字节,效率比较高,因为计算都是从低位开始的。所以,计算机的内部处理都是小端字节序。

但是,人类还是习惯读写大端字节序。所以,除了计算机的内部处理,其他的场合几乎都是大端字节序,比如网络传输和文件储存。

并发 和 并行

3个层次

1.线程级并发 单核处理器;进程间切换实现 多核处理器:

超线程: 超线程,有时称为同时多线程(simultaneous multi-threading),是一项允许一个 CPU执行多个控制流的技术。它涉及CPU某些硬件有多个备份,比如程序计数器和寄存器文件,而其他的硬件部分只有一份,比如执行浮点算术运算的单元。常规的处理器需要大约20000个时钟周期做不同线程间的转换,而超线程的处理器可以在单个周期的基础上决定要执行哪一个线程。这使得 CPU 能够更好地利用它的处理资源。比如,假设一个线程必须等到某些数据被装载到高速缓存中,那 CPU 就可以继续去执行另一个线程。举例来说Intel Core i7 处理器可以让每个核执行两个线程,所以一个4核的系统实际上可以并行地执行8个线程。

2.指令级并行 现代处理器可以同时执行多条指令的属性称为指令级并行。 其实每条指令从开始到结束需要大约20个或者更多周期,但是处理器使用了非常多的聪明技巧来同时处理多达 100条指令。在第4章中,我们会研究流水线(pipelining)的使用。在流水线中,将执行一条指令所需要的活动划分成不同的步骤,将处理器的硬件组织成一系列的阶段,每个阶段执行一个步骤。这些阶段可以并行地操作,用来处理不同指令的不同部分。我们会看到一个相当简单的硬件设计,它能够达到接近于一个时钟周期一条指令的执行速率。 如果处理器可以达到比一个周期一条指令更快的执行速率,就称之为超标量(super-scalar)处理器。大多数现代处理器都支持超标量操作。

3.单指令、多数据并行 在最低层次上,许多现代处理器拥有特殊的硬件,允许一条指令产生多个可以并行执行的操作,这种方式称为单指令、多数据,即 SIMD并行。例如,较新几代的 Intel 和AMD 处理器都具有并行地对8对单精度浮点数(C数据类型 float)做加法的指令。提供这些 SIMD 指令多是为了提高处理影像、声音和视频数据应用的执行速度。虽然有些编译器会试图从C程序中自动抽取 SIMD并行性,但是更可靠的方法是用编译器支持的特殊的向量数据类型来写程序,比如 GCC就支持向量数据类型。