UCB-CS162-操作系统笔记-六-UCB CS162 操作系统笔记（六） P19：Lecture 19： Files

UCB CS162 操作系统笔记（六）

P19：Lecture 19： Filesystems 1 Performance (Con't), Queueing Theory, File System - RubatoTheEmber - BV1L541117gr

Okay， everybody。

Welcome back。 Again， so we were talking last time about disks and SSDs。 And among other things。

this is an example of a disk that multiple platters two sides。

You think of it as a cylinder is it the set of all tracks， which are concentric circles。

moving all the way down through the stack。 And the reason we think of a cylinder as a thing is the heads move all together to a given。

cylinder and then you rotate to actually read the sectors。 Okay。

And the reason the heads are all tied together and you have to go together is because heads。

are really expensive， complicated technology and it's a commodity device and so you can't。

afford to have independent heads。 So we said there was a model of performance here， seek time。

which is the time to move， the head to the right cylinder rotational latency。

which is the time to get the right， sector underneath and then transfer time。

which is the time to transfer the blocks off。 And so a total model for disk latency is that it has that seek time plus rotation time plus。

transfer time， but it also has some other elements like queuing delay， which we'll talk。

about today and hardware controller time， which you can imagine that's just the time for the。

controller to tell the disk what to do。

Okay。 So we're and we also talked about kind of typical numbers。 So for instance。

if you look at a seagate 18 terabyte disk that's over one terabit per， square inch。

four to six milliseconds seek time and then the various rotational latencies。

you're going to see on a commodity product are somewhere between 3，600 RPM to 7，200 RPM。

but servers can go up to 15 or even 20，000 RPM。 Okay。

So were there any questions about disks before I move on？ So if you think about it。

the model there of moving the disk head into the right track。

and then rotating and then finding the data means that there's a significant advantage。

using locality to our advantage when we're starting to build file systems。

And so we're going to talk about that a little bit today and definitely next time。

Okay。 We also talked about SSDs that are made out of flash memory。

And I wanted to put this slide up， which I didn't last time to give you another idea。

of how flash works。 So this is silicon， dope silicon。

you probably all seen that in one of your 16 level classes。 Normally。

there's a single gate on top and a control line and you get a transistor。

When there's a high voltage， the current flows and when there isn't， it doesn't。

What you get with flash is you actually put two gates on there with insulator in between， them。

And so that floating gate you see right here， oops， wait a minute， that you see right。

Don't do them。 The transistor in the middle is insulated from both sides。

So you can either store a bunch of electrons on it or not。 And if you store electrons。

it changes the properties of the transistor enough that you， can detect it。 Okay。 All right。

And the reason things wear out with flash is because basically those electrons get stuck。

every now and then embedded in the insulation。 And so eventually when you get enough of those in there。

then it just doesn't work properly， anymore。 Okay。 And so the summary of SSD。

which is kind of where we were at last time， was really that。

the pros are really low latency and high throughput。 There's no moving parts。

which is a big advantage for reliability。 And you can read things essentially at memory speed。

The cons used to be that the storage of these devices was much smaller than hard drives and。

much more expensive， except that none of that's true anymore。

And I should remember I showed you that you could easily get a 15 terabyte SSD no problem。

Now the it's a little more expensive， but it's not a problem from getting to space。

Some other interesting problems that show up and we're not going to have a lot of time。

in this class to discuss the file systems for flash that are unique from disk。

But there is this weird notion that you have to erase a whole group of blocks at a time。

And so there's a lot more management that has to be done down in the controller of the。

SSD to make that work well。 Okay。 And also the SSD has to realize that things wear out。

So but things are changing rapidly。 And as an amusing thing， last time。

remember I showed you that 100 terabyte disk that was， a， you can put into a regular computer。

Now of course it was $40，000。 Really worth more than several computers， but you know。

you do this in the cloud perhaps。

Okay。 The last thing I wanted to mention is if you have any interest in persistent memory， it's。

always kind of fun to see what's coming up。

Right。 So for instance， when flash originally showed up， it had several different ways of using， it。

some of which were just like a memory card。 Others of which went into the SSD and had controllers and everything。

Similarly， there's a bunch of persistent memory technologies。

This is one of my favorite ones that doesn't quite exist yet， which is made out of nanotube， memory。

which and has a cross hash pattern and it's three dimensional for storing bits。

And the difference between a one and a zero is kind of whether all of the nanotubes are。

aligned or not。 And you can detect that difference with a resistance。

And the cool thing about nanotube memory is it doesn't wear out。 Okay。

And it's also potentially as fast as DRAM。 So and SSD is not that fast。 So someday。

there's a whole slew of possible technologies out there。

And it may be the fact that when we teach this class in five years or whatever， it'll。

be all about the fact that the memory is persistent。 There is no disk。

And the problem is when you reboot， it's got everything it had before。

So rebooting no longer is a good way to get rid of bugs。 Right。 So that's on the horizon。

That's not that far off。

Okay。 All right。 So let's change gears now。 Let's talk about ways of measuring performance before we get into the file systems themselves。

Okay。 And so you can measure things like times and rates。 You can measure latency。

which is the time to complete a task。 That's easy。 Like if I wanted to get a block off the disk。

how long would it take？ And that's measured in second units， right？ Seconds， milliseconds。

microseconds， hours， years。 The other thing， which is possibly similar is response time。

which is the time to initiate， an operation and get its response back。

which can be different from latency if you're thinking， about things like overhead and so on。

And then on a different fashion， is things like throughput and bandwidth， so how many。

accesses could I do per unit time or how much， how many bytes per second can I get？

And then there's the overhead， which is the time to actually start an operation。

So typically you go to send the controller a request。 There's some fixed amount of time。

probably in microseconds or milliseconds that needs， to lapse before the disk even gets the request。

And that's going to be pure overhead。 Okay。 So most IO operations are roughly linear like this。

This is a very simple model that often works， which is the latency as a function of the。

number of bits。 Remember little B is bits。 Is the overhead plus the bits divided by the transfer capacity。

which is sort of how， many bits per second you can get。 Okay。 Very simple model。

which we're going to show you something about in a moment。

This is ultimately what people care about， but what does it really mean？ Okay。 By the way。

the question here is why is this P bytes that's a type of order。

So what does it actually mean here to have performance？ Does it mean faster？

Does it mean lower latency？ Does it mean higher throughput？ So whenever somebody says。

does this have higher performance？ Your question ought to be， what do you mean？

That ought to be your first question if somebody says it performs more。 Okay。

Because you got to even know what you're measuring。 All right。 Now， here's an example here。

So suppose we have a link that's a gigabit per second。

That's a pretty standard ethernet link these days in low end places。

And that actually means that our， we can get 125 megabytes per second。 Okay。 Out of that link。

I just took one gigabit and divided by eight。 All right。 And the startup cost， let's say。

is a millisecond just for the sake of argument。 And so we can build something that looks like this。

which is the length in bits on the bottom， that we want to transfer in a packet。

and then we can have latency in blue and bandwidth， in red。

So I'm putting two different scales on the same graph。 I hope you don't mind that too much。

But what can we learn here？ Well， latency here has that startup cost plus the number of bits divided by this maximum。

rate。 Okay。 And you're going to have to normalize so that if you're using with bits， you deal with。

bits or bytes， you deal with bytes。 But this S plus B over B is the thing that looks like a straight line。

right？ Because it's linear in B。 So that's our blue。 And notice that it to transfer no bits。

there's still a millisecond。 That's why the intercept here is at a millisecond or a thousand microseconds。

which by the way， since my last Tuesday lecture， you guys are all that all these units figured out now。

right？ Yeah。 So we're dealing with powers of 10 or powers of two here。 Powers of 10， right？

Because we're dealing with bandwidth。 Okay。 So the effective bandwidth for transmitting so many bits is B over S plus B over B。

So that's the total number of bits I transfer divided by the latency。

And that gives me bits per second。 Okay。 And that's this red curve。

So notice that even though in the best case， I can get 125 megabytes per second， it takes。

a very large packet to overcome that overhead and get that full bandwidth。 Okay。 Questions。

Does that make sense？ All right。 So now we can talk about this key dotted line。

which is the half power point。 And that's or the half point。

And that's typically why it's not advancing。 There it is。 This is the number。

the value of B for which I have half of my bandwidth。 Okay。 And so usually the half power point。

that's what this dotted line is， is an interesting。

one because it means you're kind of not overwhelmed by the startup overhead。

It's kind of you're kind of getting along to getting real raw overhead。 So a link， by the way。

the question is what was a link in here？ We're talking about a network。

So a link is a an ethernet cable or something that's transmitting。 Okay。 So for this example。

the half power bandwidth occurs when B is 125 kilobytes。 Okay。 So B， by the way。

in answer to that previous question is actually a variable。 And we're looking at the units here。

This is kilobytes in this case。 So where does that come from？

We find out the point at which the value of B， which gives us the half full bandwidth。

half of the bandwidth， and that's 125 kilobytes。 Now what's interesting about this for me is if we make the overhead much larger like for。

a disk， what's suppose it's 10 milliseconds， then the same kind of idea， the half power。

bandwidth now is for a packet that's 1。25 megabytes in size。

Before our half power point here was 125 kilobytes。

That says that until you when you transmit a packet that's 125 kilobytes in size， you're。

going to get at least half of your bandwidth in use。

Here that 10 millisecond startup cost says that if we don't get 1。2 megabytes as a packet， size。

we don't get half of our full bandwidth。 Okay。 So when the overhead gets high。

you've got to figure out ways of getting rid of the overhead。

Either by making things arbitrarily large， which isn't always practical， or by， for instance。

in the case of disk， you make sure you don't seek very much and that will reduce the overhead。 Okay。

Questions？ Yeah。 So， I'm sorry I said this incorrectly earlier。 The lowercase b is the amount。

is a variable， it's the amount you're transmitting。 And I'm showing it in bits down here。 Okay。

And this is the capital B because this is a constant。 This is， this is bytes capital B。

So here again， this little b is a constant。 So what I really ought to do with these slides is I ought to get rid of the small b that's。

in the equation in bullet x。 Maybe that makes things less confusing。 Okay。 I'll do that。 All right。

All right。 Now， so what determines the peak bandwidth？ So remember the peak bandwidth is this。

this point that you would finally reach with an， arbitrarily large packet。

You'd hit the peak bandwidth and that's the guaranteed not to exceed bandwidth。 Okay。

So a one gigabit network link， the guaranteed not to exceed bandwidth is a gigabit， right？

But what determines that？ Well， in the case of ethernet， it's the protocol。

If it's a gigabit ethernet， you're not going to get better than a gigabit out of it。

But there's lots of things in systems that set that high point。 So in case of buses， you know。

you can look at this like PCIX is a very old bus。 That's a， you know， a gigabyte per second。

let's say， thunderbolt is another one that， a lot of you have on Intel machines。

which gets you 40 gigabits per second。 So varies widely。

So when you're sort of trying to figure out， am I going to get my full bandwidth， you have。

to start by figuring out what is the thing that's the bottleneck bandwidth here。

And so the device transfer bandwidth is another thing that could be setting a bottleneck。

So for instance， the rotational speed of the disk， if the disk is spins twice as fast， you're。

going to get twice the transfer rate off of that disk because the bits are going under。

the head twice as fast。 Okay。 And that's why high performance systems have 15，000 or 20。

000 RPM disks in them。 Okay。 Whatever is the bottleneck in the path tends to be the peak bandwidth。

Yes。 No， if we increase the disk speed， you mean how fast it's spinning？ That's going to。

it's going to do a couple of things。 Okay。 It's going to not only decrease the overhead of rotating to find your sector。

So that's going to go down。 That's overhead。 But the other thing is when you're reading off the disk。

the bits come off twice as fast。 Okay。 So it's， so the case of spinning the disk faster is actually speeding up a couple of things。

Now that latency for moving the head in and out， that's a challenge to make faster because。

that's a physical thing。 I mean， the nice thing about spinning is you spin it up to a certain speed and it stays。

there essentially。 Okay。 In terms of moving the head around。

it depends on how fast you can actually accurately move， the head。 And that's。

that's a harder one to improve。 It's been improving over time。

It used to be that the access latency was like 15 milliseconds or 20 milliseconds and。

now we're getting down to four and six。 So it's been improving， but it's not improving rapidly。

Okay。 So that's kind of pure overhead。 Any other questions？ Yeah。

So the half power bandwidth is what packet size， in this case it's， but packet size do。

I send such that I get half of the bandwidth out of my system。 Okay。

And the problem is that when I go to send a packet or take something off the disk， there's。

the raw overhead and then I get to get my full bandwidth out of it。

And so that because of the overhead， you can't just， what you need to do is figure out what。

is the largest packet that I need so that I get half of my bandwidth out of it。

That's the half power point。 Okay。 Okay。 All right。 Any other questions？ Now。

so the question here again， what does it mean exactly to get half the bandwidth？

It means that this red curve， which you say， here's a packet size， how much bandwidth am， I getting？

What is the packet size here where the bandwidth I get is half of the maximum？

That's the half power here。 Okay。 Clear？ And it's really， I mean， this previous slide showed here。

it's really the effective bandwidth， is having to take into account that overhead。

So if there was zero overhead and S was one or zero， I mean the effective bandwidth would。

be exactly a gigabit and there wouldn't be an issue of half power point， but we have， overhead here。

That's why we have to take that into account。 Okay。 Now， so hire the bandwidth the better。 Yes。

but you can't always get higher bandwidth， but you'll try to get what you can。

So let's look at a couple other things in terms of modeling。

So let's suppose we have some operation a server is going to do and it takes L latency and it。

always takes exactly L latency。 And we would get something that looks like this， okay。

where we do the first thing， then， we do the second thing， then we do the third thing。

They're all equal length in time。 And if that is true， which it's never going to happen by the way。

but if it were true and， say L was 10 milliseconds， then B which is the big B here script。

which is the number， of operations we can get per second is 100 because one over 10 milliseconds is 100 ops。

per second。 Okay。 And that's just the inverse of the link。 Now if the length was two years。

I don't know what this represents， maybe this represents， growing an orchid or something。

I don't know。 Then the number of orchids you get would be a half of an orchid per year。 Okay。

One over L。 So this is like the simplest scenario you could possibly have and it never happens this。

well。 Okay。 But let's just， this is sort of 61C material。 I'm reminding you of this。

So for instance， and this applies to processors， disk drives， whatever。 Here's a pipeline。

Suppose you've got your item that takes L but you can divide it into three separate。

pieces that can be done independent of each other。 Remember that from 61C everybody， right？

So we could have blue， gray and green be the different pipeline operations to get that， full L。

right？ So we first do a blue thing， then a gray thing， then a gray thing and we're done。

And so this would look like this。 Here's the first thing。

Notice the individual item still took a whole L to do but it was done by doing the first， third。

the second third， the third third。 And that means if we pipeline it like this。

that means while we're doing the blue thing， for the next operation。

we can be doing the green one for the current one or the gray one。

for the current one and the green one for the previous one。 So we're pipelining。

That's look familiar to everybody。 So now we can just talk about the rates。 So for instance。

if L is 10 milliseconds total and there were say four pipeline stages。

that means that instead of 100 ops per second， we can actually get 400 of them coming through。

because we've got the pipeline going。 Okay。 And really you could also say that each one of these little things is a quarter of what。

L is so basically we get four of them for L unit time。 All right。 Okay。 Now。

so there's lots of systems pipelines。 So the reason we talk about performance a bit with you guys is so that you can start。

thinking if I've got a system， what are my bottlenecks for bandwidth？ What's the limiting factor？

What are the overheads， et cetera？ So you can imagine here， for instance。

you've got a user program that makes a system called。

it to access the file and then it's got to go through the file system software and then。

maybe that's got to go through the upper half of the driver and the lower half of the。

driver and ultimately get to disk。 And so we can look at pipelines where we have kind of one process is making a system call。

Well， another one's working on the file system。 Well。

another one's in the upper part of the device driver and another one's in the lower。

part of the device driver。 So these pipelines kind of show up everywhere。

not just in processor pipelines like in 61c。 All right。

So anything with queues behaves roughly pipeline-like。 All right。

And those queues are going to be a problem for us。

We're going to have to talk about what's the actual queuing behavior。 Okay。 Now。

the other thing that you can do， you also heard about this in 61c， is I could have a。

bunch of L things and if I put a multi-core at them， I could have a bunch of L things。

happening in parallel。 All right。 And now I can split them up as well。

So if I have four cores and each thing takes 10 milliseconds， I can also get 400 ops coming。

out for once， but that's because they're now being done in parallel。 Okay。

You see the difference of pipelining， I split it into pieces and I've got a pipeline going。

Parallelism， each of them is kept the same and they all go through multiple of them at， once。

Those are just different ways of structuring systems。 And of course in 61c。

you saw this as multi-core。 Okay。 So lots of parallelism again。

where now we can have a bunch of user processes， all submitting。

syscalls at once and maybe there's a lot of parallelism in the file system and the drivers。

and there's multiple disks。 So we could have multiple things happening at once going through the system。

Okay。 So， could anybody tell me that if we wanted to exploit parallelism like this， what we're。

likely to run into？ Can anybody come up with what are the biggest issues that might show up here？

Synchronization problem。 Great。 Yes。 Okay。 Because if there's one file system and lots of people are accessing it。

you could imagine， you don't know anything about file systems quite yet。

but you could imagine that they're， all using shared data structures and if we don't have good synchronization。

it gets， all screwed up。 So parallelism， just like when we talked about thread level parallelism earlier in the term。

in the IO system， we're going to have to make sure our parallelism doesn't get in the way。

of correctness。 Okay。 This is like this class， you never lose the stuff we teach you early on。

you just have， to apply it to different topics。 Okay。 So。

let's go back to sort of a model of IO for a moment。

And here's a thread from a user and it is going to make generically some sort of IO， request。

I'm going to forget about file systems and stuff for a moment， say it's accessing the， disk。

So what happens is that request gets put into a queue。 Okay。

It could be a queue in front of the controller。 It could be a queue in the operating system or in the device driver。

And then there's a controller that pulls things off the queue and sends them off to the IO， device。

Okay。 And so the response time is going to typically be a queue time plus an IO device service time。

And so this is important。 That green thing is in the main path to getting your satisfied response。

right？ Because your request has to work all the way through the queue and then through the controller。

and get back。 So you could imagine that if the queue was completely full that this controller could。

be spitting things out really rapidly， but when you put your request in there， it still。

got to drain the queue down so you're in the front and then get to be processed。 Okay。

So a large queue translates into long latency。 Okay。 This is the McDonald's analogy， right？

You really want that McFlurry or whatever it is you're going after。 You go into the McDonald's。

there's a huge line and you got to wait through the line before。

you can even get to the counter to get served。 Okay。 And so that queue， believe it or not。

has a lot more importance to latency than is usually， recommended。

So when we're looking at the performance of an IO system， you know， we have our metrics。

like response time and throughput。 We can start talking about effective bandwidth here where we have the number of operations。

we're trying to do。 There might be some overhead and then there's something which is the time per operation。

That looks exactly like that simple model I gave you earlier， right？ Because it's pretty universal。

it shows up all over the place。 What's going to be different about this is we're going to start having to address things。

that look like this。 Okay。 This is kind of got a linearity to it or it's a slight concavity downward。

This is something else entirely。 Okay。 And notice how I've labeled it at the bottom， 0% to 100%。

What this represents is the full utilization of the device。

So 100% utilization means the devices go on all the time。 It's completely busy。 0% means it's idle。

We have an overhead， but notice this behavior where as you get closer and closer to 100% of。

the capacity， the latency goes greater and greater。

And there will be many things you encounter in your career at Berkeley that all have that。

response behavior。 Okay。 And you need today， we're going to talk about where some of that comes from。

but also you， need to be aware that this is a general engineering pattern。

And there's a couple of things to think about here。

If you design a bridge and you compute its maximum capacity for handling weight， call， that 100%。

Do you want to run the bridge at 100% capacity？ No， right。 That sounds very non clever。 Okay。

And so 100% anything once you've identified your 100% points becomes very important。

But the other thing that happens with cues is typically because of simple random behavior。

which we'll talk about in a moment， once you get closer and closer to that 100%， the latency。

goes higher and higher toward infinity。 Okay。 Now any real system， of course， can't go to infinity。

but it's going to get arbitrarily， large。 Can anybody guess what was happening in the point where we're kind of growing without。

bound？ The crew is growing。 Great。 That's exactly right。

And the cue is growing in a way that's much larger growth for a little tiny bit of change。

in utilization。 Okay。 And that's generic。 So contributing factors to latency。

if we think about the user thread， submits a request， and then they get a response back。 Software。

which is loosely modeled by a cue， the hardware controller， the I/O device service， time。

It's not easy to get in the way， but cueing behavior is very important。

And I don't know if any other classes that you've taken have focused on cues。

We're not going to do a lot of it， but I want to give you some simple ideas about cues just。

so we can have some back of the envelope kind of ways of talking about performance。 Okay。 Now。

let's start with something very simple。 So here's a simple deterministic world。

which unfortunately the world is not deterministic。

but it's always nice to pretend it is for a few slides。 So what do I mean here？

So I have a cue in front of a server。 And that server， by the way。

could be a controller plus a disk。 It could be McDonald's， whatever。

And so if we imagine that items are arriving exactly every T sub A seconds， exactly， then。

we know that the server only takes T sub S seconds to serve。 And notice that T sub S is what？

It's shorter than T sub A。 Significantly shorter。 Okay。 So what's going to happen here？ Well。

we're going to， the arrival will show up。 It'll get put on the cue。

but basically get taken off immediately because the cue is empty。 And then we'll serve T sub S。

And after we're done， the cue is now empty and the whole server sets idle。 Okay。

So from the standpoint of how busy that server is， it's clearly not 100%。 In fact。

it's less than half kind of the way I've got this diagram here， right？

So that server is really idle。 Okay。 And in this particular useful situation。

we're assuming determinism， so all requests are at， regular intervals。

There's a fixed time to process。 There's plenty of time in between。

We can start talking about a service rate， which is how fast is the server serving things。

It's one over T sub S。 So it takes T sub S time one over it is a service rate。 Okay。

Just like one over L was our service rate in a couple of previous slides。

And the arrival rate is how fast are things arriving that's one over T sub A。

What can you tell me about Lambda versus Mu？ What was that？ Yeah。 Yeah。

so Lambda one over T sub A is smaller than one over T sub S。 So this seems like a good situation。

Okay。 And typically when we're talking about utilization， it's Lambda over Mu。

And the fact that Lambda is less than Mu means we're not at that 100% point and we're okay。

All is good。 Okay。 Because the rate at which things are coming in is less than the rate at which they can go。

out。 Now， average rate is a complete world。 So this ideal world we're in， if we go from zero to one。

that's the 100% point at one， and we offer this low TS over TA。

what's great is we can serve this immediately。 And so the queue never builds up。

At most it has an item on there that gets taken off immediately。 So this is great。 Never happens。

This is great。 However， if you notice this scenario where we try to offer more than 100% load。

what's， going to happen？ Well， we're going to start saturating。

And so the device is going to be spitting things out as fast as it can， but where items。

are coming faster than the device can make it。 And so the queue is going to grow。 Okay。

So when we're in the mode where TS over TA is greater than one， what happens is we grow。

Our queue keeps getting longer and longer。 Okay。 So you could maybe say this is。

you're at that McDonald's， but there's a bunch of tour。

buses that keeps showing up and dumping people at the door， and they're all going for that。

McFlurry。 Okay。 Now， at the end of the day when the buses stop going， the queue will drain。

And we get back to being able to get our ice cream quickly。 Okay。 Now。

what's the queue wait time look like what's growing unbounded at a rate determined by TS， over TA？

Let's look at reality。 So reality， nothing's deterministic。 Okay。 I hate to break this to you。

If you don't know that already， maybe this is a sad day for you， but nothing in the world。

is deterministic。 Okay。 Even at the quantum level。 So what happens？

So let's look at a situation where we're going to have the same average rate of arrivals。 Okay。

So the average rate of arrivals are still going to be one over TA， but we're going to。

have bunches of them。 So there'll be a bunch of people and then the killed and then a bunch of other people。

Okay。 That's bursty。 All right。 And this is much better comparison with the bus because the bus shows off。

but people get， off and the bus leads and other bus shows up。 A bunch of people get off。

So look what happens here。 So the first item arrives and it's immediately pulled off the queue and it starts serving。

Okay。 So that so far looks exactly like the previous slide we had。

Then another one shows up at the next bursty time slot and that's white and now white is。

going to be sitting in the queue until the blue one is serviced。 Okay。 So this。

this white rectangle here is unhappy because they're not， they're going to take。

longer to get their ice cream than they would have expected if there's nobody in line。

And then orange comes along and orange。 Notice how they're each coming at these little ticks here。

So orange shows up。 It's in the queue， whites in the queue， in front of it。

And then gray comes along。 Okay。 And now we're at the end of a burst。

but notice we've got three items in the queue。 So the queue is no longer empty。

It's got things in it。 And as soon as blue finishes。

the great thing is that now white can get in the server and。

now orange is in the front and gray is next， right？ You guys see what's happening here。

And then we go down one more and then eventually the powder blue one gets serviced and we're。

good to go。 And the queue even has a period of time where it's empty。 Okay。

But we could set this up so that the rate in which they arrive over on average is exactly。

the same as the deterministic case。 But notice the difference here。

The difference is that there are some folks in here that are really unhappy。

Like look at this powder blue guy。 He arrived just a little bit after the dark blue one did。

But look at all the time they're stuck waiting in the queue。 Okay。

So queues in a bursty scenario start filling up。 Questions。 Office hours simplify。 Yes。 Very good。

Today， today I'm presenting a model of office hours。 Yeah， I apologize about that。

Are there any questions other than how close this is to the office hour situation？

So the moment we start getting burstiness or non-determinism， then we start having problems。

And the question is really， how do we model burstiness？

And there's a mathematical framework that works pretty well。

Where if you start with the exponential distribution like this， which says the time of the next。

arrival f of x is lambda e to the minus lambda x。 Okay。 Where the average amount。

the average arrival rate is one over lambda。 And so lambda is a rate。 This is called memoryless。

Okay。 And the curve looks like this， where I'm plotting the time between now and the next arrival。

Okay。 On the x axis。 And this is a probability distribution function because it tells me kind of what's the probability。

that I'll get somebody that has a， a， a， a， a interval of two。 Okay。

Does anybody know what the deal with memoryless arrivals are？ Yes。 Perfect。

So the conditional probability， if I already know I've waited two seconds， what's the probability。

of the time to keep waiting？ It turns out that if you do conditional probability for those of you that remember that and you。

rescale the graph， it looks exactly the same。 So this model is a great model for burst buses in Berkeley。

right？ You're sitting， waiting at the bus stop， and it doesn't matter how long you've waited。

The distribution of how long you're going to wait is the same when you rescale it。 Okay。

It's memoryless。 That's what this means。 Okay。 Now。

the good thing about a memoryless distribution is that it's easy to plug into things and use。

as a model。 The question you might ask is， is it realistic？ Well。

it turns out that a lot of physical processes when you combine a bunch of them。

together tend to have what looks like a memoryless arrival rate that comes out。

So oftentimes what will happen is if you don't really know what the actual distribution is。

oftentimes people will say， let's assume it's memoryless， let's compute the， let's compute。

the rate。 Okay。 Lambda。 And that will be， we'll use a memoryless distribution and we'll plug that into our formulas。

Now， there's lots of arguments that you can run into and in 70 and 170， they'll talk about。

this a lot。 You know， is that a fair characterization？ Not always。

But it is one that people use a lot。 Okay。 So what am I saying？

I'm saying that if you don't know anything better， you assume that they're all independent。

of each other and have a memoryless distribution and there's only lambdas， the one parameter。

you have to find。 Okay。 So what does this mean for this？ So if we had a burst。

a memoryless bursty behavior， what this really is， is there's some。

arrivals that are very close to each other and then there's some long tail arrivals。

So if you look here， there's a lot of short ones and there's some occasional long ones。

So we could talk about a queue with a memoryless arrival。 Okay。 And there's an average arrival time。

just like we talked about earlier， which is one over， Lambda。 So lots of short， a few long。 Okay。

So I will tell you that a lot of folks who are not trying to get a hyper accurate model。

will think about things in terms of memorylessness。 So we're going to use this moving forward。 Okay。

So let's quickly remind you of some general random distributions。

So you all know about if the server spends time t with customers， we could look at the。

distribution of service times。 So what you think about this is you get to the counter at McDonald's and this is the。

distribution of how long it takes to get your McFlurry。 Okay。

And there's an average in the middle and some distribution。 Okay。

So the mean is the what you get when you sum up the probability or sum up the area under， the curve。

Okay。 The variance is what you get when you sum up probability times the t minus the mean。 Okay。 So。

and you're familiar with the square root of variance， which is the standard deviation。

So this is the way you think about midterms one and two， right？ Where the mean is like 52。

unfortunately， and the standard deviation was， I don't know， 12 or something， right？

So the squared coefficient of variance is something you probably haven't seen too often。

And that is the variance， which is sigma squared divided by the mean squared。

And what's great about C is C has no units。 So we can come up with some equations for queuing that are unitless。

And if we know what C is， we can plug them in there and have a good， have a good approximation。

of what's going on。 So for instance， important values of C here are if C is zero。

we're back to determinism。 Why is that？ Well， if C is zero。

the only way that's going to happen is sigma squared is zero， which means。

there's no standard deviation， which everything takes exactly the same amount of time。

Or deterministic。 Another one is that this good old memory list thing。

remember this is how you talk about， buses in Berkeley， C is one。 So typically， when C is one。

it's often because it's a memory list situation。 And then finally， disk response times。

there's been a lot of people that have measured how， disks perform。

and they have a C that's a little bigger than one。 So it's not memory list， and really 1。

5 means that the majority of the seeks are less than， the average。

Can anybody come up with a reason why the majority of the seeks might be less than average latency？

Remember， seek is moving that it's an expensive move in the head in and out。 Yeah。 Say that again。

Okay。 Because a lot of things are on the same。 Yeah。 So the file is good。

The file systems that try to exploit locality， try to make sure that you don't move the head。

very much。 So if you do a really good job with your file system。

then you're going to have more locality， and most of your seeks are going to be less than the average。

And that's why oftentimes people will talk about C equal 1。5。 Okay。 Now， of course。

that only works if you build a good file system。 If you don't build a good file system。

you're not going to get that。 Okay。 Question。 Right。 This is all stuff you know， right？

So now let's talk a little bit about queuing。 Okay。 So queuing theory is a whole topic。

Anybody here taking any classes on queuing theory or had queuing theory talked about？ No。 All right。

We're the first。 I feel privileged。 So queuing theory is a very complicated topic that all sorts of。

you can take a whole class， on it， but I'm going to give you a very simple model here that works well for some of the。

things we talk about in this class。 And that model is that we have a queue in front of say a controller and a disk。

And we're going to look at the whole queuing system， which is what I've got in cyan here。

is having some arrivals that come in and some departures that go out。

And if you remember from high school chemistry， remember detailed balance was talked about。

at one point in chemistry， which is you get to a point where the rate of a reaction happening。

and the rate of the reaction， unhappening are meeting each other。 And that's the final stage。

So in queuing theory， you have arrivals are at the same rate as departures on average。

And in that case， you get a steady state。 All right。

So the queuing theory we talk about in this class in this lecture is steady state queuing， theory。

This is not something that deals with transient startup behavior。

That's a whole much more complicated topic。 Okay。 So when we talk about coming up with a queuing model for something。

we're talking about steady， state behavior。 All right。 By the way。

there's a series of books by somebody named Kleinrock。 Great books on queuing theory。

So if you feel like you want to learn a lot more about it， go for it。 Okay。 But for today。

we're steady state。 Okay。 And arrivals are characterized by some probability distribution。

We showed you one earlier that's possible memory list， right？ That's an option。 Okay。

And departures are characterized by some probabilistic distribution。

And the trick is what is the latency from the point that your request arrives to when。

you get your answer and also how long is the queue on average？

Those are two questions we care about。 Time and queue size。 Okay。

And so I want to introduce you to Little's Law。 Okay。 This is just a little law。 It's a good one。

Okay。 Little was his last name。 But the idea is if you have a situation with arrivals coming into some system and departures。

and the arrivals are coming in at some rate， lambda and in general， there are L items， in the queue。

then what can you say about this？ Well， in any stable system， as we just said。

the arrival rate is equal to the departure， rate and the average number of jobs in the system， N。

I'm sorry， I said earlier that L， was number of jobs。 L is the latency that you wait。

What you can see here is that the number of jobs N is equal to lambda times L， period。

So if you know what lambda is and you know the average latency it takes to get through， the system。

then you know how many jobs are stacking up there。

And what's interesting about this is this doesn't matter whether you have memoryless。

arrivals or anything you like。 It works under all circumstances。 Okay。

so regardless of the structure， bursts variation， instantaneous variations， whatever。

it all washes out in the average。 Okay， so if you know what lambda is。

you know what the time to do a request is that you can。

figure out how many jobs are sitting in the queue。 So this is going to be interesting here。

So the question， by the way， from the previous， why does memoryless imply C equal one？

You just have to plug it in。 Okay， because sigma squared over M squared for memoryless is one。 Okay。

so here's a simple example。 Look， if the latency for us to get through the system is L equal five and there's an item。

arriving every one second， then it's pretty easy to see that there are five jobs at a。

time in the system。 Anybody want to disagree with that？ Seems pretty simple， right？

If they're arriving one a second， it takes you five to get in， then there's on average， five jobs。

Here's how you think about that。 You go to McDonald's。

you're getting a lot of free advertising for me today。 I don't even like McDonald's， but anyway。

you go， you come to the door and you look in， the McDonald's and there's a bunch of people in line in front of you。

Okay？ And you walk to the counter and by the time you get to the counter， if you turn around。

and look behind you， there's the same number of people in line。 Okay？

That's what it means to be steady state。 All right？

And so that's why this equation works because if you look at the rate that people are coming， in。

all right， and you look how long you took and you turn around and you look， you know。

that in that time that you were going through the line， people were coming in at that rate。

You take the time you were there times the rate that gives you how many people are in， line。 Okay？

Very simple。 Now， that's little's law。 And here's a very quick sketch。 You ready？ One， two， three。

put your put your proof sketch hat on。 If you look at a bunch of items。

each of which takes L time and they're varying all over。

the place and there's T time that we're looking at the system。

So we're going to average over capital T。 How do we know what the average number of jobs are in the system？

Okay？ It's very easy。 We say， well， here is the number of jobs at this one slice of time。

How do I know that？ There's one， two， three， four of them。

And what I want is what's the average number of jobs in the system？

So can anybody think how I do this？ Yeah， very good。 Area。 We compute the area。

So what we're going to do is make each one of these stripes equal to size one。

And if we compute the area and divide by the time， we're going to figure out the average。

number of folks in here。 Okay？ Exactly。 So here we go。

We're going to say the system area S at any given time is L i times one， okay， which。

is that height。 Okay？ So L i is the length。 The height is one that gives us the area at that point。

And so we'll just add up all the areas throughout here。 That's the sum of all areas。

And then that turns out is just L of one， L two， L three， L four， which are the latencies。

And so now we take the total area， we divide by T and that gives us the average number。

of people in the queue。 Good。 And so if you look here。

the average number of people in the queue is S over T。 We can compute that out。

That's the sum of all L's over T。 Or if we take the total number of jobs。

we're going after divided by T。 That's the average number of jobs times the L's divided by N total。

That's the average latency。 And that gives us the number。

the average number of people in the queue is equal to， lambda average times length average。 Okay？

Little's law。 Something to remember， it might actually show up on midterm three。 I suspect it might。

But it's universal and think of it as this is the McDonald's law。 You look in the door。

you have a rate of people coming in， you look when you get to the counter， you look back。

That tells you how many people are in line。 Okay， any questions？ All right。

So now when you apply it to a queue， what do you got？

If you know the average time waiting in the queue and you know the average arrival rate。

you just melt climb together and that gives you the average length of the queue。 So basically。

this is going to be universally useful for going from the time you've been。

in the queue to the average length of the queue。 Okay？ All right。 So here we go。

We'll do all of this and we'll take a brief break。 So here's a。

here's your little bit of queuing theory that we expect you'll know something， about。 Because one。

systems in equilibrium， no limit to the queue。 Two。

the time between successive arrivals is random and memoryless。 So we're going to have a。

a rival rate， which is memoryless。 Okay？ So the arrival rate is lambda， the service rate。

which is how long it takes the server， to go can be an arbitrarily complex thing。

So we're not going to make that memoryless。 And you can think why that might be， right？

If it's a DRAM， you go to the DRAM， it probably takes a pretty deterministic amount of time。

to do a read from it。 Okay？ So deterministic service times for servers are not out of the question。

but the arrival， rates， things are typically coming up in bursts。 Okay？ So here's the parameters。

So lambda， the service rate T， CER， and C， which is the squared coefficient， those are。

things that are kind of like independent variables。 And if you know them。

you can derive a bunch of other things like mu， which is the service， rate is one over T， CER。 Okay？

That's just the average number of things the server can do per unit time。

The utilization of the server is just lambda over mu， like we said before， which also comes。

out to lambda times T， CER。 What do we know？ We know that the utilization's got to be between zero or one or we're in trouble。

If we're greater than one， we know the queue is going to grow without bound。

So any stable system always has u less than one。 And so here are our results。

Now in times past a number of years ago， I used to derive these for you。

At least the first one is very easy to derive。 I won't subject you to that。 But if you notice here。

the time in the queue， T， Q， is equal to the service time， the average， service time。

times u over one minus u。 Okay？ Now the general service time where we have a generalized service here。

not just memory， lists， looks exactly the same except that there's this one half。

one plus C factor here。 So notice by the way， if C equals one。

one plus one is two divided by two is one。 This middle thing factors out if we have C equal one。

Okay？ So the memory list service distribution is just the special case of the general one。

And the way we talk about this is an mm1 queue。 It's memory list input memory list output one queue。

The bottom one is a memory list input general output。 So it could be any distribution one server。

Okay？ And so if you look， the question is， why does the response delay grow unboundedly even though。

utilization is less than one？ Well if you notice the closer and closer you get to one with you。

what happens， we blow， up here。 See that？ So the closer you get to one。

the more we're getting kind of on this part of the curve。 Now why is that？

Anybody want a hazard of guess？ Yeah。 So first of all， right。

we're assuming the queue is of infinite size because otherwise， this math doesn't work out， right？

Okay， yep。 Why does it go up like this？ Let me show you。

So in fact actually I'll show you in a second。 Let's take a brief break。 We'll come back。

I'll show you why it goes up。 All right。 Stand up， shake it out。

This is a rough lecture because there's a lot of mathematics in it。

So the question that was on the table was， what's causing this behavior？

Now you can derive these equations given some reasonable assumptions and you happen to get。

this u over 1 minus u factor。 So you could say if I were being particularly annoying that。

well it's just in the math， right？ The math causes it to go like that， end of story。 Okay。

but that's somehow a little unsatisfying， right？ I mean， I would find a little unsatisfying。

So let's see if we can do something else。 Okay， so let's see if I can get this to advance again。

Hello。 Where is this locked up？ Okay， so if we build this where the maximum utilization is at one point here and here is。

the number， this is the rate at which things are being served on the y-axis， then what we。

can see is as we raise our request rate， this green curve kind of represents maybe what。

we get out of the server。 Okay， and there might be a little bit of overhead。

That would prevent things from going forward。 So that， you know。

so we know for instance that in this point we can't do anything actually。

greater than mu max because we're utilization is greater than one that's bad。

If we look at this queuing that we just came up with， why does the latency blow up？

Because the queue blows up on every birth。 So the fact that we have a memoryless input says there's randomness on the input that's。

going to cause burstiness which is going to cause the queue to build up。 Okay。

and so if you look at it， we really have something latency wise that looks like。

this and latency is really what matters to us because it's how long do I have to wait。

from when I submit a request to get the response back。

And we know that this curve that we're talking about here has proportionality of utilization。

over one minus utilization and so it's going to blow up as we get closer and closer to this， point。

Okay， so the half power point here says that we may want to try to find a point in which。

about half of the systems use so the utilization is about 50% because that'll be a nice medium。

between keeping things busy and not getting this latency effect。 Okay。

so that notion we talked about half power point is useful from earlier。

But let's look at why do we get an unbounded response time。 So here we go。

Remember that if we have determinism in the arrival， then we can actually lay things out， like this。

Oops。 And we can use up 100% of our service time because we do the first service and then the。

second service and the third service and things arrive exactly at the right rate to be used。

by the disk or whatever。 So there's never any slop here， it's perfect。 Something arrives。

we process it。 The next thing arrives， we process it。 We effectively keep the queue empty。 Okay。

now what did I say about generalized processes where a bunch of stuff is arriving？

It's probably not deterministic。 So let's look at what happens if we add some burstiness here。 Okay。

so now we have a stochastic or bursty arrival process and now what happens is things。

arrive in a burst， here they are， did it look， and then they get processed in order by the。

service and then there might be a gap and then they arrive in a burst and we get some。

processing and so on。 But notice if we go for the same arrival rate on average we did earlier。

then if we have， a bunch of bursts， it does mean that we have these blank spots。 Okay。

in order to make sure the average catches up， but it turns out what happens here is。

this wasted time never gets reclaimable with bursts。 Okay。

you burst things grow up and as you're trying to empty things out， there's never。

a long enough period of time for things to fully empty and that's kind of why we get。

this curve going up。 So the moment we start adding burstiness。

it's like we have a bunch of them arrive and， they start getting processed and then there might be a few long ones and then a bunch。

of them arrive and the net effect is that things just add up。 Okay。

And that's kind of an intuition of what's happening here。 Okay， so let's give you a little example。

So let's suppose that the user has a request of 10， 8 kilobyte disk I/Os per second。

So 10 per second is the request rate。 Suppose that requests and service are exponentially distributed。

So that means first and foremost that C is one and we use a simpler equation and now。

assume that the average service time is 20 milliseconds。 Okay。 Now。

what if we want to ask some questions like what's how utilized is the disk？

But what we're going to do is we're going to compute the utilization which it turns。

out is lambda times the service time。 And once we've got that。

then we can start asking questions like what's the average time， spent in the queue。

what's the average length of the queue。 And this last one which I think you should look at carefully is interesting is what's。

the average response time overall for the disk request。

And what that means is it's the time you spend in the queue plus the time it takes to get。

your service done， right？ So it's two pieces。 So T system is the queue time plus the service time。

And so now you can imagine a computation like this showing up somewhere that you might run。

into in a few weeks。 But for instance， you might say what's lambda lambda is 10 per second。

Why is that it's 10 I/O per second？ Okay。 What's the service time？

This is the average time to service a customer 20 milliseconds。 Okay。

Because it says it's 20 milliseconds。 And that's including a bunch of things like the controller plus C plus rotation plus transfer。

And you could imagine that the way you got this service time was you did a computation。

based on other things。 But fortunately for this slide I gave it to you 20 milliseconds， right？

And 20 milliseconds is how many seconds？ 0。02。 You guys are all very good at that translation now because of that magic slide from last week。

And so now we could say what's the utilization of the server？ Well， it's lambda times T server。

which is 10 per second times 0。02 seconds， which is， 0。2。

So notice that the 1 over seconds and the seconds cancel。 This is very helpful。

You learned this in high school chemistry as well that you always cancel the units。

If your units don't cancel， you got a problem。 Okay。 And notice that our utilization is 0。2。

Is that bigger than 1？ Good。 People are paying attention。 I'm glad。 So no， 0。2 is not bigger than 1。

So now what's the time in the queue？ Well we use this version of the equation T server times u over 1 minus u because。

well， that's the memory， the MM1， Q memoryless in memoryless out one server。 Okay。

And so we just plug it in。 Service time is 20 times 0。2 over 1 minus 0。2。

That's the queuing equation gives us 20 times 0。25， which says 5 milliseconds。

What is that 5 milliseconds at 5 milliseconds at the time you spend in the queue？

Now notice because of the parameters we've got here， that's only a fraction of the total。

disk service time。 That's good。 It means we're not building our queue up。 Okay。

If we had different numbers here， such that this utilization got like 0。8 or 0。9， this。

would be considerably longer time spent in the queue。 Okay。

And that's the point at which it builds up and you got problems。 But let's finish this example。

So now if I'm a user and I put a request into this queue， how long till I get my answer？ Well。

first of all， how long is the queue？ The queue is lambda times TQ。 There's little's law gives us 0。

05 items in the queue on average。 So the queue is really not a function here， right？

But if we look at the system time total， it's TQ plus T， sir， and it says that when I take。

queuing into account， rather than it taking 20 milliseconds， it actually takes 25 milliseconds。

So there's a little extra time in the queue。 All right。

Now this is an exercise to you guys if you were to try adjusting this so that lambda。

was faster and so that this 0。2 was closer to 1， you would see this number could go arbitrarily。

large。 In fact， it could be easy to come up with 100 milliseconds or 200 milliseconds purely because。

of the queuing time having nothing to do with the fact that the disk always takes 20 milliseconds。

Okay。 And so the reason I go， I subject you guys to learning a little bit about queuing theory。

is I want you to realize that the queue can become easily the most significant contributor。

to latency easily if you're running too close to that 100% utilization point。

So whenever you're doing a back of the envelope calculation， you might try to figure out what。

is what is the utilization of one mean in terms of whatever operations per second， how。

close am I to one？ Okay。 And I would say the best advice you can get out of this class as being engineers in general。

is you never run anything at 100%。 If you're running anything at 100%， something's going to break。

Right。 It's much better to run at that half power point or maybe a little bit larger than that。

but never get close to 100%。 Either in the weight capacity of a bridge， that's a bad idea。

or the number of requests。

per unit time you're trying to get out of a system。 Okay。 Now。

there's a bunch of resources we have on the resources page you can take a look at。

that have some things from the Hennissian Patterson book and so on。

And you should definitely assume that back of the envelope queuing theory like we've been。

talking here is fair game for mid-term three。 Okay。

But now that we know what the queue could do to us， we can start asking questions about。

how do we optimize IO performance。 Well， response time is really queue plus other stuff。

And so how do we improve performance？ Well， we could make everything faster。 Okay。

I have a little smiley face there， but that's not a bad solution if we let you do that right。

on the mid-term。 Maybe we don't let you do that。 But always consider if something's a bottleneck。

what did we talk about scheduling？ Scheduling only matters if there's not enough of something。

right？ Queuing theory gets in the way when there's not enough of something。

So always consider maybe making things faster。 The other is maybe more parallel。 Okay。

If I have a bunch of disks and they're spread out， then maybe I could send requests to all。

of the disks and now everything is faster by parallelism。 Okay。

And we're going to actually toward the end of the term， we will be talking about distributed。

systems and parallelism that you can get out of things that are spread through many servers。

in the network。 That could be a way of improving queuing time by spreading things out for many queues。

Okay。 Now， we could optimize the bottleneck to increase the service rate。 Okay。

So there's maybe we'd get a faster controller or slightly faster device here。

We could accept the queue and do something else。 We compute a few more digits of pi while we're waiting for a response。

So there are other things we can do。 But queues are useful。 Okay。

If the one response to this could be， well， this is a problem that's removed the queue。

Who needs a queue？ Okay。 But remember， the reason the queue was there in the first place was to absorb bursts because。

bursts are the real world。 And if you have a queue that can absorb some bursts。

then the things that are generating， the requests don't have to wait around just to put their thing on the queue。

So that's why we have a queue。 So queues are very important to smoothing the overall behavior of the system as a whole。

So you can't really get rid of queues， but you need to know they're there。 All right。 Now。

and then for finite queues， obviously， you need to do some pushback on people putting。

stuff on the queue。 So that's admission control。 So when is the disk performance the highest？ Okay。

Thinking back for a moment。 When there are big sequential reads or there's so much work to do that they can be piggybacked。

so you can reorder queues。 It's okay to be inefficient if things are mostly idle。

So if you don't have so many things that your queues are filling up， maybe you're less， efficient。

It's only when your queues are filling up that it's very important to be extremely efficient。

at that disk。 Okay。 So bursts are a threat and an opportunity。

They're a threat because they cause the queue to grow。

They're an opportunity because they'll let you guarantee there's always something there。

to you to satisfy and therefore you can keep the system busy。 That could actually be a good thing。

right？ Because when you're in this， I should say that before we lose this whole thought process。

When I'm in this point here， where I'm close to 100% utilization and the average latency。

is really high， what's happening with the disk？ If you just look at it from the standpoint of this queue。

it looks bad。 But what's happening with the disk？ It's working， right？

The disk is efficiently doing stuff all the time。 That could actually be a good thing， right？

If you put in an expensive resource， you want to keep it busy。

So you've got to be able to go back and forth between the latency of any one request grows。

without bound。 But the thing was when the queue is full， we know the expensive resource is busy。

So keep that in mind。 Okay。 So this is why we talk these through with you。

And so other opportunities， which we'll talk a little bit about toward the end of the term。

and in the next couple of lectures too， is we could use user level device drivers to make。

things faster。 We could reduce the impact of I/O delays by doing other useful work。

These are all about making overhead smaller， doing something else when you have to wait， for things。

These are all， you know these things because we've been talking about them all term。

One process goes， or one thread goes to sleep， but another one works。

And now today you learned about reducing the overhead as well。 Okay。

So I'm going to pick up with disk scheduling next time。 But let's in conclusion。

I'll let you guys go， disk performance is really queuing time。

plus controller and then plus seek plus rotational plus transfer time， those five elements。

And remember the rotational latency， the time to get to the or sector once you've gotten。

to the right track is on average half a rotation on average， right？

And then the transfer time is a disk spec， excuse me。

We talked about complex interactions with the queue and so for hard disk drives， you've。

got queuing time plus controller plus seek plus rotation plus transfer。

SSDs are simpler because you just the controller time plus the transfer time。

We talked about systems being designed to optimize performance and reliability and bursts。

and high utilization give you lots of queuing delays。 So we introduced the queuing latency equation。

And the thing to keep in mind is we gave you two equations， but they're really the same， equation。

If you look at what we've got here， if you set C to one， this thing collapses down to。

the MM1 queue。 If you set C to something non one， then this is the MG1 queue。 Okay。

And so with that， I'm going to say have a great rest of your Tuesday and we'll see you， on Thursday。

All right。 [ Silence ]。

P2：Lecture 2： Four Fundamental OS Concepts - RubatoTheEmber - BV1L541117gr

Okay everybody， welcome back to 162。 I'm going to be giving lecture two， finishing some of。

the things we were starting last time。 And if you remember last time we talked about。

operating systems pretty much in general and we asked ourselves what it was exactly that。

an operating system was。 And I tried to indicate to you that it's， there are lots of different。

operating systems and different people would disagree with each other on this。 But these。

three functions of referee， illusionist and glue are pretty common across a wide variety。

of operating systems and many of them have all three。 Okay， where the referee is actually。

managing resources， the illusionist is providing that illusion of infinite memory and perfect。

hardware resources。 And the glue consists of a whole series of common services like file， systems。

etc。 that are there to help make programming the machine much better。 We also。

talked， started talking about protection in general， that's going to be one of our first。

major detail topics but we're going to touch on it lightly today。 And here what I show。

is that there's the hardware underneath， the operating system up above is basically providing。

that virtual machine view to the processes。 Okay， and the processes which we're going。

to talk in more detail today and in additional detail as we go on are really these virtual。

containers that have a view of perfect hardware underneath them and they think that they have。

the whole machine。 So here I have a brown and a green process， the brown process thinks。

it has all of the memory， all of the file system， all of the sockets and threads， the。

green one thinks similarly and it's up the operating system to really basically provide。

that illusion。 And in terms of protection， of course the important part here is for instance。

this green process， while it's running， could attempt to access the memory of the brown， process。

it could attempt to access OS memory， it could attempt to access parts of the storage。

that it's not supposed to。 And in all of those cases， what ends up happening is the operating。

system essentially stops that from happening and then causes a segmentation fault and basically。

boots the process out。 So that's the protection piece。 And we're going to talk a lot about。

that as we go on the next couple of weeks。 And there'll be many different ways to do that。

kind of protection， but I'll show you a fairly simple first thoughts at that today even。

The thing that we didn't quite get to last time and I wanted to mention now is really。

the complexity of all of this。 So you saw that picture of the world as a single machine， you know。

single huge computer that I showed last time。 And that's a lot of hardware that。

somehow has to be tamed。 And if you look at applications， they really have a variety。

of software modules， they run on a bunch of different devices or machines， they implement。

different hardware architectures， they run competing applications， they fail in unexpected， ways。

they might be under attack。 And really that complexity of both what the applications。

are trying to do and all the underlying hardware is tremendous。 Okay， and it's not feasible。

to test all the combinations。 I mean， how could you possibly test an application against。

the machine with a one terabyte SSD and a two terabyte spinning storage and six gigabytes。

of memory and a hundred gigabytes of memory。 You just can't do that and all combinations。

are just not possible。 And so we're really going are going to have to figure out how。

to design things correctly from the beginning。 And， you know， let's accept it now。 It's not。

a question about whether they're bugs or not。 There will always be bugs。 It's a question。

about how serious they are and， you know， what type of bugs they are。 And we're going to。

try to do bug management in a sense as well as we go out throughout the term。 And one of。

the things that leads to complexity is parallelism。 And this is why， of course， we're going to。

spend a bunch of time talking about synchronization primitives in a couple of weeks。 But what I。

wanted to mention， here's a good example from 2027。 The Intel Skylake has cores that can。

have 28 cores。 There's even a 56 core version up to 56 threads。 So there's two threads per。

core and lots of different security instructions and graphics instructions。 There are 28 megabytes。

of cache at the L2 level。 There's 38。5 megabytes of cache at the L3 level。 Directory-based cache。

coherence。 There's lots of different types of networks， including a mesh network on chip。

and fast off-chip networks， DRAM connections up to 1。5 terabytes。 And so this is complex。

even of itself。 And this is like a single node。 So when we tie all that together， things。

get really interesting， right？ But parallelism is fundamental these days。 And in addition to。

that parallelism， a modern chip typically has a chipset that goes with it。 So what you。

see here at the top is an example of an Intel chip family processor with a bunch of cores， on it。

It has direct connections to memory and PCI express for high-speed communication。

And then there's typically a direct media interface connection to the chip sets underneath。

And the chip set potentially handles all the other interesting aisle。 So from the standpoint。

of the processor， we have high bandwidth memory channels。 We've got really high-speed IO for。

graphics。 We've got this direct media interface down to this secondary chipset used to be。

called the Southbridge， for instance。 But not anymore。 But anyway， off of that， we have。

PCI Express。 We've got SATA for disk。 We've got USB for other types of IO。 We have Ethernet， IO。

PCIe， RAID， et cetera。 All sorts of really interesting things all tied into that one CPU。

So if you look at， you know， this itself is interesting。 This is very interesting。 Lots。

of complexity。 And， you know， I like this graph。 You guys should all take a look。 You。

can go to information is beautiful。net/visualization/millionlines of code and take a look at some of the code。

counts over the years。 Okay。 And what's interesting here is if you look at a newer version of something。

so like Linux 2。2 versus Linux 3。1， it's always bigger。 So there's a lot more software for each。

next generation。 Right。 And so more and more memory， more and more complexity with each， generation。

A car is starting to get very complicated。 100 million lines of code。 Actually。

it would be interesting to see what a Tesla is like。 And so， you know， that complexity。

leaks in to the operating system if you don't design it correctly。 And you get blue screens。

Okay。 Mouse base pairs。 The question is what's a mouse base pair？ So this is actually the。

DNA of a mouse。 Okay。 So we're not getting all that far away。 Okay。 From that complexity。

The third party device drivers， which are the parts of an operating system that access。

the outside world， are some of the most unreliable parts of operating systems。 Okay。 And the reason。

for that is they're not written well， necessarily they're written quickly to support a new device。

and they're not written by Microsoft or Apple or whoever your operating system comes from。

And as a result， they tend to be the things that crash the system。 Okay。 And basically。

there are clean interfaces from the operating system to the device。 And that's an attempt。

to provide a clean interface so that third parties can write this type of code。 And that。

ironically can lead to more crashes under some of circumstances。 And we're going to spend。

a lot of time talking about device drivers later in the term。 And there's all sorts of， holes。 Okay。

See， if you don't have enough complexity from everything that's working properly。

now you've got security holes。 Okay。 And a great example from 2017 was the infamous meltdown。

bug where it was discovered that despite all of the protection in the hardware and the。

proper use in the operating system of that hardware， you ended up with the ability for。

user level programs to drain secure information out of the kernel。 Okay。 And so if you think。

about that， it's like， well， I did everything I could。 And there was this weird hole in the。

hardware that nobody knew about。 And even surprised a bunch of famous computer architects。

like Dave Patterson and John Hennessy。 So complexity is always there。 And at best， we。

have to manage it。 Okay。 And then things like versions queue on libraries can lead to all。

sorts of problems。 Okay。 Data breaches， denial of service attacks， timing channels。 There。

was the heart bleed SSL bug， so on。 So all of these exploits are there if you're not careful。 Okay。

And so， and I see a comment in the chat that it was a really cool exploit。 And yes。

Altong was cool as an exploit。 And we'll actually talk about that a little later in the term。

So back to what the OS is supposed to do。 So we have all of this complexity。 And really。

the OS is out there trying to tame the underlying hardware and provide a clean virtual machine。

abstraction。 And so here's the hardware underneath。 We have the physical machine interfaces， which。

are， you know， they're what they're are， their buses， their interfaces to disks。 We have an。

operating system on top of it， which then turns all of these imperfect physical interfaces。

or limited physical interfaces into a nice clean programming abstraction so that the。

abstract machine interface up top can be used by applications。 Okay。 And I gave you the。

simplest one to think about is this this illusion of infinite memory。 So maybe you only have。

16 gigabytes of memory on your laptop， but the operating system gives you the illusion。

that's that there's a lot more memory。 And it does that through a paging and various other。

virtual memory techniques， which we will talk about。 But that's the function of the operating。

system is provide this virtual or abstract machine interface that's more perfect than， the hardware。

Okay。 So the processor underlying becomes a clean thread。 The memory underlying。

becomes a clean address space。 The disks at SSDs， which are just block based storage become， files。

Okay。 Networks lead to sockets， which give you sort of the ability to have a stream。

that's sent perfectly from one part of the world to another machines become processes。 Okay。

And so all sorts of interesting things here， but this is basically the OS as an illusionist。

to remove the software hardware quirks and give us a better， a better abstraction。 Okay。

And you pick any OS area and there are many of them and we're going to have an interesting。

sample this term like file systems， virtual memory， networking， scheduling。 And you can。

ask the question of what's the hardware interface that we need to handle。 That's the physical。

reality。 And what's the software interface we want to provide？ And that's the nicer abstraction。

And so we will we will play with that hardware interface versus natural abstraction idea throughout。

the term。 Okay。 So today we have four fundamental OS concepts， which we want to get across just。

to dive in and start going。 Okay。 One of them is this idea of a thread。 And a thread is。

a virtual execution context fully describes program state of an executing program。 It's。

got the counter。 It's got registers。 It's got execution flags。 It's got stack。 Okay。 And。

this thread， however， is a virtual entity as you'll see in a bit。 This is not necessarily。

running on a CPU at all times and it's not even running necessarily on the same CPU。

So this thread is an entity of itself。 Okay。 And hopefully we'll get to where we'll be thinking。

in terms of threads rather than CPUs。 And we have some interesting discussion on on。

the Piazza actually after lecture one about that idea。 We'll get some there of that today。

So another idea which we're going to want to talk about is the address space with or。

without translation。 And an address space essentially the set of memory addresses that。

the program sees for reading and writing。 And it may actually be distinct from the physical。

machine。 So once again， the address space is a virtualized idea。 And a third thing is。

now going to be a process。 And so a process is a combination of a protected address space。

and one or more threads。 Okay。 And so a process is really this executing instance of a program。

in its own protected environment with multiple potentially things running。 Okay。 And then。

finally we're going to introduce some hardware。 And this hardware idea is dual mode operation。

which leads to protection and that dual mode operation is really that there are certain。

things that can only be done by the system。 Okay。 And so the way we distinguish that running。

in the system versus not is going to be a bit at least possibly more bits but at least。

one bit in the hardware that says whether we're in system mode or user mode。 Okay。 And。

we're going to show you more about how that works。 But the simple idea there is that when。

you're in system mode， the hardware will allow more access to things than when you're。

in user mode and that will lead us to be able to provide a nice clean virtual abstraction。

We call that dual mode because there's two modes there。 Okay。 Now， so let's look at the。

bottom line。 What's the bottom line？ Well， the OS is helping us run programs。 That's。

our important aspect here， right？ And so here's， you know， here's Joe。 He's typing away。 And。

he comes up with a program。 He's going to type a bunch of stuff into his editor。 And。

then the compiler is actually going to produce a binary version of these instructions that。

is going to be executable on a CPU。 And it's going to have data。 It's going to have instructions。

in binary mode。 And typically it's in a file called a。out， which is the result of the compiler。

And potentially the linker。 Okay。 And once that exists and that might sit on your sit on， your disk。

then when we want to execute it， we take this executable and it gets loaded， into memory。

And that's the point at which it becomes executing。 Okay。 So a program is。

the potential for execution。 Once it gets loaded into a process， it becomes an executing， process。

Okay。 And we can have many instances of the same executable running at the same。

time in different processes。 And you'll see that as we go along as well。 But if you notice， here。

here's a typical address space for a process where the address space is kind of， remember。

I said it was all the addresses that can be accessed。 And it's sort of from， zero up to FFFF。

Typically， there are instructions at the low part of memory， the low addresses。

and then data on top of that， heap， etc。 And then at the top， there's the OS space， which。

is protected。 And then we grow down from there for our stack。 Now， there's a question about。

where the linker is。 I haven't shown you here， but this， think of this as a combination compiling。

and linking to produce the final executable。 And the linker is really taking individual。

things that you compiled plus some libraries and putting them together into a single entity。

And we'll see a lot more of that as we go on。 All right。 So once we've loaded things。

into memory and we're in a process， we create the stack and the heap。 So the stack is the。

grows down to give us the ability to have recursive processes。 And the heap grows up。

for allocation like memory。 And so as a result， we will basically have a completely executing。

process。 Okay。 And how does it become executing？ Well， we load the program counter。 And we。

are in the processor to point at a starting instruction in the process。 And then we tell。

the processor to go。 And at that point， it will start executing。 And we typically don't。

do this in system mode。 We make sure that at the time we say go， it's in user mode。 And。

as a result， this will be a nice protected entity。 And this notion of when we're in user， mode。

when we're in system mode， just don't worry about the exact details because we'll。

give you more and more details as we go on。 We're trying to get the high level idea here。

So are there questions？ So how much memory is allocated per process？ It's a good question。

So the answer is the bare minimum on most and most good operating systems only allocates。

as little as it needs。 So it basically allocates instructions in data。 There's typically no。

actual DRAM allocated for heap or stack。 But the address space is here。 Okay。 So that's。

where you got to start thinking about the virtual versus the physical。 So physically。

we don't get much memory when we start up。 Virtually， we have all of these addresses。

And what will happen is as the program tries to use parts of the address space that aren't。

backed by real memory， they'll be trapped into the operating system and the operating。

system will allocate some more memory for it and then return to the user。 And so therefore。

with a modern operating system， we can start by giving processes as little as we need to。

start with。 And then it'll automatically adapt as the program starts running。 All right。

And of course， the OS once everything is running will provide services through the notion of。

system calls， which we'll talk about a little bit later。 Okay。 Good。 Any other questions？

And Anthony is actually giving some good details about linkage there。 So there's static。

linking and dynamic linking。 We'll talk about those also later。 Okay。 So now let's pull back。

out of the depth of your memory from 61 C。 So if you remember what's inside a processor。

So system calls， by the way， are that transfer of control into the kernel， a controlled transfer。

And again， we will talk more about those as we go on。 So if you look here， we typically。

have a program counter， okay， which is a register inside the processor。

And then we have instructions， in memory， which is in the address space of the process and data as well。

And the program， counter points at the next instruction to execute。

And it's going to be up to the actual processor， to pull that instruction out of memory and decode it and decide what to do with it。

Okay。 And so for instance， once we've fetched the instruction， then it'll be decoded。 And that。

decoded instruction will then work on the data path that might pull things out of registers。

and it tells the ALU to do a multiply or whatever。 And the results might be stored into memory。

or memory might be pulled out， data might be pulled out of memory。 And then we go and execute。

the next instruction。 Okay。 And so this is the continuous loop of fetch， decode， execute。

memory right back。 Okay。 And this hopefully will remind you a little bit of 61 C。 There。

is a question here about what happens if there's a memory safety violation in a program。 So。

we're going to need to get much more detail about exactly what that means。 But I showed。

you in a couple of previous slides that idea where the green process tried to access memory。

of the brown process。 And that memory access violation was flagged with a segmentation fault。

And the green process was dumped。 Okay。 And that'll be one of many responses that we can， get。 So。

you know， and basically the processor is going to walk the PC through a series of。

instructions as the execution occurs。 And that's how we get a program to run。 Okay。

Now the first concept here that I mentioned was the thread of control。 And a thread is。

really a single unique execution context consisting of a program counter registers， execution。

flags， a stack， memory state。 Okay。 And you need to think of this as a virtual version， of this。

So the 61 C idea was there's this processor thing and it's executing instructions。 Okay。

But excuse me， just having that processor thing executing instructions is too low level。

to build a modern environment on top of it。 And so instead we're going to virtualize that。

idea with a thread。 So a thread is something that has its program counter。 So it knows。

where is next instructions coming from。 But because it's virtual， it could actually be。

unloaded from the physical processor for a little while and then loaded back and start。

executing again。 And so the thread maintains its identity even when it doesn't have the。

view of the CPU。 And that's going to be helpful for us。 Okay。 And a thread is basically executing。

on a processor or core when it's resident。 Okay。 When it's in the processor registers。

Now by the way， let me clear one thing up here for the first several weeks。 We are going。

to be talking about processors or cores independently。

And we're going to be thinking about machines， with exactly one core or processor for now。 Okay。

So don't worry about multi core。 Okay。 We want to understand single core first。

And so I'm going to use the word processor。 I might， use the word core。

These are going to be essentially interchangeable in the next several weeks。 Okay。 But anyway。

the thread is actually executing when it's running in the processor registers。

So if you look back here， you could say， well， this could be thought of as a thread that's。

running right now because it's program counters in the real program counter of the processor。 Okay。

And what resident really means here is the registers hold all the state， the root。

state or context of the thread。 The registers have the program counter loaded。 It's currently。

executing instructions from there。 The program counter points at the next instruction in memory。

All the instructions are stored in memory so that as the processor or core starts executing。

and pulling instructions， it can pull them out of memory。 Okay。 So that's those instructions。

we want actually are in memory。 And it includes intermediate values for ongoing computations。

in the actual registers。 So we might have added two things together and the result is。

in a register or there are pointers to places in memory where the results are going。 And。

the stack pointer in the physical hardware is actually holding the address of the top。

of stack for the thread， which is in memory。 And everything else is in memory。 So this is。

an executing thread。 And so if you want to think about 61C again for a moment， an executing。

thread or a loaded thread or a resident thread is an example of something running like you。

thought of in 61C。 Okay。 A thread is suspended or not executing when。

its state isn't in the processor。 Okay。 And so this is like if you took the thing that。

from 61C and you just unloaded it all and put it aside in memory somewhere and what we'll。

call a thread control block， it's still a thread。 It's just not running。 Okay。 And at， that point。

something else is running。 Okay。 So the processor state is pointing somewhere。

else at a different thread。 Okay。 Is that clear to everybody here？ Okay。 So again， back。

to your 61C。 So we are going to keep this virtual machine idea in our brain。 I just wanted。

to show this other slide of here's a processor or a core。 We're going to do the fetch execute， loop。

We got a bunch of registers loaded。 The execution sequence fetches the instruction， at the PC。

decodes it， executes it， writes the results back and goes to the next instruction。

So for end repeat， this is a wash and repeat kind of scenario。 So for instance， if the program。

counter here is pointing at instruction zero， we'll execute it， then we'll go on to instruction。

one， we'll go on to instruction two， three， four。 Okay。 And keep in mind， so there's a。

question here in this class is core， the processor or execution unit within the processor。 You。

know， I know Anthony gave one answer to that。 Let's not be too confused about that for now。

So processor could be a bunch of cores and that's sometimes an entity the way they work， at it。

We're going to talk about just a processor is executing only one core and only having， one core。

So there's no confusion there for now。 Okay。 So just think whenever I say core， processor。

I'm thinking about something with exactly one execution pipeline in it。 Okay。

And the other thing I wanted to say here， and don't we're getting in depth here about。

suspending here and I don't want to quite get there yet。 Anthony can answer that if he likes。

But let's hold that question there for a couple of weeks。 Okay， make a lot more sense。

So here's our execution sequence。 The PC is busy executing。 There's a set of registers。

that represent the thread and are loaded at any given time。 And this is the view， for， instance。

from 61 C， which is like a simple risk processor where there's a set of registers。

that are very straightforward and there's many of them。 If you look， that's kind of like。

risk five。 Okay。 So we have a bunch of registers， some of which are the next program counter。

some of which are the stack pointer and so on。 And the set of instructions that can be。

run are simple。 There are ads and subtracts and so on。 This class， we're going to bite。

off the complexity of a modern machine that's more likely to be in your laptop。 Okay。 Like， an x86。

And so if you notice， some interesting things about the x86 is it's got a lot more。

different types of registers and a lot fewer of them。 Okay。 So there are registers for segments。

There are registers for control and tags and so on。 There's a bunch of other registers。

And so we're going to we're going to look at some of these going on because the environment。

in which the PintoOS operating system that we're using in this class operates is an x86。

environment。 Okay。 And section is going to cover the architecture a lot more detail。 Okay。

And I will too as time goes on。 But what I want to focus on right now， so let's pull。

everybody's mind back to this， is how can we possibly have a single core or a single。

processor but have many threads that look like they're running at the same time。 Okay。

And we're going to get this view of a bunch of virtual CPUs or threads all sharing the， same memory。

And this is going to be the programmer's view of a process。 Okay。 So assume， a single core。 Now。

the question is how do we provide this illusion？ Well， we multiplex。

And so think of threads as virtual cores。 And here's time moving to the right。 And what。

we're going to do is we're going to run virtual CPU one for a little while。 And then number。

two for a little while， number three for a little while。 And then we're going to go back。

to number one and so on。 And we're going to basically load the contents of a thread。

into a processor。 We're going to run it for a little while， then we're going to unload， it。

We're going to load the next one。 And we're just going to keep doing this over and over， again。

And if we do this with a fine enough granularity， then essentially it's going to。

appear that these are all running simultaneously， even though only one of them is actually running。

at any given time。 Okay。 So the contents of a virtual core here is what we've been calling。

a thread。 Program counter， stack pointer， registers， etc。 Where is the thread？ Well， it's。

on the real physical core， like here during magenta time， CPU one or thread one is actually。

loaded in the physical core or saved in a chunk of memory when it's not running called。

the thread control block。 Okay。 And it's going to be up to the operating system that's doing。

the scheduling to be swapping these guys in and out over and over again in order to give。

us that illusion。 Okay。 And I'm going to just， I'm repeating this multiple times just to。

make sure we're all on the same page， because this is fairly simple。 But if you don't quite。

catch what's going on here， then what we do afterwards is going to be confusing。 So any。

other questions？ So is good。 The great question in the slack here is the execution time for。

each thread the same。 And the answer is， doesn't have to be。 In fact， we're going to have a。

whole unit on schedulers， where we're going to vary the amount of time a thread gets based。

on priorities， or we're going to vary it based on real time requirements。 So the simplest。

thing you could do is just give every thread the same amount of time。 But that's absolutely。

not required。 And just to give you a good idea why that might not even work is it's possible。

that CPU one runs for a little while， and then it has to talk to a disk or network， which。

is going to take， you know， milliseconds or seconds。 It can't run。 And so it makes no。

sense to waste the CPU waiting for all that stuff externally。 So that will be a good reason。

to switch to another one。 Okay。 And what does it mean for the CPU to be idle？ What it means。

is there are no threads that are runnable right now because they're all waiting for， idle。 Okay。

that would be a good example of an idle CPU。 Okay， does that make sense？ Now， I'm assuming here。

by the way， by this slide that we have three runnable threads。

And so that's why I'm showing three of them alternating。 Now， so this illusion， let's continue。

this for a moment。 So consider， for instance， at time one， virtual CPU one's on the real core。

CPU two's back in memory at time two， virtual CPU one is on the core is in memory and CPU。

two is on in the core， etc。 Okay。 So what happens to go from T one to T two？ Well， if you think。

about it， something had to take over and unload thread one and load thread two。 What is that？ Well。

it's the operating system。 And so between thread one and thread two at this boundary。

the OS ran somehow。 And it saved the PC stack pointers， all the stuff of the CPU is thread。

out to memory and then loaded stuff from CPU two's thread back into memory。 All right。 And。

you might， if you're thinking about this carefully， start asking yourself， well， how the heck。

did the OS get control because the user's running here， right？ Suppose that CPU one's。

doing my favorite thing of all time， which is it's busy computing the last digit of pi。

then it's not going to give the OS any second give it the time of day because it's working。

on that last digit。 And so really， what has to happen is some intervention here。 Okay。

And we'll get， we'll learn a lot more about that。 But the simplest intervention is a timer。

goes off into the OS。 And the OS then grabs control away from T one and gives it to T two。 Okay。

And by the way， as an important part of 162， you have to know why computing the。

last digit of pi is important。 Okay。 And that was basically to remove the malicious entity。

from the enterprise computer back in the original days。 The only way that Spock could get the。

computer back was he started all the memory banks working on computing the last digit。

of pi and that saved the day。 So you guys need to remember that。 That's your history， for today。

All right。 Now， does the timer have its own thread or process？ That's a good question。

And the answer is it doesn't have to。 It only has to have something that takes over to run。

when the timer goes off。 So most of the time， it may not be running at all。 And it's probably。

when it starts running， it'll be running in an interrupt context， not necessarily a full-blown。

thread。 And we will get into that in a lot more detail。 Okay。 So keep that answer in the。

back of your brain and ask me more in a little bit， well， not today， but a couple of lectures。 Okay。

So what triggered this switch？ So I just talked about a timer， but there's other things。

like CPU one could have voluntarily yielded the CPU。 And a great example is a system call。 Hi。

I want to do IO。 Okay。 I'm yielding the CPU because the IO is going to go on and the， timer。

not the timer， the scheduler then takes over and loads CPU too while that IO is going， on。

So that's a voluntary， that's IO。 There's also the ability to say， okay， give somebody。

else a chance。 Other things we'll discuss。 Okay。 But we have to have some way to take over， control。

Now， the question is， can I repeat how the OS takes back control？ And if you look。

down here at the very last bullet， it gives you all the ways the OS can take control。 The。

simplest for you to remember from today is a timer goes off inside the OS and it grabs。

control away from thread one。 Okay。 And because the OS is running at an interrupt level and。

the thread is running at user level， then the answer is no， the OS is able to steal it away。

and the thread one can't prevent that from happening。 And that's important。 So multiple。

threads of control now， let's talk more about that。 So we call that typically multi programming。

And if you look， the， here we have a bunch of processes， for instance， running on top， of the OS。

And those processes each have their own little chunk of the DRAM。 And when they're， running。

they need to have the illusion that they have 100% of the memory。 And so we're。

going to have to do something to this DRAM to give that illusion that， for instance， when。

the green processor is running， it has all of the memory because clearly there are lots。

of different processes in memory。 Okay。 And good question here in the chat， which is basically。

since the threads like a virtual CPU， why do modern CPUs have a specific number of threads。

in the spec。 And the answer is that the number of threads quoted in a CPU spec are the number。

of hardware threads。 And that is the number of simultaneous things that we can， simultaneous。

threads of control that can be running at the same time， literally in the hardware。 Okay。 So again。

for now， let's not worry about the hardware threads or multiple cores and we'll， get back to that。

I promise。 Okay。 Now， if you look， the thread control block holds the。

contents of the registers when the thread's not running， what other information？ Well。

it's going to have things like the registers， it's going to have things like the program， counter。

as I mentioned， stack pointers， all of that stuff。 Where's the thread control， block stored？

It's going to be in the kernel for now。 Can toss， which is this operating。

system that we're going to be programming and working with and modifying， you can start。

doing things by reading， for instance， thread dot H and thread dot C in the process control。

structures when you start your first project， project zero。 So you get a chance to actually。

start looking at that code very soon and you can start seeing how some of the things。

I'm talking about are implemented。 Okay。 The question about what's user level or interrupt， level。

why don't we hold off for a moment on too many detailed questions there。 Just think。

of user level is less priority than kernel level or interrupt level。 Okay。 We'll get to。

these in a lot more detail later。 So now let's talk about a little in minestribia。 All of， you。

both those who are in the class and those who are still on the wait list ought to be doing。

homework zero。 Okay。 It's due next Wednesday and it， you know， so you need to be going on， it。

And it's important because it's going to help you get your VM environment set up。

and get familiar with tools and some of the other things， both homework zero and project， zero。

which is going to be next Monday。 So all of these things are important for getting。

going right away because when the projects， the real projects start up and your groups。

are ready to go， which is going to happen in the third week， you need to have gotten。

all this preliminary stuff done。 So even if you're thinking you can still get in the class。

and kind of waiting， you should be doing this homework。 If you're unless you withdraw from。

the class and are not on the wait list and certainly not in the class， you should be。

doing this homework。 Okay。 And moving forward， we finally got our section sorted out。 So all。

of our sections starting next week are going to be on Friday。 I know that we had this weirdness。

with Wednesdays and Thursdays and so on。 But the reason it's important to have everything。

on Friday and they're going to be spread throughout the day is that we can make sure that the。

sections can cover details from the two lectures in the week and that everybody gets the same。

lectures before section。 So that's going to make things much clearer。 Okay。 And watch。

for updates on Piazza and on our website。 Okay。 Slipt days。 So you get some number of， slip days。

Good question there。 Let me back up。 So there's a question of do we sign up。

for sections or can we go to any first two weeks？ You're welcome to go to any offered， sections。

Okay。 When you sign up for groups， which is going to be the beginning of the， third week。

we will assign sections based on preferences from you guys such that all。

four of your group members have to be in the same section or at least with the same TA。

It's going to be important so that that TA can know about you and your group members。 Okay。

And so you're going to actually sign up for groups and give some preferences and then。

we're going to assign you sections。 Okay。 But for now， first two weeks， any section is， fine。

So back to slip days。 So you have three slip days for homeworks and four slip days。

for projects which you can use any way you want。 But don't use them up right away because。

there's no credit for late homeworks or projects when you start running out of slip days。 Okay。

So that's very important。 These are not meant to make up for starting really late， although。

you could do that occasionally。 They're there for kind of contingencies。 If something doesn't。

quite work out or somebody has a problem or gets injured or gets sick。 Okay。 And Saturday。

is an optional review。 Okay。 And Zoom link is either TBA or beyond Piazza， maybe recorded。 Okay。

Friday is the drop day and that's a week from this Friday， week from tomorrow。

Please drop before then if you're going to drop。 That way we can make sure that everybody。

who wants to get in and could get in can get in。 Okay。 Because it's going to be hard after。

the drop day to drop it。 So please drop sooner rather than later。 All right。 Any questions？

Let's see。 Do I have it up？ That was the last one。 And by the way， our final class size。

has been set。 We're not going to be adding any more sections。 We have 11 of them now。 And。

we're not going to go for any larger class sizes either。 So， you know， I think we're getting。

down toward our final enrollment。 So we are going for all Friday sections。 Okay。 So we're。

no longer doing Thursday。 And again， that's important from a course content standpoint。 All right。

Once again， I want to remind you of the collaboration policy。 We're going to。

explain a concept to somebody in another group conceptually。 Okay。 Discussing algorithms。

conceptually。 Okay。 Discussing approaches to debugging without specific details might， be okay。

Searching online for generic algorithms like hash tables。 Okay。 All right。 These are。

kind of allowed collaborations。 Sharing code for test cases with another group。 Not okay。

Copying or reading another group's code or test cases。 Not okay。 Copying or reading your。

friends code。 Not okay。 Okay。 You can't search for online code or test cases from prior years。

We actually test for that。 Okay。 We have running， you know， we have running checks to make sure。

that people are not doing that。 So don't do it because if you get snared in， you know。

collaboration， failure here， you're going to， it's going to be problematic for everybody。

And we just don't want to deal with it。 Okay。 Helping somebody in another group to debug。

their code。 Not okay。 Okay。 So we're comparing all the project submissions against prior years。

and online solutions and we'll take actions if we find matches。 And don't， you know， don't。

put a friend in a bad position by asking them to help you with something。 Okay。 Don't。

ask them for their code。 That just， it turns out bad for both of you。 Okay。 All right。 Okay。

So let's look at the second concept of address space here。 And are there any testing frameworks。

that you'll use throughout the semester？ That's good question。 We're going to start by making。

sure you learn GDB。 And we may， we may give you some more unit testing options as time。

goes on here。 That's a good question。 And we'll see what we can do。 Okay。 But definitely。

want to get good at GDB。 So I mentioned earlier the address space。 And here's the example。

of an address space with 32-bit addresses， let's say。 So that would mean that there's。

32 bits of digits here。 These are hex。 So that would be eight hex digits。 And from zero zero。

to FFFF， that's an address space。 Okay。 It's all the addresses that a 32-bit processor。

could conceivably access。 And notice that this is a virtual idea。 Okay。 So just because。

they can access all 32 bits doesn't mean that all 32 bits are valid or even backed by DRAM。 Okay。

So this is the view of a thread as it's running。 Okay。 And so an address space set。

of accessible addresses in state。 For a 32-bit processor， it's about 4 billion addresses。

For a 64-bit processor， it's considerably bigger。 Okay。 It's 18 quintillion addresses。

And what happens when a thread that's running reads your rights from an address？ And the， answer is。

well， you can maybe it'll act like regular memory and you could read or write， to memory。

Perhaps you write to it but nothing happens because it's read only but it's set。

up to ignore your rights。 Okay。 Perhaps it causes an IO operation。 It's possible to read。

or write from an address that'll actually cause something to appear on a screen。 Okay。

That's memory map。 I owe。 We'll talk a lot about that。 Perhaps it causes an exception， or a fault。

Okay。 And that would be you're trying to access parts of the memory space。

that you're not allowed to and you get a page fault。 It maybe it communicates with another， program。

Okay。 So there， and that's an interesting thing we'll call shared memory in a moment。

or actually in a couple of lectures where you read and write and address and it appears。

in somebody else's address space。 Okay。 So the address space is the set of addresses。

you can address and what happens when you try to read or write them is up to the operating。

system and how it's configured things。 And we'll talk a lot more about how you do that。 Okay。

But so in a picture， let me show you。 Here's the processor registers。 Here's the， program counter。

It points to some address instruction in the address space。 Here's， the stack pointer。

It's pointing to the stack at the bottom of the stack。 Okay。 And what's， in the code segment？ Well。

instructions。 What's in the static data segment？ Well， data that， was in your C program statically。

Okay。 What's in the stack segment？ Well， hopefully， and， we'll talk more about this next time。

but you remember what a stack is。 So the stack is， the set of local variables。

And typically when you call a procedure， you allocate a new stack。

segment to handle all the local variables。 And when you return from the procedure， the。

stack is popped off。 And what I'm showing you here is the typical mode in which the stack。

is growing downward。 So the stack pointer starts at a FFFF。 And as you push things on， the stack。

it grows downward。 And as you pop them off the stack， it grows up。 Okay。 What's， in the heap？ Well。

the heap is dynamically allocated memory。 Okay。 So how is it allocated？ Well。

typically by calling something like Malick， how big is it？ Well， it depends on。

how much allocation you've done。 You know， how many times have you called Malick， which。

will ultimately call S break and the underlying system。 But， you know， how much is in the。

heap depends on how much you ask for。 And then there's this hole in the middle， okay。 And。

that hole is going to be kind of interesting to us in a few lectures because， you know。

what happens when you try to push something on the stack and there's no physical memory， underneath。

that's going to cause a page fault。 And the OS can do a couple of things。

like it could add more memory or it could cause a segmentation fault。 So we're going。

to talk a lot more about that as well。 Okay。 Questions？ So good question。 What is stack overflow？

Well， stack overflow is going to be when you allocate， so much stack， because you go push， push。

push， push， push， push， that the OS decides， that it can't give you anymore。

And then you'd say that the stack is overflowed。 And if the， OS catches it properly。

the worst that would happen is your program， your process would， be killed。

What's a little less fortunate is if the stack grows into the heap and the。

right protection isn't there。 And then you silently overwrite some of your data。 And the。

reason that's worse is because you don't actually get a fault out of it。 You just get。

really weird behavior。 Okay。 We'll talk more again about these ideas。 Okay。 Can the heap。

overflow too？ Yes。 So let's talk about our previous discussion of threads。 So very simple。

multi programming was what we did there。 Okay。 All the virtual CPUs share the same memory。

IO devices， et cetera。 They're all in the same address space。 The magenta thread can。

can overwrite or look at the blue cyan threads values。 Okay。 And each thread is therefore not。

protected of the others。 Can it overwrite the OS？ Well， that's a good question。 Not in a。

lot of operating systems， but certainly in the original ones it did。 Okay。 Is this scenario。

unusable because it's possible for threads to overwrite each other or the operating system？ Well。

this approach was used with no protection in the early days of computing。 It's often。

used still in embedded applications。 Some of the early Mac OS versions or Windows 3。1 or。

Windows 95 actually had this level of protection where there was just very simple multiplexing。

there wasn't any protection。 And as a result， it was possible for threads to overwrite parts。

of the address space that contain the operating system。 Risky。 Okay。 There was that question。

earlier about how does the OS make sure that it gets control again？ Well， if you have no。

protection， it's possible for a user thread to overwrite the code that was supposed to do， that。

And as a result， basically that timer interrupt might go off and the wrong things。

might happen and it wouldn't take control back。 So that's an issue。 So simple multiplexing。

that I told you about has no protection， but the operating system really has to protect， itself。

as you might imagine， for lots of reasons。 Like reliability。 We know if we're compromising。

the operating system that generally causes it to crash at minimum， security， if you limit。

the scope of what threads can do， then they can't steal information from other threads。

that are supposed to really be separate processes。 Privacy could one running program steal keys。

in another one。 For fairness， could each thread limit its appropriate share of system resources。

limit the share of other system resources by overwriting the operating system and basically。

never giving up the CPU。 So all of these things mean that the operating system needs something。

better than simple multiplexing。 Okay。 And it also must protect user programs from one。

another so that different programs can't overwrite each other。 So what can the hardware do？ Well。

it can start adding some hardware。 Okay， we can start adding some hardware to prevent。

violations of privacy and security。 And here's a particularly simple one， which we're just。

going to call base and bound。 And the idea is this address space I'm showing you is the。

set of all D RAM， physical D RAM options。 And it is operating system， which is in gray here。

and a particular thread that we want to be running in a protected mode down here， which。

is the yellow piece。 And what the base and bound idea is is that there's a special register。

called the base that points to the lowest address。 Now I've got this reverse from earlier， by。

the way， so smaller addresses are up。 But this， the base is pointing to the lowest address。

that yellow is allowed to access。 And the bound is pointing at the highest part that the。

yellow is allowed to access。 And as a result， if we could somehow make this prevent things。

running in user mode from accessing the gray thing at all， then we can prevent the yellow。

thing from overwriting the operating system。 And so， for instance， here's an example where。

we have program addresses pointing at data。 And if we have the CPU try to access， say， address 1010。

what we're going to do is we're going to ask ourselves， is it larger than the， base？

And if you notice 1010 is larger than 1010， so the answer is yes。 Okay， is it lower， than the bound？

The answer is yes， in which case we go ahead and actually allow the access。

So just by having these two registers and a couple of hardware comparison things that。

cause exceptions to happen， suddenly we can protect。 Okay， very simple protection base， and bound。

Okay， so however， if you notice the simple version of this program on the disk。

actually sort of thinks it's got address zero for the code and some higher address for stack。

stuff and really an address for static data goes 0010 on the file system。 And so when。

we load it into memory， we're going to have to dynamically translate all of the addresses。

that were here into their new position where notice the code was at all zeros here， but。

now it's at all one zeros are zero。 And so there's a dynamic loader that changes all。

the addresses inside the code to be consistent to operate at a base of one zeros are zero。

And if we do that， then we can use base and bound to allow execution。 So if you look here。

so how can the address space live in the or how can the OS live in an address space when。

the OS itself is the one providing the abstract of address spaces。 Well， I just showed you， here。

It's the magic of protection。 So here the OS is running here。 It's going to be all。

the code down here where the gray is。 And I've prevented whenever I'm in user mode。 Notice。

I don't have user mode yet。 But when I'm in user mode， the base and bound registers are。

going to be set and enforced。 And as a result， the user code won't be able to mess up the。

operating system。 And so it's protected。 And so these base and bound registers as is asked。

in the in the chat are dynamically set。 Each time a thread is started executing。 So when。

we decide a thread is ready to execute， we set the base and bound registers then。 Okay。

So this protects the OS and isolates the program and requires a relocating loader， however。

which can somehow translate the addresses as they were on disk into the way they are in， memory。

And notice by the way， this will make sense in a moment。 There's nothing special。

like addition or wherever on the hardware path。 For those of you who have taken a hardware， class。

there's no delay introduced when we try to read from the heap by this particular。

mechanism because they're really the address goes straight out to memory。 Okay。 We'll have。

a better type of protection， which is going to add a little bit of hardware latency there。

in just a few slides。 So what is relocation？ So you might remember this from 61C。 So here's。

an example of an assembly mode instruction jump and link where that gets translated into。

binary where the opcode， 000， 000， 011 is at the top。 And the address of the print def。

gets put into the lower part of the binary instruction。 And when we load it， when we compile， it。

it looks like this in on disk with the x is replaced by binaries。 But when we finally， load it。

at that point it needs to be translated and linked properly to go where it belongs in， memory。

And so at that point， we do what's called relocation， which is we relocate all。

of these addresses that were filled in either at the time that we do loading or at the time。

that we do linking。 Those are two possibilities。 Okay。 And the addresses in the executable on。

disk are as if everything was loaded in memory at 000。 And when we link it or load it into， memory。

at that point we're going to translate to its new position。 Okay。 So let me show you。

something that's a little bit less complicated for the software by putting a little complexity。

into the hardware。 And if you notice， if you think back to what I showed you earlier， the。

only thing I've done here is I've added an adder here。 Okay。 And this is a hardware adder。

And what it means is that when the program addresses come out of the CPU， they go through。

an adder before they go to DRAM。 Okay。 And if you think about that， I'll set a base address。

still of 10000。 But now when the CPU tries to access address 0010000， we add the base。

to it and now we access address 1010000。 And suddenly we've translated the CPU's view to。

the physical view。 So this is actually a type of virtual memory。 Okay。 Where we do translation。

on the fly。 Okay。 And this is going to give us an illusion of memory。 It's not going to。

be a very good one toward the infinite illusion though， but it does give us an illusion。 Okay。

And hopefully this makes sense to everybody。 So now the way that the binary looks on disk。

and the way it looks in memory is the same。 We've just loaded this off of disk and put it。

into memory。 And we can execute it as if it were at zero because it's the program or the。

CPU thinks it's at zero。 And we address it on the fly in its real spot by adding a base。

address to it。 Okay。 This is hardware relocation rather than the software case where the software。

had to change its location。 Now can the program touch the OS here？ And the answer is no because。

it would have to give a negative address that would go below yellow and it can't do that。

Or it would have to give an address that's so big that it wraps around。 It's also not。

allowed to do that because it's trapped between the base and it's trapped here with the bound。 Okay。

So this is still protecting the OS into this little yellow chunk of physical memory。

Can it touch other programs？ Well the answer is that's going to be no too because if you。

imagine a green one up here this CPU can only use addresses that when added to the base go。

between one zero zero and one one zero zero and nothing else is going to be touchable。

So all the other programs and the operating system are all protected。 Now the good question。

here on the chat is is this a hardware delay？ Yes。 But in a modern type of CMOS this extra。

adder is essentially almost unnoticeable。 So many other things in there。 So this extra。

little hardware delay is not going to change the cycle time in any reasonable way。 And。

the benefit or if it did you want to slow the cycle time down the benefit of this is huge。 Okay。

Because it means that the same code from the disk can be loaded into memory and directly。

executed。 And by the way we're not going to worry about these kind of hardware delays。

very much in this class。 We'll mention them occasionally。 That's more for 152。 Now the。

x86 is a very interesting architecture with a whole bunch of what are called segments。 Okay。

And segment registers to define them。 For instance the code segment has basically。

a pointer to the beginning of one of these chunks and a link。 So it's very much like base。

in bound where however instead of just one base in bound we have potentially many with。

different functionalities like code， data， stack and so on。 Okay。 And we'll get much。

more into segments as we go。 But just keep in mind that the idea there is lots of base。

in bounds each with a sort of a function that it's allowed to have。 Okay。 Now a slightly。

different idea from that simple one I introduced you here is fully general address based translation。

And here what happens is the addresses come out of the processor and they're translated。

to physical addresses。 And by the way you could say that happened here right？ Came out of。

the processor got translated to physical addresses。 But we're much more general whereas in the。

previous case they were linearly arrayed from the base to the bound。 Here we can put an arbitrarily。

translator in here and so the virtual addresses can be all over the place in physical memory。

And so that's going to bring us to another 61C idea which you got briefly which was the。

page virtual address space。 And so this translator in the case of paging is a very specific one。

where it moves things around and moves them around but it does so in granularity of one， page。

A very good question here that I think just showed up in the chat was the registers。

in risk 5 and the ones in x86 are the same physical registers in the CPU but assigned to。

different functions。 You can't really think of it that way。 An x86 CPU and a risk 5 CPU。

are totally different。 And there's no simple way for you just to flip a switch and one becomes。

the other。 So they both have registers they both have similar functionality but they're。

very different in their implementation and it's not like those registers of the x86 became。

registers in the risk 5 by flipping a switch。 And this is so page virtual address space getting。

back to this idea here。 The question was is this software but no it's hardware translation。

It's up to the software to set up the translation but then the hardware does it。 Okay。 And all。

the pages here are the same size so it's easy to place each page in memory。 The hardware。

translates addresses using a page table。 Okay。 Every page has a base and bound but they're。

all the same size。 Okay。 And special hardware register is point to the page。 We're going。

to treat the memory as a page size frame。 A whole bunch of page size frames。 Okay。 And。

this is another quick CS 161 C review。 And the idea here is the translator that I just。

told you about is really called a page table。 And what will happen is an address will come。

out of the processor and we'll take that address and we'll look up in the page table where that。

address is in physical memory。 And so for a given address there'll be a frame address。

here that's in blue that will point to a particular page in DRAM and we'll basically。

divide that virtual address into both a page number and an offset。 And that page number。

having been translated into the blue we then take the offset and that tells us how far。

into the blue。 Okay。 So hopefully this is ringing a bell for people。 But notice for。

instance that addresses here are in the same order as in the virtual address space。 That。

order once it's translated to physical order doesn't have to have anything like the same， ordering。

So the blue frame could be here in physical memory in the green frame up there。

The processor running in virtual space thinks that everything is pulled together in a nice。

clean order but in fact the page table can scramble them all over in memory and it turns。

out this is going to be very helpful for management。 Okay。 So this is actually a really， good thing。

Okay。 So instructions operate on virtual addresses。 Instruction addresses are。

virtual loads and stored data addresses are virtual。 They're translated through the page。

table to physical addresses and the physical addresses are the ones that are looked up in， DRAM。

Okay。 And any page of the address space can be in any place in memory。 So this is an。

alternative to the base and bound I just showed you。 Okay。 And the question is will a locality。

be compromised in any way？ Let's hold off that question a little bit because there's a question。

even at what you mean by locality。 Just keep in mind that for the processor to access in。

the processors we're talking about for the processor to access here versus here there's。

not a difference in speed unless the memory isn't loaded。 Okay。 And we're not going to。

get into two level virtual memory right now either hold off on that question please。 Good。

that you know to ask that but we'll deal with it later。 Okay。 All right。 And there's a special。

cool about this is if different processes have different address spaces then what I can do is。

just by when I'm ready to switch from process magenta to process cyan all I have to do is in。

addition to load saving and loading registers I just changed the page table base address and。

suddenly that new thread that we just swapped in has a completely different set of physical。

addresses available to it。 Okay。 So that's going to lead to the third concept of a process which is。

going to be an execution environment with restricted rights。 Okay。

Where there's a protected address， space with one or more threads in it it owns some memory so that's the address space it owns。

file descriptor file descriptors file system context etc。 It's going to encapsulate one or。

more threads sharing process resources。 Okay。 So a program when it's loaded into memory and。

starts executing is a process。 Okay。 And so complex applications can actually fork themselves。

into multiple processes that's one possibility。 Okay。 And that would have multiple of these things。

all working on behalf of the process each of which has its own protected address space。

Furthermore a single protected address space can have one or more threads。 Okay。 So why are。

processes useful they're protected from each other。 So think back to the slides I showed you。

at the beginning of the lecture。 Brown green。 The brown process protected from the green process。

Why they each have their own address space either worked out through base and bound or more likely。

in a modern processor by its page table。 They each have the page table be in memory and so when we。

switch from brown to green and back again we're altering the page table address in addition to。

loading registers in。 And as a result when we go from brown to green and back again they're fully。

protected in their own address space where they can't mess up each other unless we decide to let them。

Okay。 All right。 Does the OS build the page table？ Yes。 Okay。 And I'm gonna and the page table is a。

long lived thing that kind of exists from the creation of the process to the to the death of the。

process。 And for now we're gonna say absolutely the page table is in the operating system。 Okay。

You guys will know when you could ask me that question again it might give you a different answer。

but it won't be for a month or so。 Okay。 So the OS is definitely controlling the page table。 And if。

you notice why do we want processes well because the processes are protected from each other the OS。

is protected from them。 Okay。 Why is that？ Well let's go back to this。

If the only parts of memory are， parts that are in the page table that I can access then if it's not if it doesn't have a pointer in the。

page table there's no way for this processor running in that process to go through the page table and。

touch a part of memory that's supposed to be only accessible to the OS。 Okay。 So the reason OS is。

protected is there just aren't any pointers that are accessible to the virtual process that can。

access OS。 Okay。 Another good question are the threads in a process protected from each other or。

not？ No。 That's a feature。 Okay。 So the threads inside of a process are not protected from each。

other and that's good because they're sharing concurrency and memory with each other to get some。

job done rapidly。 And so they're not protected by each other but that's by design。

And the reason we， don't think of that as a security violation is because we assume that you wrote your program to。

work properly and you wrote it to use those threads and have those threads collaborate with each other。

properly and therefore they aren't protected from one another but boy they're protected from somebody。

else's threads。 Okay。 Now here's a view that I just said of a single threaded and a multi threaded。

process。 So for instance a single threaded process has a code and data and files in it and。

registers and stack and a single thread of control。

And so when that process gets put to sleep there's。

only one thread running that needs to be put to sleep。

A multi threaded process has a bunch of threads。 Okay。

Each of which has registers and stack of their own。 That's the part that makes it uniquely a。

thread but they also have code and data that's shared。 So typically a multi threaded process has a。

huge chunk of code and the threads are running around inside that code but that code is all part。

of the same linked image。 Okay。 So threads encapsulate the concurrency aspect。 Okay。

They're the active， component， the executing component of a process。

The address space encapsulates protection。 It's the， passive component or the box。 Okay。

I like to think of the address space is the box that we shove， these things in。

The threads are busy running around in that box。 Okay。 And that box that protection。

prevents bad deep programs from crashing the system。 Good question is the heap shared by different。

threads？ Yes。 So typically there's one heap。 Okay。 So code data files heap all one heap。

The reason we， need to have multiple stacks is because each thread is all busy running。

It's own sort of recursive， routines and so on。 And so it needs its own stack。

And of course it has its own registers because it's， doing actual execution。 Okay。

And how are multiple stacks set up？ Well， this is all part of both the。

process allocation will give you at least one register and stack。

And then when you make new threads， there'll be more of them。

And all of that is done typically at user level or can be done in the， kernel。

We'll talk about both of those as an option。 Okay。 So we'll get to we'll get to more how to build a。

process and make it run as we go on。 So protection and isolation together are important。

So why do we， need processes just to remind you for reliability？

Because bugs can only overwrite the memory of， processes they're in。 So that's good。

Security and privacy。 Molishes are compromised processes can't， look at other processes data。 Okay。

And fairness to some extent。 So by confining threads to a process。

and those processes are protected and manipulated by the operating system， it means the operating。

system can make sure that no one process can steal all of the resources and prevent other。

processes from running。 Okay。 So the protection environment of the process with proper scheduling in the。

OS is what lets us build a modern operating system where we can guarantee that even malicious things。

that are running can't prevent others from running or prevent or steal information。 Okay。 Now the。

mechanisms to make this all work， we've already showed you this idea of address translation。 But if。

you think about it， why can't we let a process change the page table pointer on its own？ Okay。 Well。

can anybody figure that out？ So why can't we let the process that's running change its own。

page table？ And Rizal， I'll get to your question in a second here。 Can anybody think about why we。

can't let a process change its own page table？ Yeah。 If it could change its own page table。

not protected， right？ It could point it at the OS or another process。 Okay。 And so clearly。

when we're running inside the process， it can't touch the page table。 Okay。 And so that means that。

we have to have something about the hardware that has those two modes I mentioned that make sure。

that when you're running as a process， you can't mess with the page table and only can mess with。

the page table when you go into the operating system， which is kernel code that's been vetted。

and is known to be not malicious， at least we hope。 Okay。 Now the question here about which I want。

to answer briefly， what's the advantage of multi-threading in a single core？

And the answer is one of concurrency。 Okay。 So it's not about getting performance out of parallelism。

but it's allowing you to have， many things that are starting and stopping and waiting on events。

Okay。 And you'll be much more， sophisticated about that as you go forward。

So there's a lot of use for multi-threading， even when there's only a single core。

And it's not about making things faster， but about dealing with。

the fact that you want to overlap computation and IO in an important way。 Okay。

And also making sure， that when one thread is waiting， it's not blocking everybody else。

And so the fourth concept for today， which we're going to end up with quickly here。

is that hardware has to provide at least two modes， kernel mode user mode or supervisor mode。 Okay。

Certain operations prohibited when you're in user， mode。 Okay。 Like changing the page table pointer。

etc。 And there has to be， when you're in user mode， you can't do those things。

And there has to be very carefully controlled transitions from user， mode to kernel mode。

So in that case， system calls， interrupts， exceptions， are all examples of。

transitioning from user mode into kernel mode in a controlled way， which is not only going to go。

to kernel mode， but it's also going to make sure that the only code that's allowed to run is code。

that's vetted and belongs in the operating system。

So we can't obviously allow the user code to just。

go to kernel mode because then it can run anything it wants。

And so we have to control the transitions， into kernel mode。

So one example we've mentioned already today was the system call idea， which is。

the user process is running along and user mode。 And then it says， hey。

I need to do a read from disk。 So it's going to make a system call， which is going to do two things。

One， it's going to enter a very， well-defined piece of code， which represents the read system call。

And two， that transition is going， to also have a transition to kernel mode。

So I'm showing it here where there's a single bit， the user。

mode bit is a register in the processor。 And we're running up here with the user mode bit set to one。

But then when we do a system call， we transition to a user to the bit being set to zero。 And so now。

the code that's only kernel code is running with a lot of privileges。

and it can do whatever it needs， to， including maybe altering the page tables。

And then when it returns to the user code， which is like， a return from a function call。

but it's a special function call， we will then go back to user mode。

after we're inside the user process。 Okay。 So for example， we can now start talking about a。

diagram like this。 Here's the typical unit system structure。 And by the way， I'm running a tiny bit。

late。 If you guys will hold on for another five minutes， I want to make sure that we keep a couple。

of things going here。 Okay。 But if you look， we can imagine things running at user mode。 These are。

pieces of applications and standard libraries that are all up here running without special privilege。

And they can be built by you， linked by you， or other users， any way you want。

Kernel mode are things that are running with high priority。 And this represents things that have to。

be done perfectly。 Okay， they have to be done in a way that's not buggy， doesn't allow people to。

have security violations， whatever。 So that's important。 Okay。 And so we need to make sure this。

code is all perfectly correct。 Okay。 And if you look now at the hardware， okay， that's the lowest。

level。 And typically， kernel mode， only the kernel mode things are allowed to actually access。

the hardware。 And that's partially because they have the kernel mode bit set。 And the hardware will。

only talk to things when the kernel mode bit is set。 Okay。 So we can start looking at the structure。

of a typical system this way。 Okay。 So for instance， here now let's look at what we've got。

We've got。

hardware is this brick wall。 Here's our software。 So we have the core as things that are running in。

the carefully vetted kernel mode。 User mode is all the stuff that you produced or regular users。

produced。 And what happens is the kernel will exact a process。 You'll learn how to do this when。

you do the shell。 It execs it， which is loaded into memory off of disk。

and then starts it running at， user mode in the startup routine of that file。 Okay。

That's main typically。 Okay。 And it runs for a， while。 And later when it's done。

it'll the process will exit。 And that process will be shut down and， we'll be back in kernel mode。

Okay。 Now， meanwhile， from the beginning to the end， things can happen， like， well。

we could make a system call into kernel mode， and that will get returned back to user。

mode when we're done。 Or we could have an interrupt。

So an interrupt is an event coming from outside， like a timer that forces from user mode into kernel mode。

where stuff will happen。 And then， eventually might access the hardware， for instance。

and then eventually return after the interrupt's done。

So that could happen if a network packet comes in， etc。 Okay。 And then finally， an exception might。

be an example where you try to divide by zero， in which case we might enter the kernel。 And if。

it's an unrecoverable one， like divide by zero， then we might kill off the process。

If it's a different， type of exception， like a page fault， which we'll learn a lot about later。

then it will return。

properly for execution。 Okay。 All right。 So there are different additional layers of protection in。

modern systems。 And so for instance， there can be something even higher privilege than the operating。

system like the hypervisor。 Okay。 These are additional layers of protection where we have what are。

typically called virtual machines that let you run an operating system on top of them， where the。

operating system thinks it's got full control of the machine。 But the hypervisor is multiplexing。

underneath。 Okay。 And that's what you're actually playing with right now， is you're getting set up。

with your virtual machines。 Okay。 Now， is the OS running in user mode by default？ So again。

that's a good question to get close to finishing on。 If you look at this for a moment。

we have stuff， kernel mode things， which might be the OS， which run at kernel mode。

And so there's the code that， touches the hardware and does highly privileged things。

Then we have user mode， which is your user， programs that are not running that special code。

So the one answer to the is the OS running in user。

mode is the OS is really always running in kernel mode。 Okay。 It may not always be actually getting。

cycles out of the CPU， though， because it may be in this instance here， where we made a here。

we're running in user mode。 And if there's only one CPU， then the OS isn't running at all， right。

isn't running。 The moment we have a system fall into the kernel， now we transition into kernel mode。

And now the user mode code isn't running。 And the kernel mode code is running， where running means。

has the CPU。 Okay。 Hopefully that's helpful。 We'll get more of this as we go on。 So I think。

I'll give you this last example here。 Okay， I promised five minutes。 So just let's look at this。

For instance， let me just show you。 Let's use base and bound。 Here's the operating system。

Here's two processes。 And we're going to have this illusion of processes。 And notice that when。

we're running the operating system， we're in system mode。 That's why this is on。

So this is not a user， mode bit。 This is system mode bit。

The program counters pointing into the operating system。 The。

stack pointer is pointing into the stack and the OS kernel mode is running。

Now what we're going to do， is we want to start running something else like the yellow code。

So we're about to return from interrupt。 And to do that。

we set up pointers to the yellow code in special registers so that when we do a， return to user。

what happens is the PC gets switched over to the yellow code。 The base and bounds will。

now be enforced because the system mode is zero。 So only when we're in system mode zero or kernel。

mode or excuse me user mode does base and bound get used whereas when we're out of kernel mode。

the kernel is free to do whatever it wants。 Okay。 All right。 So next time I'm going to talk about。

different types of kernel mode transfers。 So in conclusion， today we talked about four fundamental。

OS concepts。 Thread， which is a virtualized CPU。 It's an execution context fully describes。

a program state program counter registers execution flag stack。 And it can be multiplexed。 So there。

can be many more threads than there are actual CPU's or cores。 An address space with or without。

translation is a protected box。 It's the set of all memory addresses accessible to the program for。

reader write。 So for a 32 bit CPU， for instance， it's four billion values。 Okay。

And it may be distinct， from the physical machine。 That's when we start having translations。

We talked about a process， which combines threads and address spaces into a useful idea。

which is a protected address space， with one or more threads。

And then we also finally talked about the need for hardware to have two。

modes system and user so that we can enforce that certain hardware can't be touched by the user。

in order to give us our full protection。 All right。 I have gone over。 Thank you for bearing with me。

I， hope that everybody has a good rest of their evening and stay safe。