【Raywenderlich】Timer Profiler

570 阅读13分钟

这篇文章是记录了 Raywenderlich Timer Profiler 视频。Youtube 自动生成的英文不是很准确,自动翻译的中文更不准确,因此笔者花了两天时间把视频中说的话从头到尾整理好几遍,感兴趣的同学在学习这个视频的时候可以对照着我整理的内容一起看。

视频地址:www.youtube.com/watch?v=Qsd…

Time Profiler

Hey, what's up everyone. Wecome back to our course on practical instruments.

In this video , it's time to break out time profiler and get a look at the innards of our app, Catstagram. By the end of this video, you will solve the biggest bottleneck the app has, making it run a lot more smoothly.

If you're running of this app on a newer device, it probably doesn't really feel all that bad. If you pay attention, you should definitely see some little hiccup as the images come on screen. If you really want to see some frames getting dropped, hop on an iPhone 5, or even better, an iPad 5g, these devices are noticeably slower, and a way worse experience with this app.

While it's easy to look at the list of available instruments and get a little overwhelmed, at the end of the day, you'll probably end up coming back to Time Profile of the majority of time. While other tools may give you a little better perspective as to what's causing a certain problem in your app . Usually you'll be able to see the same problems in time profiler as well.

# 翻译
get a little overwhelmed # 有点不知所措
you'll probably end up coming back # 你可能最后又回来了
may give you a little better perspective as to # 可能会让你更好地了解

Dtrace

Like many of the instruments, Time Profiler uses the powerful DTrace tool under the hood, to dynamically insert itself into your app while it's running. DTrace itself is a little bit beyond the scope of this series, but it's actually a command line tool made by Sun Microsystems to get a detailed overview of everything going on in a running program. If you think this profiling stuff is super cool, you can even make your own instruments by writting custom Dtrace scripts. Anyway, Time Profiler works by capturing your app's call stack at a constant interval. While this can technically influence the results of performance traces, by introducing a little bit of overhead, You can rest assured that the vast majority of the time the results you get from Time Profiler will be more than accurate enough to clue you in to the problems in your app.

# 翻译
under the hood # 可以理解为在后台
beyond the scope of this series # 超出了这个系列的范围
overhead # 开销
rest assured # 放心
clue # [klu:] vt. 为...提供线索

Details Panel

In the details panel, you'll see that time profiler builds up a call tree from the call stacks that collects.

THE CALL TREE

Let's look at a simple run of an app, to see how exactly it does this. Let's say 比方说 we have a relatively simple app, it'll only consist of a few methods, I don't know, something like foo() an bar() , I don't know why that feels right.

Here we have the sample 样本 run of your app. As time moves along, Time Profiler will grab 攫取 snapshots of your current call stack, at a constant interval. By default, each of these snapshots will happen one millisecond apart.

In this case, about a millisecond into the run, a sample is taken, where the stack has main calling foo() which is then calling bar().

So, we'll save that off into the current call tree along with 以及 the count of how many times each method has been seen in this position in the call stack.

A millisecond latter, time profiler takes another sample.

This time the call stack was the same , so the count of each method is just increased by one .

At the third millsecond, the stack is captured, and we have main() calling bar() directly.

This means, will increase mains count like before, but this time we'll make a new branch in our call tree since sometimes main() calls foo(), and then bar(), and then other times it just calls bar().

In our final call stack, we have another variation where main() calls foo() , which then calls a method named baz.

Like last time, we'll update the counts of main() and foo(), and then the call to baz() means we have yet another branch in our tree.

One thing you may have noticed, is that baz() was actually already called twice, and the profiler just happend not to take samples when it was being called. This is a feature not a bug. The thing to remember about Time Profiler, is that it's not actually collecting the duration of each method your app was calling . Instead, it collects these stack traces and then counts the number of times it sees these methods in the samples.

Time profiler 默认以1毫秒的间隔捕获 App 的调用栈,然后计算它在样本中看到的方法的次数。

What you're seeing in the output is more of an average how often a particular method shows up at any given time, when your code is running. This means that Time Profiler can’t actually tell the difference between large methods that take a long time, and short methods that get called more often. This does mean that if the method is short enough, and get called infrequently enough, it may not show up at all.

Time profiler 记录的只是一个平均的方法出现频率,并不能区分花费时间长的方法和花费时间短的方法。如果一个短方法的调用频次很少,那它可能都不会出现在 Time profiler 中。

Demo

Alright , so here we are in the Catstogram app, as we can see, it's got a pretty sweet icon, similar to some other apps, but, whatever. And if we start the app up, we'd notice that as we scroll, it tends to be a little, actually really laggy. So we're go ahead and fire up instruments and see if we can figure out what's going on.

So to get started , you can go ahead and hit Command + I to start instruments.

Command + I 启动 instruments

So here we have the template chooser and we can go ahead and just pick time profiler and then hit choose. And then like you saw in the last video you can come up here and hit record and that will start the trace.

And then to collect some stack traces, we're just going to mess around with the app, right? Scroll it, interact with it like we normally would. And that way, we can really like gather information about what's happening in our app.

So once you've scrolled around a bit, you can go ahead and come up here and hit stop.

Alright, so now that we've collected enough information to have a good call tree, let's go ahead and look thorugh here and see what information we see.

Weight Column

The first thing to notice is the Weight column over here. This is how much of the work is being done in a particuler subtree.

So as we can see on our Main Thread here, we have 46.6% of the work happening.

Self Weight Column

And then, we have the Self Weight column. So the self weight column, is how much work is happening directly in that method, not in me thods called by that method.

how much work is happening directly in that method, not in methods called by that method

直接在该个方法中发生了多少工作,而不是该方法调用的方法中发生了多少工作

So the Self Weight column is going to be what you'll use to find where the work is really happening. Because there's a lot of things that happen between UIApplicationMain starting, and where the actual work happens in your app.

Go on

So if we click on these disclosure triangles, we can kind of drill down 深入了解, and see.

So we have our call to main -> UIApplicationMain, and as we drill down, we can find that there is a lot of system stuff going on, and it maybe a little bit overwhelming 不知所措 at first, and it's kind of just one of those things, where you have to just kind immerse 使...沉浸 yourself in it for a while, before you get comfortable in the kind of things you're going to see in here.

So, a lot of times you'll see this CFRunLoop, this is maybe one of the most common and kind of most important things to be comfortable with seeing in here.

So hitting these disclosure triangles is fine, but it's actually kind of not the fastest way to do things. There's a nice trick 技巧 on your Mac, for any, like in Finder, anywhere that has these disclosure triangles, if you hold Option and click on the trangle, so when you do this Option click, this is known as a smart disclosure, and what ends up happening is, it will drill down until there's actually relevant 相关的,切题的 information. So you'll see that these calls here, are where we actually start getting a lot of the big, we have like 1.2 seconds worth of work, 500 milliseconds worth of work, so these branches in the call tree are where actual like big chunks of 大块的,大量的 work start happening.

So this is a pretty common example of what will happen, you'll do a smart disclosure, it'll come down, and you'll have the CFRunLoopRun. And basically the Runloop is the thing that's reacting to user events and drawing things to the screen. So a lot of code that gets run in your app, is going to be run through these Runloop events.

Another thing to notice is these little icons over here. These are describing what kind of code is being run.

  • So as you can see, there's like a little mug here, that means Cocoa, clever.

    mug 杯子图标意味着 Cocoa

  • We have the gear, which is system calls, like OS system calls.

    gear 齿轮 是系统调用

  • We have the person, which is our app itself。

人 表示我们的app它自己

  • The briefcase always seems to be GraphicServices.

    briefcase 公文包看起来是图形服务

Anyway, now the next important thing to look at is down here this Call Tree options.

  • Separate by State that's like whether the thread is running currently or not.

    这就像当前线程是否在运行一样?

  • Seprate by Thread, which we have by default, it's a pretty nice option, because usually we wanna look at the main thread.

我们默认的很好的选项,因为通常我们想看主线程

  • Invert Call Tree will flip everything around so that the leaf functions at the bottom, or I guess you could say at the top of the call stack gets flipped back up to the top of the call tree.

  • Hide System Libraries will turn off all the Apple stuff and just show your code, which is pretty nice if you're looking for something that you're doing and you're drowning in 淹没在 CFRunLoop events. But it can also be a little misleading 误导, so you kinda have to be careful not awlays be hiding system libraries.

    不要总是隐藏系统库。

Now that we've done a quick introduction in your challenge for the moment, is to go ahead and pause me. And just kind of adventure through here, click on these disclosure icons and see if you can't find any of our code that you think might be contributing to the slowdowns. Pause me for a couple minutes, and then come back, and we'll look into it together.

Alright, so hopefully you looked around a little bit. Let's go ahead and dive into this and see what exactly is going on.

So like I said, a lot of times, we're going to see this list of _CFRunLoopDoObservers, _CFRunLoopDoSource1, _CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE_. When I'm doing profiling, I like to be at this spot , because you can drill down in these guys and kind of quickly get to what your biggest bottlenecks are in your app.

I like to be at this spot 我喜欢呆在这个地方

你可以在 CFRunLoop 这里深入研究,并快速找到你 App 最大的瓶颈

Now, if you were messing around 瞎折腾 a moment ago and had chosen to do the Hide System Libraries, you might've gotten a call tree that looks like this, right? And, like I said, it does only show your code which can be nice, but in this case, it shows our app didFinishLaunching method as being the biggest bottleneck in our app. Looks like allocating the CatFeedViewController was the biggest thing. And in reality, this is not the most work that was done in our app. So this is an example of when this can be kind of misleading, so let's go ahead and turn that off.

这展示了一个例子:Hide System Libraries 会误导我们

Now to close this tree , we can go ahead and hold Option and click Main Thread again, and that'll close everything and then hold Option and click it again and it'll open us up back to our CFRunLoop stuff.

So let's go ahead and drill down into our top result in the _CFRunLoopRun section.

And as we do, we see a lot of CoreAnimation stuff, which is easy when you see this kind of stuff. It's easy to think: "ah, I don't really know what I would, I can't fix core animation, right?".

So we see :

- CA::Transaction::observer_callback()
	- CA::Transaction::commit()
		- CA::Context::commit_transaction()
			- CA::Layer::prepare_commit()
				- CA::Render::prepare_image()
					- CA::Render::copy_image()
						- CA::Render::create_image()
			...
			...

So looks like we have a lot of CoreAnimation and image stuff. As we drill down further, we have Apple JPEG. And you may not have even had a drill down this far to see this.

If you go back up here and just click on the Main Thread, over here we have this nice Heaviest Stack Trace, which if you scroll down, you'll see, the same stuff we were just looking at, CA::Transaction, transaction, render render, and then down here we have the Apple JPEG decode, and all this stuff.

And then finally , if you came up here and you saw that there is these strategies. You may have clicked on this strategy and chosen them to get a really good look at what's going on in the Main Thread.

So, just for an example of that, let's zoom in a little bit. And here see our really big chunks of work, right? We have this , this one here, so if you wanna zoom in on this, you can click and drag to filter your call tree just to be this area.

And then, we can do our old trusty smart disclosure, come down here.

And this gives us the same thing, our biggest chunk of work in this big chunk of work is the Apple JPEG image decoding stuff.

So this is a case where it's helpful to know a lot about what's going on under the hood.

If you come over here to this big Stack Trace, one thing we find is this CA::Layer::prepare_commit() call. So there's a WWDC video that I'll link to in the challenge document, but if you've seen this, the presenter gose through and explains CoreAnimation CATransactions, and then he explains four phases of a transaction. And this prepare phase, he gose and explicitly says, this is when ImageViews will decode their JPEGs, in order to be able to display them. So if you are familiar with that, you may have noticed this prepare here, and realized that.

CATransactions 有四个阶段,CA::Layer::prepare_commit() 是 ImageView 将要解码它的 JPEGs 的阶段。

Ok, well, this is just all my compressed JPEGs being uncompressed so they can be displayed. The thing about this step of the CATransaction is that doesn't necessarily have to happen on the main thread. You can do this decompression asynchronously, and that will free up the main thread to handle scrolling and stuff like that. So that's how we'll go ahead and fix this big bottleneck.

图片解码可以在子线程中做,这样可以解放主线程,让它处理像"滚动"这样的事情。

To do so, let's go back over to the Catstagram app, if we come into the CatPhotoTableViewCell, we see we have two ImageViews, the userAvatarImageView and the photoImageView.

So instead of using an ImageView, we're going to go ahead and make our own AsynchronousImageView. To do that, we can hit Command+N just open our new file, we'll do a new Cocoa Touch Class. It will be a UIView, it'll be named AsyncImageView and will write in Swift.

All right, the first thing we're going to do is make this ImageView class able to accept an image. So delete that. First, we'll make a private image variable. That way we can write a public interface with custom getters and setters. In the setter, we'll do the normal thing where we set our image to be the new value.

Then we'll asynchronously decode that image, and apply it to our view. First, we'll set our layers contents to be nil. Then we'll guard against an nil image. And then, we'll kick off our Async job. We'll dispatch this to a global queue with a pretty high quality of service, let's say "userInitiated". And so, before we can use method, we actually have to write it, so let's comment this out for now.

Then come down here ,and this is where we'll actually write our method to decode images and return them. First, we'll grab out CGImage from the past in UIImage. Next, we'll create a graphics context, which we'll render this into.Alright, now that we have our context and color space set up, all we have to do is render it into the context. To do that, first we call the draw method on the context, and then after that we can just call the makeImage() method. Now, if we successfully got an image, we return it, otherwise, we return nil. I went up here we see we accidentally forgot to return something , so just return nil in this case . And, here we forgot some parentheses . Alright, now that we're actually creating a decoded image and returning it., we can uncomment this.

And the next step, is to jump back to the main thread, after we've created our image on a background thread, we go back to main and we can assign it to our layer's contents .

Alright, now that we have our new class, let's go back to our cell and use it.

So here we can just swap this out for an AsyncImageView. And since(由于) it accepts an image in the same way in ImageView does, that should be all we need to do .

Let's go ahead and run instruments again, take another measurement and see if this helped with our big bottleneck .

Alright, so now if we stop our trace again, and go back into instruments mode.

We can do we did before, do an Option click on the mian thread. Drill down on our top results. and here we see we still have a CA::Transaction::commit(), but after drill down we see that it's just laying out sublayers. And if we drill down far enough, we see that our biggest bottleneck right now is just our cellForAt:IndexPath. So this is a lot better than having all that image decoding happening on the main thread.

总结

If you're wondering why doing the same amount of work, but with some of in the background, really sped things up, it's time to take a step back.

A crucial element to improving performance on iOS apps is focusing on the main thread.

提升 iOS App 性能的关键因素是专注主线程

The main thread is the thread that's in charge of two things:

First, it takes care of user input. Taps can be cached by backward D and can tolerate a main threads stalls of something like 50 to 100 milliseconds before event will be lost. Continuous gestures like swipes are much more sensitive, and can only tolerate 5 to 10 milliseconds.

Then, the main thread is also in charge of displaying results to the user on screen. Just as touches can be dropped when the main thread is unresponsive , the drawing of individual frames can be lost as well. The main thread draws at a constant interval, where each time to draw is called the vsync. If the main thread hadn't finished its work in time, it'll skip drawing that frame. Your main goal when profiling is to get your app running at 60 frames per seconds. This ensures a smooth pleasant experience for your users. In fact, Apple has repeatedly mentioned that it should be a top priority for app developers to achieve. 60 frames per second equates to something like sixteen point 16.67 milliseconds for each frame to do its work. Any big chunk of work that takes more than 16 milliseconds will almost guarantee an opportunity to draw frame is missed aka a frame drop .

stalls of something like 50 to 100 millisecond 停滞大约50到100毫秒

sensitive 敏感的

can be dropped 可以删除/遗弃

individual 个别的;独特的;个人的

That's it for out introduction to time profiler, and as always, we like to leave you with a challenge. This time around, you'll make another improvement to our new AsynchronousImageView by adding layer of caching , so that it's not doing so much redundant work .

As always, I hope you enjoyed thie video. Thanks for watching and we'll see you next time .