今天收到一个BUG,扫码枪在某款设备上输入异常,大部分时候只能输入条码前面的几个字符。比如条码SF1008778973288,使用扫码枪只能输入SF10或SF18。
扫码枪是啥?
扫码枪是一种键盘类型的输入设备,通过光学原理读取条码上的图像信息,然后解码得到数字信号,再把数字信号输入设备。扫码枪不同于键盘的一大特点就是在一瞬间输入大量的字符。
这是一个什么问题
首先要确定扫码枪是一个什么类型的设备,Android/Linux系统把扫码枪识别为一个什么类型的设备。通过下面的命令查看设备信息,扫码枪是一个键盘类型的输入设备。
xxx:/ # cat /proc/bus/input/devices
I: Bus=0003 Vendor=0808 Product=0606 Version=0110
N: Name="BLM103Virtual COM Port "
P: Phys=usb-XXXX-1/input0
S: Sysfs=/devices/XXXX-ehci-2/usb1/1-1/1-1:1.0/0003:0808:0606.0001/input/input1
U: Uniq=
H: Handlers=kbd event1
B: PROP=0
B: EV=120013
B: KEY=e080ffdf 1cfffff ffffffff fffffffe
B: MSC=10
B: LED=1f
Android系统的输入流程涉及:driver -> IMS -> (PhoneWindowManager) -> APP,一个清晰的分层结构。首先判断问题发生在那一层?
通过getevent查看driver是否抛按键事件上来了
xxxx# getevent
add device 3: /dev/input/event1
name: "BLM103Virtual COM Port "
/dev/input/event1: 0004 0004 00070023
/dev/input/event1: 0001 0007 00000001
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070023
/dev/input/event1: 0001 0007 00000000
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070026
/dev/input/event1: 0001 000a 00000001
/dev/input/event1: 0000 0000 00000000
... ...
/dev/input/event1: 0004 0004 00070023
/dev/input/event1: 0001 0007 00000001
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070023
/dev/input/event1: 0001 0007 00000000
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070028
/dev/input/event1: 0001 001c 00000001
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070028
/dev/input/event1: 0001 001c 00000000
/dev/input/event1: 0000 0000 00000000
driver有抛按键事件上来,并且完整无异常,首先(基本)排除driver。
然后看PhoneWindowManager是否收到正常的按键。通过放开interceptKeyBeforeDispatching()里面的原生打印信息
public long interceptKeyBeforeDispatching(WindowState win, KeyEvent event, int policyFlags) {
if (DEBUG_INPUT) {
Log.d(TAG, "interceptKeyTi keyCode=" + keyCode + " down=" + down + " repeatCount="
+ repeatCount + " keyguardOn=" + keyguardOn + " mHomePressed=" + mHomePressed
+ " canceled=" + canceled);
}
... ...
}
01-01 09:06:42.323 1858 2013 D WindowManager: interceptKeyTi keyCode=59 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.324 1858 2013 D WindowManager: interceptKeyTi keyCode=47 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.332 1858 2013 D WindowManager: interceptKeyTi keyCode=47 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.341 1858 2013 D WindowManager: interceptKeyTi keyCode=34 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.349 1858 2013 D WindowManager: interceptKeyTi keyCode=34 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.363 1858 2013 D WindowManager: interceptKeyTi keyCode=59 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.373 1858 2013 D WindowManager: interceptKeyTi keyCode=8 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.385 1858 2013 D WindowManager: interceptKeyTi keyCode=8 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.566 1858 2013 D WindowManager: interceptKeyTi keyCode=15 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.576 1858 2013 D WindowManager: interceptKeyTi keyCode=15 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.583 1858 2013 D WindowManager: interceptKeyTi keyCode=66 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.589 1858 2013 D WindowManager: interceptKeyTi keyCode=66 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
通过日志,可以发现这里只收到了部分按键,也就是说inputflinger(IMS)出了问题。
小节一下:扫码枪是一个键盘类型的输入设备,driver会一次性抛很多按键事件给IMS,inputflinger在处理按键事件的过程中发生了异常,导致按键丢失了,表现出来就是应用里面输入不完整的条码字符。
调试inputflinger
首先放开inputflinger的原生打印信息,看能不能有啥发现
Detected input event buffer overrun for device BLM103Virtual COM Port
inputflinger果然遇到了异常,字面意思是:这个设备的输入事件buffer overrun。看一下代码发生的位置:
frameworks/native/services/inputflinger/InputReader.cpp
void InputDevice::process(const RawEvent* rawEvents, size_t count) {
size_t numMappers = mMappers.size();
for (const RawEvent* rawEvent = rawEvents; count != 0; rawEvent++) {
if (mDropUntilNextSync) {
... ...
} else if (rawEvent->type == EV_SYN && rawEvent->code == SYN_DROPPED) {
ALOGI("Detected input event buffer overrun for device %s.", getName().string());
mDropUntilNextSync = true;
reset(rawEvent->when);
} else {
for (size_t i = 0; i < numMappers; i++) {
InputMapper* mapper = mMappers[i];
mapper->process(rawEvent);
}
}
--count;
}
}
首先要知道一个知识点:type==EV_SYN code==SYN_DROPPED类型的按键事件是driver写入buffer的,这不是一个正常的按键,表示buffer写满了,出现了覆盖的情况,以此来通知上面一层需要特殊处理。
每个输入设备都有一个固定大小的环形buffer,driver负责向里面写入输入事件,inputreader线程负责从buffer读取事件,当读取的速度小于写入的速度时,在输入事件很多的情况下,就可能会出现环形buffer被写满覆盖的情况,这个时候就发生了输入事件丢失。也就是本BUG中,看到的通过扫码枪扫码只能输入部分字符。
当然也可以通过增大设备的环形buffer来解决问题。实际测试有一点效果,可能是buffer调整的不够大。因为其它平台的设备是好的,所以开始怀疑或许这个平台修改太多又不注重质量,把framework给玩坏了。
IMS的inputflinger是没有修改过的,相关的修改就是inputpolicy,特别是PhoneWindowManager的interceptKeyBeforeQueueing/interceptKeyBeforeDispatching两个方法,多人多手修改,简直已经不忍直视。
inputpolicy的修改会影响到inputreader的就是PhoneWindowManager.interceptKeyBeforeQueueing(),如果这个方法中增加了耗时的操作,就会导致inputreader线程预处理事件变慢。
inputpolicy
frameworks/base/services/core/java/com/android/server/policy/PhoneWindowManager.java
public int interceptKeyBeforeQueueing(KeyEvent event, int policyFlags) {
int keyCode = event.getKeyCode();
if ((allowBurnModeBlockKeys(event, down) == true)) {
Log.d(TAG, "XXX=====>>>interceptKeyBeforeQueueing,BurnModeBlockKeys");
return 0;
}
... ...
}
private boolean allowBurnModeBlockKeys(KeyEvent event, boolean isKeydown) {
int keyCode = event.getKeyCode();
if(CustomerManager.getInstance() != null && (CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_A")
|| CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_B")
|| CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_C")
|| CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_D")
|| CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_E")) && isBurnModeOnOff()){
if(keyCode != MKeyEvent.XXXX_HOT_KEY
&& keyCode != KeyEvent.KEYCODE_BACK
&& keyCode != KeyEvent.XXXX_BURN_MODE
&& keyCode != KeyEvent.KEYCODE_TV_INPUT){
return true;
}
}
return false;
}
public boolean isBurnModeOnOff(){
int mode = 0;
Cursor cursor = mContext.getContentResolver().query(Uri.parse("content://xxxx.tv.factory/factoryextern"),
null, null, null, null);
if (null != cursor && 0 != cursor.getCount() && cursor.moveToFirst()) {
mode = cursor.getInt(cursor.getColumnIndex("m_bAgingMode"));
Log.w(TAG, "Burn Mode : " + mode);
cursor.close();
}
return (1 == mode) ? true : false;
}
果然,问题就发生在这里。客制化增加了allowBurnModeBlockKeys(),每个输入事件都要跑这个方法。这个方法里面有大量的文件操作和数据库操作。实际测试,数据库操作耗时占大头。
至此,找到了BUG产生的原因,也定位到了具体的代码段。
调试过程中的一些数据
native层调试interceptKeyBeforeQueueing()方法执行耗时
void InputDispatcher::notifyKey(const NotifyKeyArgs* args) {
... ...
int start = clock();
mPolicy->interceptKeyBeforeQueueing(&event, /*byref*/ policyFlags);
ALOGD("intercept key before queueing execute time:%d", clock() - start);
... ...
if (needWake) {
mLooper->wake();
}
}
interceptKeyBeforeQueueing()方法耗时
理想状态0.06ms-4ms,平均值在0.1ms
01-01 08:34:40.395 1858 2014 D InputDispatcher: intercept key before queueing execute time:68
01-01 08:34:40.397 1858 2014 D InputDispatcher: intercept key before queueing execute time:141
01-01 08:34:40.926 1858 2014 D InputDispatcher: intercept key before queueing execute time:145
01-01 08:34:40.932 1858 2014 D InputDispatcher: intercept key before queueing execute time:4195
01-01 08:34:40.934 1858 2014 D InputDispatcher: intercept key before queueing execute time:135
01-01 08:34:40.934 1858 2014 D InputDispatcher: intercept key before queueing execute time:68
01-01 08:34:40.935 1858 2014 D InputDispatcher: intercept key before queueing execute time:154
01-01 08:34:40.936 1858 2014 D InputDispatcher: intercept key before queueing execute time:108
01-01 08:34:40.937 1858 2014 D InputDispatcher: intercept key before queueing execute time:122
01-01 08:34:40.939 1858 2014 D InputDispatcher: intercept key before queueing execute time:130
问题状态3ms-32ms,平均值5ms(也可能是10ms)
01-01 08:20:54.876 1878 1987 D InputDispatcher: intercept key before queueing execute time:3828
01-01 08:20:55.107 1878 1987 D InputDispatcher: intercept key before queueing execute time:5728
01-01 08:20:55.519 1878 1987 D InputDispatcher: intercept key before queueing execute time:3490
01-01 08:20:55.748 1878 1987 D InputDispatcher: intercept key before queueing execute time:4817
01-01 08:20:56.185 1878 1987 D InputDispatcher: intercept key before queueing execute time:3430
01-01 08:20:56.408 1878 1987 D InputDispatcher: intercept key before queueing execute time:3552
01-01 08:28:38.316 1878 1987 D InputDispatcher: intercept key before queueing execute time:20926
01-01 08:28:38.345 1878 1987 D InputDispatcher: intercept key before queueing execute time:31657
01-01 08:28:38.372 1878 1987 D InputDispatcher: intercept key before queueing execute time:25512
01-01 08:28:38.396 1878 1987 D InputDispatcher: intercept key before queueing execute time:31734
01-01 08:28:38.419 1878 1987 D InputDispatcher: intercept key before queueing execute time:15861
01-01 08:28:41.329 1878 1987 D InputDispatcher: intercept key before queueing execute time:11205
01-01 08:28:41.354 1878 1987 D InputDispatcher: intercept key before queueing execute time:20078
01-01 08:28:41.381 1878 1987 D InputDispatcher: intercept key before queueing execute time:30350
01-01 08:28:41.405 1878 1987 D InputDispatcher: intercept key before queueing execute time:32071
01-01 08:28:41.428 1878 1987 D InputDispatcher: intercept key before queueing execute time:28276
后记
片段逻辑大部分人都能看得懂,也可以修改。
有时候,我们(管理者)看到的只是表面,他实现了这个需求/功能。
论一个管理者/技术负责人的格局。是否只能看到表面?是否能看的更深?是否能看的全面?
美国西海岸一只蝴蝶扇动了下翅膀,论影响有多大?