[解BUG系列] 从扫码枪输入异常看Android Input系统

2,378 阅读10分钟

今天收到一个BUG,扫码枪在某款设备上输入异常,大部分时候只能输入条码前面的几个字符。比如条码SF1008778973288,使用扫码枪只能输入SF10或SF18。

扫码枪是啥?

扫码枪是一种键盘类型的输入设备,通过光学原理读取条码上的图像信息,然后解码得到数字信号,再把数字信号输入设备。扫码枪不同于键盘的一大特点就是在一瞬间输入大量的字符。

这是一个什么问题

首先要确定扫码枪是一个什么类型的设备,Android/Linux系统把扫码枪识别为一个什么类型的设备。通过下面的命令查看设备信息,扫码枪是一个键盘类型的输入设备。

xxx:/ # cat /proc/bus/input/devices                                        
I: Bus=0003 Vendor=0808 Product=0606 Version=0110
N: Name="BLM103Virtual COM Port  "
P: Phys=usb-XXXX-1/input0
S: Sysfs=/devices/XXXX-ehci-2/usb1/1-1/1-1:1.0/0003:0808:0606.0001/input/input1
U: Uniq=
H: Handlers=kbd event1 
B: PROP=0
B: EV=120013
B: KEY=e080ffdf 1cfffff ffffffff fffffffe
B: MSC=10
B: LED=1f

Android系统的输入流程涉及:driver -> IMS -> (PhoneWindowManager) -> APP,一个清晰的分层结构。首先判断问题发生在那一层?

通过getevent查看driver是否抛按键事件上来了

xxxx# getevent
add device 3: /dev/input/event1
  name:     "BLM103Virtual COM Port  "
/dev/input/event1: 0004 0004 00070023
/dev/input/event1: 0001 0007 00000001
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070023
/dev/input/event1: 0001 0007 00000000
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070026
/dev/input/event1: 0001 000a 00000001
/dev/input/event1: 0000 0000 00000000
... ...
/dev/input/event1: 0004 0004 00070023
/dev/input/event1: 0001 0007 00000001
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070023
/dev/input/event1: 0001 0007 00000000
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070028
/dev/input/event1: 0001 001c 00000001
/dev/input/event1: 0000 0000 00000000
/dev/input/event1: 0004 0004 00070028
/dev/input/event1: 0001 001c 00000000
/dev/input/event1: 0000 0000 00000000

driver有抛按键事件上来,并且完整无异常,首先(基本)排除driver。

然后看PhoneWindowManager是否收到正常的按键。通过放开interceptKeyBeforeDispatching()里面的原生打印信息

public long interceptKeyBeforeDispatching(WindowState win, KeyEvent event, int policyFlags) {

    if (DEBUG_INPUT) {
        Log.d(TAG, "interceptKeyTi keyCode=" + keyCode + " down=" + down + " repeatCount="
                + repeatCount + " keyguardOn=" + keyguardOn + " mHomePressed=" + mHomePressed
                + " canceled=" + canceled);
    }
    ... ...
}
01-01 09:06:42.323  1858  2013 D WindowManager: interceptKeyTi keyCode=59 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.324  1858  2013 D WindowManager: interceptKeyTi keyCode=47 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.332  1858  2013 D WindowManager: interceptKeyTi keyCode=47 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.341  1858  2013 D WindowManager: interceptKeyTi keyCode=34 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.349  1858  2013 D WindowManager: interceptKeyTi keyCode=34 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.363  1858  2013 D WindowManager: interceptKeyTi keyCode=59 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.373  1858  2013 D WindowManager: interceptKeyTi keyCode=8 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.385  1858  2013 D WindowManager: interceptKeyTi keyCode=8 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.566  1858  2013 D WindowManager: interceptKeyTi keyCode=15 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.576  1858  2013 D WindowManager: interceptKeyTi keyCode=15 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.583  1858  2013 D WindowManager: interceptKeyTi keyCode=66 down=true repeatCount=0 keyguardOn=false mHomePressed=false canceled=false
01-01 09:06:42.589  1858  2013 D WindowManager: interceptKeyTi keyCode=66 down=false repeatCount=0 keyguardOn=false mHomePressed=false canceled=false

通过日志,可以发现这里只收到了部分按键,也就是说inputflinger(IMS)出了问题。

小节一下:扫码枪是一个键盘类型的输入设备,driver会一次性抛很多按键事件给IMS,inputflinger在处理按键事件的过程中发生了异常,导致按键丢失了,表现出来就是应用里面输入不完整的条码字符。

调试inputflinger

首先放开inputflinger的原生打印信息,看能不能有啥发现

Detected input event buffer overrun for device BLM103Virtual COM Port  

inputflinger果然遇到了异常,字面意思是:这个设备的输入事件buffer overrun。看一下代码发生的位置:

frameworks/native/services/inputflinger/InputReader.cpp

void InputDevice::process(const RawEvent* rawEvents, size_t count) {
    size_t numMappers = mMappers.size();
    for (const RawEvent* rawEvent = rawEvents; count != 0; rawEvent++) {
        if (mDropUntilNextSync) {
            ... ...
        } else if (rawEvent->type == EV_SYN && rawEvent->code == SYN_DROPPED) {
            ALOGI("Detected input event buffer overrun for device %s.", getName().string());
            mDropUntilNextSync = true;
            reset(rawEvent->when);
        } else {
            for (size_t i = 0; i < numMappers; i++) {
                InputMapper* mapper = mMappers[i];
                mapper->process(rawEvent);
            }
        }
        --count;
    }
}

首先要知道一个知识点:type==EV_SYN code==SYN_DROPPED类型的按键事件是driver写入buffer的,这不是一个正常的按键,表示buffer写满了,出现了覆盖的情况,以此来通知上面一层需要特殊处理。

每个输入设备都有一个固定大小的环形buffer,driver负责向里面写入输入事件,inputreader线程负责从buffer读取事件,当读取的速度小于写入的速度时,在输入事件很多的情况下,就可能会出现环形buffer被写满覆盖的情况,这个时候就发生了输入事件丢失。也就是本BUG中,看到的通过扫码枪扫码只能输入部分字符。

当然也可以通过增大设备的环形buffer来解决问题。实际测试有一点效果,可能是buffer调整的不够大。因为其它平台的设备是好的,所以开始怀疑或许这个平台修改太多又不注重质量,把framework给玩坏了。

IMS的inputflinger是没有修改过的,相关的修改就是inputpolicy,特别是PhoneWindowManager的interceptKeyBeforeQueueing/interceptKeyBeforeDispatching两个方法,多人多手修改,简直已经不忍直视。

inputpolicy的修改会影响到inputreader的就是PhoneWindowManager.interceptKeyBeforeQueueing(),如果这个方法中增加了耗时的操作,就会导致inputreader线程预处理事件变慢。

inputpolicy

frameworks/base/services/core/java/com/android/server/policy/PhoneWindowManager.java

public int interceptKeyBeforeQueueing(KeyEvent event, int policyFlags) {
    int keyCode = event.getKeyCode();
    if ((allowBurnModeBlockKeys(event, down) == true)) {
        Log.d(TAG, "XXX=====>>>interceptKeyBeforeQueueing,BurnModeBlockKeys");
        return 0;
    }
    ... ...
}

private boolean allowBurnModeBlockKeys(KeyEvent event, boolean isKeydown) {
    int keyCode = event.getKeyCode();
    if(CustomerManager.getInstance() != null && (CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_A")
        || CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_B")
        || CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_C")
        || CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_D")
        || CustomerManager.getInstance().isSpecialCustomer("CUSTOMER_E")) && isBurnModeOnOff()){
        if(keyCode != MKeyEvent.XXXX_HOT_KEY
            && keyCode != KeyEvent.KEYCODE_BACK
                && keyCode != KeyEvent.XXXX_BURN_MODE
                && keyCode != KeyEvent.KEYCODE_TV_INPUT){
            return true;
        }
    }
    return false;
}

public boolean isBurnModeOnOff(){
    int mode = 0;
    Cursor cursor = mContext.getContentResolver().query(Uri.parse("content://xxxx.tv.factory/factoryextern"),
        null, null, null, null);

    if (null != cursor && 0 != cursor.getCount() && cursor.moveToFirst()) {
    mode = cursor.getInt(cursor.getColumnIndex("m_bAgingMode"));
    Log.w(TAG, "Burn Mode : " + mode);
    cursor.close();
    }
    return (1 == mode) ? true : false;
}

果然,问题就发生在这里。客制化增加了allowBurnModeBlockKeys(),每个输入事件都要跑这个方法。这个方法里面有大量的文件操作和数据库操作。实际测试,数据库操作耗时占大头。

至此,找到了BUG产生的原因,也定位到了具体的代码段。

调试过程中的一些数据

native层调试interceptKeyBeforeQueueing()方法执行耗时

void InputDispatcher::notifyKey(const NotifyKeyArgs* args) {
    ... ...
    int start = clock();
    mPolicy->interceptKeyBeforeQueueing(&event, /*byref*/ policyFlags);
    ALOGD("intercept key before queueing execute time:%d", clock() - start);
    ... ...
    if (needWake) {
        mLooper->wake();
    }
}

interceptKeyBeforeQueueing()方法耗时

理想状态0.06ms-4ms,平均值在0.1ms

01-01 08:34:40.395  1858  2014 D InputDispatcher: intercept key before queueing execute time:68
01-01 08:34:40.397  1858  2014 D InputDispatcher: intercept key before queueing execute time:141
01-01 08:34:40.926  1858  2014 D InputDispatcher: intercept key before queueing execute time:145
01-01 08:34:40.932  1858  2014 D InputDispatcher: intercept key before queueing execute time:4195
01-01 08:34:40.934  1858  2014 D InputDispatcher: intercept key before queueing execute time:135
01-01 08:34:40.934  1858  2014 D InputDispatcher: intercept key before queueing execute time:68
01-01 08:34:40.935  1858  2014 D InputDispatcher: intercept key before queueing execute time:154
01-01 08:34:40.936  1858  2014 D InputDispatcher: intercept key before queueing execute time:108
01-01 08:34:40.937  1858  2014 D InputDispatcher: intercept key before queueing execute time:122
01-01 08:34:40.939  1858  2014 D InputDispatcher: intercept key before queueing execute time:130

问题状态3ms-32ms,平均值5ms(也可能是10ms)

01-01 08:20:54.876  1878  1987 D InputDispatcher: intercept key before queueing execute time:3828
01-01 08:20:55.107  1878  1987 D InputDispatcher: intercept key before queueing execute time:5728
01-01 08:20:55.519  1878  1987 D InputDispatcher: intercept key before queueing execute time:3490
01-01 08:20:55.748  1878  1987 D InputDispatcher: intercept key before queueing execute time:4817
01-01 08:20:56.185  1878  1987 D InputDispatcher: intercept key before queueing execute time:3430
01-01 08:20:56.408  1878  1987 D InputDispatcher: intercept key before queueing execute time:3552
01-01 08:28:38.316  1878  1987 D InputDispatcher: intercept key before queueing execute time:20926
01-01 08:28:38.345  1878  1987 D InputDispatcher: intercept key before queueing execute time:31657
01-01 08:28:38.372  1878  1987 D InputDispatcher: intercept key before queueing execute time:25512
01-01 08:28:38.396  1878  1987 D InputDispatcher: intercept key before queueing execute time:31734
01-01 08:28:38.419  1878  1987 D InputDispatcher: intercept key before queueing execute time:15861
01-01 08:28:41.329  1878  1987 D InputDispatcher: intercept key before queueing execute time:11205
01-01 08:28:41.354  1878  1987 D InputDispatcher: intercept key before queueing execute time:20078
01-01 08:28:41.381  1878  1987 D InputDispatcher: intercept key before queueing execute time:30350
01-01 08:28:41.405  1878  1987 D InputDispatcher: intercept key before queueing execute time:32071
01-01 08:28:41.428  1878  1987 D InputDispatcher: intercept key before queueing execute time:28276

后记

片段逻辑大部分人都能看得懂,也可以修改。

有时候,我们(管理者)看到的只是表面,他实现了这个需求/功能。

论一个管理者/技术负责人的格局。是否只能看到表面?是否能看的更深?是否能看的全面?

美国西海岸一只蝴蝶扇动了下翅膀,论影响有多大?