A problem caused by high concurrency

109 阅读1分钟

In a project that sends money to users after some tasks are done, I used the redis set key val ex seconds nx command to implement a mutex lock to prevent requests which come at the same time from getting money repeatedly.
But there were some money being sent repeatedly as high concurrency came. Why?
The overall code looks like below, here is the Lock function implementing a mutex lock using redis,

func Lock(key string, ttl int) (bool, error) {
    rdb := redis.NewClient(&redis.Options{})
    err := rdb.Do("SET", key, 1, "ex", ttl, "nx").Err()
    if err != nil {
        return false, err 
    }
    return true, nil 
}

and the main logic,

islocked, err := Lock("test", 3)
if !islocked || err!=nil {
    failJson(...)
    return 
}

rdb := redis.NewClient(...)
val := rdb.GetBit(...).Val() 
// 1 means the task has been completed 
if val == 1 {
    successJson("done")
    return
}
pipeline := rdb.Pipeline()
pipeline.SetBit(seq, 1)
pipeline.ExpireAt(key...)
_, err := pipeline.Exec()
if err != nil {
    failJson(...)
    return
}
// send bonus to user 
SendBonus()
...

To reproduce the problem, I added some logs to the important part of logic to see what happend when high concurrency came, and finally found that because of high concurrency, redis had a very high latency of commands, causing the first goroutine that got the lock blocked at GetBit() operation, after 3 seconds, the lock expired, another goroutine of many goroutines also got the lock and blocked at the GetBit() operation, with some probability, these two goroutines got value of 0 after GetBit().Val(), and both called SendBonus() to get a bonus.
Finding the problem is the first and most important step of solving problems, after modifying the code as following, the problem has been resolved.

//large the ttl of key of lock to 10 seconds 
islocked, err := Lock("test", 10)
if !islocked || err!=nil {
    failJson(...)
    return 
}

rdb := redis.NewClient(...)
val := rdb.GetBit(...).Val() 
// 1 means the task has been completed 
if val == 1 {
    successJson("done")
    return
}

pipeline := rdb.Pipeline()
pipeline.SetBit(seq, 1)
pipeline.ExpireAt(key...)
cmds, err := pipeline.Exec()
if err != nil {
    failJson(...)
    return
}
//to check if the bitmap is set at the same offset repeatedly to make sure only one goroutine can call SendBonus()
if len(cmds) > 0 {
    oldVal := cmds[0].Val()
    // 1 means the task has already been completed 
    if oldVal == 1 {
        successJson("done")
        return 
    }
    
}

// send bonus to user 
SendBonus(body)
...

And to improve the reliability, in the consumer process, add a global lock with key which has a very long expiration time by calculating md5(body) to make sure only one body will be consumed when many bodies with same value has been sent to queue.