Redis分布式锁

286 阅读7分钟

Redisson 源码地址:github.com/redisson/re…

本文以Redisson 为例

一. 介绍

1.1 背景

  • 随着业务越来越复杂,单机服务架构已经不能满足现在人们对于互联网服务的需要
  • 由此现在的互联网公司几乎使用的都是分布式架构的服务

1.2 问题

  • 多台机器同时对同样一块业务逻辑进行处理,数据可能因为多个线程并发处理,导致执行顺序不对,最终数据错误
    • 这种并发问题如果在单机架构的服务上,可以通过JDK提供的Synchronized机制、Lock接口下的锁等来解决
  • 但是分布式架构下,机器与机器之间并不能共享内存,JDK提供的锁已经不能满足需求
  • 而Redis分布式锁的产生就是为了解决此类问题

二. 用法

本文以Redisson框架为例

参考Demo如下:

public void example_1() {                                      
    RedissonClient redissonClient = Redisson.create();         
    RLock lock = redissonClient.getLock("test_lock");          

    try {
        lock.lock();//【可替换为其他锁方式】    
        // do something                                        
    } catch (Exception e) {                                    
        e.printStackTrace();                                   
    } finally {                                                
        lock.unLock();                                    
    }                                                          
}                                                              

2.1 设置过期时间

if (!lock.tryLock(2000, 1000, TimeUnit.MILLISECONDS)) {
    System.out.println("Try lock fail, lock has been locked");   
    return;                                                      
} 


// tryLock method
/**
* Tries to acquire the lock with defined <code>leaseTime</code>.
* Waits up to defined <code>waitTime</code> if necessary until the lock became available.
*
* Lock will be released automatically after defined <code>leaseTime</code> interval.
*
* @param waitTime the maximum time to acquire the lock
* @param leaseTime lease time
* @param unit time unit
* @return <code>true</code> if lock is successfully acquired,
*          otherwise <code>false</code> if lock is already set.
* @throws InterruptedException - if the thread is interrupted
*/
boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException;
  • 参数
    • waitTime:尝试获取锁的时间
    • leaseTime:锁自动过期时间
    • unit:单位
  • 设置锁过期时间后,如果业务没有执行完,未主动释放锁。该分布式锁也会过期,这个时候其他线程就可以获取到锁,这个时候可能会产生多个线程对同一资源进行修改的情况,可能会产生错误

2.2 不设置过期时间

if (!lock.tryLock()) {                                           
    System.out.println("Try lock fail, lock has been locked");   
    return;                                                      
}  
  • 不设置锁过期时间的话,该锁会通过 【锁续签机制】 使其一直存在,直到主动释放锁
    • 锁续签机制后面会介绍
  • 如果这个锁在业务代码中,没有显示释放锁,那么由于 【锁续签机制】 ,会导致
    • 其他线程对于该临界资源永远无法操作
    • 在本台机器销毁之前,Redisson会一直续签,从而导致Redis OPS飙升,进而导致Redis服务器CPU飙高,有宕机风险

三. 原理

3.1 原生实现

3.1.1 看个例子
public void test() {
    String key = "myLock";
    String threadId = Thread.currentThread().getId();

    try {
        // 锁1000ms自动释放
        String result = redisHelper.set(key, threadId, "NX", "PX", 1000l);
        if (!"OK".equals(result)) {
            return;
        }
        
        // do something
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        redisHelper.del(key);
    }
}

注意:

2.6.0以上的版本就可以通过lua脚本合并setnx和exprie解决。2.6.12以后set命令增加了EX,PX,NX和XX选项支持了过期时间的设置

\

  • 这个代码看起来似乎没有问题,等于lock.tryLock(0, 1000, TimeUnit.MILLISECONDS)
  • 问题
    • 假设这个线程要执行2000ms
    • 锁时间如示例代码为1000ms
时间(ms)ABC
0线程执行setnxpx
1000锁自动释放
1001拿到锁
2000del(???删我锁?)
2001拿到锁
3001del(???删我锁?)
4001del
3.1.2 加上判断后
public void test() {
    String key = "myLock";
    String threadId = Thread.currentThread().getId();
    
    String result = redisHelper.set(key, threadId, "NX", "PX", 1000l);
    if (!"OK".equals(result)) {
        return;
    }
    try {
        // do something
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if(threadId.equals(redisHelper.get(key))){
            redisHelper.del(key);
        }
    }
}
  • 搞定?NO
  • 问题
    • finally中的1. 判断是否相等 2.del,并非是原子操作,是存在间隔的,依然会导致删除锁的问题
    • 比如:finally中get之后del之前,key过期了,另一个线程获取锁,那么又会将别人的锁删除
3.1.3 最终版本
  • 使用lua脚本,来实现原子unlock操作
public void test() {
    String key = "myLock";
    String threadId = Thread.currentThread().getId();

    String result = redisHelper.set(key, threadId, "NX", "PX", 1000l);
    if (!"OK".equals(result)) {
        return;
    }
    try {
        // do something
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        String luaScript = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end";
        redisHelper.eval(luaScript, Collections.singletonList(key), Collections.singletonList(threadId));
    }
}
3.1.4 总结
  • 总结
    • 如果不是使用Redisson等组件的话,我们需要按照原始方式自己实现分布式锁,相对来说还是比较繁琐的。当然,也可以自己写一个工具类。
  • 建议
    • 还是使用Redisson封装好的分布式锁
    • 并且与此同时,Redisson也提供了锁自动续签的机制,来保障分布式锁的可靠性,避免临界资源未被锁住

3.2 tryLock() & 自动续签

3.2.1 tryLock()主要源码如下
public boolean tryLock() {
    return get(tryLockAsync());
}

public RFuture<Boolean> tryLockAsync() {
    return tryLockAsync(Thread.currentThread().getId());
}

public RFuture<Boolean> tryLockAsync(long threadId) {
    return tryAcquireOnceAsync(-1, -1, null, threadId);
}


// 核心代码部分
private RFuture<Boolean> tryAcquireOnceAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
    RFuture<Boolean> acquiredFuture;
    // 当我们使用tryLock()的时候 waitTime和leaseTime都是-1,则走入else分支
    if (leaseTime > 0) {
        acquiredFuture = tryLockInnerAsync(waitTime, leaseTime, unit, threadId, RedisCommands.EVAL_NULL_BOOLEAN);
    } else {
        //过期时间采用默认的30 * 1000ms
        acquiredFuture = tryLockInnerAsync(waitTime, internalLockLeaseTime,
                                           TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_NULL_BOOLEAN);
    }

    CompletionStage<Boolean> f = acquiredFuture.thenApply(acquired -> {
        // lock acquired
        if (acquired) {
            if (leaseTime > 0) {
                internalLockLeaseTime = unit.toMillis(leaseTime);
            } else {
                // 使用tryLock()的话会走到这个分支,需要【自动续签】
                scheduleExpirationRenewal(threadId);
            }
        }
        return acquired;
    });
    return new CompletableFutureWrapper<>(f);
}



<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, command,
                          "if (redis.call('exists', KEYS[1]) == 0) then " +
                          "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                          "return nil; " +
                          "end; " +
                          "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                          "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                          "return nil; " +
                          "end; " +
                          "return redis.call('pttl', KEYS[1]);",
                          Collections.singletonList(getRawName()), unit.toMillis(leaseTime), getLockName(threadId));
}
// 这段lua脚本的意思
// 如果key不存在,则在key为“testLock”(例如)的hash结构Object中,将lockName(lockName由id + threadId),设置为field,value为1,并设置过期时间
// 如果key存在,则判断当前hash中的field是否为当前线程的lockName,如果是则hincrby并重置过期时间。如果不是则返回剩余时间
3.2.2 自动续签源码如下
  • 接着上面tryLock的源码
protected void scheduleExpirationRenewal(long threadId) {
    ExpirationEntry entry = new ExpirationEntry();
    ExpirationEntry oldEntry = EXPIRATION_RENEWAL_MAP.putIfAbsent(getEntryName(), entry);
    if (oldEntry != null) {
        oldEntry.addThreadId(threadId);
    } else {
        entry.addThreadId(threadId);
        try {
            renewExpiration();
        } finally {
            if (Thread.currentThread().isInterrupted()) {
                cancelExpirationRenewal(threadId);
            }
        }
    }
}



private void renewExpiration() {
    ExpirationEntry ee = EXPIRATION_RENEWAL_MAP.get(getEntryName());
    if (ee == null) {
        return;
    }

    // 统一放到一个定时任务的线程中去执行
    Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
        @Override
        public void run(Timeout timeout) throws Exception {
            ExpirationEntry ent = EXPIRATION_RENEWAL_MAP.get(getEntryName());
            if (ent == null) {
                return;
            }
            Long threadId = ent.getFirstThreadId();
            if (threadId == null) {
                return;
            }

            CompletionStage<Boolean> future = renewExpirationAsync(threadId);
            future.whenComplete((res, e) -> {
                if (e != null) {
                    log.error("Can't update lock " + getRawName() + " expiration", e);
                    EXPIRATION_RENEWAL_MAP.remove(getEntryName());
                    return;
                }

                if (res) {
                    // 如果还持有锁则继续10s之后续签
                    // reschedule itself
                    renewExpiration();
                } else {
                
                    cancelExpirationRenewal(null);
                }
            });
        }
    }, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);
    // internalLockLeaseTime为 30 * 1000ms,这里是每过10s执行一次定时任务

    ee.setTimeout(task);
}

// 重置时间为 internalLockLeaseTime
protected CompletionStage<Boolean> renewExpirationAsync(long threadId) {                   
    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,    
            "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +                    
                    "redis.call('pexpire', KEYS[1], ARGV[1]); " +                          
                    "return 1; " +                                                         
                    "end; " +                                                              
                    "return 0;",                                                           
            Collections.singletonList(getRawName()),                                       
            internalLockLeaseTime, getLockName(threadId));                                 
}                                                                                          

3.3 unLock()

3.3.1 源码如下
@Override                                                                          
public void unlock() {                                                             
    try {                                                                          
        get(unlockAsync(Thread.currentThread().getId()));                          
    } catch (RedisException e) {                                                   
        if (e.getCause() instanceof IllegalMonitorStateException) {                
            throw (IllegalMonitorStateException) e.getCause();                     
        } else {                                                                   
            throw e;                                                               
        }                                                                          
    }                          
}                                                                                  
                                                                                   

@Override                                                                                                                                                                 
public RFuture<Void> unlockAsync(long threadId) {  
    // 通过lua脚本进行解锁
    RFuture<Boolean> future = unlockInnerAsync(threadId);                                                                                                                 
                                                                                                                                                                          
    CompletionStage<Void> f = future.handle((opStatus, e) -> {                                                                                                            
        // 取消自动续签
        cancelExpirationRenewal(threadId);                                                                                                                                
                                                                                                                                                                          
        if (e != null) {                                                                                                                                                  
            throw new CompletionException(e);                                                                                                                             
        }                                                                                                                                                                 
        if (opStatus == null) {                                                                                                                                           
            IllegalMonitorStateException cause = new IllegalMonitorStateException("attempt to unlock lock, not locked by current thread by node id: "                     
                    + id + " thread-id: " + threadId);                                                                                                                    
            throw new CompletionException(cause);                                                                                                                         
        }                                                                                                                                                                 
                                                                                                                                                                          
        return null;                                                                                                                                                      
    });                                                                                                                                                                   
                                                                                                                                                                          
    return new CompletableFutureWrapper<>(f);                                                                                                                             
}


protected RFuture<Boolean> unlockInnerAsync(long threadId) {                                                                          
    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,                                               
            "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " +                                                               
                    "return nil;" +                                                                                                   
                    "end; " +                                                                                                         
                    "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " +                                                 
                    "if (counter > 0) then " +                                                                                        
                    "redis.call('pexpire', KEYS[1], ARGV[2]); " +                                                                     
                    "return 0; " +                                                                                                    
                    "else " +                                                                                                         
                    "redis.call('del', KEYS[1]); " +                                                                                  
                    "redis.call('publish', KEYS[2], ARGV[1]); " +                                                                     
                    "return 1; " +                                                                                                    
                    "end; " +                                                                                                         
                    "return nil;",                                                                                                    
            Arrays.asList(getRawName(), getChannelName()), LockPubSub.UNLOCK_MESSAGE, internalLockLeaseTime, getLockName(threadId));  
}   

// Redisson的unlcok这里入参threadId为null
protected void cancelExpirationRenewal(Long threadId) {
    
    ExpirationEntry task = EXPIRATION_RENEWAL_MAP.get(getEntryName());         
    if (task == null) {     
        // 如果续签队列中已经没有该锁对应的entry则不处理
        return;                                                                
    }                                                                          
                                                                               
    if (threadId != null) {                                                    
        task.removeThreadId(threadId);                                         
    }                                                                          

    // 走到这个分支
    if (threadId == null || task.hasNoThreads()) {                             
        Timeout timeout = task.getTimeout();                                   
        if (timeout != null) {   
            // 如果有定时任务,则cancel
            timeout.cancel();                                                  
        }    
        // 并且从续签队列中删除(后序将不会再进行续签)
        EXPIRATION_RENEWAL_MAP.remove(getEntryName());                         
    }                                                                          
}                                                                              

四. 踩坑

4.1 不解锁

  • 采用tryLock()却不解锁,这会导致一直在续签
    • Redis的OPS、CPU等指标都会飙升,并且呈增量式上涨
      • 由于是增量式上涨,如果将机器拉出,不再有流量进入,当前已经lock的锁,将会一直保持续签状态
      • 想要解决该问题,必须要重启机器,使得已经在续签的定时任务全部销毁

参考:

  1. juejin.cn/post/693301…