Curator - 分布式锁的实现原理 & 如何使用

464 阅读8分钟

Curator分布式锁 - 基本使用

我的代码全部来自于 github.com/apache/cura…

这个是官方提供的例子, 我觉得挺好的, 人家代码写的也不错.

我们先定义一个同步的资源. 也就是这个过程它必须是线程同步的, 不然一定会出现异常. 所以这里使用了一个cas操作 , 如果失败则异常.

public class FakeLimitedResource {
    private final AtomicBoolean inUse = new AtomicBoolean(false);

    public void use() throws InterruptedException {
        // in a real application this would be accessing/manipulating a shared resource
        if (!inUse.compareAndSet(false, true)) {
            throw new IllegalStateException("Needs to be used by one client at a time");
        }

        try {
            // 模拟真实操作时长.
            Thread.sleep((long) (100 * Math.random()));
        } finally {
            // 最后我们重置状态量.
            inUse.set(false);
        }
    }
}

我们的锁.

public class ExampleClientThatLocks {
    private final InterProcessMutex lock;
    private final FakeLimitedResource resource;
    private final String clientName;

    // 创建这个排它锁.
    public ExampleClientThatLocks(CuratorFramework client, String lockPath, FakeLimitedResource resource, String clientName) {
        this.resource = resource;
        this.clientName = clientName;
        // 这就是 curator提供的 排它锁. 
        lock = new InterProcessMutex(client, lockPath);
    }

    public void doWork(long time, TimeUnit unit) throws Exception {
        // 这里就是去获取锁, curator的排它锁必须设置超时时间. 根据业务需求设置.
        if (!lock.acquire(time, unit)) {
            // 超时抛出异常.
            throw new IllegalStateException(clientName + " could not acquire the lock");
        }
        try {
            // 使用资源.
            System.out.println(clientName + " has the lock");
            resource.use();
        } finally {
            // 释放锁.
            System.out.println(clientName + " releasing the lock");
            lock.release(); // always release the lock in a finally block
        }
    }
}

主程序

public class LockingExample {
    private static final int QTY = 5;
    private static final int REPETITIONS = QTY * 10;

    private static final String PATH = "/examples/locks";

    public static void main(String[] args) throws Exception {
        // all of the useful sample code is in ExampleClientThatLocks.java

        // FakeLimitedResource simulates some external resource that can only be access by one process at a time
        // 我们要求同步的资源.
        final FakeLimitedResource resource = new FakeLimitedResource();

        // 多线程并发操作.
        ExecutorService service = Executors.newFixedThreadPool(QTY);
        // 这个是一个zk的测试服务器. 我没有使用.
//        final TestingServer server = new TestingServer();
        try {
            for (int i = 0; i < QTY; ++i) {
                final int index = i;
                Callable<Void> task = new Callable<Void>() {
                    @Override
                    public Void call() throws Exception {
                        // 过程很简单 . 就是创建客户端.
                        CuratorFramework client = CuratorFrameworkFactory.newClient("192.168.58.131:2181", new ExponentialBackoffRetry(1000, 3));
                        try {
                            // 启动
                            client.start();

                            // 我们去创建锁.
                            ExampleClientThatLocks example = new ExampleClientThatLocks(client, PATH, resource, "Client " + index);
                            // 然后再执行业务逻辑. (分布式锁)
                            for (int j = 0; j < REPETITIONS; ++j) {
                                example.doWork(10, TimeUnit.MINUTES);
                            }
                        } catch (InterruptedException e) {
                            Thread.currentThread().interrupt();
                        } catch (Exception e) {
                            e.printStackTrace();
                            // log or do something
                        } finally {
                            // 最后记得释放客户端
                            CloseableUtils.closeQuietly(client);
                        }
                        return null;
                    }
                };
                // 提交task.
                service.submit(task);
            }

            // 优雅关闭
            service.shutdown();
            service.awaitTermination(10, TimeUnit.MINUTES);
        } finally {
//            CloseableUtils.closeQuietly(server);
        }
    }
}

打印一下信息

Client 0 has the lock
Client 0 releasing the lock
Client 3 has the lock
Client 3 releasing the lock
Client 4 has the lock
Client 4 releasing the lock
Client 1 has the lock
Client 1 releasing the lock
Client 2 has the lock
Client 2 releasing the lock

Curator实现分布式锁的原理

lock.acquire(time, unit)

lock.acquire(time, unit) 如下 :

public boolean acquire(long time, TimeUnit unit) throws Exception
{
    return internalLock(time, unit);
}

internalLock(time, unit); :

private boolean internalLock(long time, TimeUnit unit) throws Exception
{
    /*
       Note on concurrency: a given lockData instance
       can be only acted on by a single thread so locking isn't necessary
    */

    Thread currentThread = Thread.currentThread();

	// 可重入的前提建立在单线程上. 他主要是给当前线程的状态量+1 , 先会判断当前的数据是不是空.
    LockData lockData = threadData.get(currentThread);
    if ( lockData != null )
    {
        // re-entering
        lockData.lockCount.incrementAndGet();
        return true;
    }

	// 空, 就去执行lock.
    String lockPath = internals.attemptLock(time, unit, getLockNodeBytes());
    if ( lockPath != null )
    {
    // 这里就是记录一下状态 , 实现可重入. 同时记录当前节点信息.
        LockData newLockData = new LockData(currentThread, lockPath);
        threadData.put(currentThread, newLockData);
        return true;
    }

    return false;
}

internals.attemptLock(time, unit, getLockNodeBytes());

String attemptLock(long time, TimeUnit unit, byte[] lockNodeBytes) throws Exception
{
// 一些定义的变量  . 基本文字就可以看懂啥意思. 
    final long      startMillis = System.currentTimeMillis();
    final Long      millisToWait = (unit != null) ? unit.toMillis(time) : null;
    final byte[]    localLockNodeBytes = (revocable.get() != null) ? new byte[0] : lockNodeBytes;
    int             retryCount = 0;

    String          ourPath = null;
    boolean         hasTheLock = false;
    boolean         isDone = false;
    while ( !isDone )
    {
        isDone = true;

        try
        {
        // 在这里其实是创建一个子 节点. zk创建 EPHEMERAL_SEQUENTIAL节点, 本身就是不用考虑并发的.
            ourPath = driver.createsTheLock(client, path, localLockNodeBytes);
            // 这里是真正的业务逻辑.
            hasTheLock = internalLockLoop(startMillis, millisToWait, ourPath);
        }
        catch ( KeeperException.NoNodeException e )
        {
            // gets thrown by StandardLockInternalsDriver when it can't find the lock node
            // this can happen when the session expires, etc. So, if the retry allows, just try it all again
            if ( client.getZookeeperClient().getRetryPolicy().allowRetry(retryCount++, System.currentTimeMillis() - startMillis, RetryLoop.getDefaultRetrySleeper()) )
            {
                isDone = false;
            }
            else
            {
                throw e;
            }
        }
    }

// 如果拿到锁, 就return了.
    if ( hasTheLock )
    {
        return ourPath;
    }

    return null;
}

org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver#createsTheLock

@Override
public String createsTheLock(CuratorFramework client, String path, byte[] lockNodeBytes) throws Exception
{
// 这个过程很简单 , 就是一个创建一个 EPHEMERAL_SEQUENTIAL节点 . 然后创建就好了. 
    String ourPath;
    if ( lockNodeBytes != null )
    {
        ourPath = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path, lockNodeBytes);
    }
    else
    {
        ourPath = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path);
    }
    return ourPath;
}

其次就是第二步 hasTheLock = internalLockLoop(startMillis, millisToWait, ourPath); (核心步骤)

private boolean internalLockLoop(long startMillis, Long millisToWait, String ourPath) throws Exception
{
// 状态
    boolean     haveTheLock = false;
    boolean     doDelete = false;
    try
    {
        if (revocable.get() != null )
        {
            client.getData().usingWatcher(revocableWatcher).forPath(ourPath);
        }

// 如果当前的状态是启动成功的话. 同时也没有拥有锁. 这是一个循环. 
        while ( (client.getState() == CuratorFrameworkState.STARTED) && !haveTheLock )
        {
            List<String>        children = getSortedChildren();
            String              sequenceNodeName = ourPath.substring(basePath.length() + 1); // +1 to include the slash

// driver其实就是一个handler . 具体看下面解释. 就是判断是否拿到锁了, 同时返回一个前置节点.
            PredicateResults    predicateResults = driver.getsTheLock(client, children, sequenceNodeName, maxLeases);
            if ( predicateResults.getsTheLock() )
            {
                haveTheLock = true;
            }
            else
            {
            // 这里就是前置节点.
                String  previousSequencePath = basePath + "/" + predicateResults.getPathToWatch();

                synchronized(this)
                {
                    try 
                    {
                        // use getData() instead of exists() to avoid leaving unneeded watchers which is a type of resource leak
                        // 获取前置节点.同时监听此节点. 
                        // 好处是, 如果前置节点删除/或者修改数据, 此时可以通知 notify,停止wait. 
                        // 为啥要用 getData 
                        // 使用getData()而不是exist()以避免留下不必要的观察者,这是一种资源泄漏
                        // 其实这里有问题的, 如果我们在监听此节点,如果此节点被删除了会怎么办呢 ?  并发下一定会出现这种情况. 我们拿到前置节点的瞬间,前置节点已经被释放锁, 被删除了.所以后面这个catch啥也没做,继续重试.
                        // watcher的目的就是为了被notify. 如果前置节点改动了, 我一定会收到信息, 此时notify就可以了.
                        client.getData().usingWatcher(watcher).forPath(previousSequencePath);
                        // 有一个超时判断, 超时则删除当前节点.
                        if ( millisToWait != null )
                        {
                            millisToWait -= (System.currentTimeMillis() - startMillis);
                            startMillis = System.currentTimeMillis();
                            if ( millisToWait <= 0 )
                            {
                                doDelete = true;    // timed out - delete our node
                                break;
                            }

// 否则则 wait超时时间.
                            wait(millisToWait);
                        }
                        else
                        {
                        // 没有指定超时时间, 就是永久的等待.
                            wait();
                        }
                    }
                    catch ( KeeperException.NoNodeException e ) 
                    {
                        // it has been deleted (i.e. lock released). Try to acquire again
                    }
                }
            }
        }
    }
    catch ( Exception e )
    {
        ThreadUtils.checkInterrupted(e);
        doDelete = true;
        throw e;
    }
    finally
    {
    // 最后, 如果需要删除(这里显然是超时的话会触发这个操作.). 则删除节点.
        if ( doDelete )
        {
            deleteOurPath(ourPath);
        }
    }
    return haveTheLock;
}

getSortedChildren 其实是为了 实现公平性. 也不能说公平, 但实际上是很公平的, 但不是绝对的公平, 因为看zk-server端创建节点如何实现的.

我们可以看看我们的lock节点 . 以为系统创建顺序节点, 这个是zk 原生提供的.

_c_258cf713-62d2-45bd-8967-963eac169d4a-lock-0000000188
_c_4d17dd48-1e07-4434-94a4-d3412eba1d47-lock-0000000187
_c_b3f9ab7c-e863-40bf-aa2c-c6aa16700e73-lock-0000000185
_c_b7d9b167-063a-401f-a7f7-9d9399c31296-lock-0000000184
_c_e6abf5a9-f593-429f-b6e7-7d2563680ff0-lock-0000000186
public static List<String> getSortedChildren(CuratorFramework client, String basePath, final String lockName, final LockInternalsSorter sorter) throws Exception
{
    try
    {
    // 获取子节点.
        List<String> children = client.getChildren().forPath(basePath);
        List<String> sortedList = Lists.newArrayList(children);
        Collections.sort
        (
            sortedList,
            new Comparator<String>()
            {
                @Override
                public int compare(String lhs, String rhs)
                {
                    return sorter.fixForSorting(lhs, lockName).compareTo(sorter.fixForSorting(rhs, lockName));
                }
            }
        );
        return sortedList;
    }
    catch ( KeeperException.NoNodeException ignore )
    {
        return Collections.emptyList();
    }
}

driver.getsTheLock(client, children, sequenceNodeName, maxLeases); 核心 , 重要

@Override
public PredicateResults getsTheLock(CuratorFramework client, List<String> children, String sequenceNodeName, int maxLeases) throws Exception
{
// 1.拿到一个子节点. 比如[0,1,2,3,4] , 当前节点是1 , 那么返回的index则是1 , 此时
    int             ourIndex = children.indexOf(sequenceNodeName);
    validateOurIndex(sequenceNodeName, ourIndex);

// 2.去比较.最多可以获取几个, 这里默认是maxLeases=1(因为排它锁么) , 所以  1<1 false. 则返回false.
    boolean         getsTheLock = ourIndex < maxLeases;
// 3.获取失败, 然后监听它的前一个节点.(后续解释为什么.)
    String          pathToWatch = getsTheLock ? null : children.get(ourIndex - maxLeases);

    return new PredicateResults(pathToWatch, getsTheLock);
}

总结

所以 zk实现可重入的机制其实是 和AQS 差不多,

利用zk的EPHEMERAL_SEQUENTIAL 节点, 提供了AQS默认的队列机制. 线程安全的.

再其次, 获取锁的操作,是执行 查找子节点 (排序一下 , 根据EPHEMERAL_SEQUENTIAL的顺序进行排列) , 此时会根据当前节点的位置,做判断, 是否是拥有的节点. 如果不是则继续. 如果是则代表当前节点可以拥有锁 , 否则继续重复操作.

这里涉及到一个并发问题 : 第一点 , 比如我拿到的我当前位置, 比如此时获取的子节点顺序为 [0,1,2,3,4,5] , 我此时是索引1的位置, 此时我不可以拿到锁, 但是就在我比较的过程中, 此时前置节点已经释放锁了. 也就是删除了. 那么此时我就是第一个, 但是我当前的状态趋势未拥有锁. 所以我会去监听我的前置节点 , 也就是索引为0的节点. 此时监听失败. 我会继续重复一开始的操作. 这时候判断, 我是第一个位置. 哈哈哈. 拿到锁了.

所以基本就是个这 循环往复的过程.

lock.release();

我们看看 释放锁的过程 : lock.release();

public void release() throws Exception
{
    /*
        Note on concurrency: a given lockData instance
        can be only acted on by a single thread so locking isn't necessary
     */

    Thread currentThread = Thread.currentThread();
    LockData lockData = threadData.get(currentThread);
    // 如果此时释放锁, 此时拿到线程却未拥有, 则抛出异常,所以获取锁和释放锁必须在同一个线程内执行.
    if ( lockData == null )
    {
        throw new IllegalMonitorStateException("You do not own the lock: " + basePath);
    }

// 获取当前的 状态.  然后减一. 如果不为0 , 还需要释放. 其实就是一个可重入锁. 但是我觉得这里没必要用cas, 本来就是单线程. 哈哈哈. 不懂为啥这里还要用. 
    int newLockCount = lockData.lockCount.decrementAndGet();
    if ( newLockCount > 0 )
    {
        return;
    }
    if ( newLockCount < 0 )
    {
        throw new IllegalMonitorStateException("Lock count has gone negative for lock: " + basePath);
    }
    try
    {
    // 最后如果到0了 , 完全释放掉了, 再执行释放节点. 这里主要做的是释放资源.然后删除当前节点
        internals.releaseLock(lockData.lockPath);
    }
    finally
    {
    // 释放map资源, 防止内存泄漏.
        threadData.remove(currentThread);
    }
}