
Curator分布式锁 - 基本使用
我的代码全部来自于 github.com/apache/cura…
这个是官方提供的例子, 我觉得挺好的, 人家代码写的也不错.
我们先定义一个同步的资源. 也就是这个过程它必须是线程同步的, 不然一定会出现异常. 所以这里使用了一个cas操作 , 如果失败则异常.
public class FakeLimitedResource {
private final AtomicBoolean inUse = new AtomicBoolean(false);
public void use() throws InterruptedException {
// in a real application this would be accessing/manipulating a shared resource
if (!inUse.compareAndSet(false, true)) {
throw new IllegalStateException("Needs to be used by one client at a time");
}
try {
// 模拟真实操作时长.
Thread.sleep((long) (100 * Math.random()));
} finally {
// 最后我们重置状态量.
inUse.set(false);
}
}
}
我们的锁.
public class ExampleClientThatLocks {
private final InterProcessMutex lock;
private final FakeLimitedResource resource;
private final String clientName;
// 创建这个排它锁.
public ExampleClientThatLocks(CuratorFramework client, String lockPath, FakeLimitedResource resource, String clientName) {
this.resource = resource;
this.clientName = clientName;
// 这就是 curator提供的 排它锁.
lock = new InterProcessMutex(client, lockPath);
}
public void doWork(long time, TimeUnit unit) throws Exception {
// 这里就是去获取锁, curator的排它锁必须设置超时时间. 根据业务需求设置.
if (!lock.acquire(time, unit)) {
// 超时抛出异常.
throw new IllegalStateException(clientName + " could not acquire the lock");
}
try {
// 使用资源.
System.out.println(clientName + " has the lock");
resource.use();
} finally {
// 释放锁.
System.out.println(clientName + " releasing the lock");
lock.release(); // always release the lock in a finally block
}
}
}
主程序
public class LockingExample {
private static final int QTY = 5;
private static final int REPETITIONS = QTY * 10;
private static final String PATH = "/examples/locks";
public static void main(String[] args) throws Exception {
// all of the useful sample code is in ExampleClientThatLocks.java
// FakeLimitedResource simulates some external resource that can only be access by one process at a time
// 我们要求同步的资源.
final FakeLimitedResource resource = new FakeLimitedResource();
// 多线程并发操作.
ExecutorService service = Executors.newFixedThreadPool(QTY);
// 这个是一个zk的测试服务器. 我没有使用.
// final TestingServer server = new TestingServer();
try {
for (int i = 0; i < QTY; ++i) {
final int index = i;
Callable<Void> task = new Callable<Void>() {
@Override
public Void call() throws Exception {
// 过程很简单 . 就是创建客户端.
CuratorFramework client = CuratorFrameworkFactory.newClient("192.168.58.131:2181", new ExponentialBackoffRetry(1000, 3));
try {
// 启动
client.start();
// 我们去创建锁.
ExampleClientThatLocks example = new ExampleClientThatLocks(client, PATH, resource, "Client " + index);
// 然后再执行业务逻辑. (分布式锁)
for (int j = 0; j < REPETITIONS; ++j) {
example.doWork(10, TimeUnit.MINUTES);
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (Exception e) {
e.printStackTrace();
// log or do something
} finally {
// 最后记得释放客户端
CloseableUtils.closeQuietly(client);
}
return null;
}
};
// 提交task.
service.submit(task);
}
// 优雅关闭
service.shutdown();
service.awaitTermination(10, TimeUnit.MINUTES);
} finally {
// CloseableUtils.closeQuietly(server);
}
}
}
打印一下信息
Client 0 has the lock
Client 0 releasing the lock
Client 3 has the lock
Client 3 releasing the lock
Client 4 has the lock
Client 4 releasing the lock
Client 1 has the lock
Client 1 releasing the lock
Client 2 has the lock
Client 2 releasing the lock
Curator实现分布式锁的原理
lock.acquire(time, unit)
lock.acquire(time, unit) 如下 :
public boolean acquire(long time, TimeUnit unit) throws Exception
{
return internalLock(time, unit);
}
internalLock(time, unit); :
private boolean internalLock(long time, TimeUnit unit) throws Exception
{
/*
Note on concurrency: a given lockData instance
can be only acted on by a single thread so locking isn't necessary
*/
Thread currentThread = Thread.currentThread();
// 可重入的前提建立在单线程上. 他主要是给当前线程的状态量+1 , 先会判断当前的数据是不是空.
LockData lockData = threadData.get(currentThread);
if ( lockData != null )
{
// re-entering
lockData.lockCount.incrementAndGet();
return true;
}
// 空, 就去执行lock.
String lockPath = internals.attemptLock(time, unit, getLockNodeBytes());
if ( lockPath != null )
{
// 这里就是记录一下状态 , 实现可重入. 同时记录当前节点信息.
LockData newLockData = new LockData(currentThread, lockPath);
threadData.put(currentThread, newLockData);
return true;
}
return false;
}
internals.attemptLock(time, unit, getLockNodeBytes());
String attemptLock(long time, TimeUnit unit, byte[] lockNodeBytes) throws Exception
{
// 一些定义的变量 . 基本文字就可以看懂啥意思.
final long startMillis = System.currentTimeMillis();
final Long millisToWait = (unit != null) ? unit.toMillis(time) : null;
final byte[] localLockNodeBytes = (revocable.get() != null) ? new byte[0] : lockNodeBytes;
int retryCount = 0;
String ourPath = null;
boolean hasTheLock = false;
boolean isDone = false;
while ( !isDone )
{
isDone = true;
try
{
// 在这里其实是创建一个子 节点. zk创建 EPHEMERAL_SEQUENTIAL节点, 本身就是不用考虑并发的.
ourPath = driver.createsTheLock(client, path, localLockNodeBytes);
// 这里是真正的业务逻辑.
hasTheLock = internalLockLoop(startMillis, millisToWait, ourPath);
}
catch ( KeeperException.NoNodeException e )
{
// gets thrown by StandardLockInternalsDriver when it can't find the lock node
// this can happen when the session expires, etc. So, if the retry allows, just try it all again
if ( client.getZookeeperClient().getRetryPolicy().allowRetry(retryCount++, System.currentTimeMillis() - startMillis, RetryLoop.getDefaultRetrySleeper()) )
{
isDone = false;
}
else
{
throw e;
}
}
}
// 如果拿到锁, 就return了.
if ( hasTheLock )
{
return ourPath;
}
return null;
}
org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver#createsTheLock
@Override
public String createsTheLock(CuratorFramework client, String path, byte[] lockNodeBytes) throws Exception
{
// 这个过程很简单 , 就是一个创建一个 EPHEMERAL_SEQUENTIAL节点 . 然后创建就好了.
String ourPath;
if ( lockNodeBytes != null )
{
ourPath = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path, lockNodeBytes);
}
else
{
ourPath = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path);
}
return ourPath;
}
其次就是第二步 hasTheLock = internalLockLoop(startMillis, millisToWait, ourPath); (核心步骤)
private boolean internalLockLoop(long startMillis, Long millisToWait, String ourPath) throws Exception
{
// 状态
boolean haveTheLock = false;
boolean doDelete = false;
try
{
if (revocable.get() != null )
{
client.getData().usingWatcher(revocableWatcher).forPath(ourPath);
}
// 如果当前的状态是启动成功的话. 同时也没有拥有锁. 这是一个循环.
while ( (client.getState() == CuratorFrameworkState.STARTED) && !haveTheLock )
{
List<String> children = getSortedChildren();
String sequenceNodeName = ourPath.substring(basePath.length() + 1); // +1 to include the slash
// driver其实就是一个handler . 具体看下面解释. 就是判断是否拿到锁了, 同时返回一个前置节点.
PredicateResults predicateResults = driver.getsTheLock(client, children, sequenceNodeName, maxLeases);
if ( predicateResults.getsTheLock() )
{
haveTheLock = true;
}
else
{
// 这里就是前置节点.
String previousSequencePath = basePath + "/" + predicateResults.getPathToWatch();
synchronized(this)
{
try
{
// use getData() instead of exists() to avoid leaving unneeded watchers which is a type of resource leak
// 获取前置节点.同时监听此节点.
// 好处是, 如果前置节点删除/或者修改数据, 此时可以通知 notify,停止wait.
// 为啥要用 getData
// 使用getData()而不是exist()以避免留下不必要的观察者,这是一种资源泄漏
// 其实这里有问题的, 如果我们在监听此节点,如果此节点被删除了会怎么办呢 ? 并发下一定会出现这种情况. 我们拿到前置节点的瞬间,前置节点已经被释放锁, 被删除了.所以后面这个catch啥也没做,继续重试.
// watcher的目的就是为了被notify. 如果前置节点改动了, 我一定会收到信息, 此时notify就可以了.
client.getData().usingWatcher(watcher).forPath(previousSequencePath);
// 有一个超时判断, 超时则删除当前节点.
if ( millisToWait != null )
{
millisToWait -= (System.currentTimeMillis() - startMillis);
startMillis = System.currentTimeMillis();
if ( millisToWait <= 0 )
{
doDelete = true; // timed out - delete our node
break;
}
// 否则则 wait超时时间.
wait(millisToWait);
}
else
{
// 没有指定超时时间, 就是永久的等待.
wait();
}
}
catch ( KeeperException.NoNodeException e )
{
// it has been deleted (i.e. lock released). Try to acquire again
}
}
}
}
}
catch ( Exception e )
{
ThreadUtils.checkInterrupted(e);
doDelete = true;
throw e;
}
finally
{
// 最后, 如果需要删除(这里显然是超时的话会触发这个操作.). 则删除节点.
if ( doDelete )
{
deleteOurPath(ourPath);
}
}
return haveTheLock;
}
getSortedChildren 其实是为了 实现公平性. 也不能说公平, 但实际上是很公平的, 但不是绝对的公平, 因为看zk-server端创建节点如何实现的.
我们可以看看我们的lock节点 . 以为系统创建顺序节点, 这个是zk 原生提供的.
_c_258cf713-62d2-45bd-8967-963eac169d4a-lock-0000000188
_c_4d17dd48-1e07-4434-94a4-d3412eba1d47-lock-0000000187
_c_b3f9ab7c-e863-40bf-aa2c-c6aa16700e73-lock-0000000185
_c_b7d9b167-063a-401f-a7f7-9d9399c31296-lock-0000000184
_c_e6abf5a9-f593-429f-b6e7-7d2563680ff0-lock-0000000186
public static List<String> getSortedChildren(CuratorFramework client, String basePath, final String lockName, final LockInternalsSorter sorter) throws Exception
{
try
{
// 获取子节点.
List<String> children = client.getChildren().forPath(basePath);
List<String> sortedList = Lists.newArrayList(children);
Collections.sort
(
sortedList,
new Comparator<String>()
{
@Override
public int compare(String lhs, String rhs)
{
return sorter.fixForSorting(lhs, lockName).compareTo(sorter.fixForSorting(rhs, lockName));
}
}
);
return sortedList;
}
catch ( KeeperException.NoNodeException ignore )
{
return Collections.emptyList();
}
}
driver.getsTheLock(client, children, sequenceNodeName, maxLeases); 核心 , 重要
@Override
public PredicateResults getsTheLock(CuratorFramework client, List<String> children, String sequenceNodeName, int maxLeases) throws Exception
{
// 1.拿到一个子节点. 比如[0,1,2,3,4] , 当前节点是1 , 那么返回的index则是1 , 此时
int ourIndex = children.indexOf(sequenceNodeName);
validateOurIndex(sequenceNodeName, ourIndex);
// 2.去比较.最多可以获取几个, 这里默认是maxLeases=1(因为排它锁么) , 所以 1<1 false. 则返回false.
boolean getsTheLock = ourIndex < maxLeases;
// 3.获取失败, 然后监听它的前一个节点.(后续解释为什么.)
String pathToWatch = getsTheLock ? null : children.get(ourIndex - maxLeases);
return new PredicateResults(pathToWatch, getsTheLock);
}
总结
所以 zk实现可重入的机制其实是 和AQS 差不多,
利用zk的EPHEMERAL_SEQUENTIAL 节点, 提供了AQS默认的队列机制. 线程安全的.
再其次, 获取锁的操作,是执行 查找子节点 (排序一下 , 根据EPHEMERAL_SEQUENTIAL的顺序进行排列) , 此时会根据当前节点的位置,做判断, 是否是拥有的节点. 如果不是则继续. 如果是则代表当前节点可以拥有锁 , 否则继续重复操作.
这里涉及到一个并发问题 : 第一点 , 比如我拿到的我当前位置, 比如此时获取的子节点顺序为 [0,1,2,3,4,5] , 我此时是索引1的位置, 此时我不可以拿到锁, 但是就在我比较的过程中, 此时前置节点已经释放锁了. 也就是删除了. 那么此时我就是第一个, 但是我当前的状态趋势未拥有锁. 所以我会去监听我的前置节点 , 也就是索引为0的节点. 此时监听失败. 我会继续重复一开始的操作. 这时候判断, 我是第一个位置. 哈哈哈. 拿到锁了.
所以基本就是个这 循环往复的过程.
lock.release();
我们看看 释放锁的过程 : lock.release();
public void release() throws Exception
{
/*
Note on concurrency: a given lockData instance
can be only acted on by a single thread so locking isn't necessary
*/
Thread currentThread = Thread.currentThread();
LockData lockData = threadData.get(currentThread);
// 如果此时释放锁, 此时拿到线程却未拥有, 则抛出异常,所以获取锁和释放锁必须在同一个线程内执行.
if ( lockData == null )
{
throw new IllegalMonitorStateException("You do not own the lock: " + basePath);
}
// 获取当前的 状态. 然后减一. 如果不为0 , 还需要释放. 其实就是一个可重入锁. 但是我觉得这里没必要用cas, 本来就是单线程. 哈哈哈. 不懂为啥这里还要用.
int newLockCount = lockData.lockCount.decrementAndGet();
if ( newLockCount > 0 )
{
return;
}
if ( newLockCount < 0 )
{
throw new IllegalMonitorStateException("Lock count has gone negative for lock: " + basePath);
}
try
{
// 最后如果到0了 , 完全释放掉了, 再执行释放节点. 这里主要做的是释放资源.然后删除当前节点
internals.releaseLock(lockData.lockPath);
}
finally
{
// 释放map资源, 防止内存泄漏.
threadData.remove(currentThread);
}
}