Apache Sentry，看这里就够了！（四）知道了Sentry的数据库设计方案，如果让你来设计校验逻辑，你会怎么处理

知道了Sentry的数据库设计方案，如果让你来设计校验逻辑，你会怎么处理？

注意哦，这可不是常见的RBAC模型。

SQL权限校验逻辑揭秘

校验逻辑，不论是Apache Sentry还是Apache Ranger都是一样的，都是要解决同样的问题。

如果让你来设计校验逻辑，你会怎么处理？

我们传统的权限管控，当用户发起操作后，系统会把需要校验的权限（用户-资源-操作）与数据库中存储的做对比，抽象点来说就是信息的equals检测。

但是SQL的校验要复杂一些。

SQL操作类型很多，包含DDL、DML和Query等类型

SQL内容很多，包含多个库、表、列甚至函数

SQL的权限层级很复杂，包括服务、库、表、链接等，而且他们之间不是平等的关系，会有包含和交叉

SQL的操作人判别很多，可以是用户、角色和组

...

所以，SQL的校验逻辑，更多的是集合与另一个集合的contains关系。

是不是很抽象？那就说点简单的，

SQL校验时，首先会将SQL完全解析出来，因为有这个前提，所以SQL的校验逻辑一般都是放在引擎SQL解析的逻辑后面，保证解析是完备的。SQL会被解析成两个list，一个是input list，一个是output list，用以表明这个SQL将对哪些实体进行读的操作，哪些实体进行写的操作。

将读写拆分后，根据当前用户提交这条SQL所在的session（建立的连接）使用的用户名，就知晓是使用的角色还是用户本身了，通过这个特性从权限存储的数据库里获取该用户的所有有权限的信息集合。Apache Sentry为了加快速度，甚至会在client端进行缓存，以免每次校验都从Server端重新拉取权限信息。最后校验的时候，拿解析出来的input/output list与全量权限数据进行对比确认，直到全部确认通过后，才认为是校验通过。

Apache Sentry的校验源码解析

上面这一段看起来是不是很简单？接下来我们简要说一说源码。

就从Sentry在HiveServer2上的校验说吧。

Sentry应用于Hive上，主要是要配置一个参数：

hive.server2.session.hook

由这个参数可以看到，Sentry的校验是在HiveAuthzBindingHook中处理的，利用HiveHook的方式，在Hive的权限校验执行前进行SQL权限校验。这里的想法与Apache Ranger挺不同的，Ranger是覆盖了Hive自己的权限校验逻辑和步骤，但是对于其主要执行流程没有做自定义的“断流”。从这一点上，我觉得Ranger更优雅些。

以下代码来自Apache Sentry的github，branch-1.5.1（部分代码隐藏了，细节自己看源码去）

从方法的注释就能看出，这里就是校验的主要方法了。我在一些关键步骤上加上注释，方便阅读

/**
 * Convert the input/output entities into authorizables. generate
 * authorizables for cases like Database and metadata operations where the
 * compiler doesn't capture entities. invoke the hive binding to validate
 * permissions
 *
 * @param context
 * @param stmtAuthObject
 * @param stmtOperation
 * @throws AuthorizationException
 */
private void authorizeWithHiveBindings(HiveSemanticAnalyzerHookContext context,
    HiveAuthzPrivileges stmtAuthObject, HiveOperation stmtOperation) throws  AuthorizationException {
  // 就是我们上文提到的input list，通过SQL解析而来的
  Set<ReadEntity> inputs = context.getInputs();
  // 就是我们上文提到的output list，通过SQL解析而来的
  Set<WriteEntity> outputs = context.getOutputs();
  // 这里即将通过inputs进行转化，因为inputs仅能包含固定的库或者表信息，但是一个完整的SQL权限包含所有实体的层级关系，
  // 包括Server、Db、Table等，需要通过转化补充到这个变量中
  List<List<DBModelAuthorizable>> inputHierarchy = new ArrayList<List<DBModelAuthorizable>>();
  // 同上
  List<List<DBModelAuthorizable>> outputHierarchy = new ArrayList<List<DBModelAuthorizable>>();

  if(LOG.isDebugEnabled()) {
    LOG.debug("stmtAuthObject.getOperationScope() = " + stmtAuthObject.getOperationScope());
    LOG.debug("context.getInputs() = " + context.getInputs());
    LOG.debug("context.getOutputs() = " + context.getOutputs());
  }

  // 根据此次操作需要的权限范围（Operation Scope）进行细节的实体补充。这个范围是根据SQL的实际含义确定的，Hive有一个巨大的Map在做这个映射操作。举例来说，show tables的操作，需要的是库（DATABASE）的操作范围。这个映射不是Sentry写的，是Hive自己权限处理完成的。
  switch (stmtAuthObject.getOperationScope()) {

  case SERVER :
    ...
    break;
  case DATABASE:
    ...
    break;
  case TABLE:
    ...
      break;
  case FUNCTION:
    ...
    break;
  case CONNECT:
    ...
    break;

  default:
    throw new AuthorizationException("Unknown operation scope type " +
        stmtAuthObject.getOperationScope().toString());
  }

  // 看到这里你就明白了，之前是解析以及再次丰富的逻辑，
  // 到这里就是真正进行校验权限的地方。
  // hiveAuthzBinding就是与Sentry Server有Thrift链接的
  hiveAuthzBinding.authorize(stmtOperation, stmtAuthObject, getCurrentSubject(context),
      inputHierarchy, outputHierarchy);
}

然后到hiveAuthzBinding.authorize里去看一下校验的逻辑

从方法注释就能看出来，核心就在这里了

/**
 * Validate the privilege for the given operation for the given subject
 */
public void authorize(HiveOperation hiveOp, HiveAuthzPrivileges stmtAuthPrivileges,
    Subject subject, List<List<DBModelAuthorizable>> inputHierarchyList,
    List<List<DBModelAuthorizable>> outputHierarchyList)
        throws AuthorizationException {
  // 与Sentry Server的链接是否已经搭建完成，这里需要注意的是，链接的建立是需要sentry-site.xml的
  if (!open) {
    throw new IllegalStateException("Binding has been closed");
  }
  boolean isDebug = LOG.isDebugEnabled();
  if(isDebug) {
    LOG.debug("Going to authorize statement " + hiveOp.name() +
        " for subject " + subject.getName());
  }

 
  Map<AuthorizableType, EnumSet<DBModelAction>> requiredInputPrivileges =
      stmtAuthPrivileges.getInputPrivileges();
  
  Map<AuthorizableType, EnumSet<DBModelAction>> requiredOutputPrivileges =
      stmtAuthPrivileges.getOutputPrivileges();
  

  boolean found = false;
   // 从输入（也就是input list）开始校验可读权限，双重for循环就是为了集合之间做contains的判断
  for(AuthorizableType key: requiredInputPrivileges.keySet()) {
    for (List<DBModelAuthorizable> inputHierarchy : inputHierarchyList) {
      if (getAuthzType(inputHierarchy).equals(key)) {
        found = true;
        if (!authProvider.hasAccess(subject, inputHierarchy, requiredInputPrivileges.get(key), activeRoleSet)) {
          // 如果发现有一个可读权限不通过，就报错
          throw new AuthorizationException("User " + subject.getName() +
              " does not have privileges for " + hiveOp.name());
        }
      }
    }
    // 对于可读权限来说，有些特殊情况是需要单独处理的
    if(!found && !(key.equals(AuthorizableType.URI)) &&  !(hiveOp.equals(HiveOperation.QUERY))
        && !(hiveOp.equals(HiveOperation.CREATETABLE_AS_SELECT))) {
      //URI privileges are optional for some privileges: anyPrivilege, tableDDLAndOptionalUriPrivilege
      //Query can mean select/insert/analyze where all of them have different required privileges.
      //CreateAsSelect can has table/columns privileges with select.
      //For these alone we skip if there is no equivalent input privilege
      //TODO: Even this case should be handled to make sure we do not skip the privilege check if we did not build
      //the input privileges correctly
      throw new AuthorizationException("Required privilege( " + key.name() + ") not available in input privileges");
    }
    found = false;
  }
// 从输入（也就是output list）开始校验可写权限，双重for循环就是为了集合之间做contains的判断
  for(AuthorizableType key: requiredOutputPrivileges.keySet()) {
    for (List<DBModelAuthorizable> outputHierarchy : outputHierarchyList) {
      if (getAuthzType(outputHierarchy).equals(key)) {
        found = true;
        if (!authProvider.hasAccess(subject, outputHierarchy, requiredOutputPrivileges.get(key), activeRoleSet)) {
          // 如果发现有一个可写权限不通过，就报错，有没有，对称的代码
          throw new AuthorizationException("User " + subject.getName() +
              " does not have privileges for " + hiveOp.name());
        }
      }
    }
    // 同样的，对于可写操作来说，有些特殊情况是需要单独处理的
    if(!found && !(key.equals(AuthorizableType.URI)) &&  !(hiveOp.equals(HiveOperation.QUERY))) {
      //URI privileges are optional for some privileges: tableInsertPrivilege
      //Query can mean select/insert/analyze where all of them have different required privileges.
      //For these alone we skip if there is no equivalent output privilege
      //TODO: Even this case should be handled to make sure we do not skip the privilege check if we did not build
      //the output privileges correctly
      throw new AuthorizationException("Required privilege( " + key.name() + ") not available in output privileges");
    }
    found = false;
  }

}

写在最后

通过这一系列的文章，我们一共梳理了Sentry的几个设计

数据库设计
通过Hive的授权，是怎样同步到HDFS的，以及其他组件的
异常情况如何处理的
核心校验源码是怎么处理的

以后就可以自己上手设计一个小型SQL校验系统啦！下一个开源项目的创建者就是你哦～