关联的不相似 MySQL 表的最佳连接方式

49 阅读2分钟

有两个不相似的数据表,表 a (t_a) 和表 b (t_b),都需要通过 Python 代码在 Django 中进行关联。表 a (t_a) 的列有:idnamelastfirstemailstatecountry。表 b (t_b) 的列有:idsngivennickemaillc。目标是将这两张表中的数据进行关联,但需要注意的是,关联必须是 1-对-1 的,并且需要满足一定的匹配条件:

  • 需要定义一个最小匹配数 min_match,如果两个表的行的匹配字段数量小于 min_match,则不认为是有效的匹配。
  • 虽然两张表可能长度不同,但关联必须是 1-对-1 的,也就是说,不允许多对一的情况。

希望找到一种直接从 MySQL 中获取合并结果的方法,可以通过 SQL 函数或创建一个新的表来实现。

huake_00066_.jpg 2. 解决方案 方案一:使用存储过程和游标 可以使用存储过程和游标来完成这个任务。以下是存储过程的示例代码:

DELIMITER $$
http://www.jshk.com.cn/mb/reg.asp?kefu=xiaoding;//爬虫IP免费获取;
CREATE PROCEDURE `proc_name`()
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE a_id BIGINT UNSIGNED;
  DECLARE b_id BIGINT UNSIGNED;
  DECLARE x_count INT;

  -- something like the following
  DECLARE cur1 CURSOR FOR SELECT t_a.id, t_b.id FROM t_a, t_b WHERE t_a.email = t_b.email;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

  SELECT COUNT(*) INTO x_count FROM t_a, t_b WHERE t_a.email = t_b.email;

  IF(x_count > <some_min_value>) THEN

    OPEN cur1;

    REPEAT
      FETCH cur1 INTO a_id, b_id;
      IF NOT done THEN

        -- do something here like update rows, remove rows, etc.
        -- a_id and b_id hold the two id values for the two tables which
        -- I assume to be primary keys

      END IF;
    UNTIL done END REPEAT;

    CLOSE cur1;

  END IF;
END
$$

方案二:使用 Python 代码 也可以使用 Python 代码来完成这个任务。以下是 Python 代码的示例:

import mysql.connector

def connect_to_database():
  """Establishes a connection to the MySQL database."""
  connection = mysql.connector.connect(
      host="localhost",
      user="username",
      password="password",
      database="database_name"
  )
  return connection

def get_matched_rows(min_match):
  """Retrieves the matched rows from the two tables based on the given minimum match."""
  connection = connect_to_database()
  cursor = connection.cursor()

  query = """
    SELECT t_a.id AS t_a_id, t_b.id AS t_b_id
    FROM t_a
    INNER JOIN t_b ON t_a.email = t_b.email
    WHERE (
      t_a.name = t_b.sn AND
      t_a.last = t_b.given AND
      t_a.first = t_b.nick AND
      t_a.state = t_b.l AND
      t_a.country = t_b.c
    )
    GROUP BY t_a.id, t_b.id
    HAVING COUNT(*) >= %s
  """
  cursor.execute(query, (min_match,))
  matched_rows = cursor.fetchall()

  cursor.close()
  connection.close()

  return matched_rows

def main():
  """Gets the matched rows and prints them."""
  min_match = 3
  matched_rows = get_matched_rows(min_match)

  for row in matched_rows:
    print("t_a_id: {}, t_b_id: {}".format(row[0], row[1]))

if __name__ == "__main__":
  main()

根据需要,您可以选择使用存储过程或 Python 代码来完成这个任务。