题目描述
表: Employee
+--------------+---------+
| 列名 | 类型 |
+--------------+---------+
| id | int |
| name | varchar |
| salary | int |
| departmentId | int |
+--------------+---------+
在 SQL 中,id是此表的主键。
departmentId 是 Department 表中 id 的外键(在 Pandas 中称为 join key)。
此表的每一行都表示员工的 id、姓名和工资。它还包含他们所在部门的 id。
表: Department
+-------------+---------+
| 列名 | 类型 |
+-------------+---------+
| id | int |
| name | varchar |
+-------------+---------+
在 SQL 中,id 是此表的主键列。
此表的每一行都表示一个部门的 id 及其名称。
查找出每个部门中薪资最高的员工。
按 任意顺序 返回结果表。
查询结果格式如下例所示。
示例 1:
输入:
Employee 表:
+----+-------+--------+--------------+
| id | name | salary | departmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Jim | 90000 | 1 |
| 3 | Henry | 80000 | 2 |
| 4 | Sam | 60000 | 2 |
| 5 | Max | 90000 | 1 |
+----+-------+--------+--------------+
Department 表:
+----+-------+
| id | name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
输出:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Jim | 90000 |
| Sales | Henry | 80000 |
| IT | Max | 90000 |
+------------+----------+--------+
解释: Max 和 Jim 在 IT 部门的工资都是最高的,Henry 在销售部的工资最高。
解题思路
我们可以使用子查询和聚合函数来解决这个问题。首先,我们将表按照部门分组,并计算每个部门的最高工资。然后,我们使用这个结果作为子查询,并与原始表进行连接,以获取对应的员工姓名。
代码实现
MySQL 实现
SELECT
e.DepartmentId,
e.MaxSalary,
e.Name AS EmployeeName
FROM
(SELECT
DepartmentId, MAX(Salary) AS MaxSalary
FROM
Employee
GROUP BY DepartmentId) e
JOIN
Employee em ON e.DepartmentId = em.DepartmentId
AND e.MaxSalary = em.Salary;
SELECT
Department.name AS 'Department',
Employee.name AS 'Employee',
Salary
FROM Employee
JOIN Department
ON Employee.DepartmentId = Department.Id
# join之后过滤
WHERE (Employee.DepartmentId, Salary) IN
(SELECT DepartmentId, MAX(Salary)
FROM Employee
GROUP BY DepartmentId)
;
Pandas 实现
import pandas as pd
def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
# 合并表以及重命名
df = employee.merge(department, left_on='departmentId', right_on='id', how='left')
df.rename(columns={'name_x': 'Employee', 'name_y': 'Department', 'salary': 'Salary'}, inplace=True)
# 选择工资等于部门最高工资的员工
max_salary = df.groupby('Department')['Salary'].transform('max')
df = df[df['Salary'] == max_salary]
return df[['Department', 'Employee', 'Salary']]