Spark SQL和Hive中的函数(三):时间函数

6 阅读3分钟

更多干货抢先看:大数据干货合集

1. current_date / current_timestamp

获取当前时间

select current_date;
select current_timestamp;

2. 从日期时间中提取字段/格式化时间

1)year、month、day、dayofmonth、hour、minute、second

-- 20
select day("2020-12-20");

2)dayofweek(1 = Sunday, 2 = Monday, ..., 7 = Saturday)、dayofyear

-- 7
select dayofweek("2020-12-12");

3)weekofyear(date)

/**
   * Extracts the week number as an integer from a given date/timestamp/string.
   *
   * A week is considered to start on a Monday and week 1 is the first week with more than 3 days,
   * as defined by ISO 8601
   *
   * @return An integer, or null if the input was a string that could not be cast to a date
   * @group datetime_funcs
   * @since 1.5.0
   */
  def weekofyear(e: Column): Column = withExpr { WeekOfYear(e.expr) }

-- 50
select weekofyear("2020-12-12");

4)trunc

截取某部分的日期,其他部分默认为01。第二个参数: YEAR、YYYY、YY、MON、MONTH、MM

-- 2020-01-01
select trunc("2020-12-12", "YEAR");

-- 2020-12-01
select trunc("2020-12-12", "MM");

5)date_trunc

参数:YEAR、YYYY、YY、MON、MONTH、MM、DAY、DD、HOUR、MINUTE、SECOND、WEEK、QUARTER

-- 2012-12-12 09:00:00
select date_trunc("HOUR" ,"2012-12-12T09:32:05.359");

6)date_format

按照某种格式格式化时间

-- 2020-12-12
select date_format("2020-12-12 12:12:12", "yyyy-MM-dd");

3. 日期时间转换

1)unix_timestamp

返回当前时间的unix时间戳。

select unix_timestamp();
-- 1609257600
select unix_timestamp("2020-12-30", "yyyy-MM-dd");

2)from_unixtime

将unix epoch(1970-01-01 00:00:00 UTC)中的秒数转换为以给定格式表示当前系统时区中该时刻的时间戳的字符串。

select from_unixtime(1609257600, "yyyy-MM-dd HH:mm:ss");

3)to_unix_timestamp

将时间转化为时间戳。

-- 1609257600
select to_unix_timestamp("2020-12-30", "yyyy-MM-dd");

4)to_date / date

将时间字符串转化为date。

-- 2020-12-30
select to_date("2020-12-30 12:30:00");
select date("2020-12-30");

5)to_timestamp

将时间字符串转化为timestamp。

select to_timestamp("2020-12-30 12:30:00");

6)quarter

从给定的日期/时间戳/字符串中提取季度。

-- 4
select quarter("2020-12-30");

4. 日期、时间计算

1)months_between(end, start)

返回两个日期之间的月数。参数1为截止时间,参数2为开始时间

-- 3.94959677
select months_between("1997-02-28 10:30:00", "1996-10-30");

2)add_months

返回某日期后n个月后的日期。

-- 2020-12-28
select add_months("2020-11-28", 1);

3)last_day(date)

返回某个时间的当月最后一天

-- 2020-12-31
select last_day("2020-12-01");

4)next_day(start_date, day_of_week)

返回某时间后the first date基于specified day of the week。

参数1:开始时间。

参数2:Mon、Tue、Wed、Thu、Fri、Sat、Sun。

-- 2020-12-07
select next_day("2020-12-01", "Mon");

5)date_add(start_date, num_days)

返回指定时间增加num_days天后的时间

-- 2020-12-02
select date_add("2020-12-01", 1);

6)datediff(endDate, startDate)

两个日期相差的天数

-- 3
select datediff("2020-12-01", "2020-11-28");

7)关于UTC时间

-- to_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.


select to_utc_timestamp("2020-12-01", "Asia/Seoul") ;


-- from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.


select from_utc_timestamp("2020-12-01", "Asia/Seoul");

关联文章:

经典的SparkSQL/Hive-SQL/MySQL面试-练习题

更多干货抢先看:大数据干货合集