KDB文档 - Queries: q-sql (Part 1)

616 阅读6分钟

Overview

The first important difference is that a q table has ordered rows and columns. This is particularly useful when dealing with the situation where records arrive in time order. Appending them to a table ensures that they enter – and stay – in order. Subsequent select operations always retrieve the records in order without any need for sorting.

A second difference is that a q table is stored physically as a collection of column lists. This means that operations on column data are vector operations. Moreover, for simple column lists, atomic, aggregate and uniform functions applied to columns are especially simple and fast since they reduce to direct memory addressing.

A third difference is that q-sql provides upsert semantics. Recall that upsert semantics on a dictionary mean that when a key-value pair is applied with , and the key is present, the value is updated; otherwise the pair is inserted. In the context of tables and keyed tables, which are both dictionaries, this has far-reaching consequences for many common operations, including joins. Upsert semantics permeate q-sql.

Inserting Records

Append Using Assign

q)t:([] name:`symbol$(); iq:`int$())
q)t,:`name`iq!(`Beeblebrox; 42)
q)t,:`name`iq!(`Dent; 98.0)
'type

q)t,:(`Prefect; 126)


q)kt:([eid:`long$()] name:`symbol$(); iq:`long$())
q)kt,:(1002; `Dent; 98)
q)kt
eid | name id
----| -------
1002| Dent 98

Basic insert

q)`kt insert (1001; `Maven; 150)
q)`kt insert (1001; `Maven; 150)
q)kt
eid | name  id 
----| ---------
1002| Dent  98 
1001| Maven 150

You can also use insert in prefix form, possibly with the table name projected. 

q)insert[`kt;] (1003; `Peter; 120)
q)kt
eid | name  id 
----| ---------
1002| Dent  98 
1001| Maven 150
1003| Peter 120

Bulk Columnar Insert

q)t:([] name:`Dent`Beeblebrox; iq:98 42)
q)`t insert ((`Prefect; 126); (`Marvin; 200))
'type
q)`t insert (`Prefect`Marvin; 126 200)
2 3

Insert into Empty Tables

q)t:([] name:(); iq:())
q)`t insert (`Dent;98.0)
,0

It is also possible to insert a list of conforming records (i.e., a table) into a table that does not exist. 

q)tnew
'tnew
q)`tnew insert enlist `c1`c2!(`a; 10)
q)tnew
c1 c2
-----
a  10

The records in the keyed table are stored in insert order rather than key order.

Upsert

Upsert Replacing Insert

q)t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
q)`t upsert (`name`iq)!(`Slartibartfast; 134)
q)`t upsert (`Marvin; 150)
q)`t upsert ([] name:`Slartibartfast`Marvin; iq:134 200)
name           iq 
------------------
Dent           42 
Beeblebrox     98 
Prefect        126
Slartibartfast 134
Marvin         150
Slartibartfast 134
Marvin         200
q)t:3#t
name       iq 
--------------
Dent       42 
Beeblebrox 98 
Prefect    126

Upsert by Name

A limitation of insert is that it uses pass-by-name, so it can only operate against global tables. In contrast, upsert supports both pass-by-name and pass-by-value. Thus it can be used with anonymous or local tables.

q)([] c1:`a`b; c2:10 20) upsert (`c; 30)
c1 c2
-----
a  10
b  20
c  30
q)f:{t:([] c1:`a`b; c2:10 20); t upsert x}
q)f (`c; 30)
c1 c2
-----
a  10
b  20
c  30

Upsert on Keyed Tables

We have seen that insert has undesirable semantics on keyed tables – i.e., it rejects “duplicate” keys. What we really want is, well, upsert semantics.

Upsert on Persisted Tables

You can use upsert to append records to serialized and splayed tables. Simply pass the handle of the file or splayed directory as the name of the table. This is the final strike against insert, since it cannot do this.

q)`:/q4m/tser set ([] c1:`a`b; c2:1.1 2.2)
q)`:/q4m/tser upsert (`c; 3.3)
`:/q4m/tser
q)get `:/q4m/tser
c1 c2 
------
a  1.1
b  1.2
c  3.3

Next we splay a table and then append a row to it.

q)`:/q4m/tsplay/ set ([] c1:`sym?`a`b; c2:1.1 2.2)
`:/q4m/tsplay/
q)`:/q4m/tsplay upsert (`sym?`c; 3.3)
`:/q4m/tsplay
q)select from `:/q4m/tsplay
c1 c2 
------
a  1.1
b  1.2
c  3.3

Upserting to a splayed table does not read the persisted image into memory; rather, it appends to the ends of the column files. This allows incremental creation of large splayed (or partitioned) tables by upserting chunks that comfortably fit into memory.

The select Template

The result of select is always a table.

Syntax

select from texp

q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)select c1, res:2*c2 from t
c1 res
------
a  20
b  40
c  60



q)select c1, c1, 2*c2, c2+c3, string c3 from t
c1 c11 x  c2   c3
--------------------
a  a   20 11.1 "1.1"
b  b   40 22.2 "2.2"
c  c   60 33.3 "3.3"

The Virtual Column i

A virtual column i represents the offset of each record in the table – i.e., i is the row number.

q)select i, c1 from t
x c1
----
0 a
1 b
2 c

select distinct

q)select distinct from ([] c1:`a`b`a; c2:10 20 10)
c1 c2
-----
a  10
b  20

select[]

You can return the first or last n records in a select by using function parameter syntax after select. A positive integer parameter returns the first records, a negative parameter the last.

q)select [2] from t where c1 <> `a
c1 c2 c3 
---------
b  20 2.2
c  30 3.3

same as 

q)2#select from t where c1 <> `a
c1 c2 c3 
---------
b  20 2.2
c  30 3.3

The difference is that the # construct requires computing the entire result set and then keeping only the desired rows, whereas select[n] only extracts the desired number of rows. The latter will be faster and consume less memory for large tables.

This syntax is extended to select[n m] where n is the starting row number and m is the number of rows.

q)select[1 2] from s where city<>`athens

One final extension of the syntax specifies a sorting criterion inside the brackets. For ascending sort, place < before a column name and for descending sort use >.

q)select [>c3] from t where c2 >= 20
c1 c2 c3 
---------
c  30 3.3
b  20 2.2

 You can combine the two syntax extensions by separating them with a semicolon.

q)select [1; >c3] from t where c2 >= 20
c1 c2 c3 
---------
c  30 3.3

Select on Nested Columns

q)show tnest:([] c1:`a`b`c; c2:(10 20 30; enlist 40; 50 60))
c1 c2
-----------
a  10 20 30
b  ,40
c  50 60

q)select avg each c2 from tnest
c2
--
20
40
55

use each for process

q)update c3:(1.1 2.2 3.3; enlist 4.4; 5.5 6.6) from `tnext
`tnext
q)select wtavg: c2 wavg' c3 from tnext
wtavg   
--------
2.566667
4.4     
6.1 

Filtering with where

q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)t where t[`c2]>15select from t where c2>15
c1 c2 c3
---------
b  20 2.2
c  30 3.3

In fact, you can provide a boolean list to where.

q)select from t where 011b
c1 c2 c3
---------
b  20 2.2
c  30 3.3

The Virtual Column i in where

The virtual column i is useful for paginating a table. Use within, which returns a boolean indicating whether the left operand is in the closed interval specified in the right operand, to determine the bounds of the page.

q)tbig:100#t
q)select from tbig where i within 50 99
c1 c2 c3 
---------
c  30 3.3
a  10 1.1
b  20 2.2
c  30 3.3
a  10 1.1
b  20 2.2
c  30 3.3
..
q)s:50
q)e:99
q)count select from tbig where i within (s;e)
50

Multiple Where Subphrases

q)r1:select from t where c2>15,c3<3.0
q)r2:select from t where (c2>15)&c3<3.0
q)r1~r2
1b

However, since the Where subphrases are sequenced from left-to-right, their order affects the actual processing. 

q)t:([] c1:00:00:00.000+til 1000000;c2:1000000?`a`b;c3:1000000?100.)
q)\t select from t where c1 within 00:00:01.000 00:00:01.999, c2=`a
0
q)\t select from t where c2=`a, c1 within 00:00:01.000 00:00:01.999
15

Nested Columns in Where

q)t:([] f:1.1 2.2 3.3; s:("abc";enlist "d";"ef"))
q)select from t where s~"ef"
f s
---
q)select from t where s~\:"ef"
f s
--------
3.3 "ef"

The first query does not achieve the desired result because it asks if the entire column matches the specified string. The second query works because it tests the specified string for match against each string in the column.

fby in Where

In SQL you would use HAVING, but q is having none of that. Instead use fby in the Where phrase.

q)select from p where weight=(max;weight) fby city
p | name  color weight city  
--| -------------------------
p2| bolt  green 17     paris 
p3| screw blue  17     rome  
p6| cog   red   19     london

Now we include another Where phrase for the desired result.

q)select from p where weight=(max;weight) fby city,color=`blue
p | name  color weight city
--| -----------------------
p3| screw blue  17     rome

To group on multiple columns, encapsulate them in an anonymous table in the right operand of fby.

q)t:([]sym:`IBM`IBM`MSFT`IBM`MSFT;
    ex:`N`O`N`N`N;
    time:12:10:00 12:30:00 12:45:00 12:50:00 13:30:00;
    price:82.1 81.95 23.45 82.05 23.40)
q)select from t where price=(max;price) fby ([]sym;ex)
sym  ex time     price
----------------------
IBM  N  12:10:00 82.1 
IBM  O  12:30:00 81.95
MSFT N  12:45:00 23.45

Grouping and Aggregation

Aggregation without Grouping

q)\l sp.q
q)select total:sum qty, mean:avg qty from sp
total mean    
--------------
3100  258.3333

Grouping without Aggregation

The By phrase groups rows having common values in specified column(s), much like GROUP BY in SQL. 

q)t:([] c1:`a`b`a`b`c; c2:10 20 30 40 50)
q)t[`c2] group t[`c1]
a| 10 30
b| 20 40
c| ,50
q)select c2 by c1 from t
c1| c2
--| -----
a | 10 30
b | 20 40
c | ,50

Ungroup

q)ungroup select c2 by c1 from t
c1 c2
-----
a  10
a  30
b  20
b  40
c  50

Grouping with Aggregation

Normally you will group using by together with aggregation in the Select phrase. The effect is to aggregate along the groups, collapsing each group of rows into a single record. The result is a keyed table whose key columns are the grouped column(s).

q)select sum c2 by c1 from t
c1| c2
--| --
a | 40
b | 60
c | 50

To group on multiple columns, specify multiple by subphrases, which results in a compound key in the result.

q)t:([] desk:`a`b`a`b`a`b; acct:`1`2`3`4`1`4; pnl:1.1 -2.2 3.3 4.4 5.5 -.5)
q)select ct:count desk, sum pnl by desk,acct from t
desk acct| ct pnl 
---------| -------
a    1   | 2  6.6 
a    3   | 1  3.3 
b    2   | 1  -2.2
b    4   | 2  3.9

Following is a useful example that averages the observations of our time series in 100 millisecond buckets.

q)t:([] c1:00:00:00.000+til 1000000;c2:1000000?`a`b;c3:1000000?100.)
q)select avg c3 by 100 xbar c1, c2 from t
c1           c2| c3      
---------------| --------
00:00:00.000 a | 55.26494
00:00:00.000 b | 41.81758
00:00:00.100 a | 48.88826
00:00:00.100 b | 46.10946

The exec Template

select always returns a table, the result type of exec depends on the number of columns in its Select phrase. One column yields a list; more than one column yields a dictionary.

q)t:([] name:`a`b`c`d`e; state:`NY`FL`OH`NY`HI)
q)select name, distinct state from t
'length
q)exec name, distinct state from t
name | `a`b`c`d`e
state| `NY`FL`OH`HI

A common use of exec is when there is only one column in the aggregate phrase and no By phrase. The result is the computed column list devoid of other structure – i.e., not a dictionary or table. This is useful to extract a column dynamically. 

q)select name from t
name
----
a
b
c
d
e
q)exec name from t
`a`b`c`d`e

When using exec to extract a single column of a table, you can place constraints on other columns.

q)exec name from t where state in `NY`HI
`a`d`e

The update Template

Basic update

update from texp

If the left of a colon is a column that exists in the table, that column is updated with the result of the expression to the right of the colon. If the left of a colon is not a column in the table, a new column of that name with the result of the expression is added to the end of the column list. 

q)t:([] c1:`a`b`c; c2:10 20 30)
q)update c1:`x`y`z from t
c1 c2
-----
x  10
y  20
z  30
q)t
_
q)update c3:`x`y`z from t
c1 c2 c3
--------
a  10 x
b  20 y
c  30 z

q)(update c2:c2+100 from t)~update c2+100 from t
1b

q)t:([] c1:`a`b`c; c2:10 20 30)
q)update c3:c2+100 from t where c1<>`a
c1 c2 c3 
---------
a  10    
b  20 120
c  30 130


q)update c3:1b from t where c2>15
c1 c2 c3
--------
a  10 0
b  20 1
c  30 1

update-by

When the By phrase is present, the update operation is performed along groups. This is most useful with aggregate and uniform functions. For an aggregate function, the entire group gets the value of the aggregation on the group.

q)update avg weight by city from p
p | name  color weight city  
--| -------------------------
p1| nut   red   15     london
p2| bolt  green 14.5   paris 
p3| screw blue  17     rome  
p4| screw red   15     london
p5| cam   blue  14.5   paris 
p6| cog   red   15     london

A uniform function is applied along the group in place. This can be used to compute cumulative volume of orders, for example.

q)update cumqty:sums qty by s from sp
s p  qty cumqty
---------------
0 p1 300 300   
0 p2 200 500   
0 p3 400 900   
0 p4 200 1100  
3 p5 100 100   
0 p6 100 1200  
1 p1 300 300   
1 p2 400 700   
2 p2 200 200   
3 p2 200 300   
3 p4 300 600   
0 p5 400 1600

出处:code.kx.com/q4m3/9\_Que…

作者:Jeffry A. Borror