KDB文档 - Queries: q-sql (Part 2)

344 阅读9分钟

The delete Template

delete from texp

When the table is passed by value, the operation is on a copy. When the table is passed by name, the operation is in place. Deleting from a copy,

q)t:([] c1:`a`b`c; c2:10 20 30)
q)delete c1 from t
c2
--
10
20
30
q)delete from t where c2>15
c1 c2
-----
a  10
q)t
c1 c2
-----
a  10
b  20
c  30

To delete in place,

q)delete from `t where c2=30
q)t
c1 c2
-----
a  10
b  20
q)delete c2 from `t
q)t
c1
--
a 
b 

When you want to select all but a few columns, it is easier to delete the ones you don’t want than list all the ones you do.

q)t:([] c1:1 2; c2:`a`b; c3:1.1 2.2; c4:2015.01.01 2015.01.02)
q)(select c1, c2, c4 from t)~delete c3 from t
1b

Sorting

 In contrast to SQL, there is no equivalent to ORDER BY in the select template. Instead, built-in functions that sort tables are applied after select.

xasc

To work on a copy, pass by value.

q)t:([] c1:`a`b`c`a; c2:20 10 40 30)
q)`c2 xasc t
c1 c2
-----
b  10
a  20
a  30
c  40
q)`c1`c2 xasc t
c1 c2
-----
a  20
a  30
b  10
c  40

To sort in place, pass by name.

q)`c1`c2 xasc `t
q)t
c1 c2
-----
a  20
a  30
b  10
c  40

xdesc

q)t:([] c1:`a`b`c`a; c2:20 10 40 30)
q)`c1`c2 xdesc t
c1 c2
-----
c  40
b  10
a  30
a  20

Mixed Sort

q)`c1 xasc `c2 xdesc t

Joins

The essence of relational database design is normalizing data using relations and keys and then reassembling with joins. Normalization eliminates duplicate data, which takes up space and is hard to keep consistent. Joins restore the original flat rectangular form that makes data easy to work with (there’s a reason spreadsheets are so popular). Simple use cases include a master-detail relation or a lookup table.

Implicit Join

Given a primary key table m with key column(s) k and a table d with a foreign key linking to k, a left join can be expressed in various SQL notations. For example,

m LEFT JOIN d ON m.k = d.k

A SELECT statement for this join refers to columns in the join by using dot notation based on the constituent tables.

SELECT d.cold, m.colm FROM m LEFT JOIN d WHERE m.k = d.k

Implicit joins extend to the situation in which the targeted keyed table itself has a foreign key to another keyed table.

q)emaster:([eid:1001 1002 1003 1004 1005] currency:`gbp`eur`eur`gbp`eur)
q)update eid:`emaster$1001 1002 1005 1004 1003 from `s
q)select s.name, qty, s.eid.currency from sp

Ad hoc Left Join (lj)

To create an ad-hoc left outer join between tables that could have a foreign-key relationship, use the binary lj. When the foreign key exists, that linkage is used; otherwise, the linkage is constructed dynamically. The join is 2-3 times faster if the foreign key already exists.

q)t:([] k:1 2 3 4; c:10 20 30 40)
q)kt:([k:2 3 4 5]; v:200 300 400 500)
q)t lj kt
k c  v
--------
1 10
2 20 200
3 30 300
4 40 400

Observe that when the source table has a foreign key, an ad-hoc left join is equivalent to listing all columns from both tables in an implicit join.

q)kt:([k:1 2 3 4 5]; v1:10 20 30 40 50; v2:1.1 2.2 3.3 4.4 5.5)
q)tf:([] k:`kt$1 2 3 4; c:10 20 30 40)
q)(tf lj kt)~select k,c,k.v1,k.v2 from tf
1b

Column Lookup

You can perform a column lookup against a keyed table within a query without using a join.

q)t:([] k:1 2 3 4; c:10 20 30 40)
q)kt:([k1:2 3 4 5]; v:2.2 3.3 4.4 5.5)
q)select c, v:kt[([] k1:k); `v] from t
c  v
------
10
20 2.2
30 3.3
40 4.4

Especially for a single column, this is simpler (and more impressive to your colleagues) than the equivalent join.

q)select c,v from t lj `k xkey select k:k1,v from kt
c  v
------
10
20 2.2
30 3.3
40 4.4

Here is an example using compound keys and column renaming to match the lookup table.

q)t:([] f:`rocky`bullwinkle; l:`squirrel`moose; c:10 20)
q)kt:([fn:`rocky`bullwinkle`fearless; ln:`squirrel`moose`leader] v:1.1 2.2 3.3)
q)select c, v:kt[([] fn:f; ln:l); `v] from t
c  v
------
10 1.1
20 2.2

Ad Hoc Inner Join (ij)

Given a primary key table m with key column(s) k and a table d with a foreign key linking to k, an inner join can be expressed in various SQL notations,

m,d WHERE m.k = d.k 
m INNER JOIN d ON m.k = d.k

The binary inner-join operator ij performs an inner join between two tables that could have a foreign key relationship.

The following example shows that ij returns just those records with matching keys in both tables.

q)t:([] k:1 2 3 4; c:10 20 30 40)
q)kt:([k:2 3 4 5]; v:2.2 3.3 4.4 5.5)
q)t ij kt
k c  v
--------
2 20 2.2
3 30 3.3
4 40 4.4

Equijoin ej

The triadic equijoin operator ej corresponds to a SQL inner join between tables in the second and third parameters along specified column names in the first parameter. The right operand does not have to be a keyed table. Unlike ij, all matching records in the right table appear in the result. As with any join, upsert semantics holds on duplicate columns.

q)t1:([] k:1 2 3 4; c:10 20 30 40)
q)t2:([] k:2 2 3 4 5; c:200 222 300 400 500; v:2.2 22.22 3.3 4.4 5.5)
q)t1 ij `k xkey t2
k c   v
---------
2 200 2.2
3 300 3.3
4 400 4.4
q)ej[`k;t1;t2]
k c   v
-----------
2 200 2.2
2 222 22.22
3 300 3.3
4 400 4.4

Plus Join (pj)

Plus join pj is a left join that replaces upsert semantics for duplicate column names with addition. This is useful when you have two tables with identical schemas having all non-key columns numeric.

q)t:([] k:`a`b`c; a:100 200 300; b:10. 20. 30.; c:1 2 3)
q)kt:([k:`a`b] a:10 20; b:1.1 2.2)
q)t pj kt
k a   b    c
------------
a 110 11.1 1
b 220 22.2 2
c 300 30 3

Union Join

The equivalent of an ordinary SQL union on tables with matching schemas is simply ,. Indeed, it joins two lists of compatible records.

q)t1:([] c1:`a`b; c2:1 2)
q)t2:([] c1:`c`d; c2:3 4)
q)t1,t2
c1 c2
-----
a  1
b  2
c  3
d  4

Union join uj is more powerful in that it vertically combines any two tables, or keyed tables. 

q)t1:([] c1:`a`b`c; c2: 10 20 30)
q)t2:([] c1:`x`y; c3:8.8 9.9)
q)t1 uj t2
c1 c2 c3 
---------
a  10    
b  20    
c  30    
x     8.8
y     9.9

Continuing the previous example, you can use uj/ to combine a list of disparate tables.

q)t3:([] c1:`e`f`g; c2:50 60 70; c3:5.5 6.6 7.7)
q)(uj/) (t1; t2; t3)

As-of Joins

As-of joins are so-named because they most often join tables along time columns to obtain a value in one table that is current as of a time in another table. As-of joins are non-equijoins that match on less-than-or-equal. They are not restricted to time values. The fundamental as-of join is the triadic function aj. It joins tables along common columns using most recent values. The syntax of aj is,

aj[`c1...`cn;t1;t2]

where `c1...`cn is a simple list of symbol column names common to t1 and t2, which are the tables to be joined. There is no requirement for any of the join columns to be keys but the join will be faster on keys. The columns of both tables are brought into the result.

The semantics of aj are as follows. The match on all specified columns except the last is by equals. Assuming the records are sequenced by cn in both tables, for a given cn value in t1, the match picks the greatest cn in t2 less than or equal to the given value in t1. Specifically, if the cn columns are sequenced temporal values, for each cn value in t1 the match picks the t2 row whose cn value is in effect “as of” the time in t1.

The simpler function asof performs the same match as aj but with a table against a single record. The result picks out the remaining columns in the matched row of the table.

q)t:([] ti:10:01:01 10:01:03 10:01:04; sym:`msft`ibm`ge; qty:100 200 150; px:45 160 55)
q)t
ti       sym  qty px 
---------------------
10:01:01 msft 100 45 
10:01:03 ibm  200 160
10:01:04 ge   150 55 
q)t asof `sym`ti!(`ibm;10:01:03)
qty| 200
px | 160

Window Join

Window joins are a generalization of as-of joins and are specifically geared for analyzing the relationship between trades and quotes in finance. The idea is that you want to investigate the behavior of quotes in a neighborhood of each trade. For example, to determine how well a trade was executed, you need to examine the range of bid and ask prices that were prevalent around the trade time.

So that things fit on the page, we use a simple example involving trades for one ticker symbol `aapl and times that are in seconds. The general case is essentially the same. First we create the trades and quotes tables; note that they are required to be unkeyed.

q)show t:([]sym:3#`aapl;time:09:30:01 09:30:04 09:30:08;price:100 103 101)
sym  time     price
-------------------
aapl 09:30:01 100  
aapl 09:30:04 103  
aapl 09:30:08 101  
q)show q:([] sym:8#`aapl;
    time:09:30:01+(til 5),7 8 9;
    ask:101 103 103 104 104 103 102 100;
    bid:98 99 102 103 103 100 100 99)
sym  time     ask bid
---------------------
aapl 09:30:01 101 98 
aapl 09:30:02 103 99 
aapl 09:30:03 103 102
aapl 09:30:04 104 103
aapl 09:30:05 104 103
aapl 09:30:08 103 100
aapl 09:30:09 102 100
aapl 09:30:10 100 99

We construct fixed-width windows of 2 seconds before and one second after each trade time. (There is no requirement that the windows be of uniform width.)

q)w:-2 1+\:t `time
q)w
09:29:59 09:30:02 09:30:06
09:30:02 09:30:05 09:30:09

For readability we place the names of the sym and time columns in a variable.

q)c:`sym`time

The actual form for wj is:

wj[w;c;t;(q;(f0;c0);(f1;c1))]

Here w is a list of windows, c is the list of sym and time column names, t is the trades table. The final parameter is a list with the quotes table and a pair of lists, each with an aggregate function and a column name.

To see all the values in each window, pass the identity function :: in place of the aggregates. The result is similar to grouping without aggregate in a query and is helpful to see what is happening within each window.

q)wj[w;c;t;(q;(::;`ask);(::;`bid))]
sym  time     price ask             bid           
--------------------------------------------------
aapl 09:30:01 100   101 103         98 99         
aapl 09:30:04 103   103 103 104 104 99 102 103 103
aapl 09:30:08 101   104 103 102     103 100 100

Now we apply wj with actual aggregates for the maximum ask and minimum bid over each window. Here you see the results of the aggregate functions run within each window.

q)wj[w;c;t;(q;(max;`ask);(min;`bid))]
sym  time     price ask bid
---------------------------
aapl 09:30:01 100   103 98 
aapl 09:30:04 103   104 99 
aapl 09:30:08 101   104 100

Parameterized Queries

Any q function that operates on a table is effectively a stored procedure. Function parameters can be used to supply specific values for queries.

q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)select from t where c2>15
c1 c2 c3 
---------
b  20 2.2
c  30 3.3
q)proc:{[sc] select from t where c2>sc}
q)proc 15
c1 c2 c3 
---------
b  20 2.2
c  30 3.3
q)proc2:{[nms;sc] select from t where c1 in nms, c2>sc}
q)proc2[`a`c; 15]
c1 c2 c3 
---------
c  30 3.3

You can pass a table as a parameter to a template, either by value or by name.

q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)proc3:{[t;nms;delta] update c2+delta from t where c1 in nms}
q)proc3[t;`a`c;100]
c1 c2  c3 
----------
a  110 1.1
b  20  2.2
c  130 3.3
q)t
c1 c2 c3 
---------
a  10 1.1
b  20 2.2
c  30 3.3
q)proc3[`t;`a`c;100]
`t
q)t
c1 c2  c3 
----------
a  110 1.1
b  20  2.2
c  130 3.3

You can effectively parameterize column names in two ways, only one of which is good practice. First, you can mimic a common technique from SQL in which the query is created dynamically: build the query text in a string and then invoke the interpreter programmatically using value. This is comparatively slow. Worse, it exposes your application to injection attacks, since any q expression that appears inside the text will be executed.

The preferred method is to use the functional form for queries

Views

A SQL view is effectively a query expression whose result set can be used like a table. Views are useful for encapsulating data – e.g., hiding columns, or simplifying complex queries.

A q-sql view is a named table expression created as an alias with the double-colon operator ::. It is common to use the templates in views but this is not a limitation.

q)t:([] c1:`a`b`c; c2:10 20 30)
q)u:select from t where c2>15
q)v::select from t where c2>15
q)u
c1 c2
-----
b  20
c  30
q)v
c1 c2
-----
b  20
c  30
q)update c2:15 from `t where c1=`b
`t
q)u
c1 c2
c1 c2
-----
b  20
c  30
q)v
c1 c2
-----
c  30

Observe that when the underlying table changed, u did not change but the next reference to v does reflect the update.

To find the underlying query of a view, or any alias, apply the function view to the symbol alias name.

q)view `v
"select from t where c2>15"
q)a:42
q)b::a
q)view `b
_

To list all the views in a namespace, use the function views with the context name. For example, to list the views in the root,

q)views `.
`b`v

出处:code.kx.com/q4m3/9\_Que…

作者:Jeffry A. Borror