Overview
The first important difference is that a q table has ordered rows and columns. This is particularly useful when dealing with the situation where records arrive in time order. Appending them to a table ensures that they enter – and stay – in order. Subsequent select operations always retrieve the records in order without any need for sorting.
A second difference is that a q table is stored physically as a collection of column lists. This means that operations on column data are vector operations. Moreover, for simple column lists, atomic, aggregate and uniform functions applied to columns are especially simple and fast since they reduce to direct memory addressing.
A third difference is that q-sql provides upsert semantics. Recall that upsert semantics on a dictionary mean that when a key-value pair is applied with , and the key is present, the value is updated; otherwise the pair is inserted. In the context of tables and keyed tables, which are both dictionaries, this has far-reaching consequences for many common operations, including joins. Upsert semantics permeate q-sql.
Inserting Records
Append Using Assign
q)t:([] name:`symbol$(); iq:`int$())
q)t,:`name`iq!(`Beeblebrox; 42)
q)t,:`name`iq!(`Dent; 98.0)
'type
q)t,:(`Prefect; 126)
q)kt:([eid:`long$()] name:`symbol$(); iq:`long$())
q)kt,:(1002; `Dent; 98)
q)kt
eid | name id
----| -------
1002| Dent 98
Basic insert
q)`kt insert (1001; `Maven; 150)
q)`kt insert (1001; `Maven; 150)
q)kt
eid | name id
----| ---------
1002| Dent 98
1001| Maven 150
You can also use insert in prefix form, possibly with the table name projected.
q)insert[`kt;] (1003; `Peter; 120)
q)kt
eid | name id
----| ---------
1002| Dent 98
1001| Maven 150
1003| Peter 120
Bulk Columnar Insert
q)t:([] name:`Dent`Beeblebrox; iq:98 42)
q)`t insert ((`Prefect; 126); (`Marvin; 200))
'type
q)`t insert (`Prefect`Marvin; 126 200)
2 3
Insert into Empty Tables
q)t:([] name:(); iq:())
q)`t insert (`Dent;98.0)
,0
It is also possible to insert a list of conforming records (i.e., a table) into a table that does not exist.
q)tnew
'tnew
q)`tnew insert enlist `c1`c2!(`a; 10)
q)tnew
c1 c2
-----
a 10
The records in the keyed table are stored in insert order rather than key order.
Upsert
Upsert Replacing Insert
q)t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
q)`t upsert (`name`iq)!(`Slartibartfast; 134)
q)`t upsert (`Marvin; 150)
q)`t upsert ([] name:`Slartibartfast`Marvin; iq:134 200)
name iq
------------------
Dent 42
Beeblebrox 98
Prefect 126
Slartibartfast 134
Marvin 150
Slartibartfast 134
Marvin 200
q)t:3#t
name iq
--------------
Dent 42
Beeblebrox 98
Prefect 126
Upsert by Name
A limitation of insert is that it uses pass-by-name, so it can only operate against global tables. In contrast, upsert supports both pass-by-name and pass-by-value. Thus it can be used with anonymous or local tables.
q)([] c1:`a`b; c2:10 20) upsert (`c; 30)
c1 c2
-----
a 10
b 20
c 30
q)f:{t:([] c1:`a`b; c2:10 20); t upsert x}
q)f (`c; 30)
c1 c2
-----
a 10
b 20
c 30
Upsert on Keyed Tables
We have seen that insert has undesirable semantics on keyed tables – i.e., it rejects “duplicate” keys. What we really want is, well, upsert semantics.
Upsert on Persisted Tables
You can use upsert to append records to serialized and splayed tables. Simply pass the handle of the file or splayed directory as the name of the table. This is the final strike against insert, since it cannot do this.
q)`:/q4m/tser set ([] c1:`a`b; c2:1.1 2.2)
q)`:/q4m/tser upsert (`c; 3.3)
`:/q4m/tser
q)get `:/q4m/tser
c1 c2
------
a 1.1
b 1.2
c 3.3
Next we splay a table and then append a row to it.
q)`:/q4m/tsplay/ set ([] c1:`sym?`a`b; c2:1.1 2.2)
`:/q4m/tsplay/
q)`:/q4m/tsplay upsert (`sym?`c; 3.3)
`:/q4m/tsplay
q)select from `:/q4m/tsplay
c1 c2
------
a 1.1
b 1.2
c 3.3
Upserting to a splayed table does not read the persisted image into memory; rather, it appends to the ends of the column files. This allows incremental creation of large splayed (or partitioned) tables by upserting chunks that comfortably fit into memory.
The select Template
The result of select is always a table.
Syntax
select from texp
q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)select c1, res:2*c2 from t
c1 res
------
a 20
b 40
c 60
q)select c1, c1, 2*c2, c2+c3, string c3 from t
c1 c11 x c2 c3
--------------------
a a 20 11.1 "1.1"
b b 40 22.2 "2.2"
c c 60 33.3 "3.3"
The Virtual Column i
A virtual column i represents the offset of each record in the table – i.e., i is the row number.
q)select i, c1 from t
x c1
----
0 a
1 b
2 c
select distinct
q)select distinct from ([] c1:`a`b`a; c2:10 20 10)
c1 c2
-----
a 10
b 20
select[]
You can return the first or last n records in a select by using function parameter syntax after select. A positive integer parameter returns the first records, a negative parameter the last.
q)select [2] from t where c1 <> `a
c1 c2 c3
---------
b 20 2.2
c 30 3.3
same as
q)2#select from t where c1 <> `a
c1 c2 c3
---------
b 20 2.2
c 30 3.3
The difference is that the # construct requires computing the entire result set and then keeping only the desired rows, whereas select[n] only extracts the desired number of rows. The latter will be faster and consume less memory for large tables.
This syntax is extended to select[n m] where n is the starting row number and m is the number of rows.
q)select[1 2] from s where city<>`athens
One final extension of the syntax specifies a sorting criterion inside the brackets. For ascending sort, place < before a column name and for descending sort use >.
q)select [>c3] from t where c2 >= 20
c1 c2 c3
---------
c 30 3.3
b 20 2.2
You can combine the two syntax extensions by separating them with a semicolon.
q)select [1; >c3] from t where c2 >= 20
c1 c2 c3
---------
c 30 3.3
Select on Nested Columns
q)show tnest:([] c1:`a`b`c; c2:(10 20 30; enlist 40; 50 60))
c1 c2
-----------
a 10 20 30
b ,40
c 50 60
q)select avg each c2 from tnest
c2
--
20
40
55
use each for process
q)update c3:(1.1 2.2 3.3; enlist 4.4; 5.5 6.6) from `tnext
`tnext
q)select wtavg: c2 wavg' c3 from tnext
wtavg
--------
2.566667
4.4
6.1
Filtering with where
q)t:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
q)t where t[`c2]>15 或 select from t where c2>15
c1 c2 c3
---------
b 20 2.2
c 30 3.3
In fact, you can provide a boolean list to where.
q)select from t where 011b
c1 c2 c3
---------
b 20 2.2
c 30 3.3
The Virtual Column i in where
The virtual column i is useful for paginating a table. Use within, which returns a boolean indicating whether the left operand is in the closed interval specified in the right operand, to determine the bounds of the page.
q)tbig:100#t
q)select from tbig where i within 50 99
c1 c2 c3
---------
c 30 3.3
a 10 1.1
b 20 2.2
c 30 3.3
a 10 1.1
b 20 2.2
c 30 3.3
..
q)s:50
q)e:99
q)count select from tbig where i within (s;e)
50
Multiple Where Subphrases
q)r1:select from t where c2>15,c3<3.0
q)r2:select from t where (c2>15)&c3<3.0
q)r1~r2
1b
However, since the Where subphrases are sequenced from left-to-right, their order affects the actual processing.
q)t:([] c1:00:00:00.000+til 1000000;c2:1000000?`a`b;c3:1000000?100.)
q)\t select from t where c1 within 00:00:01.000 00:00:01.999, c2=`a
0
q)\t select from t where c2=`a, c1 within 00:00:01.000 00:00:01.999
15
Nested Columns in Where
q)t:([] f:1.1 2.2 3.3; s:("abc";enlist "d";"ef"))
q)select from t where s~"ef"
f s
---
q)select from t where s~\:"ef"
f s
--------
3.3 "ef"
The first query does not achieve the desired result because it asks if the entire column matches the specified string. The second query works because it tests the specified string for match against each string in the column.
fby in Where
In SQL you would use HAVING, but q is having none of that. Instead use fby in the Where phrase.
q)select from p where weight=(max;weight) fby city
p | name color weight city
--| -------------------------
p2| bolt green 17 paris
p3| screw blue 17 rome
p6| cog red 19 london
Now we include another Where phrase for the desired result.
q)select from p where weight=(max;weight) fby city,color=`blue
p | name color weight city
--| -----------------------
p3| screw blue 17 rome
To group on multiple columns, encapsulate them in an anonymous table in the right operand of fby.
q)t:([]sym:`IBM`IBM`MSFT`IBM`MSFT;
ex:`N`O`N`N`N;
time:12:10:00 12:30:00 12:45:00 12:50:00 13:30:00;
price:82.1 81.95 23.45 82.05 23.40)
q)select from t where price=(max;price) fby ([]sym;ex)
sym ex time price
----------------------
IBM N 12:10:00 82.1
IBM O 12:30:00 81.95
MSFT N 12:45:00 23.45
Grouping and Aggregation
Aggregation without Grouping
q)\l sp.q
q)select total:sum qty, mean:avg qty from sp
total mean
--------------
3100 258.3333
Grouping without Aggregation
The By phrase groups rows having common values in specified column(s), much like GROUP BY in SQL.
q)t:([] c1:`a`b`a`b`c; c2:10 20 30 40 50)
q)t[`c2] group t[`c1]
a| 10 30
b| 20 40
c| ,50
q)select c2 by c1 from t
c1| c2
--| -----
a | 10 30
b | 20 40
c | ,50
Ungroup
q)ungroup select c2 by c1 from t
c1 c2
-----
a 10
a 30
b 20
b 40
c 50
Grouping with Aggregation
Normally you will group using by together with aggregation in the Select phrase. The effect is to aggregate along the groups, collapsing each group of rows into a single record. The result is a keyed table whose key columns are the grouped column(s).
q)select sum c2 by c1 from t
c1| c2
--| --
a | 40
b | 60
c | 50
To group on multiple columns, specify multiple by subphrases, which results in a compound key in the result.
q)t:([] desk:`a`b`a`b`a`b; acct:`1`2`3`4`1`4; pnl:1.1 -2.2 3.3 4.4 5.5 -.5)
q)select ct:count desk, sum pnl by desk,acct from t
desk acct| ct pnl
---------| -------
a 1 | 2 6.6
a 3 | 1 3.3
b 2 | 1 -2.2
b 4 | 2 3.9
Following is a useful example that averages the observations of our time series in 100 millisecond buckets.
q)t:([] c1:00:00:00.000+til 1000000;c2:1000000?`a`b;c3:1000000?100.)
q)select avg c3 by 100 xbar c1, c2 from t
c1 c2| c3
---------------| --------
00:00:00.000 a | 55.26494
00:00:00.000 b | 41.81758
00:00:00.100 a | 48.88826
00:00:00.100 b | 46.10946
The exec Template
select always returns a table, the result type of exec depends on the number of columns in its Select phrase. One column yields a list; more than one column yields a dictionary.
q)t:([] name:`a`b`c`d`e; state:`NY`FL`OH`NY`HI)
q)select name, distinct state from t
'length
q)exec name, distinct state from t
name | `a`b`c`d`e
state| `NY`FL`OH`HI
A common use of exec is when there is only one column in the aggregate phrase and no By phrase. The result is the computed column list devoid of other structure – i.e., not a dictionary or table. This is useful to extract a column dynamically.
q)select name from t
name
----
a
b
c
d
e
q)exec name from t
`a`b`c`d`e
When using exec to extract a single column of a table, you can place constraints on other columns.
q)exec name from t where state in `NY`HI
`a`d`e
The update Template
Basic update
update from texp
If the left of a colon is a column that exists in the table, that column is updated with the result of the expression to the right of the colon. If the left of a colon is not a column in the table, a new column of that name with the result of the expression is added to the end of the column list.
q)t:([] c1:`a`b`c; c2:10 20 30)
q)update c1:`x`y`z from t
c1 c2
-----
x 10
y 20
z 30
q)t
_
q)update c3:`x`y`z from t
c1 c2 c3
--------
a 10 x
b 20 y
c 30 z
q)(update c2:c2+100 from t)~update c2+100 from t
1b
q)t:([] c1:`a`b`c; c2:10 20 30)
q)update c3:c2+100 from t where c1<>`a
c1 c2 c3
---------
a 10
b 20 120
c 30 130
q)update c3:1b from t where c2>15
c1 c2 c3
--------
a 10 0
b 20 1
c 30 1
update-by
When the By phrase is present, the update operation is performed along groups. This is most useful with aggregate and uniform functions. For an aggregate function, the entire group gets the value of the aggregation on the group.
q)update avg weight by city from p
p | name color weight city
--| -------------------------
p1| nut red 15 london
p2| bolt green 14.5 paris
p3| screw blue 17 rome
p4| screw red 15 london
p5| cam blue 14.5 paris
p6| cog red 15 london
A uniform function is applied along the group in place. This can be used to compute cumulative volume of orders, for example.
q)update cumqty:sums qty by s from sp
s p qty cumqty
---------------
0 p1 300 300
0 p2 200 500
0 p3 400 900
0 p4 200 1100
3 p5 100 100
0 p6 100 1200
1 p1 300 300
1 p2 400 700
2 p2 200 200
3 p2 200 300
3 p4 300 600
0 p5 400 1600
作者:Jeffry A. Borror