写在前面
这是针对自己公司的小组的一个分享
background
Traditionally logs were generated in the form of free form text messages prepended with some basic metadata such as the time of the event, severity and the source of the event. The advantages are:
- can be easily parsed
- easily stored in text files for humans to look at
On the other side...
- rigid metadata
- hard for filtering, searching, tracing context
- heavily depend on parsing, usually human-unreadable regrex
So what do we need?
- structured: logging in structured format, e.g. JSON
- contextual: logging with additional user-specified fields(context) automatically
- leveled: e.g. INFO, ERROR, DEBUG, FATAL
- severity: e.g. V(1), V(2)
Advantages?
- free to customize metadata/specs
- easy to track, search, filter
- ease developers
Any drawbacks?
- trade-off between out-of-the-box and context
- not that easy to ready JSON-formatted logs directly
related work
Lots of tools...
status quo
Not that easy to migrate to structured logging due to
- existed projects simply use klog for logging
- lack resources(time, developer, QA)
- logging is just assistance
- seldom meet bugs that need painstaking tracing
- learning cost
proposal
- almost compatible with klog
- structured, contextual, leveled, severity logging
- better performance by wrapping zap
usage
import
import this package by github.com/xial-thu/klog
initialization
for example, in main.go
// Unchanged
klog.InitFlags(nil)
flag.StringVar(&kubeconfig, "kubeconfig", "", "path to Kubernetes config file")
flag.Parse()
// Append a new API
klog.Singleton()
If Singleton()
is not called, the default global no-ops logger will work, which means you are not able to see any real log.
Due to some gaps between klog and zap, parameters shall be converted, and the conversion must be done after flag.Parse()
. klog.Singleton()
inits an unique global logger whose configuration is slightly different from default zap production configuration at:
Timekey
is set to "time"EncodeTime
is set toISO8601TimeEncoder
flags
Not all flags defined in klog is supported, or rather say, not all the flags still make sense. Only alsologtostderr
and v
is supported currently.
v
: still supportsklog.V(2).Info()
syntax. But the level ofklog.Info()
is INFO; thatklog.V(3).Info()
is DEBUG. If v is set to zero, zap DEBUG log will be ignored. Default to 0alsologtostderr
: default to true. If set to false, INFO and DEBUG log will only output to stdout
structured logging
There're 3 APIs:
With()
: parse each field and value from input.With(struct{A string}{"hi"})
will output"A":"hi"
. If you care the fields in your struct and hope to extract them, useWith()
WithAll()
: sugar ofzap.Any()
. e.g.WithFields(struct{A string}{"hi"})
will output"":{"A":"hi"}
. If you want to record the name of your struct, useWithAll()
WithFields()
: e.g.WithFields("ID", 1, "name": "hi")
, just another sugar ofsugar.With()
Tips of With()
:
- Only struct or map will be accepted
- If arg is a struct, only exported field(which means "FieldName", not "fieldname") will be logged
- If arg is a map, only accept maps whose key is string
- If you have nested structs or maps, consider spliting them into several anomony parts
- If you don't want some fields to be logged automatically, unexport them
Some examples of With()
:
type S struct {
A int
B string
}
type Q struct {
D ID
}
s := S{
A: 10,
B: "abc",
}
q := Q{
D: ID(1),
}
c := ""
// struct args
With(s).Info(c) // "A":10,"B":"abc"
With(s, q).Info(c) // "A":10,"B":"abc","D":1
// anomony
With(struct {
A int
B int
}{1, 2}).Info(c) // "A":1,"B":2
// map args
With(map[string]int{"A":1}).Info(c) // "A":1
performance of With()
case | ns/op | B/op | allocs/op |
---|---|---|---|
With every time | 3121 | 2884 | 20 |
WithField every time | 2239 | 1568 | 9 |
WithAll | 3215 | 2642 | 14 |
With once | 573 | 7 | 1 |
Due to reflect, With()
is slow. It indicates us that it's better to write like this instead of parsing interfaces every time.
newLogger := klog.With(something)
newLogger.Info(something)
limitation
- default field is empty.
- structured logging still needs modification in business logic.
- support of flags needs more work.
Best Practice
If you don't need additional metadata
just replace k8s.io/klog
with github.com/xial-thu/klog
and add one single API: klog.Singleton()
Why it's not totally compatible?
- flags are not used directly, so something has to be done after
flag.Parse()
- Though I can set default logger to a production-ready logger, we may still want some minor changes, such as time format and output files. So it's necessary to call an init function explicitly.
Add some flavor
Ideally, context is full lifecycle coverage. Just add the contexts at framework, not business logic. For example, if you have
func main() {
handler := someFunc
framework := NewFramework(handler)
framework.Run()
}
add some flavor at NewFramework(), the goal is to pass logger into handler.
type context struct {}
func NewFramework(handler someFunc) *framework {
ctx := context{}
logger := klog.With(ctx)
framework.logger = logger
framework.AddHandler(handler)
}
So here're some tips:
- carefully design your logging metadata(context)
- carefully design your software architecture and try to avoid reverse-dependency
Assume we have
type request struct {
httpContext HTTPContext
encodedRuntimeObject []byte
}
Obviously, we'd better track runtime object by adding some fields in logger. But the question is: During framework level, we DO NOT know the object. So we pass the logger into business logic, waiting for the decoder to extract runtime information. To make things worse, if we're printing some logs in framework, that runtime object may be lost forever.
Some examples
kubectl-gpu
{"level":"info","time":"2020-04-20T10:19:51.169+0800","caller":"server/handler.go:140","msg":"receive delete event of weike/parrots-20200416--slurm-node-0"}
{"level":"warn","time":"2020-04-20T10:19:51.169+0800","caller":"server/handler.go:155","msg":"unrecommended FORCE-DELETE: resource has not been released before DELETE event"}
admission-webhook
{"level":"info","time":"2020-04-20T10:27:36.075+0800","caller":"validate/pod.go:48","msg":"failed validation: path /local-path-storage failed to pass any rules","Method":"POST","Host":"validate-webhook.phx-system.svc:443","Path":"/validate","Proto":"HTTP/2.0","UID":"0309715d-c910-4489-9b1c-03a7d8fbec0f","UserName":"system:serviceaccount:monitoring:local-path-provisioner","UserGroup":["system:serviceaccounts","system:serviceaccounts:monitoring","system:authenticated"],"Group":"","Version":"v1","Kind":"Pod","Resource":"pods","SubResource":"","Namespace":"monitoring","Operation":"CREATE","Name":"delete-pvc-7c17608d-d210-45fd-8d01-ca9b2945c5e1"}
How it benefits us? explore kibana!
filtering, searching and debugging
structured logging: admission request and response
regrex parsing: tracing info in APIServer. You DO NOT want to read the regrex. /^(?[IWE]+)(?[^ ]* [^ ]*) (?[^ ]*) (?[^\] ]*)\] (?:(?[A-Z]{2,}))?(?:(? [/]{1}[/A-Za-z0-9.-]{2,}))?(?:(?[\?]{1}[^\: ]*))?(?:\: \((?[^ \)]*))?(?:\) (?[0-9]{1,}))?*(?.*)$/
common logs: All info is included in f_message
. Do filtering and searching using lucene grammar. Why bother?
full lifecycle tracking
use filters
Statistics & aggregation
- visualize
- dashboard
aggregation is the silver bullet.
The aggregations in this family compute metrics based on values extracted in one way or another from the documents that are being aggregated. The values are typically extracted from the fields of the document (using the field data), but can also be generated using scripts.
kibana supported aggregations:
- extract from fields of docment: require field type to be 'number'. Sadly, '1.23ms' is NOT numerical field!
- using script: do we really need it?
Proceed with caution
Please familiarize yourself with script fields and with scripts in aggregations before using scripted fields.
Scripted fields can be used to display and aggregate calculated values. As such, they can be very slow, and if done incorrectly, can cause Kibana to be unusable. There's no safety net here. If you make a typo, unexpected exceptions will be thrown all over the place!
So why not using metrics?