Keyword spotting datasets are hard to create because:
1. We need to consider who the end users will be (e.g., will they have accents?)
2. We need to consider the environment in which the application will be run (e.g., how much background noise do we expect?)
3. We need to consider the requirements of the model (e.g., how large of a false detection rate can we handle?)
4. We need to consider what distractors will be given to the model (e.g., who else might be speaking around the device?)
展开
评论