1. 创建正则表达式
基本语法
val pattern1 = "\\d+".r
import scala.util.matching.Regex
val pattern2 = new Regex("\\d+")
val pattern3 = """\d+""".r
2. 常用模式
val digits = "\\d".r
val multiDigits = "\\d+".r
val exactly3Digits = "\\d{3}".r
val letters = "[a-zA-Z]".r
val word = "\\w+".r
val whitespace = "\\s".r
val dot = "\\.".r
val slash = "\\\\".r
3. 常用方法
查找匹配
val text = "我的电话是 13800138000,邮箱是 abc@example.com"
val phonePattern = "\\d{11}".r
val allPhones = phonePattern.findAllIn(text).toList
val firstPhone = phonePattern.findFirstIn(text)
val startsWithNum = "\\d+".r.findPrefixOf("123abc")
提取匹配组
val datePattern = """(\d{4})-(\d{2})-(\d{2})""".r
val date = "2024-01-15"
date match {
case datePattern(year, month, day) =>
println(s"年份: $year, 月份: $month, 日期: $day")
case _ =>
println("格式不匹配")
}
替换操作
val text = "价格: $100, 折扣: $20"
val result1 = "\\$\\d+".r.replaceAllIn(text, "***")
val result2 = "\\$\\d+".r.replaceFirstIn(text, "***")
val result3 = "\\d+".r.replaceAllIn(text, m => (m.matched.toInt * 2).toString)
分割字符串
val text = "apple,banana,orange,grape"
val items = ",".r.split(text)
4. 完整示例
object RegexDemo {
def main(args: Array[String]): Unit = {
val emailPattern = """^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$""".r
val emails = List(
"user@example.com",
"invalid-email",
"test@gmail.com"
)
emails.foreach { email =>
email match {
case emailPattern() => println(s"$email ✓ 有效")
case _ => println(s"$email ✗ 无效")
}
}
val text = "访问 https://www.example.com 或 http://test.org 获取更多信息"
val urlPattern = """https?://[^\s]+""".r
println("\n找到的URL:")
urlPattern.findAllIn(text).foreach(println)
val phoneText = "我的电话是13812345678,另一个是13987654321"
val phoneFormatPattern = "(\\d{3})(\\d{4})(\\d{4})".r
val formatted = phoneFormatPattern.replaceAllIn(phoneText,
m => s"${m.group(1)}-${m.group(2)}-${m.group(3)}")
println(s"\n格式化后: $formatted")
}
}
5. 实用技巧
忽略大小写
val caseInsensitive = "(?i)hello".r
val result = caseInsensitive.findFirstIn("Hello WORLD")
多行模式
val multiLine = """(?m)^\d+\. .*$""".r
val text = """1. 第一行
|2. 第二行
|3. 第三行""".stripMargin
命名捕获组
val namedPattern = """(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})""".r
val date = "2024-01-15"
date match {
case namedPattern(year, month, day) =>
println(s"$year年$month月$day日")
}
6. 性能建议
- 预编译正则表达式:如果同一个模式需要多次使用,先编译好
- 避免过度使用:简单字符串操作能用
contains、startsWith 等方法的就不要用正则
- 注意贪婪匹配:
.* 会匹配尽可能多的字符,有时需要用 .*? 进行非贪婪匹配