阅读 293

lodash 驼峰转换函数 camelCase

camelCase函数,直译过来是驼峰转换。本文将涉及到ascii码表,Unicode码表,利用正则表达式匹配ascii码或Unicode码,类型转换,密集型数组slice方法等内容

ASCII

image.png

Unicode

定义

Unicode

Unicode是一个包含用来编写文本的各种字符的一张表。ASCII也被包含在Unicode

UTF-8

UTF-8是一种编码,用于把一系列的代码点转换为机器码。所有 Unicode 代码点都可以用 UTF-8 编码。ASCII 也可以被编码,但仅支持128个字符

character

字符是一个相当模糊的概念。字母和数字以及标点符号都是字符。基本上是 Unicode 表中某处的一个字符。

glyph

字形是某种符号的视觉表示,由字体提供。它可能代表单个字符,也可能代表多个字符。或代表两者。

后续内容 Dark corners of Unicode。作者提出了一个很有意思的见解,javascript-has-no-string-type

相关网站

unicode官网

相关在线工具

基本上包含所有类型的unicode码

类型转换

getTag

const toString = Object.prototype.toString
function getTag(value) {
  if (value == null) {
    // 兼容javascript低版本,特殊处理null和undefined
    return value === undefined ? '[object Undefined]' : '[object Null]'
  }
  return toString.call(value)
}
复制代码

isSymbol

function isSymbol(value) {
  const type = typeof value
  return type == 'symbol' || (type === 'object' && value != null && getTag(value) == '[object Symbol]')
}
复制代码

toString

特殊处理-0的情况。递归处理数组(可能会堆栈溢出)。

const INFINITY = 1 / 0
function toString(value) {
  if (value == null) {
    return ''
  }
  // Exit early for strings to avoid a performance hit in some environments.
  // string类型直接返回
  if (typeof value === 'string') {
    return value
  }
  // 数组类型
  if (Array.isArray(value)) {
    // Recursively convert values (susceptible to call stack limits).
    // 数组项不为null或undefined的情况下,递归调用自身进行转换
    return `${value.map((other) => other == null ? other : toString(other))}`
  }
  // symbol类型调用symbol的toString方法
  if (isSymbol(value)) {
    return value.toString()
  }
  // 处理-0的情况
  const result = `${value}`
  return (result == '0' && (1 / value) == -INFINITY) ? '-0' : result
}
复制代码

匹配ascii码和unicode码

unicodeWords

用于匹配unicode编码的函数

/** Used to compose unicode character classes. */
// 星芒层
const rsAstralRange = '\\ud800-\\udfff'
// https://www.unicode.org/charts/PDF/U0300.pdf
const rsComboMarksRange = '\\u0300-\\u036f'
// https://www.unicode.org/charts/PDF/UFE20.pdf
const reComboHalfMarksRange = '\\ufe20-\\ufe2f'
// https://www.unicode.org/charts/PDF/U20D0.pdf
const rsComboSymbolsRange = '\\u20d0-\\u20ff'
// https://www.unicode.org/charts/PDF/U1AB0.pdf
const rsComboMarksExtendedRange = '\\u1ab0-\\u1aff'
// https://www.unicode.org/charts/PDF/U1DC0.pdf
const rsComboMarksSupplementRange = '\\u1dc0-\\u1dff'
const rsComboRange = rsComboMarksRange + reComboHalfMarksRange + rsComboSymbolsRange + rsComboMarksExtendedRange + rsComboMarksSupplementRange

// https://www.unicode.org/charts/PDF/U2700.pdf
const rsDingbatRange = '\\u2700-\\u27bf'
// a-z 223-246 248-255
const rsLowerRange = 'a-z\\xdf-\\xf6\\xf8-\\xff'
// 172 177 215 x  247 +
const rsMathOpRange = '\\xac\\xb1\\xd7\\xf7'
// 0-48 58-64 91-96 124-191
const rsNonCharRange = '\\x00-\\x2f\\x3a-\\x40\\x5b-\\x60\\x7b-\\xbf'
// https://www.unicode.org/charts/PDF/U2000.pdf
const rsPunctuationRange = '\\u2000-\\u206f'
// \t \n \r \f 11 160 
// feff https://www.unicode.org/charts/PDF/UFE70.pdf
// 1680 https://www.unicode.org/charts/PDF/U1680.pdf
// 180e https://www.unicode.org/charts/PDF/U1800.pdf
// 2000-200a 2028 2029 202f 205f https://www.unicode.org/charts/PDF/U2000.pdf
// 3000 https://www.unicode.org/charts/PDF/U3000.pdf
const rsSpaceRange = ' \\t\\x0b\\f\\xa0\\ufeff\\n\\r\\u2028\\u2029\\u1680\\u180e\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2007\\u2008\\u2009\\u200a\\u202f\\u205f\\u3000'
// A-Z 192-214 216-222
const rsUpperRange = 'A-Z\\xc0-\\xd6\\xd8-\\xde'
// fe0e fe0f https://www.unicode.org/charts/PDF/UFE00.pdf
const rsVarRange = '\\ufe0e\\ufe0f'
const rsBreakRange = rsMathOpRange + rsNonCharRange + rsPunctuationRange + rsSpaceRange

/** Used to compose unicode capture groups. */
// 匹配apostrophe(撇号)' https://www.unicode.org/charts/PDF/U2000.pdf
const rsApos = "['\u2019]"
// 匹配 运算符 非char字符 各种标点符 各种空格 
const rsBreak = `[${rsBreakRange}]`
// 见上述网址
const rsCombo = `[${rsComboRange}]`
// 数字
const rsDigit = '\\d'
// 见上述网址
const rsDingbat = `[${rsDingbatRange}]`
// 小写字母
const rsLower = `[${rsLowerRange}]`
// 匹配各种其他字符
const rsMisc = `[^${rsAstralRange}${rsBreakRange + rsDigit + rsDingbatRange + rsLowerRange + rsUpperRange}]`
// 组合 由d83c + 
// 位于 ud800-udfff 的一个元素 d83c 和DC00-DFFF的组合
// fitz emoji
const rsFitz = '\\ud83c[\\udffb-\\udfff]'
// 修饰符
const rsModifier = `(?:${rsCombo}|${rsFitz})`
// 非星芒层
const rsNonAstral = `[^${rsAstralRange}]`
// 国家地图
const rsRegional = '(?:\\ud83c[\\udde6-\\uddff]){2}'
const rsSurrPair = '[\\ud800-\\udbff][\\udc00-\\udfff]'
// 大写字母
const rsUpper = `[${rsUpperRange}]`
// ZWJ https://www.unicode.org/charts/PDF/U2000.pdf   ZERO WIDTH JOINER
const rsZWJ = '\\u200d'

/** Used to compose unicode regexes. */
const rsMiscLower = `(?:${rsLower}|${rsMisc})`
const rsMiscUpper = `(?:${rsUpper}|${rsMisc})`
const rsOptContrLower = `(?:${rsApos}(?:d|ll|m|re|s|t|ve))?`
const rsOptContrUpper = `(?:${rsApos}(?:D|LL|M|RE|S|T|VE))?`
const reOptMod = `${rsModifier}?`
const rsOptVar = `[${rsVarRange}]?`
const rsOptJoin = `(?:${rsZWJ}(?:${[rsNonAstral, rsRegional, rsSurrPair].join('|')})${rsOptVar + reOptMod})*`
const rsOrdLower = '\\d*(?:1st|2nd|3rd|(?![123])\\dth)(?=\\b|[A-Z_])'
const rsOrdUpper = '\\d*(?:1ST|2ND|3RD|(?![123])\\dTH)(?=\\b|[a-z_])'
const rsSeq = rsOptVar + reOptMod + rsOptJoin
const rsEmoji = `(?:${[rsDingbat, rsRegional, rsSurrPair].join('|')})${rsSeq}`

const reUnicodeWords = RegExp([
  `${rsUpper}?${rsLower}+${rsOptContrLower}(?=${[rsBreak, rsUpper, '$'].join('|')})`,
  `${rsMiscUpper}+${rsOptContrUpper}(?=${[rsBreak, rsUpper + rsMiscLower, '$'].join('|')})`,
  `${rsUpper}?${rsMiscLower}+${rsOptContrLower}`,
  `${rsUpper}+${rsOptContrUpper}`,
  rsOrdUpper,
  rsOrdLower,
  `${rsDigit}+`,
  rsEmoji
].join('|'), 'g')

/**
 * Splits a Unicode `string` into an array of its words.
 *
 * @private
 * @param {string} The string to inspect.
 * @returns {Array} Returns the words of `string`.
 */
function unicodeWords(string) {
  return string.match(reUnicodeWords)
}
复制代码

reUnicodeWords生成的正则表达式如下: 图形化结果

hasUnicode

是否包含unicode

/** Used to compose unicode character classes. */
// 星芒层
const rsAstralRange = '\\ud800-\\udfff'
// https://www.unicode.org/charts/PDF/U0300.pdf
const rsComboMarksRange = '\\u0300-\\u036f'
// https://www.unicode.org/charts/PDF/UFE20.pdf
const reComboHalfMarksRange = '\\ufe20-\\ufe2f'
// https://www.unicode.org/charts/PDF/U20D0.pdf
const rsComboSymbolsRange = '\\u20d0-\\u20ff'
// https://www.unicode.org/charts/PDF/U1AB0.pdf
const rsComboMarksExtendedRange = '\\u1ab0-\\u1aff'
// https://www.unicode.org/charts/PDF/U1DC0.pdf
const rsComboMarksSupplementRange = '\\u1dc0-\\u1dff'
const rsComboRange = rsComboMarksRange + reComboHalfMarksRange + rsComboSymbolsRange + rsComboMarksExtendedRange + rsComboMarksSupplementRange
// fe0e fe0f https://www.unicode.org/charts/PDF/UFE00.pdf
const rsVarRange = '\\ufe0e\\ufe0f'

/** Used to compose unicode capture groups. */
// ZWJ https://www.unicode.org/charts/PDF/U2000.pdf   ZERO WIDTH JOINER
const rsZWJ = '\\u200d'

/** Used to detect strings with [zero-width joiners or code points from the astral planes](http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/). */
const reHasUnicode = RegExp(`[${rsZWJ + rsAstralRange + rsComboRange + rsVarRange}]`)

/**
 * Checks if `string` contains Unicode symbols.
 * 字符串是否包含unicode编码
 *
 * @private
 * @param {string} string The string to inspect.
 * @returns {boolean} Returns `true` if a symbol is found, else `false`.
 */
function hasUnicode(string) {
  return reHasUnicode.test(string)
}

复制代码

asciiWords

用于匹配对应范围的ascii,对照ascii码表查看

/** Used to match words composed of alphanumeric characters. */
// 用于匹配由字母数字组成的单词,除去了各种标点
// 0-47 58-64 91-96 124-127
const reAsciiWord = /[^\x00-\x2f\x3a-\x40\x5b-\x60\x7b-\x7f]+/g

function asciiWords(string) {
  return string.match(reAsciiWord)
}
复制代码

asciiToArray

把一串ascii字符转换成数组

/**
 * Converts an ASCII `string` to an array.
 *
 * @private
 * @param {string} string The string to convert.
 * @returns {Array} Returns the converted array.
 */
function asciiToArray(string) {
  // 调用split方法
  return string.split('')
}
复制代码

unicodeToArray

把一串unicode字符转换成数组

/** Used to compose unicode character classes. */
// 星芒层
const rsAstralRange = '\\ud800-\\udfff'
// https://www.unicode.org/charts/PDF/U0300.pdf
const rsComboMarksRange = '\\u0300-\\u036f'
// https://www.unicode.org/charts/PDF/UFE20.pdf
const reComboHalfMarksRange = '\\ufe20-\\ufe2f'
// https://www.unicode.org/charts/PDF/U20D0.pdf
const rsComboSymbolsRange = '\\u20d0-\\u20ff'
// https://www.unicode.org/charts/PDF/U1AB0.pdf
const rsComboMarksExtendedRange = '\\u1ab0-\\u1aff'
// https://www.unicode.org/charts/PDF/U1DC0.pdf
const rsComboMarksSupplementRange = '\\u1dc0-\\u1dff'
const rsComboRange = rsComboMarksRange + reComboHalfMarksRange + rsComboSymbolsRange + rsComboMarksExtendedRange + rsComboMarksSupplementRange
const rsVarRange = '\\ufe0e\\ufe0f'

/** Used to compose unicode capture groups. */
const rsAstral = `[${rsAstralRange}]`
const rsCombo = `[${rsComboRange}]`
// fitz emoji
const rsFitz = '\\ud83c[\\udffb-\\udfff]'
// 修饰符
const rsModifier = `(?:${rsCombo}|${rsFitz})`
// 非星芒层
const rsNonAstral = `[^${rsAstralRange}]`
// 国家旗帜
const rsRegional = '(?:\\ud83c[\\udde6-\\uddff]){2}'
// High Surrogate Area https://www.unicode.org/charts/PDF/UD800.pdf
// Low Surrogate Area https://www.unicode.org/charts/PDF/UDC00.pdf
const rsSurrPair = '[\\ud800-\\udbff][\\udc00-\\udfff]'
// ZWJ https://www.unicode.org/charts/PDF/U2000.pdf   ZERO WIDTH JOINER
const rsZWJ = '\\u200d'

/** Used to compose unicode regexes. */
// 生成匹配正则
const reOptMod = `${rsModifier}?`
const rsOptVar = `[${rsVarRange}]?`
const rsOptJoin = `(?:${rsZWJ}(?:${[rsNonAstral, rsRegional, rsSurrPair].join('|')})${rsOptVar + reOptMod})*`
const rsSeq = rsOptVar + reOptMod + rsOptJoin
const rsNonAstralCombo = `${rsNonAstral}${rsCombo}?`
const rsSymbol = `(?:${[rsNonAstralCombo, rsCombo, rsRegional, rsSurrPair, rsAstral].join('|')})`

/** Used to match [string symbols](https://mathiasbynens.be/notes/javascript-unicode). */
const reUnicode = RegExp(`${rsFitz}(?=${rsFitz})|${rsSymbol + rsSeq}`, 'g')

/**
 * Converts a Unicode `string` to an array.
 *
 * @private
 * @param {string} string The string to convert.
 * @returns {Array} Returns the converted array.
 */
function unicodeToArray(string) {
  return string.match(reUnicode) || []
}
复制代码

工具方法

stringToArray

string字符转换为字符串,依赖于上面的hasUnicodeasciiToArrayunicodeToArray方法

import asciiToArray from './asciiToArray.js'
import hasUnicode from './hasUnicode.js'
import unicodeToArray from './unicodeToArray.js'

/**
 * Converts `string` to an array.
 *
 * @private
 * @param {string} string The string to convert.
 * @returns {Array} Returns the converted array.
 */
function stringToArray(string) {
  return hasUnicode(string)
    ? unicodeToArray(string)
    : asciiToArray(string)
}
复制代码

slice

Array.prototype.slice方法的实现。这个方法确保密集型数组的返回。

/**
 * Creates a slice of `array` from `start` up to, but not including, `end`.
 *
 * **Note:** This method is used instead of
 * [`Array#slice`](https://mdn.io/Array/slice) to ensure dense arrays are
 * returned.这个方法确保密集型数组的返回
 *
 * @since 3.0.0
 * @category Array
 * @param {Array} array The array to slice.
 * @param {number} [start=0] The start position. A negative index will be treated as an offset from the end.
 * @param {number} [end=array.length] The end position. A negative index will be treated as an offset from the end.
 * @returns {Array} Returns the slice of `array`.
 * @example
 *
 * var array = [1, 2, 3, 4]
 *
 * _.slice(array, 2)
 * // => [3, 4]
 */
function slice(array, start, end) {
  let length = array == null ? 0 : array.length
  // length为0返回空数组
  if (!length) {
    return []
  }
  // start为null 或 undefined 默认为0
  start = start == null ? 0 : start
  // end 为 undefined 默认数组长度
  end = end === undefined ? length : end

  // start 为负数 则从后向前定位起始位置
  if (start < 0) {
    start = -start > length ? 0 : (length + start)
  }
  // end 最大不能超过数组长度
  end = end > length ? length : end
  // end为负数,从后向前确定截止位置
  if (end < 0) {
    end += length
  }
  // 开始位置大于截止位置返回0 否则
  // 无符号右移 取整
  length = start > end ? 0 : ((end - start) >>> 0)
  // 无符号右移 取整
  start >>>= 0

  // 返回对应的数组
  let index = -1
  const result = new Array(length)
  while (++index < length) {
    result[index] = array[index + start]
  }
  return result
}
复制代码

castSlice

依赖于上面的slice函数。增加了传入数组是否需要slice的判断

import slice from '../slice.js'

/**
 * Casts `array` to a slice if it's needed.
 *
 * @private
 * @param {Array} array The array to inspect.
 * @param {number} start The start position.
 * @param {number} [end=array.length] The end position.
 * @returns {Array} Returns the cast slice.
 */
function castSlice(array, start, end) {
  const { length } = array
  // 不传默认数组长度
  end = end === undefined ? length : end
  // 调用slice方法
  return (!start && end >= length) ? array : slice(array, start, end)
}
复制代码

createCaseFirst

根据传入的methodname 生成对应的函数,本质上是对传入字符串的第一个字符调用对应的方法。依赖于上面的castSlice函数(用于截取第一个字符之后的内容),hasUnicode函数(用户检测传入的字符串是否包含unicode),stringToArray函数(字符串包含unicode的情况下将字符串正确解析为一个数组)

import castSlice from './castSlice.js'
import hasUnicode from './hasUnicode.js'
import stringToArray from './stringToArray.js'

/**
 * Creates a function like `lowerFirst`.
 * 根据传入的methodname 生成对应的函数
 *
 * @private
 * @param {string} methodName The name of the `String` case method to use.
 * @returns {Function} Returns the new case function.
 */
function createCaseFirst(methodName) {
  return (string) => {
    // string 为空字符串 不做任何操作
    if (!string) {
      return ''
    }

    // 包含unicode码,调用内部的asciiToArray和unicodeToArray方法
    const strSymbols = hasUnicode(string)
      ? stringToArray(string)
      : undefined

    // string 字符不包含unicode 默认取string的第一个字符。否则转化为数组后,取第一个字符
    const chr = strSymbols
      ? strSymbols[0]
      : string[0]

    // string截取剩下的字符串。包含unicode情况,截取数组的1到最后一项并转换为字符串
    const trailing = strSymbols
      ? castSlice(strSymbols, 1).join('')
      : string.slice(1)

    // 调用第一个字符串对应的方法执行 prototype上对应的函数 并 追加后续字符串
    // 只对字符串的第一个字符进行操作
    return chr[methodName]() + trailing
  }
}
复制代码

upperFirst

调用了createCaseFirst并传入方法名toUpperCase。默认调用String.prototype.toUpperCase

import createCaseFirst from './.internal/createCaseFirst.js'

/**
 * Converts the first character of `string` to upper case.
 * 字符串的第一个字母转换成大写字母
 * 
 * @since 4.0.0
 * @category String
 * @param {string} [string=''] The string to convert.
 * @returns {string} Returns the converted string.
 * @see camelCase, kebabCase, lowerCase, snakeCase, startCase, upperCase
 * @example
 *
 * upperFirst('fred')
 * // => 'Fred'
 *
 * upperFirst('FRED')
 * // => 'FRED'
 */
const upperFirst = createCaseFirst('toUpperCase')
复制代码

hasUnicodeWord

是否包含unicode字符,匹配规则如下:

const hasUnicodeWord = RegExp.prototype.test.bind(
  /[a-z][A-Z]|[A-Z]{2}[a-z]|[0-9][a-zA-Z]|[a-zA-Z][0-9]|[^a-zA-Z0-9 ]/
)
复制代码

image.png

words方法

调用了reAsciiWord(返回一个数组包含匹配的字符串)函数,unicodeWords函数(返回一个数组包含匹配的字符串),hasUnicodeWord函数(匹配包含aA AAa 0a 0A A0 a0以及任意非数字字母空格)

/**
 * Splits `string` into an array of its words.
 *
 * @since 3.0.0
 * @category String
 * @param {string} [string=''] The string to inspect.
 * @param {RegExp|string} [pattern] The pattern to match words.
 * @returns {Array} Returns the words of `string`.
 * @example
 *
 * words('fred, barney, & pebbles')
 * // => ['fred', 'barney', 'pebbles']
 *
 * words('fred, barney, & pebbles', /[^, ]+/g)
 * // => ['fred', 'barney', '&', 'pebbles']
 */
function words(string, pattern) {
  if (pattern === undefined) {
    const result = hasUnicodeWord(string) ? unicodeWords(string) : asciiWords(string)
    return result || []
  }
  return string.match(pattern) || []
}
复制代码

camelCase

依赖于toString函数(传入value转换为string类型),words函数(匹配字符串中的字符并返回一个数组),upperFirst函数(单词的第一个字母大写)

import upperFirst from './upperFirst.js'
import words from './words.js'
import toString from './toString.js'

/**
 * Converts `string` to [camel case](https://en.wikipedia.org/wiki/CamelCase).
 * 驼峰转换
 * 
 * @since 3.0.0
 * @category String
 * @param {string} [string=''] The string to convert.
 * @returns {string} Returns the camel cased string.
 * @see lowerCase, kebabCase, snakeCase, startCase, upperCase, upperFirst
 * @example
 *
 * camelCase('Foo Bar')
 * // => 'fooBar'
 *
 * camelCase('--foo-bar--')
 * // => 'fooBar'
 *
 * camelCase('__FOO_BAR__')
 * // => 'fooBar'
 */
const camelCase = (string) => (
  /**
   * 1. 先转换成string类型
   * 2. replace替换撇号' ’ 为空字符串
   * 3. 传给words方法处理
   * 4. 处理words返回的数组
   */
  words(toString(string).replace(/['\u2019]/g, '')).reduce((result, word, index) => {
    // 当前单词转小写
    word = word.toLowerCase()
    // index 不等于 0时 把当前单词的第一个字母转换成大写 
    return result + (index ? upperFirst(word) : word)
  }, '')
)
复制代码
文章分类
前端
文章标签