如何使用Python的re模块从JavaScript中过滤注释

120 阅读2分钟

人们想要使用python的re模块来过滤掉JavaScript中的注释,尤其是单行注释(以“//”开始)。但他们在努力了很长时间后都没有成功,因此寻求帮助。

2、解决方案

答案1

一位热心的回答者提供了详细的解决方案:

  • 使用re.compile()函数创建一个正则表达式对象,其中包含一个复杂的正则表达式,可以匹配代码、多行注释和单行注释。
  • 使用re.findall()函数将正则表达式应用于JavaScript代码,并将匹配的结果存储在一个列表中。
  • 从列表中提取代码、多行注释和单行注释,并分别打印出来。

答案2

另一位回答者提供了一种更简单的解决方案:

  • 使用re.compile()函数创建一个包含“//.*$”正则表达式的正则表达式对象。
  • 使用re.match()函数将正则表达式应用于每行JavaScript代码,并将匹配的结果存储在一个列表中。
  • 从列表中提取单行注释,并打印出来。

代码例子

import re

# 答案1中的正则表达式
reexpr = r"""
    (                           # Capture code
        "(?:\.|[^"\])*"       # String literal
        |
        '(?:\.|[^'\])*'       # String literal
        |
        (?:[^/\n"']|/[^/*\n"'])+ # Any code besides newlines or string literals
        |
        \n                      # Newline
    )|
    (/*  (?:[^*]|*[^/])*   */)        # Multi-line comment
    |
    (?://(.*)$)                 # Comment
    $"""
rx = re.compile(reexpr, re.VERBOSE + re.MULTILINE)

# 答案2中的正则表达式
rx = re.compile(r'.*(//(.*))$')

# 输入的JavaScript代码
code = r"""// this is a comment
var x = 2 * 4 // and this is a comment too
var url = "http://www.google.com/" // and "this" too
url += 'but // this is not a comment' // however this one is
url += 'this "is not a comment' + " and ' neither is this " // only this

bar = 'http://no.comments.com/' // these // are // comments
bar = 'text // string ' no // more //\' // comments
bar = 'http://no.comments.com/'
bar = /var/ // comment

/* comment 1 */
bar = open() /* comment 2 */
bar = open() /* comment 2b */// another comment
bar = open( /* comment 3 */ file) // another comment 
"""

# 答案1的处理过程
parts = rx.findall(code)
print('*' * 80, '\nCode:\n\n', '\n'.join([x[0] for x in parts if x[0].strip()]))
print('*' * 80, '\nMulti line comments:\n\n', '\n'.join([x[1] for x in parts if x[1].strip()]))
print('*' * 80, '\nOne line comments:\n\n', '\n'.join([x[2] for x in parts if x[2].strip()]))

# 答案2的处理过程
lines = ["// this is a comment", 
    "var x = 2 // and this is a comment too",
    """var url = "http://www.google.com/" // and "this" too""",
    """url += 'but // this is not a comment' // however this one is""",
    """url += 'this "is not a comment' + " and ' neither is this " // only this""",]

for line in lines: 
    print(rx.match(line).groups())

输出结果:

********************************************************************************
Code:

var x = 2
var url = "http://www.google.com/"
url += 'but '
url += 'this "is not a comment' + " and ' neither is this "
bar = 'http://no.comments.com/'
bar = 'text // string ' no  more '
bar = 'http://no.comments.com/'
bar = /var/
bar = open()
bar = open()
bar = open( file)
********************************************************************************
Multi line comments:

/* comment 1 */
/* comment 2 */
/* comment 2b */// another comment
/* comment 3 */
********************************************************************************
One line comments:

 this is a comment
 and this is a comment too
 and "this" too
 however this one is
 only this
 these // are // comments
// string ' no // more //\' // comments
 comment
 another comment
********************************************************************************
('// this is a comment', ' this is a comment')
('// and this is a comment too', ' and this is a comment too')
('// and "this" too', ' and "this" too')
('// however this one is', ' however this one is')
('// only this', ' only this')