我有包含足球队名称的数据库,例如,在下面的第一项中,Marshall 和 Southern Methodist。然后,与我的数据库名称匹配的是一些不同,但可识别的名称(在第一个条目中,SMU、Marshall):
[u'Houston', u'Alabama']
[u'Houst', u'Alab']
[u'Florida State', u'North Carolina State']
[u'NCSt', u'FlaSt']
[u'Penn State', u'Iowa']
[u'PnSt', u'Iowa']
[u'Oklahoma', u'Texas']
[u'Texas', u'Okla']
[u'Florida Atlantic', u'South Florida']
[u'SFla', u'FlAtl']
[u'Georgia', u'Tennessee']
[u'Geo', u'Tenn']
[u'San Jose State', u'Idaho']
[u'UI', u'SJSU']
[u'Washington State', u'Arizona State']
[u'ArzSt', u'WshSt']
[u'Fresno State', u'Nevada']
[u'Nevad', u'FrsSt']
[u'Oregon State', u'Arizona']
[u'ARIZ', u'OSU']
[u'Clemson', u'Virginia Tech']
[u'VTech', u'Clem']
[u'Chattanooga', u'Arkansas']
[u'UTC', u'AR']
[u'USC', u'Stanford']
[u'USC', u'Stanf']
[u'Baylor', u'Colorado']
[u'BU', u'CU']
[u'North Texas', u'Louisiana-Lafayette']
[u'NoTex', u'LaLaf']
[u'Tulane', u'Army']
[u'TLN', u'ARMY']
[u'Troy', u'Florida International']
[u'TROY', u'FIU']
[u'Louisiana-Monroe', u'Arkansas State']
[u'ASU', u'ULM']
[u'Texas Tech', u'Iowa State']
[u'TT', u'ISU']
[u'Akron', u'Western Michigan']
[u'AKRON', u'WMU']
[u'Liberty', u'Toledo']
[u'LIBERTY', u'TOLEDO']
[u'Virginia', u'Middle Tennessee']
[u'Virg', u'MTnSt']
[u'Oklahoma State', u'Texas A&M']
[u'TexAM', u'OKSt']
[u'Notre Dame', u'UCLA']
[u'NDame', u'UCLA']
[u'Rutgers', u'Cincinnati']
[u'Cincy', u'Rutgr']
[u'Ohio State', u'Purdue']
[u'Prdue', u'OhSt']
[u'LSU', u'Florida']
[u'Fla', u'LSU']
[u'Air Force', u'UNLV']
[u'AFA', u'UNLV']
[u'Nebraska', u'Missouri']
[u'Misso', u'Neb']
[u'New Mexico State', u'Boise State']
[u'NMxSt', u'BoiSt']
[u'Pittsburgh', u'Navy']
[u'Navy', u'Pitt']
[u'Wake Forest', u'Florida State']
[u'WFrst', u'FlaSt']
[u'San Jose State', u'Hawaii']
[u'Hawa', u'SJSt']
[u'UCF', u'South Florida']
[u'UCF', u'SFla']
对于每四组名称,我需要将我的数据库名称与正确的名称匹配。我可以用很多 if 语句来完成,但这需要很多代码,而且不太优雅。有没有更好的方法来匹配?
解决方案
from difflib import SequenceMatcher
li = [
([u'Houston', u'Alabama'],
[u'Houst', u'Alab']),
([u'Florida State', u'North Carolina State'],
[u'NCSt', u'FlaSt']),
([u'Penn State', u'Iowa'],
[u'PnSt', u'Iowa']),
([u'Oklahoma', u'Texas'],
[u'Texas', u'Okla']),
([u'Florida Atlantic', u'South Florida'],
[u'SFla', u'FlAtl']),
([u'Georgia', u'Tennessee'],
[u'Geo', u'Tenn']),
([u'San Jose State', u'Idaho'],
[u'UI', u'SJSU']),
([u'Washington State', u'Arizona State'],
[u'ArzSt', u'WshSt']),
([u'Fresno State', u'Nevada'],
[u'Nevad', u'FrsSt']),
([u'Oregon State', u'Arizona'],
[u'ARIZ', u'OSU']),
([u'Clemson', u'Virginia Tech'],
[u'VTech', u'Clem']),
([u'Chattanooga', u'Arkansas'],
[u'UTC', u'AR']),
([u'USC', u'Stanford'],
[u'USC', u'Stanf']),
([u'Baylor', u'Colorado'],
[u'BU', u'CU']),
([u'North Texas', u'Louisiana-Lafayette'],
[u'NoTex', u'LaLaf']),
([u'Tulane', u'Army'],
[u'TLN', u'ARMY']),
([u'Troy', u'Florida International'],
[u'TROY', u'FIU']),
([u'Louisiana-Monroe', u'Arkansas State'],
[u'ASU', u'ULM']),
([u'Texas Tech', u'Iowa State'],
[u'TT', u'ISU']),
([u'Akron', u'Western Michigan'],
[u'AKRON', u'WMU']),
([u'Liberty', u'Toledo'],
[u'LIBERTY', u'TOLEDO']),
([u'Virginia', u'Middle Tennessee'],
[u'Virg', u'MTnSt']),
([u'Oklahoma State', u'Texas A&M'],
[u'TexAM', u'OKSt']),
([u'Notre Dame', u'UCLA'],
[u'NDame', u'UCLA']),
([u'Rutgers', u'Cincinnati'],
[u'Cincy', u'Rutgr']),
([u'Ohio State', u'Purdue'],
[u'Prdue', u'OhSt']),
([u'LSU', u'Florida'],
[u'Fla', u'LSU']),
([u'Air Force', u'UNLV'],
[u'AFA', u'UNLV']),
([u'Nebraska', u'Missouri'],
[u'Misso', u'Neb']),
([u'New Mexico State', u'Boise State'],
[u'NMxSt', u'BoiSt']),
([u'Pittsburgh', u'Navy'],
[u'Navy', u'Pitt']),
([u'Wake Forest', u'Florida State'],
[u'WFrst', u'FlaSt']),
([u'San Jose State', u'Hawaii'],
[u'Hawa', u'SJSt']),
([u'UCF', u'South Florida'],
[u'UCF', u'SFla'])
]
def comp(N, D, sq=SequenceMatcher(None)):
sq.set_seqs(N[0], D[0])
a = sq.ratio()
sq.set_seqs(N[1], D[1])
b = sq.ratio()
sq.set_seqs(N[0], D[1])
x = sq.ratio()
sq.set_seqs(N[1], D[0])
y = sq.ratio()
sq.set_seqs(N[0].lower(), D[0].lower())
al = sq.ratio()
sq.set_seqs(N[1].lower(), D[1].lower())
bl = sq.ratio()
sq.set_seqs(N[0].lower(), D[1].lower())
xl = sq.ratio()
sq.set_seqs(N[1].lower(), D[0].lower())
yl = sq.ratio()
if ((a > 0.5 and b > 0.5 and a + b > 1.4)
or (al > 0.5 and bl > 0.5 and al + bl > 1.4)):
return (N[