日拱一卒，伯克利CS61A，这是我见过最酷炫的Python作业

Github

阶段0 工具方法

Problem 0

Problem 0.1 Using list comprehensions

list comprehension是如下形式的语法：

[<map expression> for <name> in <sequence expression> if <filter expression>]

>>> [x * x for x in range(10) if x % 2 == 0]
[0, 4, 16, 36, 64]

def map_and_filter(s, map_fn, filter_fn):
"""Returns a new list containing the results of calling map_fn on each
element of sequence s for which filter_fn returns a true value.

>>> square = lambda x: x * x
>>> is_odd = lambda x: x % 2 == 1
>>> map_and_filter([1, 2, 3, 4, 5], square, is_odd)
[1, 9, 25]
"""
# BEGIN Question 0
return [map_fn(x) for x in s if filter_fn(x)]
# END Question 0

Problem 0.2: Using min

min函数接收一个list，返回其中最小的元素。但它同样可以接收一个匿名函数key，用来自定义元素的排序。这个匿名函数key只能有一个输入，它会被list中的每一个元素调用，它返回的结果将会用来进行比较。

>>> min([-1, 0, 1]) # no key argument; smallest input
-1
>>> min([-1, 0, 1], key=lambda x: x*x) # input with the smallest square
0

def key_of_min_value(d):
"""Returns the key in a dict d that corresponds to the minimum value of d.

>>> letters = {'a': 6, 'b': 5, 'c': 4, 'd': 5}
>>> min(letters)
'a'
>>> key_of_min_value(letters)
'c'
"""
# BEGIN Question 0
return min(d, key=lambda x: d[x])
# END Question 0

Problem 1

def mean(s):
"""Returns the arithmetic mean of a sequence of numbers s.

>>> mean([-1, 3])
1.0
>>> mean([0, -3, 2, -1])
-0.5
"""
# BEGIN Question 1
assert len(s) > 0, 'empty list'
return sum(s) / len(s)
# END Question 1

阶段1 数据抽象

Problem 2

• make_restaurant函数: 返回一个餐馆，它由5个字段组成:

• name (a string)

• location (a list containing latitude and longitude)

• categories (a list of strings)

• price (a number)

• reviews (a list of review data abstractions created by make_review)

• restaurant_name: 返回restaurant名称

• restaurant_location: 返回restaurant位置

• restaurant_categories: 返回restaurant类别

• restaurant_price: 返回restaurant价格

• restaurant_ratings: 返回restaurant评分

def make_restaurant(name, location, categories, price, reviews):
"""Return a restaurant data abstraction containing the name, location,
categories, price, and reviews for that restaurant."""
# BEGIN Question 2
return [name, location, categories, price, reviews]
# END Question 2

def restaurant_name(restaurant):
"""Return the name of the restaurant, which is a string."""
# BEGIN Question 2
return restaurant[0]
# END Question 2

def restaurant_location(restaurant):
"""Return the location of the restaurant, which is a list containing
latitude and longitude."""
# BEGIN Question 2
return restaurant[1]
# END Question 2

def restaurant_categories(restaurant):
"""Return the categories of the restaurant, which is a list of strings."""
# BEGIN Question 2
return restaurant[2]
# END Question 2

def restaurant_price(restaurant):
"""Return the price of the restaurant, which is a number."""
# BEGIN Question 2
return restaurant[3]
# END Question 2

def restaurant_ratings(restaurant):
"""Return a list of ratings, which are numbers from 1 to 5, of the
restaurant based on the reviews of the restaurant."""
# BEGIN Question 2
return [review_rating(r) for r in restaurant[4]]
# END Question 2

python3 recommend.py -u one_cluster

阶段2 无监督学习

• 根据距离类簇距离的远近，将样本点分成k个类别
• 将k个类别中的点的坐标取平均，得到新的类簇

• location: 餐厅的坐标，可以表示成经纬度的集合：(latitude, longitude)
• centroid: 某个类别的中心坐标（所有点的坐标均值）
• restaurant: 餐厅信息的抽象，定义在 abstractions.py 文件中
• cluster: 聚集在同一个类别的餐厅list
• user: 用户信息的抽象，定义在 abstractions.py文件中
• review: 评分信息的抽象, 定义在 abstractions.py文件中
• feature function: 特征函数，单个参数函数。以餐厅为输入，返回一个浮点值，比如打分的均值或者是价格的均值

Problem 3

# distance函数
def distance(pos1, pos2):
"""Returns the Euclidean distance between pos1 and pos2, which are pairs.

>>> distance([1, 2], [4, 6])
5.0
"""
return sqrt((pos1[0] - pos2[0]) ** 2 + (pos1[1] - pos2[1]) ** 2)

def find_closest(location, centroids):
"""Return the centroid in centroids that is closest to location.
If multiple centroids are equally close, return the first one.

>>> find_closest([3.0, 4.0], [[0.0, 0.0], [2.0, 3.0], [4.0, 3.0], [5.0, 5.0]])
[2.0, 3.0]
"""
# BEGIN Question 3
return min(centroids, key=lambda x: distance(location, x))
# END Question 3

Problem 4

group_by_first函数代码如下：

def group_by_first(pairs):
"""Return a list of pairs that relates each unique key in the [key, value]
pairs to a list of all values that appear paired with that key.

Arguments:
pairs -- a sequence of pairs

>>> example = [ [1, 2], [3, 2], [2, 4], [1, 3], [3, 1], [1, 2] ]
>>> group_by_first(example)
[[2, 3, 2], [2, 1], [4]]
"""
keys = []
for key, _ in pairs:
if key not in keys:
keys.append(key)
return [[y for x, y in pairs if x == key] for key in keys]

def group_by_centroid(restaurants, centroids):
"""Return a list of clusters, where each cluster contains all restaurants
nearest to a corresponding centroid in centroids. Each item in
restaurants should appear once in the result, along with the other
restaurants closest to the same centroid.
"""
# BEGIN Question 4
pairs = [[find_closest(restaurant_location(r), centroids), r] for r in restaurants]
return group_by_first(pairs)
# END Question 4

Problem 5

def find_centroid(cluster):
"""Return the centroid of the locations of the restaurants in cluster."""
# BEGIN Question 5
latitudes, longitudes = [], []
for c in cluster:
loc = restaurant_location(c)
latitudes.append(loc[0])
longitudes.append(loc[1])
return [mean(latitudes), mean(longitudes)]
# END Question 5

Problem 6

• restaurant聚类，每一个类簇中的restaurant最接近的centroid一样
• 根据聚类的结果，更新centroids

def k_means(restaurants, k, max_updates=100):
"""Use k-means to group restaurants by location into k clusters."""
assert len(restaurants) >= k, 'Not enough restaurants to cluster'
old_centroids, n = [], 0
# Select initial centroids randomly by choosing k different restaurants
centroids = [restaurant_location(r) for r in sample(restaurants, k)]

while old_centroids != centroids and n < max_updates:
old_centroids = centroids
# BEGIN Question 6
pairs = group_by_centroid(restaurants, centroids)
centroids = [find_centroid(l) for l in pairs]
# END Question 6
n += 1
return centroids

python3 recommend.py -k 2

阶段3 有监督学习

Problem 7

• Sxx = Σi (xi - mean(x))2
• Syy = Σi (yi - mean(y))2
• Sxy = Σi (xi - mean(x) (yi - mean(y)

• b = Sxy / Sxx
• a = mean(y) - b * mean(x)
• R2 = Sxy2 / (Sxx Syy)

def find_predictor(user, restaurants, feature_fn):
"""Return a rating predictor (a function from restaurants to ratings),
for a user by performing least-squares linear regression using feature_fn
on the items in restaurants. Also, return the R^2 value of this model.

Arguments:
user -- A user
restaurants -- A sequence of restaurants
feature_fn -- A function that takes a restaurant and returns a number
"""
reviews_by_user = {review_restaurant_name(review): review_rating(review)
for review in user_reviews(user).values()}

xs = [feature_fn(r) for r in restaurants]
ys = [reviews_by_user[restaurant_name(r)] for r in restaurants]

# BEGIN Question 7
x_mean = mean(xs)
sxx = sum([(x - x_mean)**2 for x in xs])

y_mean = mean(ys)
syy = sum([(y - y_mean)**2 for y in ys])
sxy = sum([(x - x_mean)*(y - y_mean) for x, y in zip(xs, ys)])
b = sxy / sxx  # REPLACE THIS LINE WITH YOUR SOLUTION
a = y_mean - b * x_mean
r_squared = sxy * sxy / (sxx * syy)
# END Question 7

def predictor(restaurant):
return b * feature_fn(restaurant) + a

return predictor, r_squared

Problem 8

def best_predictor(user, restaurants, feature_fns):
"""Find the feature within feature_fns that gives the highest R^2 value
for predicting ratings by the user; return a predictor using that feature.

Arguments:
user -- A user
restaurants -- A list of restaurants
feature_fns -- A sequence of functions that each takes a restaurant
"""
reviewed = user_reviewed_restaurants(user, restaurants)
# BEGIN Question 8
dt = {}
for feature_fn in feature_fns:
predictor, r_squared = find_predictor(user, reviewed, feature_fn)
dt[predictor] = r_squared
return max(dt, key=lambda x: dt[x])
# END Question 8

Problem 9

def rate_all(user, restaurants, feature_fns):
"""Return the predicted ratings of restaurants by user using the best
predictor based on a function from feature_fns.

Arguments:
user -- A user
restaurants -- A list of restaurants
feature_fns -- A sequence of feature functions
"""
predictor = best_predictor(user, ALL_RESTAURANTS, feature_fns)
reviewed = user_reviewed_restaurants(user, restaurants)
# BEGIN Question 9
return {restaurant_name(r): user_rating(user, restaurant_name(r)) if r in reviewed else predictor(r) for r in restaurants}
# END Question 9

python3 recommend.py -u likes_southside -k 5 -p

Problem 10

def search(query, restaurants):
"""Return each restaurant in restaurants that has query as a category.

Arguments:
query -- A string
restaurants -- A sequence of restaurants
"""
# BEGIN Question 10
return [r for r in restaurants if query in restaurant_categories(r)]
# END Question 10

python3 recommend.py -u likes_expensive -k 2 -p -q Sandwiches

• 梁唐
12月前
• 梁唐
1年前
• 梁唐
1年前
• 梁唐
1年前
• 梁唐
1年前
• 梁唐
1年前
• vortesnail
1年前
• Wizey
4年前