[本文始发于个人公众号:painless1207,原创不易,点赞关注哦]
requirements—pipreqs
根据代码自动生成引用的package list文件
基本用法:
pipreqs --pypi-server https://mirrors.cloud.tencent.com/pypi/simple \
--savepath requirements.txt \
[target path]
pipreqs - Generate pip requirements.txt file based on imports
Usage:
pipreqs [options] [<path>]
Arguments:
<path> The path to the directory containing the application
files for which a requirements file should be
generated (defaults to the current working
directory).
Options:
--use-local Use ONLY local package info instead of querying PyPI.
--pypi-server <url> Use custom PyPi server.
--proxy <url> Use Proxy, parameter will be passed to requests
library. You can also just set the environments
parameter in your terminal:
$ export HTTP_PROXY="http://10.10.1.10:3128"
$ export HTTPS_PROXY="https://10.10.1.10:1080"
--debug Print debug information.
--ignore <dirs>... Ignore extra directories, each separated by a comma.
--no-follow-links Do not follow symbolic links in the project
--encoding <charset> Use encoding parameter for file open
--savepath <file> Save the list of requirements in the given file
--print Output the list of requirements in the standard
output.
--force Overwrite existing requirements.txt
--diff <file> Compare modules in requirements.txt to project
imports.
--clean <file> Clean up requirements.txt by removing modules
that are not imported in project.
--no-pin Omit version of output packages.
版本管理—pyenv
安装
从homebrew或你的包管理器安装pyenv。
设置shell:
bash:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.bashrc
zsh:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.zshrc
然后重启shell。
以上两段的第三句,可以不写,如果不写,在使用pyenv之前需要在shell中键入 pyenv init 以激活pyenv。
使用
pyenv install --list #显示可安装版本
pyenv install [版本号]
可用命令:
依赖控制和虚拟环境—poetry
1. 安装
Introduction | Documentation | Poetry - Python dependency management and packaging made easy.
1.1 Linux:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
安装不顺畅的话可以先下载raw.githubusercontent.com/python-poet…这个文件,然后从github.com/python-poet…下载对应平台、版本的安装包;get-poetry.py是如下所述的安装脚本,通过file参数指定已下载的安装包即可。
Remainder of file ignored
usage: get-poetry.py [-h] [-p] [--version VERSION] [-f] [--no-modify-path]
[-y] [--uninstall] [--file FILE]
Installs the latest (or given) version of poetry
optional arguments:
-h, --help show this help message and exit
-p, --preview install preview version
--version VERSION install named version
-f, --force install on top of existing version
--no-modify-path do not modify $PATH
-y, --yes accept all prompts
--uninstall uninstall poetry
--file FILE Install from a local file instead of fetching the latest
version of Poetry available online.
更新poetry:
poetry self update 可选的版本号
终端自动补全配置:
# Bash
poetry completions bash > /etc/bash_completion.d/poetry.bash-completion
# Bash (Homebrew)
poetry completions bash > $(brew --prefix)/etc/bash_completion.d/poetry.bash-completion
# Fish
poetry completions fish > ~/.config/fish/completions/poetry.fish
# Fish (Homebrew)
poetry completions fish > (brew --prefix)/share/fish/vendor_completions.d/poetry.fish
# Zsh
poetry completions zsh > ~/.zfunc/_poetry
echo fpath+=~/.zfunc >> ~/.zshrc
# Oh-My-Zsh
mkdir $ZSH/plugins/poetry
poetry completions zsh > $ZSH/plugins/poetry/_poetry
(然后向plugin列表里加入poetry)
# prezto
poetry completions zsh > ~/.zprezto/modules/completion/external/src/_poetry
1.2 windows:
(Invoke-WebRequest -Uri https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py -UseBasicParsing).Content | python
2. 基本配置和使用
Poetry version 1.0.10
USAGE
poetry [-h] [-q] [-v [<...>]] [-V] [--ansi] [--no-ansi] [-n] <command> [<arg1>] ... [<argN>]
ARGUMENTS
<command> The command to execute
<arg> The arguments of the command
GLOBAL OPTIONS
-h (--help) Display this help message
-q (--quiet) Do not output any message
-v (--verbose) Increase the verbosity of messages: "-v" for normal output, "-vv" for more verbose output and "-vvv" for debug
-V (--version) Display this application version
--ansi Force ANSI output
--no-ansi Disable ANSI output
-n (--no-interaction) Do not ask any interactive question
AVAILABLE COMMANDS
about Shows information about Poetry.
add Adds a new dependency to pyproject.toml.
build Builds a package, as a tarball and a wheel by default.
cache Interact with Poetry's cache
check Checks the validity of the pyproject.toml file.
config Manages configuration settings.
debug Debug various elements of Poetry.
env Interact with Poetry's project environments.
export Exports the lock file to alternative formats.
help Display the manual of a command
init Creates a basic pyproject.toml file in the current directory.
install Installs the project dependencies.
lock Locks the project dependencies.
new Creates a new Python project at <path>.
publish Publishes a package to a remote repository.
remove Removes a package from the project dependencies.
run Runs a command in the appropriate environment.
search Searches for packages on remote repositories.
self Interact with Poetry directly.
shell Spawns a shell within the virtual environment.
show Shows information about packages.
update Update the dependencies as according to the pyproject.toml file.
version Shows the version of the project or bumps it when a valid bump rule is provided.
2.1 建立项目
poetry new project
生成基本目录:
poe-demo
├── poe_demo
│ └── __init__.py
├── pyproject.toml
├── README.rst
└── tests
├── __init__.py
└── test_poe_demo.py
2.2 配置
pyproject.toml 是主要配置文件
[tool.poetry]
name = "project"
version = "0.1.0"
description = ""
authors = ["'wangm23456' <'wangm23456@163.com'>"]
[tool.poetry.dependencies]
python = "^3.7"
[tool.poetry.dev-dependencies]
pytest = "^5.2"
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
添加以下行可以使用其他镜像源:
[[tool.poetry.source]]
name = "tencent"
url = "https://mirrors.cloud.tencent.com/pypi/simple"
default = true #设为默认源
secondary = true # 设为次要源
或者通过config命令:
poetry config repositories.tencent https://mirrors.cloud.tencent.com/pypi/simple
设置虚拟目录:
poetry config —list 显示基本配置信息:
其中, virtualenvs.create 指的是是否要创建拟环境,virtualenvs.path 指的是虚拟环境的路径,virtualenvs.in-project 指的是是否在当前目录下克隆一份虚拟环境文件。
这些项目可以通过 pyenv config [项目] [value] 来改变。
2.3 安装依赖
通过 pyenv Install 安装依赖;在此之前,你可以在 pyproject.toml 的 tool.poetry.dependencies下添加项目的依赖;或者通过 poetry add 向虚拟环境中添加依赖并写入pyproject.toml,如果poetry add之前没有创建虚拟环境,poetry add 将会自动创建虚拟环境;如果 pyproject.toml 中已经列出了依赖,就不能使用add 命令添加。
第一次install之后,将会生成一个poetry.lock文件,这个文件中详细列举了根据 pyproject.toml 中所列依赖解析而来的依赖列表,包括他们的基本信息、指定版本等等,可以作为git版本控制的一部分。
默认地,poetry会尽量找到最新的一组可共存依赖。在poetry.lock存在时,poetry add 和 poetry install 将会以poetry.lock为准来选择依赖的版本。这一行为,可能会导致你安装的依赖并不是最新的:因为自上次生成poetry.lock以来,你的项目的一些依赖的版本可能有所更新,而poetry并不自动地更新poetry.lock已列出的依赖版本,而是使要安装的新依赖在版本上满足poetry.lock已列出的旧依赖的版本需求。
要将你的依赖和依赖关系更新至最新,你需要使用poetry update, 这等效于删除poetry.lock然后poetry install。
配置文件管理—hydra
优雅配置复杂应用程序的框架
hydra是Facebook开发的配置管理系统,目的是模块化、灵活地管理你的配置文件,另一方面,hydra也可以作为一个命令行接口使用。
基础
下面是改写的scikit-learn的一个例子——一个简单的kmean例子:
from time import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
import hydra
from omegaconf import DictConfig
@hydra.main(config_path="config.yaml")
def main(cfg: DictConfig) -> None:
np.random.seed(cfg.kmean_param.seed)
X_digits, y_digits = load_digits(return_X_y=True)
data = scale(X_digits)
n_samples, n_features = data.shape
n_digits = len(np.unique(y_digits))
labels = y_digits
def bench_k_means(estimator: KMeans, name: str, data: np.ndarray)-> None:
t0 = time()
estimator.fit(data)
print('%-9s\t%.2fs\t%i\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f'
% (name, (time() - t0), estimator.inertia_,
metrics.homogeneity_score(labels, estimator.labels_),
metrics.completeness_score(labels, estimator.labels_),
metrics.v_measure_score(labels, estimator.labels_),
metrics.adjusted_rand_score(labels, estimator.labels_),
metrics.adjusted_mutual_info_score(labels, estimator.labels_),
metrics.silhouette_score(data, estimator.labels_,
metric='euclidean',
sample_size=cfg.kmean_param.sample_size)))
print(f"n_digits: {n_digits}, \t n_samples {n_samples}, \t n_features {n_features}")
print(82 * '_')
print('init\t\ttime\tinertia\thomo\tcompl\tv-meas\tARI\tAMI\tsilhouette')
for k,v in cfg.kmean.items():
bench_k_means(KMeans(init=v['init'], n_clusters=n_digits, n_init=v["n_init"]), name=v["name"], data=data)
if __name__ == "__main__":
main()
相应的config.yaml如下:
kmean:
k++: {init: k-means++, name: k-means++, n_init: 10}
random: {init: random, name: random, n_init: 10}
kmean_param:
sample_size: 300
seed: 42
如上,我们把sample_size和seed和k写进配置文件的kmean_param中,把不同的初始化方法写在kmean中。运行结果如下:
hydra允许在命令行中直接更改配置参数:
分组
hydra允许你对配置文件进行分组,并在运行时动态地指定使用哪一个配置文件。
如下是来自scikit-learn的一个改编例子:
import time
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn.neighbors import kneighbors_graph
import hydra
from hydra import utils
from omegaconf import DictConfig
import logging
import os
log = logging.getLogger(__name__)
# Generate sample data
# Create a graph capturing local connectivity. Larger number of neighbors
# will give more homogeneous clusters to the cost of computation
# time. A very large number of neighbors gives more evenly distributed
# cluster sizes, but may not impose the local manifold structure of
# the data
@hydra.main(config_path="conf")
def main(cfg: DictConfig) -> None:
np.random.seed(cfg.params.seed)
t = 1.5 * np.pi * (1 + 3 * np.random.rand(1, cfg.params.n_samples))
x = t * np.cos(t)
y = t * np.sin(t)
X = np.concatenate((x, y))
X += .7 * np.random.randn(2, cfg.params.n_samples)
X = X.T
knn_graph = kneighbors_graph(X, cfg.params.n_neighbors, include_self=False)
if cfg.params.connectivity:
connectivity = knn_graph
else:
connectivity = None
plt.figure(figsize=(10, 4))
for index, linkage in enumerate(('average',
'complete',
'ward',
'single')):
plt.subplot(1, 4, index + 1)
log.info(f"{index + 1}: {linkage}")
model = AgglomerativeClustering(linkage=linkage,
connectivity=connectivity,
n_clusters=cfg.params.n_clusters)
t0 = time.time()
model.fit(X)
elapsed_time = time.time() - t0
plt.scatter(X[:, 0], X[:, 1], c=model.labels_,
cmap=plt.cm.nipy_spectral)
plt.title('linkage=%s\n(time %.2fs)' % (linkage, elapsed_time),
fontdict=dict(verticalalignment='top'))
plt.axis('equal')
plt.axis('off')
plt.subplots_adjust(bottom=0, top=.83, wspace=0,
left=0, right=1)
plt.suptitle('n_cluster=%i, connectivity=%r' %
(cfg.params.n_clusters, cfg.params.connectivity), size=17)
print(f"Current working directory : {os.getcwd()}")
print(f"Original working directory : {utils.get_original_cwd()}")
plt.savefig(os.path.join(utils.get_original_cwd(),"img","demo2.png"))
log.info(f"create png at {os.path.join(utils.get_original_cwd(),'img','demo2.png')}")
if __name__ == "__main__":
main()
所谓分组,就是如下组织目录结构以放置配置文件:
conf/
└── params
├── 1.yaml
└── 2.yaml
只需在main函数的装饰器中指定配置文件所在目录,并在命令行中指定每个配置组所使用的配置。
默认配置
可能需要指定一些默认配置,以避免每次都要在命令行中指名,实现的方法就是在conf目录下面添加另一个配置文件(这里取名config.yaml)指定默认配置,其内容为:
defaults:
- params: 1
并在装饰器中指定这个配置文件路径:
@hydra.main(config_path="conf/config.yaml")
另外,如果默认配置文件不在某个分组中,可以直接写:
defaults:
- default.yaml
组合配置
组合也很简单,只是上面的例子再多几个配置组而已,需要注意的是,如果载入的不同配置之间有重叠,hydra将会按载入顺序取最后的配置中的值,特别的,如果重叠的是字典这样可融合的项,hydra会尝试融合他们,组成一个字典,字典中的重复键同样按载入顺序被覆盖。
multi-run
运行多个配置的组合,使用-m设置:
可以在参数探索的时候用,你可以把每个配置组当作一个维度,在一个配置空间中遍历。
工作目录
注意到上个例子中,我们打印了两个working directory,这是hydra的一个需要知道的默认行为:每次运行hydra都会默认自动地创建一个名为'output/[date]/[time]'的输出目录(Current working directory),默认地,这里将会成为程序的工作目录;如果需要还原当前真实的工作目录可以通过hydra的utils模块提供的功能utils.get_original_cwd()获取真实目录。
日志
如上面的例子,同时启用logging和hydra时,logging默认会向输出目录下的 .log 文件和shell打印日志信息:
[2020-08-17 19:55:39,292][__main__][INFO] - 1: average
[2020-08-17 19:55:39,801][__main__][INFO] - 2: complete
[2020-08-17 19:55:40,255][__main__][INFO] - 3: ward
[2020-08-17 19:55:40,785][__main__][INFO] - 4: single
[2020-08-17 19:55:41,590][__main__][INFO] - create png at /home/wangm/APP/demo-project/poe-demo/img/demo2.png
还可以通过命令行控制输出level:
debug功能
- 通过
-c job或--cfg job只输出配置内容而不运行程序,类似于—dry-run。—cfg还有其他选项:job:你的配置hydra: Hydra 的配置all: 上面两者的结合.
- 如上一节log level所述,hydra.verbose=hydra 会输出更多关于hydra的信息,包括插件、搜索路径、加载历史和追踪等。