实用工具加快你的python开发

400 阅读8分钟

[本文始发于个人公众号:painless1207,原创不易,点赞关注哦]

requirements—pipreqs

根据代码自动生成引用的package list文件

基本用法:

pipreqs --pypi-server https://mirrors.cloud.tencent.com/pypi/simple \
				--savepath requirements.txt \
				[target path]
pipreqs - Generate pip requirements.txt file based on imports

Usage:
    pipreqs [options] [<path>]

Arguments:
    <path>                The path to the directory containing the application
                          files for which a requirements file should be
                          generated (defaults to the current working
                          directory).

Options:
    --use-local           Use ONLY local package info instead of querying PyPI.
    --pypi-server <url>   Use custom PyPi server.
    --proxy <url>         Use Proxy, parameter will be passed to requests
                          library. You can also just set the environments
                          parameter in your terminal:
                          $ export HTTP_PROXY="http://10.10.1.10:3128"
                          $ export HTTPS_PROXY="https://10.10.1.10:1080"
    --debug               Print debug information.
    --ignore <dirs>...    Ignore extra directories, each separated by a comma.
    --no-follow-links     Do not follow symbolic links in the project
    --encoding <charset>  Use encoding parameter for file open
    --savepath <file>     Save the list of requirements in the given file
    --print               Output the list of requirements in the standard
                          output.
    --force               Overwrite existing requirements.txt
    --diff <file>         Compare modules in requirements.txt to project
                          imports.
    --clean <file>        Clean up requirements.txt by removing modules
                          that are not imported in project.
    --no-pin              Omit version of output packages.

版本管理—pyenv

安装

从homebrew或你的包管理器安装pyenv。

设置shell:

bash:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.bashrc

zsh:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.zshrc

然后重启shell。

以上两段的第三句,可以不写,如果不写,在使用pyenv之前需要在shell中键入 pyenv init 以激活pyenv。

使用

pyenv install --list #显示可安装版本
pyenv install [版本号]

可用命令:

pyenv/pyenv

依赖控制和虚拟环境—poetry

1. 安装

Introduction | Documentation | Poetry - Python dependency management and packaging made easy.

1.1 Linux:

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

安装不顺畅的话可以先下载raw.githubusercontent.com/python-poet…这个文件,然后从github.com/python-poet…下载对应平台、版本的安装包;get-poetry.py是如下所述的安装脚本,通过file参数指定已下载的安装包即可。

Remainder of file ignored
usage: get-poetry.py [-h] [-p] [--version VERSION] [-f] [--no-modify-path]
                     [-y] [--uninstall] [--file FILE]

Installs the latest (or given) version of poetry

optional arguments:
  -h, --help         show this help message and exit
  -p, --preview      install preview version
  --version VERSION  install named version
  -f, --force        install on top of existing version
  --no-modify-path   do not modify $PATH
  -y, --yes          accept all prompts
  --uninstall        uninstall poetry
  --file FILE        Install from a local file instead of fetching the latest
                     version of Poetry available online.

更新poetry:

poetry self update 可选的版本号

终端自动补全配置:

# Bash
poetry completions bash > /etc/bash_completion.d/poetry.bash-completion

# Bash (Homebrew)
poetry completions bash > $(brew --prefix)/etc/bash_completion.d/poetry.bash-completion

# Fish
poetry completions fish > ~/.config/fish/completions/poetry.fish

# Fish (Homebrew)
poetry completions fish > (brew --prefix)/share/fish/vendor_completions.d/poetry.fish

# Zsh
poetry completions zsh > ~/.zfunc/_poetry
echo fpath+=~/.zfunc >> ~/.zshrc

# Oh-My-Zsh
mkdir $ZSH/plugins/poetry
poetry completions zsh > $ZSH/plugins/poetry/_poetry
(然后向plugin列表里加入poetry)

# prezto
poetry completions zsh > ~/.zprezto/modules/completion/external/src/_poetry

1.2 windows:

(Invoke-WebRequest -Uri https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py -UseBasicParsing).Content | python

2. 基本配置和使用

Poetry version 1.0.10

USAGE
  poetry [-h] [-q] [-v [<...>]] [-V] [--ansi] [--no-ansi] [-n] <command> [<arg1>] ... [<argN>]

ARGUMENTS
  <command>              The command to execute
  <arg>                  The arguments of the command

GLOBAL OPTIONS
  -h (--help)            Display this help message
  -q (--quiet)           Do not output any message
  -v (--verbose)         Increase the verbosity of messages: "-v" for normal output, "-vv" for more verbose output and "-vvv" for debug
  -V (--version)         Display this application version
  --ansi                 Force ANSI output
  --no-ansi              Disable ANSI output
  -n (--no-interaction)  Do not ask any interactive question

AVAILABLE COMMANDS
  about                  Shows information about Poetry.
  add                    Adds a new dependency to pyproject.toml.
  build                  Builds a package, as a tarball and a wheel by default.
  cache                  Interact with Poetry's cache
  check                  Checks the validity of the pyproject.toml file.
  config                 Manages configuration settings.
  debug                  Debug various elements of Poetry.
  env                    Interact with Poetry's project environments.
  export                 Exports the lock file to alternative formats.
  help                   Display the manual of a command
  init                   Creates a basic pyproject.toml file in the current directory.
  install                Installs the project dependencies.
  lock                   Locks the project dependencies.
  new                    Creates a new Python project at <path>.
  publish                Publishes a package to a remote repository.
  remove                 Removes a package from the project dependencies.
  run                    Runs a command in the appropriate environment.
  search                 Searches for packages on remote repositories.
  self                   Interact with Poetry directly.
  shell                  Spawns a shell within the virtual environment.
  show                   Shows information about packages.
  update                 Update the dependencies as according to the pyproject.toml file.
  version                Shows the version of the project or bumps it when a valid bump rule is provided.

2.1 建立项目

poetry new project

生成基本目录:

poe-demo
├── poe_demo
│   └── __init__.py
├── pyproject.toml
├── README.rst
└── tests
    ├── __init__.py
    └── test_poe_demo.py

2.2 配置

pyproject.toml 是主要配置文件

[tool.poetry]
name = "project"
version = "0.1.0"
description = ""
authors = ["'wangm23456' <'wangm23456@163.com'>"]

[tool.poetry.dependencies]
python = "^3.7"

[tool.poetry.dev-dependencies]
pytest = "^5.2"

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"

添加以下行可以使用其他镜像源:

[[tool.poetry.source]]
name = "tencent"
url = "https://mirrors.cloud.tencent.com/pypi/simple"
default = true #设为默认源
secondary = true # 设为次要源

或者通过config命令:

poetry config repositories.tencent https://mirrors.cloud.tencent.com/pypi/simple

设置虚拟目录:

poetry config —list 显示基本配置信息:

其中, virtualenvs.create 指的是是否要创建拟环境,virtualenvs.path 指的是虚拟环境的路径,virtualenvs.in-project 指的是是否在当前目录下克隆一份虚拟环境文件。

这些项目可以通过 pyenv config [项目] [value] 来改变。

2.3 安装依赖

通过 pyenv Install 安装依赖;在此之前,你可以在 pyproject.toml 的 tool.poetry.dependencies下添加项目的依赖;或者通过 poetry add 向虚拟环境中添加依赖并写入pyproject.toml,如果poetry add之前没有创建虚拟环境,poetry add 将会自动创建虚拟环境;如果 pyproject.toml 中已经列出了依赖,就不能使用add 命令添加。

第一次install之后,将会生成一个poetry.lock文件,这个文件中详细列举了根据 pyproject.toml 中所列依赖解析而来的依赖列表,包括他们的基本信息、指定版本等等,可以作为git版本控制的一部分。

默认地,poetry会尽量找到最新的一组可共存依赖。在poetry.lock存在时,poetry addpoetry install 将会以poetry.lock为准来选择依赖的版本。这一行为,可能会导致你安装的依赖并不是最新的:因为自上次生成poetry.lock以来,你的项目的一些依赖的版本可能有所更新,而poetry并不自动地更新poetry.lock已列出的依赖版本,而是使要安装的新依赖在版本上满足poetry.lock已列出的旧依赖的版本需求。

要将你的依赖和依赖关系更新至最新,你需要使用poetry update, 这等效于删除poetry.lock然后poetry install

配置文件管理—hydra

优雅配置复杂应用程序的框架

hydra是Facebook开发的配置管理系统,目的是模块化、灵活地管理你的配置文件,另一方面,hydra也可以作为一个命令行接口使用。

基础

下面是改写的scikit-learn的一个例子——一个简单的kmean例子:

from time import time
import numpy as np
import matplotlib.pyplot as plt

from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

import hydra
from omegaconf import DictConfig

@hydra.main(config_path="config.yaml")
def main(cfg: DictConfig) -> None:
      np.random.seed(cfg.kmean_param.seed)
      X_digits, y_digits = load_digits(return_X_y=True)
      data = scale(X_digits)
      n_samples, n_features = data.shape
      n_digits = len(np.unique(y_digits))
      labels = y_digits
      def bench_k_means(estimator: KMeans, name: str, data: np.ndarray)-> None:
            t0 = time()
            estimator.fit(data)
            print('%-9s\t%.2fs\t%i\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f'
                  % (name, (time() - t0), estimator.inertia_,
                        metrics.homogeneity_score(labels, estimator.labels_),
                        metrics.completeness_score(labels, estimator.labels_),
                        metrics.v_measure_score(labels, estimator.labels_),
                        metrics.adjusted_rand_score(labels, estimator.labels_),
                        metrics.adjusted_mutual_info_score(labels,  estimator.labels_),
                        metrics.silhouette_score(data, estimator.labels_,
                                                metric='euclidean',
                                                sample_size=cfg.kmean_param.sample_size)))
      print(f"n_digits: {n_digits}, \t n_samples {n_samples}, \t n_features {n_features}")
      print(82 * '_')
      print('init\t\ttime\tinertia\thomo\tcompl\tv-meas\tARI\tAMI\tsilhouette')
      for k,v in cfg.kmean.items():
            bench_k_means(KMeans(init=v['init'], n_clusters=n_digits, n_init=v["n_init"]), name=v["name"], data=data)

if __name__ == "__main__":
    main()

相应的config.yaml如下:

kmean:
    k++: {init: k-means++, name: k-means++, n_init: 10}
    random: {init: random, name: random, n_init: 10}
kmean_param:
    sample_size: 300
    seed: 42

如上,我们把sample_size和seed和k写进配置文件的kmean_param中,把不同的初始化方法写在kmean中。运行结果如下:

hydra允许在命令行中直接更改配置参数:

分组

hydra允许你对配置文件进行分组,并在运行时动态地指定使用哪一个配置文件。

如下是来自scikit-learn的一个改编例子:

import time
import matplotlib.pyplot as plt
import numpy as np

from sklearn.cluster import AgglomerativeClustering
from sklearn.neighbors import kneighbors_graph

import hydra
from hydra import utils
from omegaconf import DictConfig
import logging
import os

log = logging.getLogger(__name__)

# Generate sample data
# Create a graph capturing local connectivity. Larger number of neighbors
# will give more homogeneous clusters to the cost of computation
# time. A very large number of neighbors gives more evenly distributed
# cluster sizes, but may not impose the local manifold structure of
# the data

@hydra.main(config_path="conf")
def main(cfg: DictConfig) -> None:
    np.random.seed(cfg.params.seed)
    t = 1.5 * np.pi * (1 + 3 * np.random.rand(1, cfg.params.n_samples))
    x = t * np.cos(t)
    y = t * np.sin(t)
    X = np.concatenate((x, y))
    X += .7 * np.random.randn(2, cfg.params.n_samples)
    X = X.T
    knn_graph = kneighbors_graph(X, cfg.params.n_neighbors, include_self=False)

    if cfg.params.connectivity:
        connectivity = knn_graph
    else:
        connectivity = None
    plt.figure(figsize=(10, 4))

    for index, linkage in enumerate(('average',
                                    'complete',
                                    'ward',
                                    'single')):
        plt.subplot(1, 4, index + 1)
        log.info(f"{index + 1}: {linkage}")
        model = AgglomerativeClustering(linkage=linkage,
                                        connectivity=connectivity,
                                        n_clusters=cfg.params.n_clusters)
        t0 = time.time()
        model.fit(X)
        elapsed_time = time.time() - t0
        plt.scatter(X[:, 0], X[:, 1], c=model.labels_,
                    cmap=plt.cm.nipy_spectral)
        plt.title('linkage=%s\n(time %.2fs)' % (linkage, elapsed_time),
                fontdict=dict(verticalalignment='top'))
        plt.axis('equal')
        plt.axis('off')

        plt.subplots_adjust(bottom=0, top=.83, wspace=0,
                            left=0, right=1)
        plt.suptitle('n_cluster=%i, connectivity=%r' %
                    (cfg.params.n_clusters, cfg.params.connectivity), size=17)
    print(f"Current working directory  : {os.getcwd()}")
    print(f"Original working directory : {utils.get_original_cwd()}")
    plt.savefig(os.path.join(utils.get_original_cwd(),"img","demo2.png"))
    log.info(f"create png at {os.path.join(utils.get_original_cwd(),'img','demo2.png')}")

if __name__ == "__main__":
    main()

所谓分组,就是如下组织目录结构以放置配置文件:

conf/
└── params
    ├── 1.yaml
    └── 2.yaml

只需在main函数的装饰器中指定配置文件所在目录,并在命令行中指定每个配置组所使用的配置。

默认配置

可能需要指定一些默认配置,以避免每次都要在命令行中指名,实现的方法就是在conf目录下面添加另一个配置文件(这里取名config.yaml)指定默认配置,其内容为:

defaults:
    - params: 1

并在装饰器中指定这个配置文件路径:

@hydra.main(config_path="conf/config.yaml")

另外,如果默认配置文件不在某个分组中,可以直接写:

defaults:
    - default.yaml

组合配置

组合也很简单,只是上面的例子再多几个配置组而已,需要注意的是,如果载入的不同配置之间有重叠,hydra将会按载入顺序取最后的配置中的值,特别的,如果重叠的是字典这样可融合的项,hydra会尝试融合他们,组成一个字典,字典中的重复键同样按载入顺序被覆盖。

multi-run

运行多个配置的组合,使用-m设置:

可以在参数探索的时候用,你可以把每个配置组当作一个维度,在一个配置空间中遍历。

工作目录

注意到上个例子中,我们打印了两个working directory,这是hydra的一个需要知道的默认行为:每次运行hydra都会默认自动地创建一个名为'output/[date]/[time]'的输出目录(Current working directory),默认地,这里将会成为程序的工作目录;如果需要还原当前真实的工作目录可以通过hydra的utils模块提供的功能utils.get_original_cwd()获取真实目录。

日志

如上面的例子,同时启用logging和hydra时,logging默认会向输出目录下的 .log 文件和shell打印日志信息:

[2020-08-17 19:55:39,292][__main__][INFO] - 1: average
[2020-08-17 19:55:39,801][__main__][INFO] - 2: complete
[2020-08-17 19:55:40,255][__main__][INFO] - 3: ward
[2020-08-17 19:55:40,785][__main__][INFO] - 4: single
[2020-08-17 19:55:41,590][__main__][INFO] - create png at /home/wangm/APP/demo-project/poe-demo/img/demo2.png

还可以通过命令行控制输出level:

debug功能

  1. 通过 -c job--cfg job 只输出配置内容而不运行程序,类似于—dry-run。—cfg还有其他选项:
    • job :你的配置
    • hydra : Hydra 的配置
    • all : 上面两者的结合.
  2. 如上一节log level所述,hydra.verbose=hydra 会输出更多关于hydra的信息,包括插件、搜索路径、加载历史和追踪等。