Python基础

187 阅读2分钟

简述:安装Pip

Mac:命令行敲入sudo easy_install pip,输入Mac密码,等待片刻

命令行:sudo easy_install pip
Password:
Searching for pip
Reading https://pypi.python.org/simple/pip/
...

##一、使用requests和BeautifulSoup进行爬虫

# -*- coding: UTF-8 -*-
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.jianshu.com/") #拉取指定网站
soup = BeautifulSoup(html.content, 'html.parser') #运用BeautifulSoup解析返回的网页源代码,并且指定解析器
for item in soup.select(".content"): #查找Html中class = content的标签,返回此标签的列表
    # print item.select(".avatar")[0]
    name = item.select(".blue-link")[0].text
    title = item.select(".title")[0].text
    content = item.select(".abstract")[0].text
    time = item.select(".time")[0]["data-shared-at"]
    print "名字:",name
    print "标题:",title
    print "内容:",content
    print "时间:",time

##二、数据库操作
数据库驱动:https://dev.mysql.com/downloads/connector/python/ ```

-- coding: UTF-8 --

import mysql.connector

config = { 'user': 'root', 'password': 'root', 'host': '127.0.0.1', 'database': 'test', }

con = mysql.connector.connect(**config) cursor = con.cursor()

#增加 cursor.execute("insert into User values(null,%s,%s)",['haha','123']) row = cursor.rowcount ##返回影响的行数 print row # 1

#查询 cursor = con.cursor() cursor.execute("select * from User") fetchall = cursor.fetchall() print fetchall # [(1, u'junwen', u'123'), (2, u'junwen', u'123'), (3, u'junwen', u'123')]

cursor.close() con.close()


##三、Splinter测试工具,能够网页自动执行
</br>

pip install splinter pip install selenium

**下载chromedriver.exe 和 geckodriver.exe 分别加入环境变量,路径不要加上.exe文件**

chromedriver : http://download.csdn.net/download/qianaier/7966945
                          http://download.csdn.net/download/anan_ss/9723479

geckodriver:https://github.com/mozilla/geckodriver/releases/

添加环境后,再把chromedriver .exe放入你要执行的.py目录中

coding=utf-8

from splinter.browser import Browser xx = Browser(driver_name="chrome") xx.visit("http://item.jd.com/2707976.html") print xx.title #页面标题 : 京东... print xx.driver_name ##浏览器名称:chrome print xx.url #当前页面的Url地址 xx.click_link_by_text("你好,请登录") #点击text是后面文件本 xx.click_link_by_text("账户登录") xx.fill("loginname","18695604770") #填充数据根据name xx.fill("nloginpwd","yao20100814") xx.click_link_by_id("loginsubmit")



![](http://upload-images.jianshu.io/upload_images/2650372-c9ea3d4ed5533da7.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)



##四、注意事项
</br>

**一、 编码问题存在中文字符,再代码第一行加入 `# -*- coding: UTF-8 -*-`**
![](http://upload-images.jianshu.io/upload_images/2650372-af37fb6e6b5c712d.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


**二、'module' object is not callable 原因分析**

![](http://upload-images.jianshu.io/upload_images/2650372-008115d1065c3ddc.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
解决:原因分析:[Python](http://lib.csdn.net/base/python)导入模块的方法有两种:import module 和 from module import,区别是前者所有导入的东西使用时需加上模块名的限定,而后者不要。

正确的代码:

import Person person = Person.Person('dnawo','man') print person.Name 或

from Person import * person = Person('dnawo','man') print person.Name


**三 WindowsError: [Error 183] :  这是因为文件夹重名了**

![](http://upload-images.jianshu.io/upload_images/2650372-1ee1d3bb30a77ff7.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

**四 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)**

![](http://upload-images.jianshu.io/upload_images/2650372-a74e8a532df0bac8.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
解决:加入代码就可以了

import sys reload(sys) sys.setdefaultencoding('utf8')



##四、学习资料
</br>
http://cuiqingcai.com/1319.html

http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001391435131816c6a377e100ec4d43b3fc9145f3bb8056000

http://www.runoob.com/python/python-object.html