目标:下载hongkong实景三维模型
网络URL下载
www.pland.gov.hk/pland_sc/in…
这香港实景三维数据资源的门户网站,通过点击格网来下载osgb、3Dtiles、obj数据类型下的实景三维模型,单次下载量最大为6个。
一种从js代码中查看下载源;另一种是从整理好的excel文件中通过url下载数据。
- 第一次尝试:js代码,了解网页下载数据逻辑,但是没有找到下载链接。
<select id='cboformat' onchange='change_format();' title='Select File Format / 选择档案格式'>
<option value='OSGB'>OSGB</option><option value='OBJ'>OBJ</option><option value='CESIUM'>Cesium 3D Tiles</option>
</select>
<br/><br/>
<span id='spnSelect' style="font-weight: bold;color:white;font-size:16px">Select your area on map and click [ Download ]</span>
<br/>
<button type='button' style='font-weight:bold;font-size:18px' id='button_Download' onclick='click_Download();' disabled >
Download </button>
<button type='button' style='font-weight:bold;font-size:18px' id='button_Clear' onclick='click_Clear();' autofocus>
Clear </button>
点击下载按钮
function click_Download()
{
var count1=0;
gDownloadGrid = [];
for (i=0;i<gGrids.length;i++)
{ 被选中且color有透明度不是下载颜色
if (gGrids[i].options.fillOpacity > 0.5 && gGrids[i].options.fillColor != DOWNLOAD_COLOR )
{
gDownloadGrid.push(gFiles[i]);
//下载 时间间隔:(DOWNLOAD_STOP /2) + (DOWNLOAD_STOP * count1)
setTimeout(downloadfile , (DOWNLOAD_STOP /2) + (DOWNLOAD_STOP * count1), gFiles[i], gGrids[i] );
count1++;
}
}
if (count1==0)
{
if (gLang ==0)
{
alert ('Please click on the Grid to select area to Download 3D Data.');
}
else if (gLang ==1)
{
alert ('請單擊選擇網格區域以下載三維數據.');
}
else
{
alert ('请单击选择网格区域以下载三维数据.');
}
}
}
- 第二次尝试:URL,基于python的request库来写代码完成。
代码如下:
# coding=UTF-8 import csv import requests import time # 打开CSV文件 with open('E:\hongkong_data\OSGBtest.csv', 'r',encoding='utf-8') as file: reader = csv.reader(file) next(reader) # 定义存储URL的数组 file_urls = [] # 读取每一行数据 for row in reader: file_url = row[0] # FILE_URL所在的列索引为12(从0开始计数) file_urls.append(file_url) # 定义本地文件夹路径 local_folder = 'D:\OSGB' total_files = len(file_urls) print(f"Total files to download: {total_files}") downloaded_files = 0 start_time=time.time() # 遍历URL列表并下载文件 for url in file_urls: try: filename = url.split('/')[-1] # 提取文件名 file_path = f'{local_folder}{filename}' # 拼接本地文件路径 # 发起GET请求并保存文件 response = requests.get(url) response.raise_for_status() # 检查请求是否成功 print(f"Response status code: {response.status_code}") # 调试输出状态码 with open(file_path, 'wb') as file: file.write(response.content) downloaded_files += 1 print(f"Downloaded: {file_path}") if downloaded_files%10==0: print(f"Downloaded ({downloaded_files}/{total_files}): {file_path}") time.sleep(1) except Exception as e: print(f"Error occurred while downloading {url}: {str(e)}") print("Download completed!") end_time=time.time() total_time=end_time-start_time print(f"total time taken:{total_time} seconds")
import csv
import requests
import threading
import time
# 定义下载函数
def download_file(url, local_folder, downloaded_files, total_files):
try:
filename = url.split('/')[-1] # 提取文件名
file_path = f'{local_folder}/{filename}' # 拼接本地文件路径
# 发起GET请求并保存文件
response = requests.get(url)
response.raise_for_status() # 检查请求是否成功
with open(file_path, 'wb') as file:
file.write(response.content)
downloaded_files += 1
print(f"Downloaded ({downloaded_files}/{total_files}): {file_path}")
# 每下载五个文件提示已下载数据/数据总量
if downloaded_files % 5 == 0:
print(f"Downloaded {downloaded_files} files out of {total_files}.")
except Exception as e:
print(f"Error occurred while downloading {url}: {str(e)}")
# 打开CSV文件
with open('E:\hongkong_data\OSGBtest.csv', 'r', encoding='utf-8') as file:
reader = csv.reader(file)
next(reader)
# 定义存储URL的数组
file_urls = []
# 读取每一行数据
for row in reader:
file_url = row[0] # FILE_URL所在的列索引为12(从0开始计数)
file_urls.append(file_url)
# 定义本地文件夹路径
local_folder = 'D:\OSGB'
total_files = len(file_urls)
print(f"Total files to download: {total_files}")
downloaded_files = 0
threads = [] # 存储线程列表
start_time = time.time()
# 遍历URL列表并下载文件
for url in file_urls:
# 创建并启动线程
t = threading.Thread(target=download_file, args=(url, local_folder, downloaded_files, total_files))
t.start()
threads.append(t)
# 控制同时运行的线程数量为5
if len(threads) >= 5:
# 等待所有线程完成
for thread in threads:
thread.join()
threads = []
# 等待剩余线程完成
for thread in threads:
thread.join()
print("Download completed!")
end_time = time.time()
total_time = end_time - start_time
print(f"Total time taken: {total_time} seconds")
问题:
1 网络访问拒绝:关闭代理。
2 gbk、utf-8 can not decodexxxxxx这是保存下载资源是,应该用wb来写入文件,二进制保存zip压缩包。
3 https connectionpool :大概是max_retrive访问超过次数了,这方面问题有可能是两个,一个是访问超时,hhtp连接太多,keep-alive长连接了,或者是ssl验证出了问题(qgis的chatpgt插件也有ssl验证问题,可能与这相关),参考blog如下:【Bug】python requests发起请求,报“Max retries exceeded with url”_机器不学习我学习的博客-CSDN博客
4 逐个下载太慢,考虑使用多线程试试下载速率。
总结:
1 这份代码的整体逻辑:提取csv文件URL列,存储在列表中。csv library,next跳过第一行、csv.reader(逐行读取)读取每一行的第URL列,存储在列表中;try、exception测试代码,如果出错抛出错误消息,通过request库中的get访问url,raise_for_status访问服务器访问状态,保存在目标路径。