HK实景三维数据下载

622 阅读3分钟

目标:下载hongkong实景三维模型

网络URL下载

www.pland.gov.hk/pland_sc/in…
这香港实景三维数据资源的门户网站,通过点击格网来下载osgb、3Dtiles、obj数据类型下的实景三维模型,单次下载量最大为6个。

一种从js代码中查看下载源;另一种是从整理好的excel文件中通过url下载数据。

  1. 第一次尝试:js代码,了解网页下载数据逻辑,但是没有找到下载链接。
  
        <select id='cboformat' onchange='change_format();' title='Select File Format / 选择档案格式'>
                <option value='OSGB'>OSGB</option><option value='OBJ'>OBJ</option><option value='CESIUM'>Cesium 3D Tiles</option>
        </select>
        <br/><br/>
        
        <span id='spnSelect' style="font-weight: bold;color:white;font-size:16px">Select your area on map and click [ Download ]</span>
        <br/>
        <button type='button' style='font-weight:bold;font-size:18px' id='button_Download' onclick='click_Download();' disabled > 
        Download </button>&nbsp;&nbsp;&nbsp;&nbsp;
        <button type='button' style='font-weight:bold;font-size:18px' id='button_Clear'  onclick='click_Clear();' autofocus> 
        Clear </button>
        
        点击下载按钮
        function click_Download()
        {


                var count1=0;
                gDownloadGrid  = [];
          
         
                for (i=0;i<gGrids.length;i++)
                {        被选中且color有透明度不是下载颜色
                        if (gGrids[i].options.fillOpacity > 0.5 && gGrids[i].options.fillColor != DOWNLOAD_COLOR )
                        {

                                gDownloadGrid.push(gFiles[i]);
                                //下载 时间间隔:(DOWNLOAD_STOP /2) + (DOWNLOAD_STOP * count1)
                                setTimeout(downloadfile , (DOWNLOAD_STOP /2) + (DOWNLOAD_STOP * count1), gFiles[i], gGrids[i] );


                                count1++;
                        }
                }
          

          
          if (count1==0)
          {
                if (gLang ==0)
                {
                        alert ('Please click on the Grid to select area to Download 3D Data.');
                }
                else if (gLang ==1)
                {
                        alert ('請單擊選擇網格區域以下載三維數據.');
                }
                else
                {
                        alert ('请单击选择网格区域以下载三维数据.');
                }
          }
                  
        }
  1. 第二次尝试:URL,基于python的request库来写代码完成。

代码如下:

# coding=UTF-8
import csv
import requests
import time
# 打开CSV文件
with open('E:\hongkong_data\OSGBtest.csv', 'r',encoding='utf-8') as file:
    reader = csv.reader(file)
   
    
    next(reader)
    
    # 定义存储URL的数组
    file_urls = []
    
    # 读取每一行数据
    for row in reader:
        file_url = row[0]  # FILE_URL所在的列索引为12(从0开始计数)
        file_urls.append(file_url)
# 定义本地文件夹路径
local_folder = 'D:\OSGB'

total_files = len(file_urls)
print(f"Total files to download: {total_files}")

downloaded_files = 0 
start_time=time.time()
# 遍历URL列表并下载文件
for url in file_urls:
    try:
        filename = url.split('/')[-1]  # 提取文件名
        
        file_path = f'{local_folder}{filename}'  # 拼接本地文件路径
        
        # 发起GET请求并保存文件
        response = requests.get(url)
        response.raise_for_status()  # 检查请求是否成功
        print(f"Response status code: {response.status_code}")  # 调试输出状态码
        with open(file_path, 'wb') as file:
            file.write(response.content)

        downloaded_files += 1
        
        print(f"Downloaded: {file_path}")
        if downloaded_files%10==0:
            print(f"Downloaded ({downloaded_files}/{total_files}): {file_path}")
        time.sleep(1)
    except Exception as e:
        print(f"Error occurred while downloading {url}: {str(e)}")

print("Download completed!")

end_time=time.time()
total_time=end_time-start_time
print(f"total time taken:{total_time} seconds")
import csv
import requests
import threading
import time
 
# 定义下载函数
def download_file(url, local_folder, downloaded_files, total_files):
    try:
        filename = url.split('/')[-1]  # 提取文件名
        file_path = f'{local_folder}/{filename}'  # 拼接本地文件路径

        # 发起GET请求并保存文件
        response = requests.get(url)
        response.raise_for_status()  # 检查请求是否成功

        with open(file_path, 'wb') as file:
            file.write(response.content)

        downloaded_files += 1

        print(f"Downloaded ({downloaded_files}/{total_files}): {file_path}")

        # 每下载五个文件提示已下载数据/数据总量
        if downloaded_files % 5 == 0:
            print(f"Downloaded {downloaded_files} files out of {total_files}.")

    except Exception as e:
        print(f"Error occurred while downloading {url}: {str(e)}")

# 打开CSV文件
with open('E:\hongkong_data\OSGBtest.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file)
    next(reader)

    # 定义存储URL的数组
    file_urls = []

    # 读取每一行数据
    for row in reader:
        file_url = row[0]  # FILE_URL所在的列索引为12(从0开始计数)
        file_urls.append(file_url)

# 定义本地文件夹路径
local_folder = 'D:\OSGB'

total_files = len(file_urls)
print(f"Total files to download: {total_files}")

downloaded_files = 0
threads = []  # 存储线程列表

start_time = time.time()

# 遍历URL列表并下载文件
for url in file_urls:
    # 创建并启动线程
    t = threading.Thread(target=download_file, args=(url, local_folder, downloaded_files, total_files))
    t.start()
    threads.append(t)

    # 控制同时运行的线程数量为5
    if len(threads) >= 5:
        # 等待所有线程完成
        for thread in threads:
            thread.join()

        threads = []

# 等待剩余线程完成
for thread in threads:
    thread.join()

print("Download completed!")

end_time = time.time()
total_time = end_time - start_time
print(f"Total time taken: {total_time} seconds")

问题:

1 网络访问拒绝:关闭代理。

2 gbk、utf-8 can not decodexxxxxx这是保存下载资源是,应该用wb来写入文件,二进制保存zip压缩包。

3 https connectionpool :大概是max_retrive访问超过次数了,这方面问题有可能是两个,一个是访问超时,hhtp连接太多,keep-alive长连接了,或者是ssl验证出了问题(qgis的chatpgt插件也有ssl验证问题,可能与这相关),参考blog如下:【Bug】python requests发起请求,报“Max retries exceeded with url”_机器不学习我学习的博客-CSDN博客

4 逐个下载太慢,考虑使用多线程试试下载速率。

总结:

1 这份代码的整体逻辑:提取csv文件URL列,存储在列表中。csv library,next跳过第一行、csv.reader(逐行读取)读取每一行的第URL列,存储在列表中;try、exception测试代码,如果出错抛出错误消息,通过request库中的get访问url,raise_for_status访问服务器访问状态,保存在目标路径。