TQC網頁資料擷取與分析

最後修改於 2026 / 6 / 5 by CML

🪄正在取得您的 IP...

https://www.codejudger.com/
@gcloud.csu.edu.tw
19911223

python -m pip install pyyaml requests beautifulsoup4 lxml pandas numpy matplotlib

Python102.104.202.204.302.304.402.404

Python 102 新北市公共自行車即時資訊

1. 題目說明:
請開啟PYD01.py檔案，依下列題意進行作答，使輸出值符合題意要求。作答完成請另存新檔為PYA01.py再進行評分。

程式所產出的檔案，須輸出與程式同一層資料夾。

2. 設計說明：
請撰寫一程式，讀取新北市公共自行車即時資訊read.xml，請將其中sno（站點代號）、sna（中文場站名稱）、tot（場站總停車格）等三個欄位轉存為write.csv (需為UTF-8編碼格式)，各欄位內容之間以一個半形逗號隔開。

提示：只需要輸出資料，不需要輸出欄位名稱。

3. 輸入輸出：
輸入說明
讀取read.xml

輸出說明
將三個欄位的內容：sno、sna、tot，輸出至write.csv檔案，各欄位內容之間以一個半形逗號隔開

# 載入 xml.etree.ElementTree 模組並縮寫為 ET
import ___ as ___
# 載入 csv 模組
import ___

# 讀取 xml
tree = ___.___("___")
root = tree.getroot()

# 寫入 csv 檔案，編碼設定為 utf8
ubikefile = ___("___", "___", encoding='___')
csvwriter = csv.writer(ubikefile)

# 將其中 sno（站點代號）、sna（中文場站名稱）、tot（場站總停車格）等三個欄位寫出
for row in root:
    ubike = []
    sno = row.find('___').text
    ubike.append(___)
    sna = row.find('___').text
    ubike.append(___)
    tot = row.find('___').text
    ubike.append(___)
    csvwriter.writerow(ubike)
ubikefile.close()

# 載入 xml.etree.ElementTree 模組並縮寫為 ET
import xml.etree.ElementTree as ET
# 載入 csv 模組
import csv

# 讀取 xml
tree = ET.parse("read.xml")
root = tree.getroot()

# 寫入 csv 檔案，編碼設定為 utf8
ubikefile = open("write.csv", "w", encoding='utf8')
csvwriter = csv.writer(ubikefile)

# 將其中 sno（站點代號）、sna（中文場站名稱）、tot（場站總停車格）等三個欄位寫出
for row in root:
    ubike = []
    sno = row.find('sno').text
    ubike.append(sno)
    sna = row.find('sna').text
    ubike.append(sna)
    tot = row.find('tot').text
    ubike.append(tot)
    csvwriter.writerow(ubike)
ubikefile.close()

Python 104 JSON檔案輸出處理

1. 題目說明:
請開啟PYD01.py檔案，依下列題意進行作答，使輸出值符合題意要求。作答完成請另存新檔為PYA01.py再進行評分。

2. 設計說明：
請撰寫一程式，建立以下資料並將其輸出為write.json檔案：

{
'people' :
[{  
    'id': '1',
    'name': 'Peter',
    'country': 'Taiwan'
},
{  
    'id': '2',
    'name': 'Jack',
    'country': 'USA'
},
{  
    'id': '3',
    'name': 'Cindy',
    'country': 'Japan'
}]
}

3. 輸入輸出：
輸入說明
無

輸出說明
將資料輸出至write.json

# 載入 json 模組
import ___


# 建立資料
# 'id': '1'
# 'name': 'Peter'
# 'country': 'Taiwan'
#
# 'id': '2'
# 'name': 'Jack'
# 'country': 'USA'
#
# 'id': '3'
# 'name': 'Cindy'
# 'country': 'Japan'

# 將資料寫入json檔案
with ___('___', '___') as outfile:
    json.dump(___, ___)

# 載入 json 模組
import json

# 建立資料
data = {
    'people': [
        {
            'id': '1',
            'name': 'Peter',
            'country': 'Taiwan'
        },
        {
            'id': '2',
            'name': 'Jack',
            'country': 'USA'
        },
        {
            'id': '3',
            'name': 'Cindy',
            'country': 'Japan'
        }
    ]
}

# 將資料寫入json檔案
with open('write.json', 'w') as outfile:
    json.dump(data, outfile)

Python 202 美元收盤匯率

1. 題目說明:
請開啟PYD02.py檔案，依下列題意進行作答，使輸出值符合題意要求。作答完成請另存新檔為PYA02.py再進行評分。

程式所產出的檔案，須輸出與程式同一層資料夾。

2. 設計說明：
請撰寫一程式，爬取read.html，取得「新臺幣對美元銀行間成交之收盤匯率」資料，並將其中日期、NTD/USD兩個欄位的名稱與資料轉存為write.csv (需為UTF-8編碼格式)。

3. 輸入輸出：
輸入說明
爬取網頁

輸出說明
日期、NTD/USD兩個欄位的名稱與資料，輸出至write.csv

模組安裝：python -m pip install beautifulsoup4 lxml

# 載入 csv 模組
import csv
# 自 urllib.request 模組載入 urlopen 函數
from ___ import ___
# 自 bs4 模組載入 BeautifulSoup 函數
from ___ import ___


# 將資料寫入csv檔案，編碼為 utf8
file_name = "___"
f = open(file_name, "w", encoding='___')
# 以 csv 模組的 writer 函數初始化寫檔
w = ___.___(f)

# 爬取的目標網頁
htmlname = "___"
# urlopen 函數讀取 html 檔案
html = urlopen(___)
# 指定 BeautifulSoup 的解析器為 lxml
bsObj = BeautifulSoup(html, "___")

count = 0
# 將其中日期、NTD/USD 兩個欄位的名稱與資料轉存為csv
# 資料位置
for single_tr in bsObj.find("___", {"class": "___"}).findAll("___"):
    if count == 0:
        # 擷取資料位置
        cell = single_tr.findAll("___")
    else:
        # 擷取資料位置
        cell = single_tr.findAll("___")
    F0 = cell[0].text
    F1 = cell[1].text
    data = [[F0, F1]]
    w.writerows(data)
    count = count + 1
f.close()

# 載入 csv 模組
import csv
# 自 urllib.request 模組載入 urlopen 函數
from urllib.request import urlopen
# 自 bs4 模組載入 BeautifulSoup 函數
from bs4 import BeautifulSoup


# 將資料寫入csv檔案，編碼為 utf8
file_name = "write.csv"
f = open(file_name, "w", encoding='utf8')
# 以 csv 模組的 writer 函數初始化寫檔
w = csv.writer(f)

# 爬取的目標網頁
htmlname = "file:./read.html"
# urlopen 函數讀取 html 檔案
html = urlopen(htmlname)
# 指定 BeautifulSoup 的解析器為 lxml
bsObj = BeautifulSoup(html, "lxml")

count = 0
# 將其中日期、NTD/USD 兩個欄位的名稱與資料轉存為csv
# 資料位置
for single_tr in bsObj.find("table", {"class": "DataTable2"}).findAll("tr"):
    if count == 0:
        # 擷取資料位置
        cell = single_tr.findAll("th")
    else:
        # 擷取資料位置
        cell = single_tr.findAll("td")
    F0 = cell[0].text
    F1 = cell[1].text
    data = [[F0, F1]]
    w.writerows(data)
    count = count + 1
f.close()

Python 204 新北市大專院校名

1. 題目說明:
請開啟PYD02.py檔案，依下列題意進行作答，使輸出值符合題意要求。作答完成請另存新檔為PYA02.py再進行評分。

2. 設計說明：
(1) 請撰寫一程式，爬取新北市大專院校名單，API連結如下：https://www.codejudger.com/target/5204.json
(2) 程式須輸出：新北市每一所大專院校的相關訊息：名稱、地址、聯絡電話、網站、資料更新時間。

3. 輸入輸出：
輸入說明
爬取API資料

輸出說明
新北市每一所大專院校的相關訊息：名稱、地址、聯絡電話、網站、資料更新時間

模組安裝：python -m pip install requests

# 載入 requests 模組
import ___
# 載入 json 模組
import ___

# 開放資料連結
url = '____'
# 以 requests 模組發出 HTTP GET 請求
res = ___.___(url)

# 將回傳結果轉換成標準JSON格式
data = json.loads(res.text)

# 輸出新北市大專院校名單
print('新北市大專院校名單：\n')
for record in data:
    if record['type'] == '大專院校':
        print('名稱：%s' % record['___'])
        print('地址：%s' % record['___'])
        print('聯絡電話：%s' % record['___'])
        print('網站：%s' % record['___'])
        print('資料更新時間：%s' % record['___'])
        print()

# 載入 requests 模組
import requests
# 載入 json 模組
import json

# 開放資料連結
url = 'https://www.codejudger.com/target/5204.json'
# 以 requests 模組發出 HTTP GET 請求
res = requests.get(url)

# 將回傳結果轉換成標準JSON格式
data = json.loads(res.text)

# 輸出新北市大專院校名單
print('新北市大專院校名單：\n')
for record in data:
    if record['type'] == '大專院校':
        print('名稱：%s' % record['name'])
        print('地址：%s' % record['address'])
        print('聯絡電話：%s' % record['tel'])
        print('網站：%s' % record['website'])
        print('資料更新時間：%s' % record['update_date'])
        print()

Python 302 矩陣

1. 題目說明:
請開啟PYD03.py檔案，依下列題意進行作答，使輸出值符合題意要求。作答完成請另存新檔為PYA03.py再進行評分。

2. 設計說明：
請用numpy隨機產生5~15之間，15個正整數並輸出
請將 1. 轉成3×5的X矩陣並輸出
請輸出X矩陣的最大值
請輸出X矩陣的最小值
請輸出X矩陣的總和
請輸出X矩陣四個角落的元素內容
3. 輸入輸出：
輸入說明
無

輸出說明
1.請用numpy隨機產生5~15之間，15個正整數並輸出
2.請將 1. 轉成3×5的X矩陣並輸出
3.請輸出X矩陣的最大值
4.請輸出X矩陣的最小值
5.請輸出X矩陣的總和

# --開始--批改評分使用，請勿變動
set_seed = 123
# --結束--批改評分使用，請勿變動

import numpy as np

x = np.random.RandomState(set_seed).randint(low=5, high=16, size=15)
print('隨機正整數：', ___)

x = x.reshape(___, ___)
print('X矩陣內容：')
print(___)
print('最大：', ___)
print('最小：', ___)
print('總和：', ___)
print('四個角落元素：')
print(x[np.ix_([___, ___], [___, ___])])

# --開始--批改評分使用，請勿變動
set_seed = 123
# --結束--批改評分使用，請勿變動

import numpy as np

x = np.random.RandomState(set_seed).randint(low=5, high=16, size=15)
print('隨機正整數：', x)

x = x.reshape(3, 5)
print('X矩陣內容：')
print(x)
print('最大：', np.max(x))
print('最小：', np.min(x))
print('總和：', np.sum(x))
print('四個角落元素：')
print(x[np.ix_([0, -1], [0, -1])])

Python 304 資料處理與分析

1. 題目說明：
請開啟PYD03.py檔案，依下列題意進行作答，使輸出值符合題意要求。作答完成請另存新檔為PYA03.py再進行評分。

2. 設計說明：
請讀取read.csv中的資料轉換成numpy陣列，並輸出以下資訊：

資料集型態
平均數
中位數
標準差
變異數
極差值
註：數值需四捨五入至小數點後兩位

3. 輸入輸出：
輸入說明
讀取read.csv的內容

輸出說明
資料集型態
平均數
中位數
標準差
變異數
極差值

# 載入 numpy 模組

# 載入 pandas 模組縮

# 讀入 read.csv 檔案

# 判斷資料型態
print('資料型態：%s' % ___(__))
# 計算平均數
print('平均值：%.2f' % __.___(__))
# 計算中位數
print('中位數：%.2f' % __.___(__))
# 計算標準差
print('標準差：%.2f' % __.___(__))
# 計算變異數
print('變異數：%.2f' % __.___(__))
# 計算極差值
print('極差值：%.2f' % __.___(__))

# 載入 numpy 模組
import numpy as np
# 載入 pandas 模組縮
import pandas as pd

# 讀入 read.csv 檔案
n = np.array(pd.read_csv('read.csv'))

# 判斷資料型態
print('資料型態：%s' % type(n))
# 計算平均數
print('平均值：%.2f' % np.mean(n))
# 計算中位數
print('中位數：%.2f' % np.median(n))
# 計算標準差
print('標準差：%.2f' % np.std(n))
# 計算變異數
print('變異數：%.2f' % np.var(n))
# 計算極差值
print('極差值：%.2f' % (np.max(n) - np.min(n) ))

Python 402 市場成交行情：折線圖

1. 題目說明:
請開啟PYD04.py檔案，依下列題意進行組合改寫，使輸出值符合題意要求。作答完成請另存新檔為PYA04.py再進行評分。

2. 設計說明：
請讀取果菜市場香蕉成交行情read.csv資料，主要有兩個欄位：成交日期、成交平均價。再以matplotlib輸出折線圖chart.png，輸出圖表的參數如下：

顯示圖例（legend）：banana
圖表標題：Market Average Price
以成交日期為X軸，X軸名稱：date
以成交平均價為Y軸，Y軸名稱：NT$
Y軸下限15、上限25
3. 輸入輸出：
輸入說明
讀取read.csv的內容

輸出說明
輸出chart.png圖檔

# --開始--批改評分使用，請勿變動
import matplotlib as mpl
mpl.use('Agg')
# --結束--批改評分使用，請勿變動

# 載入 matplotlib.pyplot 並縮寫為 plt
import ___ as ___
# 載入 csv 模組
import ___

x = []
y = []

# 讀入 read.csv
with open('___', 'r', encoding='utf8') as csvfile:
    plots = csv.reader(csvfile, delimiter=',')
    for row in plots:
        x.append(row[0])
        y.append(float(row[1]))

x_ticks = range(1, len(x) + 1)
plt.___(x_ticks, y, label=___)
plt.xticks(x_ticks, x)
plt.xlabel(___)
plt.ylabel(___)
plt.ylim(___)
# 添加圖表標題 title()
plt.___('Market Average Price')
plt.legend()
# 使用 savefig() 函數
plt.___('chart.png')
plt.close()

# --開始--批改評分使用，請勿變動
import matplotlib as mpl
mpl.use('Agg')
# --結束--批改評分使用，請勿變動

# 載入 matplotlib.pyplot 並縮寫為 plt
import matplotlib.pyplot as plt
# 載入 csv 模組
import csv

x = []
y = []

# 讀入 read.csv
with open('read.csv', 'r', encoding='utf8') as csvfile:
    plots = csv.reader(csvfile, delimiter=',')
    for row in plots:
        x.append(row[0])
        y.append(float(row[1]))

x_ticks = range(1, len(x) + 1)
plt.plot(x_ticks, y, label='banana')
plt.xticks(x_ticks, x)
plt.xlabel('date')
plt.ylabel('NT$')
plt.ylim(15, 25)

# 添加圖表標題 title()
plt.title('Market Average Price')
plt.legend()

# 使用 savefig() 函數
plt.savefig('chart.png')
plt.close()

Python 404 成績統計：長條圖

1. 題目說明:
請開啟PYD04.py檔案，依下列題意進行作答，使輸出值符合題意要求。作答完成請另存新檔為PYA04.py再進行評分。

2. 設計說明：
請讀取read.csv中的資料，再以matplotlib輸出長條圖chart.png，輸出圖表的參數如下：

圖表標題：Score ranges count
X軸名稱：Range
Y軸名稱：Quantity
標題字型大小：20
X軸和Y軸字型大小：14
長條寬度：2
X軸刻度：0~19, 20~39, 40~59, 60~79, 80~100
Y軸刻度：0到25，間隔5
3. 輸入輸出：
輸入說明
讀取read.csv的內容

輸出說明
輸出chart.png圖檔

# --開始--批改評分使用，請勿變動
import matplotlib as mpl
mpl.use('Agg')
# --結束--批改評分使用，請勿變動

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 讀取學生分數資料
# 讀取 read.csv
df = ___(___)
scores = df["___"].values

# range_count[0]: range0~19
# range_count[1]: range20~39
# range_count[2]: range40~59
# range_count[3]: range60~79
# range_count[4]: range80~100
# 以0初始化計數串列
range_count = [0] * 5

# 計數過程
for score in scores:
    if score < 20:
        range_count[0] += 1
    elif score < 40:
        range_count[1] += 1
    elif score < 60:
        range_count[2] += 1
    elif score < 80:
        range_count[3] += 1
    else:
        range_count[4] += 1

# y軸標籤
index = np.arange(___, ___, ___)
# X軸刻度
labels = [___, ___, '40~59', ___, '80~100']
# 畫出長條圖
plt.bar(___, range_count, ___)
# 設定X軸名稱
plt.xlabel('___', fontsize=___)
# 設定Y軸名稱
plt.ylabel('___', fontsize=___)
# 設定x軸標籤
plt.xticks(index, labels)
# 設定y軸標籤
plt.yticks(index)
# 設定圖名稱
plt.title('___', fontsize=___)
# 輸出圖片檔案
plt.___('___')
plt.close()

# --開始--批改評分使用，請勿變動
import matplotlib as mpl
mpl.use('Agg')
# --結束--批改評分使用，請勿變動

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 讀取學生分數資料
# 讀取 read.csv
df = pd.read_csv('read.csv')
scores = df["scores"].values

# range_count[0]: range0~19
# range_count[1]: range20~39
# range_count[2]: range40~59
# range_count[3]: range60~79
# range_count[4]: range80~100
# 以0初始化計數串列
range_count = [0] * 5 # 結果: [0, 0, 0, 0, 0]

# 計數過程
for score in scores:
    if score < 20:
        range_count[0] += 1
    elif score < 40:
        range_count[1] += 1
    elif score < 60:
        range_count[2] += 1
    elif score < 80:
        range_count[3] += 1
    else:
        range_count[4] += 1

# y軸標籤
index = np.arange(0, 25, 5)
# X軸刻度
labels = ['0~19', '20~39', '40~59', '60~79', '80~100']
# 畫出長條圖
plt.bar(labels, range_count, width=0.4)
# 設定X軸名稱
plt.xlabel('Range', fontsize=14)
# 設定Y軸名稱
plt.ylabel('Quantity', fontsize=14)
# 設定x軸標籤
# plt.xticks(index, labels) # 多餘的，因為 plt.bar(...) 的部分已經設定了
# 設定y軸標籤
plt.yticks(index)
# 設定圖名稱
plt.title('Score ranges count', fontsize=20)
# 輸出圖片檔案
plt.savefig('chart.png')
plt.close()

TQC網頁資料擷取與分析

Python102.104.202.204.302.304.402.404

Python 102 新北市公共自行車即時資訊

Python 104 JSON檔案輸出處理

Python 202 美元收盤匯率

Python 204 新北市大專院校名

Python 302 矩陣

Python 304 資料處理與分析

Python 402 市場成交行情：折線圖

Python 404 成績統計：長條圖

更多文章