从米游社爬取原神wiki数据

最近上头了一个尘歌壶套装,但是前期统计材料需求、收集材料是很痛苦的过程,如果计算出错就需要来回捯饬,很是费时间。于是乎就想要收集一个全素材的材料表,找到找去还是从米游社官方去爬数据最简单。大概思路是登陆之后,在浏览器控制台找到对数据的请求链接,然后用Python请求到数据,用BeautifulSoup解析html,用json模块解析数据,用pandas整理数据并输出。下面是一些抓数据的代码:

引包

python

import pandas as pd

import requests

import json

import re

请求数据

python

url = 'https://api-static.mihoyo.com/common/blackboard/ys_obc/v1/home/content/list?app_sn=ys_obc&channel_id=189'

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}

response = requests.get(url,headers=headers)

rep = json.loads(response.text)

数据整理可以从该json中解析出所需要的数据

python

datadb = rep['data']['list'][0]['children']

fur = datadb[13]

不同家具的情况不一样,有些位置会空缺数据,需要用try/catch识别,并且出错后报告

python

res = []

n=0

for i in fur['list']:

title = i['title']

summary = i['summary']

ext = json.loads(i['ext'])

try:

region = re.compile(r'\"区域\/\S+?\"').findall(ext['c_130']['filter']['text'])[0][4:-1]

except:

region = "Error"

print(n, title, summary, region, type, quality, blueprintAccess, furnitureAccess)

try:

type = re.compile(r'\"类型\/\S+?\"').findall(ext['c_130']['filter']['text'])[0][4:-1]

except:

type = "Error"

print(n, title, summary, region, type, quality, blueprintAccess, furnitureAccess)

try:

quality = re.compile(r'\"品质\/\S+?\"').findall(ext['c_130']['filter']['text'])[0][4:-1]

except:

quality = "Error"

print(n, title, summary, region, type, quality, blueprintAccess, furnitureAccess)

try:

blueprintAccess = re.compile(r'\"图纸获取方式\/\S+?\"').findall(ext['c_130']['filter']['text'])[0][8:-1]

except:

blueprintAccess = "Error"

print(n, title, summary, region, type, quality, blueprintAccess, furnitureAccess)

try:

furnitureAccess = re.compile(r'\"摆设获取方式\/\S+?\"').findall(ext['c_130']['filter']['text'])[0][8:-1]

except:

furnitureAccess = "Error"

print(n, title, summary, region, type, quality, blueprintAccess, furnitureAccess)

n=n+1

res.append([title, summary, region, type, quality, blueprintAccess, furnitureAccess])

数据存储

python

result = pd.DataFrame(res, columns=['名称','注释','区域','分类','品质','图纸获取方式','摆设获取方式'])

result.to_csv('家具表.csv')