current position:Home>Python crawler obtains the data of Douban movie top250 fantasy class

Python crawler obtains the data of Douban movie top250 fantasy class

2022-02-03 00:55:02 CSDN Q & A

# There is an error :ValueError: Length mismatch: Expected axis has 0 elements, new values have 12 elements

# Code :

Get the ranking of Douban films - Fantasy Film

import requests
import pandas as pd
import time

Grab from web page

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
}
quest_data= pd.DataFrame()
n = 0
index = []
N=20# Number of movies per acquisition
for interval_id in range(10,0,-1): # In the range of interval_id
for start in range(9999): # Start sequence number start
url = 'https://movie.douban.com/j/chart/top_list?type=23&interval_id={}%3A{}&action=&start={}&limit={}'.format(interval_id * 10, (interval_id - 1) * 10, start * N, N)
html1 = requests.post(url, headers=headers)
try:
film_data = eval(html1.text.replace('true', 'True').replace('false', 'False'))
except:
continue
for film in film_data:
df = pd.DataFrame(
data=[film['rank'], film['title'], film['cover_url'], film['actors'], film['is_playable'],
film['id'], film['types'],
film['regions'], film['release_date'], film['score'], film['vote_count'], film['url']])
quest_data= quest_data.append(df.T)
n += 1
index.append(n)

    time.sleep(6)    if len(film_data) < N:        break

quest_data.columns = [' ranking ',' Film name ',' Movie picture link ',' actor ', ' Playable ',' douban ID',' Film type ',' Producer country ',' Time of issue ', ' score ', ' Number of evaluators ',' link ']
quest_data.index = index

Store data

quest_data.to_excel(r'E:\ Douban film category ranking - Fantasy Film .xls') # r'D:\ Douban film category ranking -xx slice .xls'




Refer to the answer 1:

Please sort out the code format , Note that the indentation




Refer to the answer 2:

copyright notice
author[CSDN Q & A],Please bring the original link to reprint, thank you.
https://en.primo.wiki/2022/02/202202030055005284.html

Random recommended