In this video tutorial, we covered how to scrape Instagram image and video posts. The JSON provided by Instagram for album/carousel posts is a little different. Let’s take a look at an Instagram album post such as https://www.instagram.com/p/CBIvNwGB2IX/. By appending ?__a=1 to the URL ( so it becomes https://www.instagram.com/p/CBIvNwGB2IX/?__a=1 ), we can get the JSON. Within the JSON, there is a property called edge_sidecar_to_children which has a list called edges. By looping over the edges list, we can get the link to each image or video within the post. Check out the video tutorial and full source code below:
import requests, urllib.request, os
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36"}
r = requests.get("https://www.instagram.com/p/CBIvNwGB2IX/?__a=1", headers=headers)
data = ''
if r.status_code == 200:
data = r.json()['graphql']['shortcode_media']
print(r.status_code)
print(data)
else:
print(r.status_code)
exit("Error downloading JSON")
shortcode = data['shortcode']
print(shortcode)
# create download directory
directory = f'{shortcode}'
cwd = os.getcwd()
if not os.path.exists(directory):
os.makedirs(directory)
os.chdir(os.path.join(cwd, directory))
# loop and download images/videos in the album
posts = data['edge_sidecar_to_children']['edges']
count = 0
for post in posts:
download_url = post['node']['display_url']
type = post['node']['__typename']
if type == 'GraphImage':
urllib.request.urlretrieve(download_url, f'{count}.jpg')
else:
urllib.request.urlretrieve(download_url, f'{count}.mp4')
count += 1
Level Up Your CODING Skills with our Awesome Courses