Antons blogg om elektronik och Linux

3 augusti, 2010

4chan download script for Linux

Filed under: Okategoriserat,Terminal — Anton @ 14:10
Tags: , , ,

Jag brukar ju vanligtvis skriva på svenska, men eftersom detta inlägget riktar sig mot en internationell publik tänker jag ta tillfället i akt och skriva det på engelska.

The 4chan download script

This is an update of the 4chan download script for Linux written by Daniel Triendl,

The modified script downloads every image file in a 4chan thread, preserving the original file names (not the incrementing numbers given by 4chan). Perfect for downloading entire sets of pictures or other original content. Tested on a few different boards but should theoretically work on all.

Last update: August 2012 (after 4chan’s HTML5 redesign and switch to HTTPS per default). Known bugs and limitations:

  • If there are several files in the thread with the same original filename, only the first will be downloaded.
  • If an image file from another thread is linked to in a post, it will also be downloaded and the link-filename relationship will be messed up.
  • Network errors are treated like 404 errors.
  • Threads that has a slash (/) in the subject breaks the link-filename relationship because the subject is treated like a filename. No known workarounds at this time.
# A bash script for downloading all images in a 4chan thread to their original
# filenames. Updates every 60 seconds until canceled or the thread disappears.
# Copyright 2008, 2010, 2012 Daniel Triendl, Anton Eliasson
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <>.

if [ "$1" = "" ]; then # no arguments
	echo "Usage: `basename $0` <4chan thread url> [optional: download directory]"
	exit 1

if [ "$2" = "" ]; then # only one argument
	LOC=$(echo "$1" | egrep -o '([0-9]*)$' | sed 's/\.html//g' ) # find out the thread number
	LOC=$2 # use download dir specified by user
echo "4chan downloader"
echo "Downloading to \"$LOC\" until canceled or 404'd"

if [ ! -d $LOC ]; then
	mkdir -- $LOC

cd -- $LOC # new directory named after the thread number

while [ "1" = "1" ]; do
	thread=`mktemp` # thread is the html thread
	links=`mktemp` # links will be a list of all image addresses
	names=`mktemp` # names will be a list of all original file names

    # get thread
    echo "Updating..."
	wget -q -k -O "$thread" "$1"
	if [ "$?" != "0" ]; then
		echo "Update failed, exiting"
		rm $thread $links $names
		exit 1

    # get file list, space separated
	grep -E -o 'http[s]?://[a-z0-9]+/src/([0-9]*).(jpg|png|gif)' "$thread" | uniq | tr "\n" " " > "$links"

	# get original file name list, space separated (spaces in filenames changed to underlines)
	sed 's/ /_/g' "$thread" | grep -E -o '<span_title="[^"]+' | awk -F \" '{print $2}' | tr "\n" " " > "$names"

	COUNT=`cat $names | wc -w` # total number of files/names
	for ((i=1; i<=$COUNT; i++)); do
		wget -nv -nc -O `cut -d ' ' -f $i $names` `cut -d ' ' -f $i $links` # now download all files, one by one

	rm $thread $links $names

	echo "Waiting 60 seconds before next run"
	sleep 60

This should run on any Linux-based OS using the bash shell. Feel free to contact me if you find any bugs and/or improve the script.

Blogga med