Antons blogg om elektronik och Linux

3 augusti, 2010

4chan download script for Linux

Filed under: Okategoriserat,Terminal — Anton @ 14:10
Tags: , , ,

Jag brukar ju vanligtvis skriva på svenska, men eftersom detta inlägget riktar sig mot en internationell publik tänker jag ta tillfället i akt och skriva det på engelska.

The 4chan download script

This is an update of the 4chan download script for Linux written by Daniel Triendl,

The modified script downloads every image file in a 4chan thread, preserving the original file names (not the incrementing numbers given by 4chan). Perfect for downloading entire sets of pictures or other original content. Tested on a few different boards but should theoretically work on all.

Last update: August 2012 (after 4chan’s HTML5 redesign and switch to HTTPS per default). Known bugs and limitations:

  • If there are several files in the thread with the same original filename, only the first will be downloaded.
  • If an image file from another thread is linked to in a post, it will also be downloaded and the link-filename relationship will be messed up.
  • Network errors are treated like 404 errors.
  • Threads that has a slash (/) in the subject breaks the link-filename relationship because the subject is treated like a filename. No known workarounds at this time.
# A bash script for downloading all images in a 4chan thread to their original
# filenames. Updates every 60 seconds until canceled or the thread disappears.
# Copyright 2008, 2010, 2012 Daniel Triendl, Anton Eliasson
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <>.

if [ "$1" = "" ]; then # no arguments
	echo "Usage: `basename $0` <4chan thread url> [optional: download directory]"
	exit 1

if [ "$2" = "" ]; then # only one argument
	LOC=$(echo "$1" | egrep -o '([0-9]*)$' | sed 's/\.html//g' ) # find out the thread number
	LOC=$2 # use download dir specified by user
echo "4chan downloader"
echo "Downloading to \"$LOC\" until canceled or 404'd"

if [ ! -d $LOC ]; then
	mkdir -- $LOC

cd -- $LOC # new directory named after the thread number

while [ "1" = "1" ]; do
	thread=`mktemp` # thread is the html thread
	links=`mktemp` # links will be a list of all image addresses
	names=`mktemp` # names will be a list of all original file names

    # get thread
    echo "Updating..."
	wget -q -k -O "$thread" "$1"
	if [ "$?" != "0" ]; then
		echo "Update failed, exiting"
		rm $thread $links $names
		exit 1

    # get file list, space separated
	grep -E -o 'http[s]?://[a-z0-9]+/src/([0-9]*).(jpg|png|gif)' "$thread" | uniq | tr "\n" " " > "$links"

	# get original file name list, space separated (spaces in filenames changed to underlines)
	sed 's/ /_/g' "$thread" | grep -E -o '<span_title="[^"]+' | awk -F \" '{print $2}' | tr "\n" " " > "$names"

	COUNT=`cat $names | wc -w` # total number of files/names
	for ((i=1; i<=$COUNT; i++)); do
		wget -nv -nc -O `cut -d ' ' -f $i $names` `cut -d ' ' -f $i $links` # now download all files, one by one

	rm $thread $links $names

	echo "Waiting 60 seconds before next run"
	sleep 60

This should run on any Linux-based OS using the bash shell. Feel free to contact me if you find any bugs and/or improve the script.


5 kommentarer »

  1. This doesnt work on os x, so just do this..

    TMP=`mktemp /tmp/$RANDOM` # TMP is the html thread
    TMP2=`mktemp /tmp/$RANDOM` # TMP2 will be a list of all image addresses
    TMP3=`mktemp /tmp/$RANDOM` # TMP3 will be a list of all original file names

    Kommentar av Anonym — 14 juli, 2011 @ 01:08

  2. That fix probably still applies, but the variables have been renamed to thread, links and names respectively.

    Kommentar av Anton — 1 juni, 2012 @ 10:25

  3. I’m having some trouble getting it to work. It looks like maybe part of the grep command that makes the list of names is missing?

    Kommentar av zorblek — 4 juni, 2012 @ 03:08

  4. Yes, I accidentally posted the code without converting the < and > to &lt; and &gt; so parts of the code was interpreted as HTML. Try it again now.

    Kommentar av Anton — 8 juni, 2012 @ 10:52

  5. Update for

    # get original file name list, space separated (spaces in filenames changed to underlines)
    sed ‘s/ /_/g’ ”$thread” | \
    grep -E -o ‘[^<]*\)’ | \
    cut -f 2 -d ‘>’ | \
    cut -f 1 -d ‘ ”$names”

    Probably also want to avoid filename collisions (untested code):

    # now download all files, one by one
    for ((i=1; i<=$COUNT; i++)); do
    # Get the source link and destination name.
    THISNAME="$(cut -d ' ' -f $i $names)"
    THISLINK="$(cut -d ' ' -f $i $links)"

    # If the target file already exists
    # (two different posters submit different images with the same filename)
    # then rename any duplicate names.
    if [ -e "${THISNAME}" ]; then
    NEWNAME="$(echo "${THISNAME}" | \
    sed "s/….$/.${NAMECTR}&/"

    while [ -e "${NEWNAME}" ] ; do
    NEWNAME="$(echo "${THISNAME}" | \
    sed "s/….$/.${NAMECTR}&/"

    wget -nv -nc -O "${THISNAME}" "${THISLINK}"

    Kommentar av Anonym — 6 januari, 2014 @ 00:28

RSS feed for comments on this post. TrackBack URI


Fyll i dina uppgifter nedan eller klicka på en ikon för att logga in: Logo

Du kommenterar med ditt Logga ut / Ändra )


Du kommenterar med ditt Twitter-konto. Logga ut / Ändra )


Du kommenterar med ditt Facebook-konto. Logga ut / Ändra )


Du kommenterar med ditt Google+-konto. Logga ut / Ändra )

Ansluter till %s

Blogga med

%d bloggare gillar detta: