2012 April

Archive for April, 2012

Simplified Python Multithreading

Apr.24, 2012

I’ve been bashing my head against this topic for months now on some level. I’ve looked around the web at people’s horrible explanations of how to use the threading module in Python and now that I figured it out for myself, let me unleash my horrible explanation on the world. 🙂

In order to properly use threading, first off, you need a task that can be passed off to a thread and done in the background in parallel with other tasks. For example, crawling a website. A crawler could grab a page, parse all of the links for that page and then pass all of the links off to process threads to speed up the process.

My problem before was that I was thinking about threading all wrong. I was thinking that you launch the threads via the process you are trying to accomplish so in other words something like:

def thread(url):

    content = crawlPage(url)

    return content

^^^ BAD!!! That doesn’t work.

The proper way to think about threads is that you are creating empty background while loops. Those loops continue running and picking things out of your queue to process. If there is nothing in the queue, they’ll just indefinitely loop until something shows up. The code I’m going to show you below has been distilled down to what *I* think you need for a good thread implementation. It has several great features such as monitoring, throttling(upwards only) and a queue.

There is some fluff in this code for purpose of demonstration that could be pulled out when it comes time for implementation but I would suggest you download/retype this code, run it and tweak it a bit to see what happens with it. I pretty much promise you that you’ll get a better feel for how threads work after messing with this code for a bit.

#!/usr/bin/python

import time 
import Queue 
import threading 
import random

# create the queue that will feed the background threads
q = Queue.Queue() 

# this starts and individual thread that continuously 
# pull jobs out of the queue
def backgroundThread():
# this while loop keeps the thread running continuously
    while True: 
        # get the job out of the queue
        job = q.get() 
        # do the job
        print "Job output: %s" % job
        # pause so the loop doesn't spike your cpu load
        time.sleep(5)
        # report back to the queue that the task is finished
        q.task_done()

# this launches the necessary number of background threads
def launchThreads(num_threads):
    for i in range(num_threads): 
        # create the thread
        t = threading.Thread(target=backgroundThread) 
        # setting daemon mode makes threads cease when
        # main thread is terminated
        t.setDaemon(True)
        # start the thread
        t.start() 
        print "Now launching %s" % t.name
        
# launch 2 background threads
launchThreads(1)

# Main loop
while True:
    #this spins up more workers if queue gets too big
    if q.qsize() > 10:
        launchThreads(1)

    # this will be results of another process or a list/file iteration
    job = random.uniform(1,300)

    # push the job into the queue.
    q.put(job)

    # this slows down the queue filling up for demonstration
    time.sleep(random.uniform(0,1))

    # shows the size of the queue
    print "%s jobs in queue" % q.qsize()

    # this prints a list of the threads currently running
    for thread in threading.enumerate():
        print thread

If downloading the code works better for you, here’s a link to it. threading-demo.py

Tags: python

notANON

Archive for April, 2012

Simplified Python Multithreading

recent entries…

Recent Comments

friends & links

Monthly archives