22 Feb

How to get favicon.ico files from Alexa Top 1000 sites in 2 minutes with Python

Make folders:

mkdir -p favicons/icons ; cd favicons

Get a list of Alexa Top 1000 sites:

curl -s -O http://s3.amazonaws.com/alexa-static/top-1m.csv.zip ; unzip -q -o top-1m.csv.zip top-1m.csv ; head -1000 top-1m.csv | cut -d, -f2 | cut -d/ -f1 > topsites.txt

Former time-saving oneliner was found here.

yum install python-gevent

Gevent is a high-performance network framework for Python built on top of libevent and greenlets.

Few modifications of an example shipped with gevent:

#!/usr/bin/python
# Copyright (c) 2009 Denis Bilenko. See LICENSE for details.
"""Spawn multiple workers and wait for them to complete"""

ursl = []
urls = lines = ['http://www.' + line.strip() for line in open('topsites.txt')]

import gevent
from gevent import monkey

# patches stdlib (including socket and ssl modules) to cooperate with other greenlets
monkey.patch_all()

import urllib2
from socket import setdefaulttimeout
setdefaulttimeout(30)

def print_head(url):
     print ('Starting %s' % url)
     url = url + '/favicon.ico'
     try:
         data = urllib2.urlopen(url).read()
         except Exception, e:
         print 'error', url, e
         return

    fn = 'icons/' + url[+11:].replace("/", "-")
    myFile = file(fn, 'w')
    myFile.write(data)
    myFile.close()

jobs = [gevent.spawn(print_head, url) for url in urls]

gevent.joinall(jobs)
[dande@host favicons]$ time python ./get.py
...

real 0m50.644s
user 0m1.914s
sys 0m0.888s
[dande@host favicons]$
[dande@host favicons]$ ls icons/ | wc -l
889
[dande@host favicons]$

Well, there’s no much sense except fooling around with Python.

21 Feb

How to download Coursera materials with use of Python

Install coursera-dl by Dirk Gorissen:

python-pip install coursera-dl

Make a folder to store files:

mkdir -p ./courses/comnetworks-2012-001

Run:

coursera-dl -u y [email protected] -p your_password ./courses/comnetworks-2012-001 comnetworks-2012-001

Enjoy.

If you want to check if there are new materials you should run the same command. coursera-dl is smart enough to skip files you already have:

- Downloading resources for 2-6 Link Layer Overview (0414)
- "2-readings.pdf" already exists, skipping
- "2-6-link-overview-ink.pdf" already exists, skipping
- "2 - 6 - 2-6 Link Layer Overview (0414).txt" already exists, skipping
- "2 - 6 - 2-6 Link Layer Overview (0414).srt" already exists, skipping
- "2 - 6 - 2-6 Link Layer Overview (0414).mp4" already exists, skipping
20 Feb

Watching specified files/folders for changes in Python

For specific purposes there could be a need to monitor file and folders changes on Linux box. To achieve this you can go with incrond. There’s also Pythonic way. Several Python wrappers on inotify feature  are accessible. Here we’ll cover simple Python daemon Watcher (github repo). First of all, we need to install python-inotify package:

yum install python-inotify.noarch

python-inotify uses Linux kernel feature called inotify (accessible starting from version 2.6.13). It allows to get notifications on file system event from user-space.

Now you can download last version of the config and the daemon:

mkdir watcher
cd watcher
wget https://raw.github.com/splitbrain/Watcher/master/watcher.ini
wget https://raw.github.com/splitbrain/Watcher/master/watcher.py

Modify your watcher.ini to meet your requirements:

[DEFAULT]
logfile=/tmp/watcher.log
pidfile=/tmp/watcher.pid
[job1]
watch=/tmp
events=create,delete
recursive=false
autoadd=true
command=ls -l $filename

Now you are ready to start Watcher daemon:

chmod u+x watcher.py
./watcher.py -c watcher.ini debug
19 Feb

10 Minutes Celery Introduction

Celery is an asynchronous task queue/job queue based on distributed message passing. This post is not detailed introduction but rather a short how-to start using Celery.

Using Celery supposes having of several components. It’s a:

  • broker. Think it as a transport. You can choose among RabbitMQ, Redis or SQL servers;
  • worker application which executes task;
  • client application which should add tasks to the queue.

Let’s get started. At the very beginning there’s a need to install Celery. I run Fedora server. If you use Debian use apt-get.

yum install python-celery.noarch

For the sake of simplicity we’ll use Redis as a broker. It’s fast, simple to setup and doesn’t consume a lot of resources.

yum install redis

Now we can tune some options. Here’s redis.conf example:

daemonize no
pidfile /var/run/redis/redis.pid
port 6379
bind 127.0.0.1
timeout 0
loglevel notice
logfile /var/log/redis/redis.log
databases 16
save 900 1
save 300 10
save 60 10000
rdbcompression yes
dbfilename dump.rdb
dir /var/lib/redis/
slave-serve-stale-data yes
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
slowlog-log-slower-than 10000
slowlog-max-len 128
vm-enabled no
vm-swap-file /tmp/redis.swap
vm-max-memory 0
vm-page-size 32
vm-pages 134217728
vm-max-threads 4
hash-max-zipmap-entries 512
hash-max-zipmap-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes

We will also need celery-with-redis package which Celery requires to work with Redis:

python-pip install -U celery-with-redis

Keep in mind that this command would also update your current Celery installation with its dependencies. It’s not big deal, but you might need to know.

Now let’s create our worker application called tasks.py:

from celery import Celery
celery = Celery('tasks', broker='redis://localhost:6379/0', backend='redis://localhost:6379/1')

@celery.task
def add(x, y):
    return x + y

Now we can launch it:

celery -A tasks worker --loglevel=info

You should get similar output:

 
-------------- celery@turtle v3.0.15 (Chiastic Slide)
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: redis://localhost:6379/0
- ** ---------- . app: tasks:0x1d1b690
- ** ---------- . concurrency: 1 (processes)
- ** ---------- . events: OFF (enable -E to monitor this worker)
- ** ----------
- *** --- * --- [Queues]
-- ******* ---- . celery: exchange:celery(direct) binding:celery
--- ***** -----

[Tasks]
. tasks.add

[2013-02-19 23:52:42,339: WARNING/MainProcess] celery@turtle ready.
[2013-02-19 23:52:42,361: INFO/MainProcess] consumer: Connected to redis://localhost:6379/0.

Here’s our client application:

from tasks import add
result = add.delay(4, 4)
print result.get(timeout=1)

Note that here we use Celeray in synchronous mode. It means that we wait till the result is ready. I believe in most cases one would use Celery in asynchronous mode. Here we use it just to get a result to make sure everything works.

Output:

[dande@turtle ~]# python client.py
8
[dande@turtle ~]#

Now as everything is ready we can start to think about what we can do with described solution.

By the way, if you are interested in how Celery uses Redis run:

redis-cli monitor
07 Feb

Tag Clouds in Python

One of the most beautiful things about Python is a plenty of third-party libraries (which Lua world unfortunately lacks). Creating tag cloud in Python is quite easy. Firstly, we need to install required packages. I run Fedora 18.

yum install python-pip.noarch
yum install pygame
yum install simplejson

Now you an install pytagcloud. (NB: do not use easy_install, pip is the right way to go).

python-pip install -U pytagcloud

Now we are ready to create our first tag cloud image:

from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts

TEXT = '''
You know the day destroys the night
Night divides the day
Tried to run
Tried to hide
Break on through to the other side
Break on through to the other side
Break on through to the other side, yeah

We chased our pleasures here
Dug our treasures there
But can you still recall
The time we cried
Break on through to the other side
Break on through to the other side
Yeah!
C'mon, yeah
Everybody loves my baby
Everybody loves my baby
She get
She get
She get
She get high
I found an island in your arms
Country in your eyes
Arms that chain us
Eyes that lie
Break on through to the other side
Break on through to the other side
Break on through, oww!
Oh, yeah!
Made the scene
Week to week
Day to day
Hour to hour
The gate is straight
Deep and wide
Break on through to the other side
Break on through to the other side
Break on through
Break on through
Break on through
Break on through
Yeah, yeah, yeah, yeah
Yeah, yeah, yeah, yeah, yeah'''

tags = make_tags(get_tag_counts(TEXT), maxsize=150)

create_tag_image(tags, 'cloud_large.png', size=(900, 600))

After you run this script you should have file ‘cloud_large.png’ in your current directory.
Here is mine:

cloud_large5

Python is great.

07 Feb

trac deployment with gunicorn and systemd in Fedora 17

gunicorn installation:

yum install python-gunicorn.noarch

Put systemd unit file to /lib/systemd/system/gunicorn-trac.service:

[Unit]
Description=gunicorn-trac

[Service]
ExecStart=/usr/bin/gunicorn -D -n gunicorn-trac -w5 tracwsgi:application -b 127.0.0.1:8000 --access-logfile /home/trac/log/access.log --error-logfile /home/trac/log/error.log
Type=forking
User=trac
Group=trac
Restart=always
StandardOutput=syslog
StandardError=syslog
WorkingDirectory = /home/trac/

[Install]
WantedBy=multi-user.target

Enabling, starting:

systemctl enable gunicorn-trac
systemctl start gunicorn-trac

Checking:

[root@moonstation ~]# netstat -lpn | grep gun
tcp 0 0 127.0.0.1:8000 0.0.0.0:* LISTEN 1034/gunicorn: maste
[root@moonstation ~]#

Everything seems fine. Now we can proceed with nginx setup as frontend to trac.

01 Feb

trac deployment under nginx on Centos 6

Here’s is an example of how to deploy trac under nginx. I assume that you already have trac installed.

server {
    listen 192.168.1.1:80;
    server_name trac.example.com www.trac.example.com default;

    location /chrome/common/ {
         alias /usr/lib/python2.6/site-packages/trac/htdocs/;
         expires 1M;
         add_header Cache-Control private;
         gzip_static on;
         gzip_disable Firefox/([0-2]\.|3\.0);
         gzip_disable Chrome/2;
         gzip_disable Safari;
    }
    location / {
        auth_basic            "Authorized area";
        auth_basic_user_file  /home/trac/.passwords;

        proxy_pass  http://127.0.0.1:8000;
        proxy_set_header REMOTE_USER $remote_user;
    }
}

trac is being launched this way:

/usr/bin/python /usr/sbin/tracd --daemonize --pidfile=/tmp/tracd.pid --port=8000 --protocol=http --single-env /home/trac -b 127.0.0.1 --basic-auth==/home/trac,/home/trac/.passwords,example.com

To get the authorization working you should also have this parameter in your trac.ini file:

obey_remote_user_header = true
01 Feb

How to modify a variable inside of function

Code example

class z():
    def __init__(self):
        self.z = ['foo']
        print 'before', self.z

    def zoo(self, doo):
        doo[0] = 'ya'

class b():
    def __init__(self):
        self.z = 'foo'
        print 'before', self.z

    def zoo(self, doo):
        doo = 'ya'

A = z()
A.zoo(A.z)
print 'after', A.z

print

B = b()
B.zoo(B.z)
print 'after', B.z

Output

[dandelion@bart ~]$ python z.py
before ['foo']
after ['ya']

before foo
after foo
[dandelion@bart ~]$

Explanation

It’s simple. In Python ‘string’ is immutable object, while [list] is mutable one.