Monday, October 20, 2014

Published my first presentation on SpeakerDeck - using Python

By Vasudev Ram

SpeakerDeck is an online presentation service roughly like SlideShare. SpeakerDeck seems to have been created by Github Inc.

I just published my first presentation on SpeakerDeck. It is a quickstart tutorial for the vi editor. Note: vi, not vim. I had written the tutorial some years ago, when vim was not so widely used, and vi was the most common text editor on Unix systems.

About the tutorial:

I first wrote this vi quickstart tutorial for some friends at a company where I worked. They were Windows and network system administrators without prior Unix experience, and had been tasked with managing some Unix servers that the company had bought for client work. Since I had a Unix background, they asked me to create a quick tutorial on vi for them, which I did.

Later on, after learning the basics of vi from it, and spending some days using vi to edit Unix configuration files, write small shell scripts, etc., they told me that they had found the tutorial useful in getting up to speed on vi quickly.

So, some time later, I thought of publishing it, and sent an article proposal to Linux For You magazine (an Indian print magazine about Linux and open source software). The proposal was accepted and the article was published.

About generating the tutorial as PDF and uploading it to SpeakerDeck:

The original vi quickstart tutorial was in text format. Last year I wrote XMLtoPDFBook (as an application of xtopdf, my Python toolkit for PDF creation), which allows the user to create simple PDF e-books from XML files. So I converted the vi tutorial to XML format (*) and used it to test XMLtoPDFBook. I therefore had the tutorial available in PDF format.

(*) All you have to do for that - i.e. to convert a text file to the XML format supported by XMLtoPDFBook - is to insert each chapter's text as a <chapter> element in the XML file. Then give the XML file as the input to XMLtoPDFBook, and you're done.

SpeakerDeck requires that presentations be uploaded in PDF format. It then converts them to slides. So I thought it would be a good test of SpeakerDeck and/or xtopdf, to upload this PDF generated by xtopdf to SpeakerDeck, and see how the result turned out. I did that today. Then I viewed the resulting SpeakerDeck presentation. It was good to see that the conversion turned out well, AFAICT. All pages seem to have got converted correctly into slides.

The presentation can be viewed here:

A vi quickstart tutorial

If you prefer plain text to presentations, you can read the vi quickstart tutorial here.

- Vasudev Ram - Dancing Bison Enterprises

Click here to signup for email notifications about new products and services from Vasudev Ram.

Contact Page

Wednesday, October 15, 2014

Let's do some magic with Python

By Vasudev Ram

python-magic is a Python wrapper for the libmagic C library which allows you to detect the type of a file by reading and deciphering the initial part of its contents, and/or by using the magic number database for file types. The Unix command called file uses libmagic internally. When you give the command:
$ file *
at a Unix command prompt, it gives you output showing its guess (using libmagic) as to the type of each file in the current directory (because the * is a wildcard that matches all the filenames in the current directory).

For example, if there are 10 files in the directory, it may detect and tell you that the 1st file is a text file, the 2nd is the source code of a C program, the 3rd is the object (compiled) code of that C program, the 4th is a PDF file, the 5th is an HTML file, the 6th is a Linux executable (which may be the end result of linking the object code mentioned earlier with some standard libraries), and so on.

Here is a simple example showing the use of the python-magic library:
>>> import magic
>>> magic.from_file("testdata/test.pdf")
'PDF document, version 1.2'
>>> magic.from_buffer(open("testdata/test.pdf").read(1024))
'PDF document, version 1.2'
>>> magic.from_file("testdata/test.pdf", mime=True)
Here is an example program that reads the list of files in the current directory, and for each file, prints the filename, the file type and the file's MIME type.
(I used the term MIME type loosely; it should really be called Internet media type.)

import os
import magic
from magic import from_file

def do_magic(filename):
    file_type = from_file(filename)
    mime_type = from_file(filename, mime=True)
    print "{}: {} | {}".format(filename, file_type, mime_type)

print "filename: file_type | mime_type"
for filename in os.listdir('.'):
Example program output:
filename: file_type | mime_type
awk: directory | inode/directory
awk.tar: POSIX tar archive (GNU) | application/x-tar
echoer: ASCII text | text/plain ASCII text | text/plain PDF document, version 1.3 | application/pdf
prog1.c: ASCII C program text | text/x-c
prog1.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped | application/x-object
prog2.c: ASCII C program text | text/x-c
prog2.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), stripped | application/x-object
reportlab-1.21.1: directory | inode/directory
selpg: directory | inode/directory
test1.tar.gz: gzip compressed data, was "test1.tar", from Unix, last modified: Mon Oct 13 19:50:01 2014 | application/x-gzip Python script, ASCII text executable | text/x-python Python script, ASCII text executable | text/x-python
text_file.txt: ASCII text | text/plain
tpm.out: ASCII text | text/plain
tpm2.out: empty | inode/x-empty
xtopdf: directory | inode/directory
So the python-magic library can be useful, since it allows us to detect the type of a file (correctly most of the time) from within our Python code, and then do something meaningful with that information.

For example, a program that reads all the files under a directory tree, can be made to do the right kind of processing with each type of file, based on the file type it detects using python-magic.


- Vasudev Ram - Dancing Bison Enterprises

Click here to signup for email notifications about new products and services from Vasudev Ram.

Contact Page

Monday, October 13, 2014

Hacker News thread on PDF reporting tools

By Vasudev Ram

I saw this thread about PDF reporting tools on Hacker News (HN) today:

Ask HN: What do you use for PDF reports these days?

It was interesting to see that multiple HN users commented saying that they use ReportLab for PDF report creation in Python and like it a lot. I also commented, mentioning my xtopdf PDF generation library, which is also written in Python and builds on top of Reportlab, and provides a subset of ReportLab's functionality, with a somewhat easier interface / API for that subset.

PrinceXML (*), Jasper (Java), JagPDF (C++, Python, Java, C), Flying Saucer (Java), PDFBox (Java), prawn (Ruby), wkhtmltopdf, FPDF/TCPDF (PHP) were some of the other interesting PDF creation tools or libraries mentioned. I have come across many of these tools in my explorations of the PDF creation field (which has been going on for some years, as it is a personal interest of mine, and I've also done some consulting projects that involved PDF generation and PDF text extraction), but still came across some tools new to me, in the HN thread.

(*) A possibly somewhat less-known fact is that Håkon Wium Lie, one of the board members of YesLogic, the company behind PrinceXML is also the original proposer of CSS and the CTO of Opera Software (yes, the company behind the Opera browser).

Wikipedia page about PDF - the Portable Document Format.

PDF became an ISO standard - ISO 32000-1 some years ago.

- Vasudev Ram - Dancing Bison Enterprises

Click here to signup for email notifications about new products and services from Vasudev Ram.

Contact Page

Thursday, October 9, 2014

The Linux Foundation's new Linux Certification program

By Vasudev Ram

Saw this recently via the newsletter I get from The Linux Foundation

The Linux Foundation is introducing a new Linux certification program. It will be available anywhere, online.

Jim Zemlin, the executive director of the Linux Foundation, has details about it in this blog post:

Linux Growth Demands Bigger Talent Pool

There are two certifications:

Linux Foundation Certified System Administrator (LFCS)

Linux Foundation Certified Engineer (LFCE)

These Linux certifications are likely to be a good value addition to anyone seeking to start or grow a career involving Linux, since they are from the official foundation that is behind Linux - the Linux Foundation, which does a lot of work related to sponsoring Linux development (*), conducting conferences like LinuxCon, etc.

In fact, the Linux Foundation sponsors the work of Linux Torvalds, the founder of Linux - Linus is a Linux Foundation Fellow. See this page about the Linux Fellow Program - Linus's name is at the top of the list of Linux Fellows.
On a related note, if you are into Linux and would like to learn how to write Linux command-line utilities in C, check out this blog post by me on the topic of Developing a Linux command-line utility in C, an article I wrote for IBM developerWorks a while ago. It got many views and a 4-star rating, and some people have told me they used the article (which is a tutorial) as a guide to developing command-line utilities on Linux for production use.

- Vasudev Ram - Python and Linux training and consulting - Dancing Bison Enterprises

Click here to signup for email notifications about new products and services from Vasudev Ram.

Contact Page

Wednesday, October 8, 2014

New: The Hacker News API (with Python support)

By Vasudev Ram

Hacker News (HN), a.k.a. has introduced an API for their site - the Hacker News API.

Hacker News is a tech and other news site popular with developers, entrepreneurs and others, which was set up by Paul Graham, founder of Y Combinator, a startup incubator / accelerator.

HN thread about the Hacker News API.

The Hacker News API on Github.

The API has been built in partnership with Firebase, a startup that is a graduate of the Y Combinator incubator / accelerator.

There is a Firebase REST API available to access the Hacker News data.

They mention two Python wrappers for the Firebase REST API:

python-firebase, Python interface to the Firebase REST API, by Özgür Vatansever

and another one, also called

python-firebase, Python wrapper for the Firebase API by Michael Huynh.

The news about the Hacker News API was posted just today.

I'll experiment over some days with the API and then may write another post about using it from Python.

- Vasudev Ram - Python training and consulting - Dancing Bison Enterprises

Click here to signup for email notifications about new products and services from Vasudev Ram.

Contact Page