22 October 2013

Avoiding license poisoning & my new stupid simple tool

Usually large Python and Django projects come with many dependencies. Requirements files grow constantly and uncontrollably during the development. Some people choose not to pip-freeze the whole virtualenv state, just stick to specifying the immediate dependencies.
Most times it's not a problem, especially in F/OSS project. Commercial software on the other hand...

The issue

If you need to sell your product in a closed-source fashion or with any sort of restrictions the problem of licenses appears. At times you just cannot ship a piece of code with some license like GPL, AGPL, LGPL, custom ones etc. Getting lawyers involved in the development process is at best time-consuming.

This was troubling me for some time. I found myself constantly checking the dependencies list for new packages, then going to PyPI to look for the license information. Some packages don't even specify it -- you're lucky if you find some clue in the source itself. Then, a new dependency can even sneak on you as a requirement of a completely innocent package you were using for a long time. It's getting installed but it's not in your stale requirements.txt file.

My humble contribution

Pushed by the great atmosphere at PyCon PL I've finally decided to write some minimalistic utility script to help me look for poisonous licenses in my dependency list. It actually took me only about 10 minutes, after I found the great pkgtools library and glued it together with pip. I thought this tool could be useful for other people so I uploaded it to PyPI and even decided to do the lightning talk at the conference. It surprised me to find that feedback from the attendees was actually pretty positive.

Should you want to try it yourself use this line to install it with pip:
pip install license-info
and run just by typing:
inside a virtual Python environment.
Still, this tool comes with a warning. I consider it now just a proof of concept so don't try to rely on it too much. I have many ideas for future development; it needs a lot of improvement.

How does it work

The tool uses License field from package info submitted to Python Package Index. Currently it makes a new RPC call to PyPI for each execution -- with no caching at all. Packages to check are found using the great pip utility. So, all in all license-info checks every one package installed in your environment. As I said before, stupid simple.
The output is similar to and compatible with pip freeze format. Each line has an additional column (treated as a comment) with package's license name. Additionally licenses are matched against a white-list containing some example non-poisonous licenses. Of course, this list is not comprehensive. All matched license strings are coloured green, all unmatched are coloured red so you can take a quick look to see if you have a case of license poisoning (come to think of it that's probably not a good colour pattern for colour-blind people -- another thing I'd need to take into account).

No comments:

Post a Comment