Solving a permission problem when setting up Hadoop to support multiple users

And now for something completely different…

If you are setting up a Hadoop cluster for multiple users and run into the following error:

...
AccessControlException: Permission denied: user=John, access=EXECUTE
...

while trying to run MapReduce jobs as users that aren’t the owner of the Hadoop files and folders in the system, then try the following:

hadoop fs –chmod -R 1777 /tmp

which fixed the issue for me.

And yes, the actual staging folder for each user is nested way, way deep inside the tmp folder on the HDFS, but for some reason it isn’t enough to change the permission of the staging folder, you need to change the permission of the tmp folder and all of the files and folders inside it as well.

Socket programming in Common Lisp: how to use usocket’s with-* macros

So I have tried my hands at socket programming in Common Lisp lately, using the (widely used) usocket library.

After getting past the initial difficulties of learning how to use the library’s API (thanks to various tutorials & guides, especially this one by Sid Heroor), I’ve noticed that many people recommend using the with-* macros (specifically, with-connected-socket, with-client-socket, with-server-socket, and with-socket-listener) provided by usocket for the sake of convenience (these macros will ensure that errors are properly handled and all sockets are properly closed after we are done with them) and clarity.

Well, the only problem is that I can’t seem to find any guide or tutorial on how to use them. After searching in vain for a couple of hours, I decided to just bite the bullet and dive into usocket’s source code to see how these macros work, and, to my surprise and delight, found that these macros are very easy to understand.

Basically, all the other with-* macros are built on top of the with-connected-socket macro. The with-connected-socket macro is responsible for automatically binding a socket to a variable, passing the body to the with-mapped-conditions helper macro which handles any errors or exceptions, and ensuring that the socket is destroyed on exit. The with-client-socket macro provides a nice and convenient interface over the socket-connect function for a client to connect to a server. It not only automatically binds a socket to a variable, but also creates a socket stream for the client as well. The with-socket-listener macro provides a convenient interface over the socket-listen function, and returns the resulting socket to the with-server-socket macro. The with-server-socket macro, in turn, passes the socket that was created from the call to socket-listen to the with-connected-socket macro.

We can rewrite the example by Mr. Sid using these with-* macros as follows:

The echo server:

The echo client:

Note how these macros combined can significantly reduce boilerplate code in both the server and client, as well as making the program safer and easier to understand. This is a great example of the power of Common Lisp’s macro. In other languages, for example Python, it would require a new version of the language to introduce the equivalents of the with-* macros seen here, while in Lisp one can introduce a new control structure/expression as a part of a library! I also love how the definitions of the with-* macros themselves serve as a quasi-style guide on how to safely use the underlining functions of usocket.

Again, Common Lisp (as well as other Lisp dialects) continues to impress me, and I look forward to learning more about the language.

More Lisp in the news

So not too long ago John Carmack tweeted that he “dumped the C++ server I wrote for a new one in Racket”.

This created some nice discussions on Hacker News, along with a link to a nice document about Racket macros.

However, that seems to have sparked off some interest in Racket, as few threads on the language start to pop up on the site ever since (for example, here, here, and here). One of these contains a link to a blog post by Clojure’s Leiningen author describing his experience with Racket.

If this leads to a renewed interest in Racket, Common Lisp, and Lisp in general, then this is a very good thing. These languages are so powerful and so underrate, it’s about time they get the recognition they deserve.

Unscientific and unreliable toy benchmark game: Common Lisp, Racket, Python, C, and Fibonacci numbers

Update 2015/12/21: update the Common Lisp and C implementations and revise the article according to the new results.

Warning: take the results in this blog post with a grain of salt.

So I have been playing with Common Lisp and type declarations for a while now. With Steel Bank Common Lisp (SBCL), declaring types also result in the compiler automatically checking the declared types for any errors or inconsistencies.

This is obviously good for safety purposes. But in addition to that, SBCL also uses the type information to optimize the code and improve performance. In order to understand how type declarations help the compiler optimizes the code during compilations, we need to do a benchmark. And what better benchmark to use than everyone’s favorite toy benchmark: computing Fibonacci numbers.

I will be using the naive recursive algorithm, the one with the runtime complexity of O(2^n). Here is the algorithm implemented in Python:

I ran the above script using the time utility to measure how long it takes to compute the 40th Fibonacci number (n = 40), and this is the result with Python version 2.7.9 on my Thinkpad X230 running Debian Sid:

$ time python2.7 fib.py

102334155

real    0m32.401s
user    0m32.392s
sys     0m0.020s

Surprisingly enough, Python version 3.4.2 performed even worse than Python 2.7.9:

$ time python3 fib.py

102334155

real    0m37.492s
user    0m37.432s
sys     0m0.032s

Next is the Common Lisp version:

Using SBCL to compile the script and run the resultant binary gave us the following result:

$ time ./fib1

102334155

real    0m4.081s
user    0m4.080s
sys     0m0.000s

So Common Lisp takes about 4 seconds to compute the number, and is already 8 to 9 times faster than the Python versions. But can we do better?

Let’s go all out and declare the type of all variables (well, there is only one variable to declare anyway), as well as using all the optimization options:

Note how I specified the type of the argument, n, as a fixnum. This is because a fixnum is the most efficient type of integers in Common Lisp (more information about fixnum and its characteristics can be found here, here, and here).

Compiled and ran the above code gave us the following:

$ time ./fib2

102334155

real    0m1.183s
user    0m1.176s
sys     0m0.004s

So this version runs almost 3 seconds faster than the untyped, unoptimized Common Lisp version, and 26 to 30 times faster than both Python versions.

That’s impressive and all, but how does Common Lisp fare against its cousin, Racket?

Here is the same algorithm implemented in Racket:

Running the Racket script gave us the following numbers:

$ time racket fib1.rkt

102334155

real    0m2.055s
user    0m2.032s
sys     0m0.024s

While the Racket version is about one second slower than the properly typed Common Lisp version, its performance is still very impressive, being 2 seconds faster than the equivalent untyped Common Lisp version. This is most likely due to its sophisticated JIT compiler.

And last but not least is the C version (our baseline):

Compiled and ran the program, again using the time utility to measure how long it takes to compute the result, gave us the following:


$ time ./fib
102334155

real    0m0.882s
user    0m0.880s
sys     0m0.000s

So the C version takes about a little less than a second to find the value.

In conclusion, our little “benchmark” game did show us that declaring types does make Common Lisp code perform significantly faster than not doing so. Indeed, with a proper type declaration (which is crucial, since declaring a wrong or inappropriate type can hinder the performance), SBCL can create a very, very fast binary code, ranks second to only the C implementation in terms of the raw execution speed (and leaves other dynamic languages behind in the dust).

Again, please take the results in this blog post with a grain of salt. I’m still new to both Common Lisp and Racket, and still have a lot to learn.

Resources on Common Lisp type declarations

In my last post on Common Lisp, I mentioned how I learned that Common Lisp has a gradual/optional typing. Indeed, that was one of the many reasons that piqued me interested in the language.

However, it turns out that finding documentations or tutorials that teach how to use Common Lisp type declarations is harder than expected.

So the following is a hodgepodge collection of links related to type and type declarations in Common Lisp that I have found to be useful.

First is a paper (PDF) by Didier Verna. The paper describes how to use static typing to make Lisp as fast as C, and contains great examples on how to use type declarations to speed up Lisp code, as well as other optimization techniques, such as profiling, using the correct data structures, etc.

Next is a 3 parts blog posts by Patrick Stein. These articles chronicle how Patrick learned how to use compilers and type declarations to speed up matrix multiplications in Common Lisp.

And last but not least is this little Q&A thread on StackOverflow, which contains concise examples of how to declare as well as check the types.

And of course, the ever-useful Common Lisp HyperSpec contains plenty of references information on type specifiers, a type declaration, and a declare expression.

C’s Designated Initializers

While working with FUSE C API, I came across the follow snippets of code which caught me eyes:

static struct fuse_operations xmp_oper = {
    .getattr  = xmp_getattr,
    .access   = xmp_access,
    .readlink = xmp_readlink,
    ...
    .fsync    = xmp_fsync,
}

Those style of struct initialization are called designated Initializers, and apparently they were introduced in ISO C99 (it is also existed as a GNU extension to C90 as well).

A thread on Hacker News has an interesting discussion on the topic (such as the fact that it is extensively used in Linux device driver code, e.g., [1]), as well as many informative links [2, 3] and even a suggestion on a book that describes new features that were introduced in C99.

My initial experiences with Common Lisp

So I have been getting my hands dirty with FUSE and it’s C API lately. Since I have only been using Python for almost all of my major projects, the experience has given me a newfound appreciation of a type system and how much it can help programmers understand large, complex code bases and APIs, especially ones that have little to no documentations.

But I still love all the benefits that dynamic languages bring, primarily the ease and speed of which they enable one to get a program up and running, so I spent my evening searching for a dynamic language with a type system. That’s when I accidentally ran into this little gem of a comment on Hacker News, which mentioned that Common Lisp also has a type system. A gradual/optional typing, to be precise, among other intriguing features.

So I have decided to take the plunge and learn Common Lisp. Specifically, from the Practical Common Lisp book, which many people recommend as one of the best introduction text on the language.

I have gone over the book once, and, well, I’ve to say that they are correct, as the book really is great. It has a very clear and concise prose and does a great job of explaining even the more complex and esoteric parts of the language (this is the first book that made me grok generic functions and the power of Lisp Macros).

All in all, the book is great, and the experience with Common Lisp has been a great one. The language really is as powerful as many Lispers said. Hell, how many language let you change the grammar of the language itself! (Using the ridiculously power Common Lisp Macro, of course.) I have heard that Lisp has no syntax, but to actually understand what that means really blows my mind.

Also, the Common Lisp type system exceed my expectation. Not only does it provide type checking, it can also act as a way to optimize your code (à la Cython) if you use the compiler that supports such feature (namely SBCL). And not to mention the other features of the language. The CLOS, the Meta Object Protocol (MOP), etc. Heck, it seems that the only reasons Common Lisp isn’t more popular are because of its unique syntax, the lack of libraries, especially compare to newer languages like Python or Ruby, and the lack of de facto standard compiler (Common Lisp has many different compilers, each with its own unique set of features that may or may not be compatible with one another).

Anyhow, I look forward to use Common Lisp in my future projects and getting to know the language ins-and-outs better.