Wednesday, June 24, 2009

A possible future for web-based communication

My recent post on the Kings of code side event got too long, so I extracted the following into its own blog post. It is a collection of thoughts on a possible presentation topic.

People like to communicate with each other. Centuries ago we wrote letters and sent them with the merchants, hoping they would arive. The telegraph was a revolution: we could send a message faster, cheaper and with more certainty of delivery. The telephone was even more revolutionary: direct communication over a long distance.

In the age of the internet, change happens even faster. E-mail has been around for "ages", just like various forms of chat services have come and gone.

Broadcasting has gone through a similar change. It started with spoken announcements; proclamations from the king. At some point there were pamphlets and posters. Books can be seen as a way to broadcast a message. Newspapers are a periodic form of broadcast. Radio enabled long-distance audio broadcasting without the cost of creating a physical carrier for the message. Television added moving images to radio.

Now we have the internet, where we go through similar stages.

Static, web 1.0, websites are pamplets re-invented. Old concepts in an electronic shape. Ebooks are electronic books. Newspapers try to put their content on their own websites, updating them daily to bring their news to the masses. E-mail newsletters just scream "newspaper" to me. Radio can be found as streaming audio.

As we have become more familiar with the internet and with an increased access through broadband, cable and fiber, we have started to innovate with the new medium. RSS changed the direction of broadcast from push to pull. youTube may have started as a way to share existing videos, but it has since grown to a place where anyone can make themselves heard. That is many to many communication.

The internet made interactivity a lot easier than in the off-line world. Web forums and UseNet allow groups of users to interact with each other through written messages. Blogs allow everyone to have their own newspaper column. Instead of an opinions page in the newspaper that is the internet, every writer gets their own column and they all respond and refer to each other's writing.

Twitter is the latest thing. It is a hybrid between instant messaging, e-mail and RSS feeds. People say it does not scale, yet the Twitter engineers keep making it better and more and more people are able to use it. Is there a limit to which it can scale? Is its centralized server model not going to be an important limitation on both scale and freedom later on? When there are six billion people using one and the same service, relying on it for an important part of their daily communications, how much can you trust on one company to take care of it?

Would it not be better to turn it into a distributed service? It has sucessfully scaled e-mail, Jabber/XMPP, the telephone network and internet itself. E-mail, as a world-wide service, has never gone down, even if individual servers might go down from time to time.

Traditional media has its problems. Paper flyers, mail-delivered advertising and commercials on the radio and television are a few examples. The relatively high price of print media or traditional broadcasting limits the amount of these forms of advertising. This makes it somewhat bearable. In contrast, spam via e-mail, blog comments, web forum posts and instant messages have no such limitations. It's virtually free to broadcast your message to a million people, so it happens a lot and people really don't like it.

E-mail spam can happen because there is a near-zero cost or risk for the sender. The same goes for other on-line communication. It can be interesting to consider the friend-of-a-friend model, such as is seen on Linked In and other social networks?

To send a message to someone, the whole connection chain between sender and receiver must be known, no matter how long it is. If someone spams, it means there is a chain of real people connecting them to you. This means the sender is traceable instead of anonymous. If you flag a message as spam, the whole chain is notified that they were part of a spam chain. This means people can choose to ban the spammer from using them as a connection in sending a message. It also means you can identify people who act as a gateway for spammers to send messages to other people.

The nice thing about treating communication as a social network activity is that it makes people more aware that they are dealing with people. By taking anonymity out of the equation, it lowers the tendency of people to act like a Total Fuckwad when they think nobody is watching them.

Are there other ways to look at communication, to turn it upside-down and re-investigate how it works? What is going to be the next Twitter?

Monday, June 22, 2009

A possible future for package management

My recent post on the Kings of code side event got too long, so I extracted the following into its own blog post. It is a collection of thoughts on a possible presentation topic.

RubyGems has been around for ages and has made it relatively easy to distribute Ruby code. Not everyone uses it, though. Some prefer to use the Debian package manager, or whatever their OS provides, instead. This is very useful if a gem has external dependencies, but it is not as portable as RubyGems.

RIP was recently released (well it is only version 0.0.1, but still) as something to use complementary to RubyGems. It does not allow relative version requirements (<, <=, >=, >) for dependencies, only exact version requirements. It borrows the concept of virtual environments from the Python world. A different approach to package management out in the wild means people will gain new insights. What can we learn here? Where lies the right balance between having rigid, version specific, dependencies and open-ended dependencies?

Thinking along the dependency management line, why do we require exact versions or do we put an upper limit on accepted versions? The only reason I can think of is incompatibilities introduced in later versions, but is it right at all to introduce backward incompatibilities in your API? Can't we learn something from functional programming here?

In FP, pure functions don't have side effects. One of the implications is that the data they receive does not get altered. You don't add a new item to an existing array; you return a new array with the new item appended to the existing array. Because of this, there is no problem when you have a multi-threaded program: there is no risk that two threads will try to modify a shared resource at the same time.

This means you don't need mutexes to lock an object to one thread while it manipulates the object. No mutexes means no deadlocks or other headaches associated with threading.

What was I talking about? Ah yes, dependencies and how they relate to functional programming. Explicit version dependencies can be seen as mutexes: only one version is allowed to be used at once. Two versions of a library can not be loaded at the same time. This is good if the two versions are incompatible. It is bad if the newer version only adds new functionality to the library.

What if you would build your library in a way that resembles the pure functions of functional programming? No side effects in this case means there are no nasty surprises when upgrading. If your program works with version 1 of the library, it will work without changes with version 1000. Existing functionality is immutable.

To make this work, new versions should only introduce new behaviour, they can not change old behaviour. I think making bugfixes would be ok, but performance enhancements are not, as it might be that you introduce negative side effects in some edge cases, thereby breaking someone's app. Maybe fixing bugs will actually break someone's app if they depended on buggy behaviour. Hmmm....

This makes dependency management rather easy. You set a minimum version requirement for the libraries you use and you can just upgrade the libraries to newer versions as they become available. New applications that use new features can co-exist with old applications that use old features from the same library.

Under this model, if you plan to radically re-architect a project, you could fork it and release it under a new name. Rails v1., Rails2 v1, Rails3 v1. A downside is that forks can have a large shared codebase, but there will no longer be conflicts between versions of one project.

Has anyone ever explored the possibilities of library development along these lines? Did it work or were there problems that I have overlooked? What good features of the 'current' systems would you lose?

Sunday, June 21, 2009

Kings of Code side-event: Amsterdam.rb unconference

The Kings of Code (KoC) conference is going to be held on 30 June 2009. The day before, monday 29 June, is side-event day. Sander van der Vliet, the KoC organizer, posted a message to the Amsterdam.rb mailing list last wednesday to ask if we were interested in organizing a side-event on the 29th.

After only a couple of us replied to Sander's e-mail, we knew none of the heroes usual people would step forward to organize this. On friday, Julio Javier Cichelli (@monsieur_rock) sent me a direct message on Twitter about my thoughts on how the unconference should be organized. From there, we discussed how to get it organized, how to get speakers and what we were going to present. In short, we stepped forward to organize the side-event.

Next I tweeted to ask for presenters. The message got re-tweeted a number of times and almost within minutes there is feedback from multiple people willing to do a presentation. At the end of the day we have 6 people willing to speak. Using Twitter to organize something is a really quick and powerful way to do it.

As more and more people indicate they are willing to speak (thank you all!) the focus moves from finding speakers to handling the details of making it all work. At what time do we start? How many hours do we have? Is there wifi? Is there a beamer? The further you go, the more you discover there are things you should find out or arrange.

Next week we need to find a sponsor for the venue and we need to start thinking about the things that need to be done on the event day itself. We also need to confirm time and location with all presenters and announce the side-event.

Unconference



The side-event is an unconference. I have never been to one, so I can only go by what is on the internet. A characteristic of unconferences is that there is no fixed agenda. There are no time slots. It is not about one person being an expert and bestowing wisdom upon the attendees, but about the attendees sharing wisdom with each other. I like that.

Everybody knows something other people can benefit from, so the more opportunities there are for everyone to contribute, the more everyone will learn. Any one of the attendees can decide on the spot they want to talk about something, show code or sing a song. I hope people will do this.

The Devnology meetings have impressed upon me the importance of interactivity at a gathering of people, so I hope we can give the unconference an interactive twist.

After a presenter is done speaking, we'll try to get a group discussion started on the topic. Once the discussion starts to fade, or starts to run in circles, we can ask for the next speaker to get on stage and introduce a next topic.

After the last speaker, we can try to spark group discussions by encouraging people to approach the speakers and ask them questions. This, in turn, can create a number of smaller discussions, with the speakers being the center of interactivity. It's a great way to get to know new people.

Between 6 more or less confirmed speakers, group discussions, short breaks and (I hope) spontaneous speakers, it looks like we will actually fill up the 5 hours we have available to us.

Finding a theme



Unconferences tend to have a theme, this is so people can prepare themselves and to have some form of coherence between talks. This is trickier, as I did not really think about this until now.

Here is a list of topics that people have expressed they want to talk about:
  • CouchDB (or an introduction to Erlang)

  • Communicative Programming with Ruby

  • Code reviews

  • Using Rails for Location based search

  • Short and Sweet II

  • MacRuby, RESTful web services and other cool things


If I do a bit of creative extrapolating, one topic that can be extracted from this is "The Future of Web Development (using Ruby)". Let me explain by briefly looking at each topic:
  • CouchDB is a possible future of databases. It's not relational, so it has different scaling needs compared to 'traditional' relational databases.
  • RESTful services are the next big thing. Within the Ruby/Rails world it's becoming a de-facto standard on how to design a web service. The rest of the webdev world seems to be following along here.
  • If you look at the last decade or two and how the dominant languages have changed, it becomes apparent that code is getting way more readable. Shorter, leaner code is more readable because there is just less code. People use more expressive languages that can do more with less code. Code has become more communicative (at least in Ruby) because of the focus on good conventions like intention revealing naming. DSLs are another good example of readability. If a non-programmer can read your code, you know it is readable.
  • Alternative Ruby implementations are a way into the future for the language. Diversity allows different ideas to be explored at the same time. The same goes for alternative web frameworks. They are a breeding ground for innovation, which is what you need to get a future that is different from the present.
  • Code reviews are a way to ensure that code written in the past is actually good enough to be kept around in the future.
  • Location-based search has a futuristic sound to it, so it fits the theme.


Thoughts on The Future of Web Development (using Ruby)



The Future is an interesting topic that can be applied in a lot of ways. Here's a number of ideas for presentations:

The future of server administration


With VPSes and cloud computing becoming available everywhere, is there still a need to own your server hardware? With services like Heroku, Github webpages and Disqus, do you still need to even know how to install Ruby or how to configure Apache?

Even if you don't use these services, by using tools like Capistrano, Ubuntu Machine, Deprec or Rudy, you can still simplify deployment and server management. Does simplifying these things bring new opportunities? What does a sysadmin do with the time saved by these tools and services? Are there new possibilities opened by freeing up sysadmin time? What are they?

The future of package management



I have addressed this in a separate blog post.

The future of web-based communication



I have addressed this in a separate blog post.

My personal experience so far



Between asking Sander for info (via Twitter, of course), discussing things with Julio and discussing details with speakers, there is quite some communication going on. It's exciting and a little scary at the same time.

I'm a natural introvert, so I tend to avoid communication when I can get away with it. It's not often that I approach other people first. Taking an active role in helping to organize an unconference like this is therefore quite a bit outside of my comfort zone.

So why the heck am I doing this? One reason is because I want to see it happen. If nobody does it for you, do it yourself. The other reason is is that I want to expand my comfort zone. Someone wise once wrote: "If something does not scare you at least a little bit, it is not worth doing."
It might sound be a bit extreme, but the core idea is valuable nonetheless: a very good way to learn things is to do the things that scare you.

Helping to organize a side-event for KoC will most definitely be a great learning experience.

Thursday, June 11, 2009

Upgraded to Ruby 1.9.1

The short story


Today I upgraded my Macbook to ruby 1.9.1 (patchlevel 129) as the main version of Ruby I use. It was not really intentional, but now that I have it, I'm kind of sticking with it. That's the short story. There's also a long story that involves a server, lots of logs and me not paying attention.

The long story


Earlier today I wanted to analyze 2.5GB of Rails log files. Because it is not such a good idea to do that on a live production server, I decided to use the one server that never really does anything: the backup server. It's hidden all the way in the back of our server network, far away from the business of our webservers, so it is the perfect place to do some heavy number crunching. After sending the 2.5GB of Rails log files over with scp -C (-C stands for compress) I tried to install the request-log-analyzer gem, but there was no gem command.

A quick ruby -v resulted in bash telling me there is no ruby. My confused reaction went along the lines of: "No Ruby? What? We have a server without Ruby? How can this be?" I checked the server's sources dir and there was actually a dusty tarball for ruby 1.8.6 sitting undisturbed. I immediately jumped on it and started to unpack it and ran ./configure.

While waiting for this to end, I thought about Ruby 1.9.1 and it's promise of speed and the huge stack of logs I was planning to start working on. I never made it to the make && sudo make install part for Ruby 1.8.6.

After downloading the latest Ruby 1.9.1 tarball to my desktop and sending it through a chain of servers to the poor ruby-less backup server, a ./configure && make && sudo make install made it all happy again. It actually purrs if you listen close enough to your SSH session.

In the meanwhile I figured I'd upgrade my local Ruby 1.9.0p0 install to the latest patchlevel, so I also perfomed the ./configure && make && sudo make install ritual on my own machine as well. As a habit I always run a '-v' check to see if the version did get installed, but I accidentally typed ruby -v instead of ruby1.9 -v and to my surprise it said:
ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-darwin9.7.0]
Oh, oh. That was not supposed to happen. That should have been Ruby 1.8.6!
A check for ruby1.9 showed it was still the old Ruby 1.9.1:
ruby 1.9.1p0 (2009-01-20 revision 21700) [i386-darwin9]
Since the server was done installing as well, I jumped over to there. Unfortunately, request-log-analyzer did not like ruby 1.9:

$ request-log-analyzer log/production.log
Request-log-analyzer, by Willem van Bergen and Bart ten Brinke - version 1.1
Website: http://github.com/wvanbergen/request-log-analyzer

/usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer.rb:27:in `require': /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/output/fixed_width.rb:48: invalid multibyte char (US-ASCII) (SyntaxError)
/usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/output/fixed_width.rb:48: invalid multibyte char (US-ASCII)
/usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/output/fixed_width.rb:48: syntax error, unexpected $end, expecting '}'
... => { :horizontal_line => '━', :vertical_line => '┃', ...
... ^
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer.rb:27:in `load_default_class_file'
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/output.rb:4:in `const_missing'
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/controller.rb:38:in `const_get'
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/controller.rb:38:in `build'
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/bin/request-log-analyzer:88:in `'
from /usr/local/bin/request-log-analyzer:19:in `load'
from /usr/local/bin/request-log-analyzer:19:in `
'

So now we know Ruby 1.9 is strict on character encoding and does not like this particular version of the gem. My natural reaction is to switch to my local machine and do a github checkout of the source and build a new gem:

$ gh clone wvanbergen/request-log-analyzer
/usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github/extensions.rb:11: warning: undefining `object_id' may cause serious problem
/usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:149:in `module_eval': /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/commands/commands.rb:40: syntax error, unexpected ')' (SyntaxError)
helper.tracking.sort { |(a,),(b,)| a == helper.origin ? -...
^
/usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/commands/commands.rb:40: syntax error, unexpected '|', expecting '='
...per.tracking.sort { |(a,),(b,)| a == helper.origin ? -1 : b ...
... ^
/usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/commands/commands.rb:40: syntax error, unexpected '}', expecting keyword_end
...rigin ? 1 : a.to_s <=> b.to_s }.each do |(name,user_or_url)|
... ^
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:149:in `load'
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:66:in `block in activate'
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:65:in `each'
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:65:in `activate'
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/bin/gh:8:in `'
from /usr/local/bin/gh:19:in `load'
from /usr/local/bin/gh:19:in `
'
Aargh! Another gem that does not play well with Ruby 1.9.

At this point I was kind of fed up with gems not working with Ruby 1.9, so I decided to use a script I wrote ages ago to do simple log crunching. It did not look as pretty as request-log-analyzer, but since it was my script, I felt it would be easiest to fix if it was wrong.

The script did needed a little tweaking to work on Ruby 1.9, but that went pretty ok. The script started crunching, and crunching, and crunching and grew to about 800M (not bad for holding about a gazillion URLs and their call times, standard deviations and more relevant numbers). It was mostly done, generating a new report to highlight different stats every minute or so. And then the Ruby process died. There was something about character encoding and UTF-8:

analyze_log.rb:232:in `split': invalid byte sequence in UTF-8 (ArgumentError)
from analyze_log.rb:232:in `block in
'
from analyze_log.rb:231:in `each'
from analyze_log.rb:231:in `
'
Luckily, there were enough reports done so that I could look at the numbers I wanted to see.

Conclusion


This experience has taught me two things:
  • Being an early adopter is rarely a smooth experience. You're playing with new features before other people do, but the flip side is that you will run into problems and that you will have to fix them yourself.
  • I really dislike character encodings. Tell me again, why aren't we just using plain old ASCII? *grumbles*
Next up is checking/fixing my own Rubygems to guarantee they at least do work with Ruby 1.9.1p129. A better world starts with you doing your best to improve it.