My recent post on the Kings of code side event got too long, so I extracted the following into its own blog post. It is a collection of thoughts on a possible presentation topic.
People like to communicate with each other. Centuries ago we wrote letters and sent them with the merchants, hoping they would arive. The telegraph was a revolution: we could send a message faster, cheaper and with more certainty of delivery. The telephone was even more revolutionary: direct communication over a long distance.
In the age of the internet, change happens even faster. E-mail has been around for "ages", just like various forms of chat services have come and gone.
Broadcasting has gone through a similar change. It started with spoken announcements; proclamations from the king. At some point there were pamphlets and posters. Books can be seen as a way to broadcast a message. Newspapers are a periodic form of broadcast. Radio enabled long-distance audio broadcasting without the cost of creating a physical carrier for the message. Television added moving images to radio.
Now we have the internet, where we go through similar stages.
Static, web 1.0, websites are pamplets re-invented. Old concepts in an electronic shape. Ebooks are electronic books. Newspapers try to put their content on their own websites, updating them daily to bring their news to the masses. E-mail newsletters just scream "newspaper" to me. Radio can be found as streaming audio.
As we have become more familiar with the internet and with an increased access through broadband, cable and fiber, we have started to innovate with the new medium. RSS changed the direction of broadcast from push to pull. youTube may have started as a way to share existing videos, but it has since grown to a place where anyone can make themselves heard. That is many to many communication.
The internet made interactivity a lot easier than in the off-line world. Web forums and UseNet allow groups of users to interact with each other through written messages. Blogs allow everyone to have their own newspaper column. Instead of an opinions page in the newspaper that is the internet, every writer gets their own column and they all respond and refer to each other's writing.
Twitter is the latest thing. It is a hybrid between instant messaging, e-mail and RSS feeds. People say it does not scale, yet the Twitter engineers keep making it better and more and more people are able to use it. Is there a limit to which it can scale? Is its centralized server model not going to be an important limitation on both scale and freedom later on? When there are six billion people using one and the same service, relying on it for an important part of their daily communications, how much can you trust on one company to take care of it?
Would it not be better to turn it into a distributed service? It has sucessfully scaled e-mail, Jabber/XMPP, the telephone network and internet itself. E-mail, as a world-wide service, has never gone down, even if individual servers might go down from time to time.
Traditional media has its problems. Paper flyers, mail-delivered advertising and commercials on the radio and television are a few examples. The relatively high price of print media or traditional broadcasting limits the amount of these forms of advertising. This makes it somewhat bearable. In contrast, spam via e-mail, blog comments, web forum posts and instant messages have no such limitations. It's virtually free to broadcast your message to a million people, so it happens a lot and people really don't like it.
E-mail spam can happen because there is a near-zero cost or risk for the sender. The same goes for other on-line communication. It can be interesting to consider the friend-of-a-friend model, such as is seen on Linked In and other social networks?
To send a message to someone, the whole connection chain between sender and receiver must be known, no matter how long it is. If someone spams, it means there is a chain of real people connecting them to you. This means the sender is traceable instead of anonymous. If you flag a message as spam, the whole chain is notified that they were part of a spam chain. This means people can choose to ban the spammer from using them as a connection in sending a message. It also means you can identify people who act as a gateway for spammers to send messages to other people.
The nice thing about treating communication as a social network activity is that it makes people more aware that they are dealing with people. By taking anonymity out of the equation, it lowers the tendency of people to act like a Total Fuckwad when they think nobody is watching them.
Are there other ways to look at communication, to turn it upside-down and re-investigate how it works? What is going to be the next Twitter?
Wednesday, June 24, 2009
Monday, June 22, 2009
A possible future for package management
My recent post on the Kings of code side event got too long, so I extracted the following into its own blog post. It is a collection of thoughts on a possible presentation topic.
RubyGems has been around for ages and has made it relatively easy to distribute Ruby code. Not everyone uses it, though. Some prefer to use the Debian package manager, or whatever their OS provides, instead. This is very useful if a gem has external dependencies, but it is not as portable as RubyGems.
RIP was recently released (well it is only version 0.0.1, but still) as something to use complementary to RubyGems. It does not allow relative version requirements (<, <=, >=, >) for dependencies, only exact version requirements. It borrows the concept of virtual environments from the Python world. A different approach to package management out in the wild means people will gain new insights. What can we learn here? Where lies the right balance between having rigid, version specific, dependencies and open-ended dependencies?
Thinking along the dependency management line, why do we require exact versions or do we put an upper limit on accepted versions? The only reason I can think of is incompatibilities introduced in later versions, but is it right at all to introduce backward incompatibilities in your API? Can't we learn something from functional programming here?
In FP, pure functions don't have side effects. One of the implications is that the data they receive does not get altered. You don't add a new item to an existing array; you return a new array with the new item appended to the existing array. Because of this, there is no problem when you have a multi-threaded program: there is no risk that two threads will try to modify a shared resource at the same time.
This means you don't need mutexes to lock an object to one thread while it manipulates the object. No mutexes means no deadlocks or other headaches associated with threading.
What was I talking about? Ah yes, dependencies and how they relate to functional programming. Explicit version dependencies can be seen as mutexes: only one version is allowed to be used at once. Two versions of a library can not be loaded at the same time. This is good if the two versions are incompatible. It is bad if the newer version only adds new functionality to the library.
What if you would build your library in a way that resembles the pure functions of functional programming? No side effects in this case means there are no nasty surprises when upgrading. If your program works with version 1 of the library, it will work without changes with version 1000. Existing functionality is immutable.
To make this work, new versions should only introduce new behaviour, they can not change old behaviour. I think making bugfixes would be ok, but performance enhancements are not, as it might be that you introduce negative side effects in some edge cases, thereby breaking someone's app. Maybe fixing bugs will actually break someone's app if they depended on buggy behaviour. Hmmm....
This makes dependency management rather easy. You set a minimum version requirement for the libraries you use and you can just upgrade the libraries to newer versions as they become available. New applications that use new features can co-exist with old applications that use old features from the same library.
Under this model, if you plan to radically re-architect a project, you could fork it and release it under a new name. Rails v1., Rails2 v1, Rails3 v1. A downside is that forks can have a large shared codebase, but there will no longer be conflicts between versions of one project.
Has anyone ever explored the possibilities of library development along these lines? Did it work or were there problems that I have overlooked? What good features of the 'current' systems would you lose?
RubyGems has been around for ages and has made it relatively easy to distribute Ruby code. Not everyone uses it, though. Some prefer to use the Debian package manager, or whatever their OS provides, instead. This is very useful if a gem has external dependencies, but it is not as portable as RubyGems.
RIP was recently released (well it is only version 0.0.1, but still) as something to use complementary to RubyGems. It does not allow relative version requirements (<, <=, >=, >) for dependencies, only exact version requirements. It borrows the concept of virtual environments from the Python world. A different approach to package management out in the wild means people will gain new insights. What can we learn here? Where lies the right balance between having rigid, version specific, dependencies and open-ended dependencies?
Thinking along the dependency management line, why do we require exact versions or do we put an upper limit on accepted versions? The only reason I can think of is incompatibilities introduced in later versions, but is it right at all to introduce backward incompatibilities in your API? Can't we learn something from functional programming here?
In FP, pure functions don't have side effects. One of the implications is that the data they receive does not get altered. You don't add a new item to an existing array; you return a new array with the new item appended to the existing array. Because of this, there is no problem when you have a multi-threaded program: there is no risk that two threads will try to modify a shared resource at the same time.
This means you don't need mutexes to lock an object to one thread while it manipulates the object. No mutexes means no deadlocks or other headaches associated with threading.
What was I talking about? Ah yes, dependencies and how they relate to functional programming. Explicit version dependencies can be seen as mutexes: only one version is allowed to be used at once. Two versions of a library can not be loaded at the same time. This is good if the two versions are incompatible. It is bad if the newer version only adds new functionality to the library.
What if you would build your library in a way that resembles the pure functions of functional programming? No side effects in this case means there are no nasty surprises when upgrading. If your program works with version 1 of the library, it will work without changes with version 1000. Existing functionality is immutable.
To make this work, new versions should only introduce new behaviour, they can not change old behaviour. I think making bugfixes would be ok, but performance enhancements are not, as it might be that you introduce negative side effects in some edge cases, thereby breaking someone's app. Maybe fixing bugs will actually break someone's app if they depended on buggy behaviour. Hmmm....
This makes dependency management rather easy. You set a minimum version requirement for the libraries you use and you can just upgrade the libraries to newer versions as they become available. New applications that use new features can co-exist with old applications that use old features from the same library.
Under this model, if you plan to radically re-architect a project, you could fork it and release it under a new name. Rails v1., Rails2 v1, Rails3 v1. A downside is that forks can have a large shared codebase, but there will no longer be conflicts between versions of one project.
Has anyone ever explored the possibilities of library development along these lines? Did it work or were there problems that I have overlooked? What good features of the 'current' systems would you lose?
Labels:
dependencies,
package management,
ruby,
versions
Sunday, June 21, 2009
Kings of Code side-event: Amsterdam.rb unconference
The Kings of Code (KoC) conference is going to be held on 30 June 2009. The day before, monday 29 June, is side-event day. Sander van der Vliet, the KoC organizer, posted a message to the Amsterdam.rb mailing list last wednesday to ask if we were interested in organizing a side-event on the 29th.
After only a couple of us replied to Sander's e-mail, we knew none of theheroes usual people would step forward to organize this. On friday, Julio Javier Cichelli (@monsieur_rock) sent me a direct message on Twitter about my thoughts on how the unconference should be organized. From there, we discussed how to get it organized, how to get speakers and what we were going to present. In short, we stepped forward to organize the side-event.
Next I tweeted to ask for presenters. The message got re-tweeted a number of times and almost within minutes there is feedback from multiple people willing to do a presentation. At the end of the day we have 6 people willing to speak. Using Twitter to organize something is a really quick and powerful way to do it.
As more and more people indicate they are willing to speak (thank you all!) the focus moves from finding speakers to handling the details of making it all work. At what time do we start? How many hours do we have? Is there wifi? Is there a beamer? The further you go, the more you discover there are things you should find out or arrange.
Next week we need to find a sponsor for the venue and we need to start thinking about the things that need to be done on the event day itself. We also need to confirm time and location with all presenters and announce the side-event.
The side-event is an unconference. I have never been to one, so I can only go by what is on the internet. A characteristic of unconferences is that there is no fixed agenda. There are no time slots. It is not about one person being an expert and bestowing wisdom upon the attendees, but about the attendees sharing wisdom with each other. I like that.
Everybody knows something other people can benefit from, so the more opportunities there are for everyone to contribute, the more everyone will learn. Any one of the attendees can decide on the spot they want to talk about something, show code or sing a song. I hope people will do this.
The Devnology meetings have impressed upon me the importance of interactivity at a gathering of people, so I hope we can give the unconference an interactive twist.
After a presenter is done speaking, we'll try to get a group discussion started on the topic. Once the discussion starts to fade, or starts to run in circles, we can ask for the next speaker to get on stage and introduce a next topic.
After the last speaker, we can try to spark group discussions by encouraging people to approach the speakers and ask them questions. This, in turn, can create a number of smaller discussions, with the speakers being the center of interactivity. It's a great way to get to know new people.
Between 6 more or less confirmed speakers, group discussions, short breaks and (I hope) spontaneous speakers, it looks like we will actually fill up the 5 hours we have available to us.
Unconferences tend to have a theme, this is so people can prepare themselves and to have some form of coherence between talks. This is trickier, as I did not really think about this until now.
Here is a list of topics that people have expressed they want to talk about:
If I do a bit of creative extrapolating, one topic that can be extracted from this is "The Future of Web Development (using Ruby)". Let me explain by briefly looking at each topic:
The Future is an interesting topic that can be applied in a lot of ways. Here's a number of ideas for presentations:
With VPSes and cloud computing becoming available everywhere, is there still a need to own your server hardware? With services like Heroku, Github webpages and Disqus, do you still need to even know how to install Ruby or how to configure Apache?
Even if you don't use these services, by using tools like Capistrano, Ubuntu Machine, Deprec or Rudy, you can still simplify deployment and server management. Does simplifying these things bring new opportunities? What does a sysadmin do with the time saved by these tools and services? Are there new possibilities opened by freeing up sysadmin time? What are they?
I have addressed this in a separate blog post.
I have addressed this in a separate blog post.
Between asking Sander for info (via Twitter, of course), discussing things with Julio and discussing details with speakers, there is quite some communication going on. It's exciting and a little scary at the same time.
I'm a natural introvert, so I tend to avoid communication when I can get away with it. It's not often that I approach other people first. Taking an active role in helping to organize an unconference like this is therefore quite a bit outside of my comfort zone.
So why the heck am I doing this? One reason is because I want to see it happen. If nobody does it for you, do it yourself. The other reason is is that I want to expand my comfort zone. Someone wise once wrote: "If something does not scare you at least a little bit, it is not worth doing."
It might sound be a bit extreme, but the core idea is valuable nonetheless: a very good way to learn things is to do the things that scare you.
Helping to organize a side-event for KoC will most definitely be a great learning experience.
After only a couple of us replied to Sander's e-mail, we knew none of the
Next I tweeted to ask for presenters. The message got re-tweeted a number of times and almost within minutes there is feedback from multiple people willing to do a presentation. At the end of the day we have 6 people willing to speak. Using Twitter to organize something is a really quick and powerful way to do it.
As more and more people indicate they are willing to speak (thank you all!) the focus moves from finding speakers to handling the details of making it all work. At what time do we start? How many hours do we have? Is there wifi? Is there a beamer? The further you go, the more you discover there are things you should find out or arrange.
Next week we need to find a sponsor for the venue and we need to start thinking about the things that need to be done on the event day itself. We also need to confirm time and location with all presenters and announce the side-event.
Unconference
The side-event is an unconference. I have never been to one, so I can only go by what is on the internet. A characteristic of unconferences is that there is no fixed agenda. There are no time slots. It is not about one person being an expert and bestowing wisdom upon the attendees, but about the attendees sharing wisdom with each other. I like that.
Everybody knows something other people can benefit from, so the more opportunities there are for everyone to contribute, the more everyone will learn. Any one of the attendees can decide on the spot they want to talk about something, show code or sing a song. I hope people will do this.
The Devnology meetings have impressed upon me the importance of interactivity at a gathering of people, so I hope we can give the unconference an interactive twist.
After a presenter is done speaking, we'll try to get a group discussion started on the topic. Once the discussion starts to fade, or starts to run in circles, we can ask for the next speaker to get on stage and introduce a next topic.
After the last speaker, we can try to spark group discussions by encouraging people to approach the speakers and ask them questions. This, in turn, can create a number of smaller discussions, with the speakers being the center of interactivity. It's a great way to get to know new people.
Between 6 more or less confirmed speakers, group discussions, short breaks and (I hope) spontaneous speakers, it looks like we will actually fill up the 5 hours we have available to us.
Finding a theme
Unconferences tend to have a theme, this is so people can prepare themselves and to have some form of coherence between talks. This is trickier, as I did not really think about this until now.
Here is a list of topics that people have expressed they want to talk about:
- CouchDB (or an introduction to Erlang)
- Communicative Programming with Ruby
- Code reviews
- Using Rails for Location based search
- Short and Sweet II
- MacRuby, RESTful web services and other cool things
If I do a bit of creative extrapolating, one topic that can be extracted from this is "The Future of Web Development (using Ruby)". Let me explain by briefly looking at each topic:
- CouchDB is a possible future of databases. It's not relational, so it has different scaling needs compared to 'traditional' relational databases.
- RESTful services are the next big thing. Within the Ruby/Rails world it's becoming a de-facto standard on how to design a web service. The rest of the webdev world seems to be following along here.
- If you look at the last decade or two and how the dominant languages have changed, it becomes apparent that code is getting way more readable. Shorter, leaner code is more readable because there is just less code. People use more expressive languages that can do more with less code. Code has become more communicative (at least in Ruby) because of the focus on good conventions like intention revealing naming. DSLs are another good example of readability. If a non-programmer can read your code, you know it is readable.
- Alternative Ruby implementations are a way into the future for the language. Diversity allows different ideas to be explored at the same time. The same goes for alternative web frameworks. They are a breeding ground for innovation, which is what you need to get a future that is different from the present.
- Code reviews are a way to ensure that code written in the past is actually good enough to be kept around in the future.
- Location-based search has a futuristic sound to it, so it fits the theme.
Thoughts on The Future of Web Development (using Ruby)
The Future is an interesting topic that can be applied in a lot of ways. Here's a number of ideas for presentations:
The future of server administration
With VPSes and cloud computing becoming available everywhere, is there still a need to own your server hardware? With services like Heroku, Github webpages and Disqus, do you still need to even know how to install Ruby or how to configure Apache?
Even if you don't use these services, by using tools like Capistrano, Ubuntu Machine, Deprec or Rudy, you can still simplify deployment and server management. Does simplifying these things bring new opportunities? What does a sysadmin do with the time saved by these tools and services? Are there new possibilities opened by freeing up sysadmin time? What are they?
The future of package management
I have addressed this in a separate blog post.
The future of web-based communication
I have addressed this in a separate blog post.
My personal experience so far
Between asking Sander for info (via Twitter, of course), discussing things with Julio and discussing details with speakers, there is quite some communication going on. It's exciting and a little scary at the same time.
I'm a natural introvert, so I tend to avoid communication when I can get away with it. It's not often that I approach other people first. Taking an active role in helping to organize an unconference like this is therefore quite a bit outside of my comfort zone.
So why the heck am I doing this? One reason is because I want to see it happen. If nobody does it for you, do it yourself. The other reason is is that I want to expand my comfort zone. Someone wise once wrote: "If something does not scare you at least a little bit, it is not worth doing."
It might sound be a bit extreme, but the core idea is valuable nonetheless: a very good way to learn things is to do the things that scare you.
Helping to organize a side-event for KoC will most definitely be a great learning experience.
Labels:
amsterdam.rb,
conference,
kings of code,
ruby
Thursday, June 11, 2009
Upgraded to Ruby 1.9.1
The short story
Today I upgraded my Macbook to ruby 1.9.1 (patchlevel 129) as the main version of Ruby I use. It was not really intentional, but now that I have it, I'm kind of sticking with it. That's the short story. There's also a long story that involves a server, lots of logs and me not paying attention.
The long story
Earlier today I wanted to analyze 2.5GB of Rails log files. Because it is not such a good idea to do that on a live production server, I decided to use the one server that never really does anything: the backup server. It's hidden all the way in the back of our server network, far away from the business of our webservers, so it is the perfect place to do some heavy number crunching. After sending the 2.5GB of Rails log files over with scp -C (-C stands for compress) I tried to install the request-log-analyzer gem, but there was no gem command.
A quick ruby -v resulted in bash telling me there is no ruby. My confused reaction went along the lines of: "No Ruby? What? We have a server without Ruby? How can this be?" I checked the server's sources dir and there was actually a dusty tarball for ruby 1.8.6 sitting undisturbed. I immediately jumped on it and started to unpack it and ran ./configure.
While waiting for this to end, I thought about Ruby 1.9.1 and it's promise of speed and the huge stack of logs I was planning to start working on. I never made it to the make && sudo make install part for Ruby 1.8.6.
After downloading the latest Ruby 1.9.1 tarball to my desktop and sending it through a chain of servers to the poor ruby-less backup server, a ./configure && make && sudo make install made it all happy again. It actually purrs if you listen close enough to your SSH session.
In the meanwhile I figured I'd upgrade my local Ruby 1.9.0p0 install to the latest patchlevel, so I also perfomed the ./configure && make && sudo make install ritual on my own machine as well. As a habit I always run a '-v' check to see if the version did get installed, but I accidentally typed ruby -v instead of ruby1.9 -v and to my surprise it said:
ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-darwin9.7.0]Oh, oh. That was not supposed to happen. That should have been Ruby 1.8.6!
A check for ruby1.9 showed it was still the old Ruby 1.9.1:
ruby 1.9.1p0 (2009-01-20 revision 21700) [i386-darwin9]Since the server was done installing as well, I jumped over to there. Unfortunately, request-log-analyzer did not like ruby 1.9:
$ request-log-analyzer log/production.log
Request-log-analyzer, by Willem van Bergen and Bart ten Brinke - version 1.1
Website: http://github.com/wvanbergen/request-log-analyzer
/usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer.rb:27:in `require': /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/output/fixed_width.rb:48: invalid multibyte char (US-ASCII) (SyntaxError)
/usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/output/fixed_width.rb:48: invalid multibyte char (US-ASCII)
/usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/output/fixed_width.rb:48: syntax error, unexpected $end, expecting '}'
... => { :horizontal_line => '━', :vertical_line => '┃', ...
... ^
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer.rb:27:in `load_default_class_file'
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/output.rb:4:in `const_missing'
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/controller.rb:38:in `const_get'
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/lib/request_log_analyzer/controller.rb:38:in `build'
from /usr/local/lib/ruby/gems/1.9.1/gems/request-log-analyzer-1.1.6/bin/request-log-analyzer:88:in `'
from /usr/local/bin/request-log-analyzer:19:in `load'
from /usr/local/bin/request-log-analyzer:19:in `'
So now we know Ruby 1.9 is strict on character encoding and does not like this particular version of the gem. My natural reaction is to switch to my local machine and do a github checkout of the source and build a new gem:
Aargh! Another gem that does not play well with Ruby 1.9.
$ gh clone wvanbergen/request-log-analyzer
/usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github/extensions.rb:11: warning: undefining `object_id' may cause serious problem
/usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:149:in `module_eval': /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/commands/commands.rb:40: syntax error, unexpected ')' (SyntaxError)
helper.tracking.sort { |(a,),(b,)| a == helper.origin ? -...
^
/usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/commands/commands.rb:40: syntax error, unexpected '|', expecting '='
...per.tracking.sort { |(a,),(b,)| a == helper.origin ? -1 : b ...
... ^
/usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/commands/commands.rb:40: syntax error, unexpected '}', expecting keyword_end
...rigin ? 1 : a.to_s <=> b.to_s }.each do |(name,user_or_url)|
... ^
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:149:in `load'
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:66:in `block in activate'
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:65:in `each'
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/lib/github.rb:65:in `activate'
from /usr/local/lib/ruby/gems/1.9.1/gems/github-0.3.4/bin/gh:8:in `'
from /usr/local/bin/gh:19:in `load'
from /usr/local/bin/gh:19:in `'
At this point I was kind of fed up with gems not working with Ruby 1.9, so I decided to use a script I wrote ages ago to do simple log crunching. It did not look as pretty as request-log-analyzer, but since it was my script, I felt it would be easiest to fix if it was wrong.
The script did needed a little tweaking to work on Ruby 1.9, but that went pretty ok. The script started crunching, and crunching, and crunching and grew to about 800M (not bad for holding about a gazillion URLs and their call times, standard deviations and more relevant numbers). It was mostly done, generating a new report to highlight different stats every minute or so. And then the Ruby process died. There was something about character encoding and UTF-8:
Luckily, there were enough reports done so that I could look at the numbers I wanted to see.
analyze_log.rb:232:in `split': invalid byte sequence in UTF-8 (ArgumentError)
from analyze_log.rb:232:in `block in'
from analyze_log.rb:231:in `each'
from analyze_log.rb:231:in `'
Conclusion
This experience has taught me two things:
- Being an early adopter is rarely a smooth experience. You're playing with new features before other people do, but the flip side is that you will run into problems and that you will have to fix them yourself.
- I really dislike character encodings. Tell me again, why aren't we just using plain old ASCII? *grumbles*
Labels:
log analysis,
rails-log-analyzer,
ruby,
ruby 1.9
Tuesday, May 12, 2009
Devnology, a bridge between developer communities
What is Devnology?
Devnology is a foundation that organizes meetings for software developers. Their goal is to bridge the gap between the many different communities that exist for the many programming languages and platforms that a software developer can choose to use. The background of the founders is mainly in the Windows world.
In order to learn from others, Devnology was started as a way to help developers share and learn tips and tricks, insights, tools, news and insights. In this respect, it is similar to how the Ruby User Group meetings work as a way to learn from each other. The difference is that the Ruby meetings are more heavily focused on socializing, while Devnology meetings are balanced more towards learning.
Ruby meetings tend to be fun beer nights: we gather in the pub, drink beer (or ice tea) and talk all night long about topics that can include Ruby, tools, new companies or projects, freelancing, pet projects and much, much more. Sometimes someone gets a laptop on the table and shows what they are working on, usually in the form of a live demo or by showing code. The key difference between Ruby meetings and Devnology meetings is the organized format of Devnology meetings. Devnology meetings have a limit on attendance, a pre-arranged location that is either rented or sponsored and they have a speaker or theme to provide enough fuel to keep the conversations going for the whole evening.
Building bridges
Getting to know new people is a chance to learn about new ways of thinking. The more different people you meet, the more you can learn. Within a single community, there tends to be an overlap in certain ways of thinking. As an example, let's contrast Rubyists and Windows developers.
Rubyists, in my experience, tend to be young and eager to try out new things. There are quite a few people that learn on the job and don't have a formal education. A lot of them work at smaller companies or as freelancer and they are pretty passionate about what they do. Due to Rails, a lot of work is in web development. Open Source is very important and everyone is more or less expected to have an account on Github to share their code. A lot of people have a Mac or at least run Linux on their laptop. I don't know anyone who is serious about Windows as a developer platform for Ruby. There are some tools for Windows, but support is poor compared to Linux and Mac. Rubyists tend to be intimate with the command line and they tend to know at least a little bit of how to operate a Linux server. From front-end HTML/CSS design, through Rails code to a MySQL database design, from automated unit testing, integration testing and performance testing, a Ruby developer tends to know at least a bit about everything. Chances are, they will fulfill multiple or all of these roles on their projects.
In contrast, the handful of Windows developers I met at the last Devnology meeting tend to be older than Rubyists and appear a bit more formal. There are quite a bit of consultants working for larger companies. I actually heard people describe themselves as Software Architect and talk about corporate ladders as the most normal thing in the world. For a Rubyist to say that would be strange.
For me, these two groups are kind of opposites. Due to this, they would normally not interact much. This causes knowledge to be discovered independently and to be spread in their own communities. What is new in one group could be discovered a year later by the other group. If the two groups interact and share knowledge, the knowledge sharing might happen earlier. This is where Devnology can add a lot of value.
The git story
A good example of knowledge that is not universal is the Git distributed source code management system. For Rubyists, it is the SCM to use. Subversion is so 2007. In the Windows world, Subversion is still the way to go and Git is largely unknown. Git was developed by Linus Torvalds, of Linux kernel fame, to replace Bitkeeper after Bitkeeper stopped being free. Git has great support on Linux and Mac, but Windows support took a while to get going.
Being distributed, Git does not depend on a central repository to store all code. When you have a checkout of the code, you have the full repository. This includes all history and branches, tags, etc. Git supports a centralized workflow, where everyone pushes their changes to a single server and pulls their updates from there. Through git-svn, it is possible to even use git to interact with Subversion repositories.
The advantages of Git becomes more interesting in the Github model: everyone on the team has a public repository on Github and a private repository on their laptop. You work locally and then push your changes to your public repository. Then you pull changes from other people's public repositories, work offline for a bit and push all changes back to your public repository. Other people can then pull in your changes and so on. This is almost an evolutionary approach to coding, where the best patches get pulled in by a lot of people and remain with the project. For closed source projects, you can use Github's private repositories, which you can share with people of your choice. In the Ruby world, Github is becoming a social network for developers.
My personal "I am so glad I use Git instead of Subversion" moment came when the old server we use to keep a number of private repositories on had a HDD failure. We just put in a new HDD, uploaded my working copies of all repositories and got back to work. During the week we had no central server, we committed code to local branches and ran a built-in git server to share new patches over the network. With Subversion we would have needed to do extra work to regularly backup the central repository and we could not have created new patches during the week the server was down.
Last week's Devnology meeting
Last week wednesday I attended Devnology's second meeting and had a great evening. There were about 18 people, mostly .NET and Java developers, but there were also two Smalltalkers, a Pythonist and me as a Rubyist. Though a lot of references and examples used the Windows platform and it's tools and languages, the discussion went about fundamentals that apply to all platforms, languages and communities. The meeting type was Round Table discussion and it was divided into two parts, each with their own topics.
First part
We gathered a number of topics and then voted on them. The two topics we would be discussing were: Generalist vs Specialist and Learning on the job vs Computer Science.
Generalist vs Specialist was interesting because it immediately became clear that everyone had a different opinion on what the terms meant. Is it specific to a language, a business domain, a platform, a role in the team or something else entirely? What is the scope in which you define these terms? It is interesting to question this. I always thought of myself as a Generalist, because I can do almost anything required in our company: be the sysadmin, be the software architect, design the database, write back-end code, write front-end code, test the system, plan the project and lead the team. On the other hand I am a Specialist, because all my knowledge is focused within the Ruby and Rails environment. I would be lost on Windows with a .NET project to develop a GUI application.
After a bit, the discussion flowed over to Learning on the Job vs Computer Science. Arguments that were put forth for CS are that it gives you a broad knowledge of different ways to solve a problem. Learning on the Job saves you four years and immediately starts to teach you what you need to know. You might not know the theoretical background about why things work the way the do, but you will be able to apply it.
The discussion also highlighted interesting perspectives of people who initially started to work and then later got their CS degree. Going this route gives you a lot more practical context to put the theory in. This is the opposite of the CS-first approach, where you first learn a lot of 'useless' theory that only later on becomes relevant when it gets a context in your job. A lot of theory might never get a proper context, I can imagine.
I initially started on a CS-ish route by studying Artificial Intelligence, but I did not find it challenging enough. For this reason I quit and found myself a job, where I did find a challenge and learned a lot of things. From time to time I do find myself hungry for the knowledge I could have gained at a CS course. The problem is that the school system is, in my opinion, very fake with grades as goals instead of knowledge as supreme goal. It's been four years since I left university and I only have vague memories of most of the things I learned, while I was always among the best scoring students. On the other hand, I also tend to forget how to use certain software libraries I knew intimately half a year ago but never used since then. I think that the details of what you know will fade, but the general concepts you learn will probably expand your way of thinking and stay.
Second part
The question of this part was "What is/are your favorite...?" A couple of suggested things to list were people's top-3 books, blogs, podcasts, tools.
My choice was the Pragmatic Programmers. They started as simple software developers turned authors, but they went on to build a publishing house for software books. Whenever I want to learn something new, I always check in there is a PragProg book or screencast available. They introduced the concept of beta books, which are books that have a beta version published as PDF while the author is still writing the book. This is great, as the author gets a lot of feedback to make the printed book better. If the book has source code that contains a bug, you can just click the 'Report Errata' button at the bottom of the PDF and submit a bug report and possible fix for the code on that page. Getting your hands on an early version of a book also means that you can read it way before the paper version even ships.
A second choice is Peepcode, which sells professional screencasts on a wide range of topics. Recently they also started publishing smaller eBooks as PDF. A lot of early work is focused on Ruby, but more recent work covers a wider range of topics. Non-ruby topics include Git, Emacs, Clojure, Objective-C, Productivity, Javascript and more.
Conclusion
Devnology is a great initiative that I intend to support by means of attending meetings and generating publicity in the Ruby communities I am part of. Sharing knowledge between previously unconnected communities is a good thing and I hope it will be a huge success.
Monday, January 26, 2009
SimpleGate
At yoMedia we frequently have to perform simple tasks on servers that are only reachable by hopping through a number of other servers. To deploy Rails, Merb or Ruby projects, we use Capistrano and it works great. It is easy to configure and you can set it to use any number of gateway servers if that is needed to reach a server.
When deploying ruby gems or executing arbitrary commands, Capistrano does not really work for me. Don't get me wrong, the upload:deploy task is great if you want to send an updated project file to your deployed project. Sending a gem or configuration file to an arbitrary directory on the server is not as easy. It is hard to break out of its project-sized box.
Our strategy so far has been to try and minimize our interaction with servers that were more than one hop away from our laptops. When we did have to restart a daemon or look at log files, we would have to do the SSH hop, hop, hop ritual and do our thing. Then exit, exit, exit until we're back on our laptop's bash shell. Having RSA key logins to a number of servers saves the trouble of having to enter passwords, but the manual SSH-hopping does get tedious after a while.
Being the kind of programmer who rather scripts the tedious things away rather than perfecting his typing speed to speed up repetitive tasks, I figured it must be possible to automate these things. Knowing I'm not the only one with this problem, Google led me to a solution. Net-ssh-gateway (NSG). Thank you, Jamis Buck!
NSG makes it possible to establish an SSH connection through a gateway server to the next gateway server behind it using port forwarding. Repeat this multiple times until you've connected to the final server. The code for this looks like this:
Since this is rather bulky and non-DRY code, let's condense it into something that involves less repeating of code:
Capistrano uses NSG internally for its gateway connection code, but it seems kinda tightly coupled with Capistrano internals. Also, when I looked at it for the first time, I did not really get how it worked or what it did. To gain a better understanding, I decided to extract the relevant code into a script and rework it to be stand-alone.
The great thing about trying to do something with code you don't really understand is that you will understand it once you have dissected it far enough. After that, you can work with it and adapt it to suit your needs.
This resulted in an early version of SimpleGate, my attempt at creating a wrapper library around NSG to make gateway chaining easier. The first version was simply the relevant Capistrano method reworked to work in isolation. The next version improved on the gateway chaining by making at as easy as calling SimpleGate.new.through_to(%w[foo bar baz]) to connect through foo and bar to baz.
SimpleGate also has a ServerDefinition to wrap a simple YAML configuration file that stores the actual server connection information. This is useful for cronjob-activated scripts and other non-interactive code when you have a password authentication server in the chain of gateways. For command line tools using SimpleGate, it saves typing.
Capistrano has a good support for SSH, its configuration files and the various authentication schemes. Passwords and RSA keys are both not a problem. SimpleGate currently only supports passwords, as that is what NSG supports out of the box. RSA key logins are a todo item. For configuration, it does the simplest thing that can possibly work: just store the connection info as a plain YAML file in ~/.servers.yml.
After two minor version bumps, I had something that was good enough to build a script that connected through multiple gateway servers to my target and request its uptime. The next step was to execute arbitrary commands, which was a small modification.
Here I discovered another hard to reproduce feature that Capistrano executes in a really nice way: sudo passwords. For some reason I still have to discover, SimpleGate does not let me enter a password when asked for it on the remote server: it just fails the password check and quits. I guess that is another todo item.
After discovering this I wanted to check up on another server that was hidden behind a number of hops and started to change the hard-coded gateway connection sequence in my test script to connect to the new server.
Woah! Wait! Full stop.
Hard-coded connection info is not good. The server name should be a command line option and the connection sequence should be figured out by the script, not by me. Since I was not interested in manually working out all possible connection sequences, I figured it was time to add a simple path-finding Router class to the project...
After a quick read of relevant sections in Bratko, to refresh my knowledge of the topic, I decided to model the search space as a directed non-cyclical graph and search through it using a simple depth-first recursive search algorithm. Support for cyclical graphs will be another todo.
For every node in the network, all its possible connections are described in a YAML file, that is just a Hash of Arrays with server name strings. A special 'local' node represents my laptop or any arbitrary internet-connected system. The search algorithm comes down to:
A version bump later I remembered by original goal: upload a gem, install it and restart the daemon associated with it. Restarting is all done in userland, so that is not a problem. Installation requires sudo, which is still a todo item. Uploading was still open.
NSG can open a normal net/ssh session. net/sftp can use this session to do file transfers. A quick copy-paste-adjust later I had a new executable for copying a single file to a single server, through an arbitrary number of hops.
Right now SimpleGate is at version 0.5.0 and it has the three items noted above as left to do. Then it has its core functionality and should be polished up for its 1.0 version.
The command line tools should get parameter support and a --help interface. Then the config files should both be documented and command-line editable. Once those things are in, multi-server support might be useful. In a Capistrano-like fashion connect to multiple servers (sequentially or in parallel) and execute commands on all of them or upload file(s) to them. The file uploader can get a better interface instead of mimicking the code.
When deploying ruby gems or executing arbitrary commands, Capistrano does not really work for me. Don't get me wrong, the upload:deploy task is great if you want to send an updated project file to your deployed project. Sending a gem or configuration file to an arbitrary directory on the server is not as easy. It is hard to break out of its project-sized box.
Our strategy so far has been to try and minimize our interaction with servers that were more than one hop away from our laptops. When we did have to restart a daemon or look at log files, we would have to do the SSH hop, hop, hop ritual and do our thing. Then exit, exit, exit until we're back on our laptop's bash shell. Having RSA key logins to a number of servers saves the trouble of having to enter passwords, but the manual SSH-hopping does get tedious after a while.
Being the kind of programmer who rather scripts the tedious things away rather than perfecting his typing speed to speed up repetitive tasks, I figured it must be possible to automate these things. Knowing I'm not the only one with this problem, Google led me to a solution. Net-ssh-gateway (NSG). Thank you, Jamis Buck!
NSG makes it possible to establish an SSH connection through a gateway server to the next gateway server behind it using port forwarding. Repeat this multiple times until you've connected to the final server. The code for this looks like this:
Since this is rather bulky and non-DRY code, let's condense it into something that involves less repeating of code:
Capistrano uses NSG internally for its gateway connection code, but it seems kinda tightly coupled with Capistrano internals. Also, when I looked at it for the first time, I did not really get how it worked or what it did. To gain a better understanding, I decided to extract the relevant code into a script and rework it to be stand-alone.
The great thing about trying to do something with code you don't really understand is that you will understand it once you have dissected it far enough. After that, you can work with it and adapt it to suit your needs.
This resulted in an early version of SimpleGate, my attempt at creating a wrapper library around NSG to make gateway chaining easier. The first version was simply the relevant Capistrano method reworked to work in isolation. The next version improved on the gateway chaining by making at as easy as calling SimpleGate.new.through_to(%w[foo bar baz]) to connect through foo and bar to baz.
SimpleGate also has a ServerDefinition to wrap a simple YAML configuration file that stores the actual server connection information. This is useful for cronjob-activated scripts and other non-interactive code when you have a password authentication server in the chain of gateways. For command line tools using SimpleGate, it saves typing.
Capistrano has a good support for SSH, its configuration files and the various authentication schemes. Passwords and RSA keys are both not a problem. SimpleGate currently only supports passwords, as that is what NSG supports out of the box. RSA key logins are a todo item. For configuration, it does the simplest thing that can possibly work: just store the connection info as a plain YAML file in ~/.servers.yml.
After two minor version bumps, I had something that was good enough to build a script that connected through multiple gateway servers to my target and request its uptime. The next step was to execute arbitrary commands, which was a small modification.
Here I discovered another hard to reproduce feature that Capistrano executes in a really nice way: sudo passwords. For some reason I still have to discover, SimpleGate does not let me enter a password when asked for it on the remote server: it just fails the password check and quits. I guess that is another todo item.
After discovering this I wanted to check up on another server that was hidden behind a number of hops and started to change the hard-coded gateway connection sequence in my test script to connect to the new server.
Woah! Wait! Full stop.
Hard-coded connection info is not good. The server name should be a command line option and the connection sequence should be figured out by the script, not by me. Since I was not interested in manually working out all possible connection sequences, I figured it was time to add a simple path-finding Router class to the project...
After a quick read of relevant sections in Bratko, to refresh my knowledge of the topic, I decided to model the search space as a directed non-cyclical graph and search through it using a simple depth-first recursive search algorithm. Support for cyclical graphs will be another todo.
For every node in the network, all its possible connections are described in a YAML file, that is just a Hash of Arrays with server name strings. A special 'local' node represents my laptop or any arbitrary internet-connected system. The search algorithm comes down to:
- If we search from target to target, return that the route is target.
- If the current node has outgoing connection possibilities, try them all, keep the shortest and return it with the current node prepended to the returned list of nodes.
- If there are no outgoing nodes, return nil.
A version bump later I remembered by original goal: upload a gem, install it and restart the daemon associated with it. Restarting is all done in userland, so that is not a problem. Installation requires sudo, which is still a todo item. Uploading was still open.
NSG can open a normal net/ssh session. net/sftp can use this session to do file transfers. A quick copy-paste-adjust later I had a new executable for copying a single file to a single server, through an arbitrary number of hops.
Right now SimpleGate is at version 0.5.0 and it has the three items noted above as left to do. Then it has its core functionality and should be polished up for its 1.0 version.
The command line tools should get parameter support and a --help interface. Then the config files should both be documented and command-line editable. Once those things are in, multi-server support might be useful. In a Capistrano-like fashion connect to multiple servers (sequentially or in parallel) and execute commands on all of them or upload file(s) to them. The file uploader can get a better interface instead of mimicking the code.
Subscribe to:
Posts (Atom)