Karma-Based Mailing Lists (or: how to automate a meritocracy)

tagged | Posted by Davey Shafik
Jun 28 2009

Defining the Problem

The problem with mailing lists is that they are a free for all, it doesn’t matter who posts, everybody at every level gets to see the post.

In the real world, communications pass through a hierarchy of people, escalating as necessary, passing from person to person up the chain.

This means that, given enough time, any mailing list starts to have a large noise:signal ratio, at least for any given person’s take on the list;
they want to read what they want to read, and don’t need to be distracted ignoring the stuff they don’t want to read.

Solving the Problem

There is an unspoken — some what hap-hazard — hierarchy among the community, which with some thought, I believe, could be defined, refined
and utilized to our advantage. As an example:

  - Active internals contributors with access to internals CVS (contributions of code and useful discussions)
  \
   - Active internals contributors without access to internals CVS (patch submitters, useful discussions)
   - Inactive internals contributors with access to internals CVS (previous contributions of code and useful discussions)
     \
      - Active non-internals, PHP contributions (docs, phpweb, PEAR)
      - Active community leaders
      - Active project leaders
      - Active linux distro maintainers
       \
        - General Users with a high understanding
          \
           - Genernal users with little understanding (newbies)

If we take each of these, and assign them a number:

L1  - Active internals contributors with access to internals CVS (contributions of code and useful discussions)
    \
L2   - Active internals contributiors without access to internals CVS (patch submitters, useful discussions)
L2   - Inactive internals contributors with access to internals CVS (previous contributions of code and useful discussions)
       \
L3      - Active non-internals, PHP contributions (docs, phpweb, PEAR)
L3      - Active community leaders
L3      - Active project leaders
L3      - Active linux distro maintainers
         \
L4        - General Users with a high understand
            \
L5           - Genernal users with little understanding (newbies)

Now, what if, at any level, you could only see (by default) 1 level below you (and all levels above you). For example: L1 can see L2, L2 can see L3 etc.

This immediately means that you only see stuff that might be relevant to you; however, as a community, we then lose the ability for newcomers to contribute good ideas; because they would start out with zero karma. To help solve this issue, we adjust karma based on responses:

Scenario

  • A L3 user posts something of interest
  • A L2 user see’s the post and replies, the reply is L2
    • This bumps the original post up to L2 as well
  • A L1 see’s the post now, and can then participate in the discussion if they choose

In this case, only the thread in question is bumped up, however given enough L2/L1 (weighted) direct responses over different threads, a L3 user can gain karma and eventually become a L2 user (and obviously this applies to anyone moving up the chain)

In this way, threads (and by this mechanism, users also) can organically make their way up the tree as they gain traction, are discussed at each level and moved up.

I believe it would be possible to have a ”’single”’ mailing list that could span everything from internals right down to php-general, but this is probably not desired! It would however allow the users of any list to, regardless of their experience in the tree, contribute without weighing down the list.

Features

  • Karma tree seeded by current “social” climate
    • based on CVS access level, activity in a sliding timescale, community contributions etc
    • Some manual work will be needed on this
  • Weighted responses, a L2 responding to a L3, will move it to L2, but 3 L2’s or 1 L2 and 1 L1 responding would move it to L1, for example.
  • Adjustable threshold. Perhaps some magnanimous internals contributor likes to help out newbies, he can choose to see the whole tree, or perhaps just 3 levels down
  • Championing — it should be possible for a user to champion someone to quickly move them up the ladder, for example a L2 can bring a L5 up to L3, so their peers can see stuff, this is bringing the user up, not the thread
  • Continued tweaking of karma based on CVS access and contributions
  • Personal filters, you can add users (of any level) from whom you would like to see threads, helping the movement of stuff up the tree, skipping levels

In this way, we can, in some ways, automate the karma, and in someways advance it through our own choices; creating a hierarchy based on merit and trust.

Final Thoughts

  • Some smart filtering, so that “You’re an idiot” responses don’t elevate a thread would be good
    • That, or just handle it biologically — Don’t respond to people in negative ways
  • Implementing this as a ML (in terms of interaction) is likely the only way to get some of the higher internals folks using it, a web interface [for the messages] just won’t fly
  • Personal filters would be handled at master.php.net or maybe a new web interface for this
  • Now, obviously, this is a huge undertaking; certainly not one any single person could complete on their own… but it’s food for thought.

    - Davey

    Mini-Review: Linkinus 2.0

    tagged | Posted by Davey Shafik
    Jun 10 2009

    In July of last year I finally found a decent IRC client for OS X; Linkinus. Now, I know some of you will cry foul at that statement, after all there is Colloquy and the venerable X-Chat Aqua.

    I’m sorry, but they both sucked. Colloquy is quite stagnant, and has had the long-standing display bug (which is solved with /reload styles) forever. X-Chat Aqua on the other hand is great, except it’s obviously a ported applicaiton. Sorry, I’m a snob.

    So, here we are, almost a year later, and Linkinus 2.0 has just been released.

    I’m going to start out with the one bad point: I knew 2.0 was coming, and I knew it was a paid upgrade. So when I got the update notice, I clicked upgrade and it installed. Suddenly I was on a 15 day trial (and I only knew that because I went to the Registration dialog). It wasn’t a big deal for me, but not everybody will know.

    First Impressions

    Wow! The new left-side bar is even more iTunes/Finder/Apple Mail like, the chat style is very clean and clear and frankly, this is the best looking IRC application on the market. I can’t believe I ever put up with mIRC…

    The side-bar

    One of the most maddening bugs introduced in Linkinus 1.3 was it’s inability to remember if you closed a channel if it crashed. This meant I had an ever increasing number of channels and queries building up. Thankfully, one of the new features in the sidebar, is the ability to multiple select items and close them.

    Multiple Select and Remove

    In addition, the indicators in the sidebar are nicer and more informative.

    When you select multiple items, as in previous versions, it will split the chat between all the selected channels (and queries); however in 2.0 the display of these makes the current one much more obvious. In addition, switching between them (Using Cmd+Shift+Up/Down) uses a tasteful fade animation.

    Multiple Channels

    The Chat

    Lets face it, IRC is all about the chat. In Linkinus 2.0, the chat takes center stage. With the new visual style, it’s simple, understated and highly legible. One of the neatest features is the ability to hover over anyones nick and it will highlight all other on-screen lines by that user.

    Highlight A Users Lines (in this case: Zack)

    In addition to this great feature, Linkinus will embed media linked to, right in the chat. For example, a link to a picture will show up like this:

    Embedded Media

    When clicked on, it will do a quicklook style zoom:

    Embedded Media Zoom

    If you want the media removed from screen, simply click the X and it will be turned back to the original URL:

    Closed Media File

    Another neat feature is tiny url expansion:
    Tiny URL

    turns into:
    Tiny URL Expanded

    The expansion is fast (though obviously depends on the service response), unfortunately, so is the mouseout. When the mouse is no longer hovering over the tiny url, it contracts again; this wouldn’t be a big deal except that as soon as someone says a line your mouse is no longer over the URL and you lose it. In a busy channel this makes it a pain. A small delay would be welcome.

    The Cool

    The only other feature I wanted to highlight in this review, is the “Stars” feature. Essentially, this is Bookmarks for IRC. Allowing you save favorite quotes for posterity.

    To do this, simply hover over the line (just like you do to highlight a users other lines) and click the Star:

    A Starred Line

    Then, any time you bring up the Highlights and Stars window (Cmd+1 or Window > Highlights and Stars) you will be able to review them:

    Stars Review Window

    The Final Word

    I said that Linkinus 1.0 was worth paying for; I will say it again and more for 2.0. I’ve barely scratched the surface here, but Linkinus 2.0 is by far the best IRC client on any platform, for any price.

    Grab it now at the Conceited Software website for the paltry sum of €19.99.

    - Davey
    Note: For Linkinus 1.0 users, you can get an upgrade at half price.

    Explaining the Cloud

    tagged | Posted by Davey Shafik
    Jun 07 2009

    Some weeks back, somebody asked me “What’s the big deal with the cloud? I don’t even understand WHAT it is!”. This is a common problem, and one I’m going to try and clear up here and now.

    Why is it so hard to define?

    The cloud is so hard to define, because it is comprised of several different ideas and technologies. As I see it, the cloud comprises of the following things:

    • File Storage
    • Remote Computing Power
    • Clustered Web Hosting
    • Data Storage
    • Web Applications
    • Data Exchange

    I will attempt to define all of these, and end with a real world scenario (though fake) of how several of these can be brought together.

    There are, in my opinion, two large players in this market, Amazon, and through it’s Mosso brand, Rackspace. In addition, Google plays a large part.

    File Storage

    Disks are cheap, we know that. You can buy 1TB for US$75, that’s peanuts! The problem is high availability and data throughput. This is where “old skool” CDN’s typical played a role. However, with the introduction of Amazon’s Simple Storage Service (S3), things changed. While there is little difference between the two services in terms of the reason you used a CDN; what S3 bought to the table was a unique pricing plan (no huge setup fees, just pay pennies for what you use) making it available to every company at every level, more importantly they also introduced an API.

    Through the API, those looking for a standard CDN-type service, can upload their resources transparently as an integral part of their process. In addition many services capitalize on this API to provide non-CDN services, such as data backup.

    Since the introduction of S3, Rackspace has also entered the space with it’s Cloud Files service.

    Remote Computing Power

    Another facet of The Cloud, is remote computing power, this originally took the form of Amazon’s Elastic Cloud Computing (EC2). The idea behind this service, is the ability to configure what I can best describe as virtual machines to perform specific tasks (i.e. crunch data). Then, using the API, you can “spin up” multiple virtual appliances using the disk image as you need them.

    This means you have the resources of a giant enterprise company at your disposal, on an as-needed basis, and again, one of the breakthroughs is Amazons pricing: Pay for what you use.

    In 2008, a small company loved by geeks around the world, entered into this space, SliceHost. Known (at least, by a savvy few) for their excellent VPS services, the introduction of an API, put them in direct competition with Amazon’s EC2. In October 2008, Rackspace purchased SliceHost and while SliceHost is still a separate company, the technology now powers Rackspace’s Cloud Servers offering.

    Clustered Web Hosting

    Clustered web hosting is nothing new. Companies have been creating clusters of servers for eons, for many tasks; ranging from number crunching, to data analysis, through to web servers and database servers. Where this space enters into The Cloud, is through a service like Rackspaces Mosso/Cloud Sites service. Like a traditional cluster, they provide high availability, lots of power and reliability. (Note: I use Mosso for this blog and a number of other sites)

    However, where this becomes blended with the cloud, versus traditional clusters, is that Mosso operates one giant cluster, with huge numbers of websites using the same cluster, with the infrastructure in place to allow those sites to grow as large as they wish to autonomously and transparently as needs require.

    Another (perhaps the first, but I’m not that familiar with them) player in this space, is MediaTemple’s Grid-Service.

    Data Storage

    You might ask yourself, what is the difference between File Storage and Data Storage? The answer is the same as what is the difference between the file system and a database.

    This area is the newest addition to the cloud, and one I think most people saw as needed to really replace the old style non-cloud systems. The biggest player in this market is Amazon’s SimpleDB (beta), with Google’s BigTable service only available through their python-based AppEngine.

    Web Applications

    Arguably the meat of Web 2.0, web applications allow people to create, and work in the cloud without any knowledge of the technology. To them, data held by web applications is in the same place as their webmail. API access to integrate these applications into other services are a part of how they are used within the cloud. The obvious player in this area is Google, with it’s Gmail, Google Calendar, and other Google apps such as Google Docs.

    Data Exchange

    Data Exchange using web services is the heart of Web 2.0: the mashup. Data exchange is not strictly part of the cloud, but web services are. Almost all of the cloud is interacted with using web services. In addition, thanks to the ideas of single sign on using OpenID etc, we are starting to see different facets of our data migrating across the websites we use to make it more useful and accessible — this is part of the cloud.

    Scenario

    For this scenario, I’m going to make up some fictional scenario involving Twitter. I have absolutely no idea what they have technically going on, and have no idea if this is how they might handle the scenario; it’s just a well known scenario that could be solved using the cloud.

    The scenario is this: Oprah joins your service, and suddenly you have an influx of a new users. In addition Ashton Kutcher and CNN are duking it out to reach 1 million followers.

    You have 2 weeks to prepare, you could call your Dell representative and order 50 new servers, clone disks, and put them into your cluster… but what if it’s not enough? How do you spend that much money when the hype might only last 2 weeks? a month? The simple answer is, you don’t. Instead you configure a couple of EC2 or CloudServer instances, and as your load starts to ramp up, you simply initiate more and more appliance on-the-fly using their respective APIs.

    Knowing that Oprahs show is going to air at a specific time; you might fire up several instances to get the ball rolling an hour before hand.

    You have one appliance which will function as web servers for twitter.com, one for handling API requests, perhaps even split out registration to it’s own appliance, and then of course clustered copies of their traditional RDBMS (i.e. they’re not typically using Amazon Simple DB for their regular storage as it’s functionality just isn’t up-to-par).

    You already have S3 in place for use avatars, but instead of calculating the filename hash on every request, or retrieving it from your local database, you push that into Amazon Simple DB.

    And that’s it. As the load starts to drop off, you shut down EC2 instances, knowing if you get a sudden influx, you can always spin them back up.

    Eventually, you get a handle on what your new average load will be (presumably, only some small portion of the initial influx of “zomg Oprah says this is awesome so it must be” people will stay) and then you can actually purchase the right amount of actual hardware to add to your own systems.

    Or not. Keep it in the cloud. That’s a decision you can now make at your leisure, instead of scrambling to make your best guess in that two week period before things go nuts.

    Conclusion

    The reason the cloud is so hard to define, is because it’s no single thing. It is, like it’s namesake, nebulous. It is simply there, and will look like what you make it.

    - Davey

    P.S.
    Please read Rob’s reply below, he is an employee of Rackspace, and usually (always?) the guy behind @mosso.

    Making the case for PHP

    tagged | Posted by Davey Shafik
    Jun 05 2009

    One of the biggest decisions you can make for any project is the environment it which the project will be written.

    Most developers mistake the word environment for the word “technology” or “software”. For example, you develop in a “JSP environment”, or a “LAMP environment”. This is a crucial mistake that is made time and time again, and unfortunately, it hurts companies because the decision makers either make the same mistake, or they listen to those making the mistake.

    I’ve said numerous times, that you can use any language to do anything. Yes, there are practical limits, using C to write a dynamic website isn’t a great idea, nor is using PHP for password cracking. Each language has it’s own strengths and weaknesses; good developers however, know what these are and work with their strengths and work around their weaknesses. This post isn’t going to focus much on languages; I figure everybody reading has already chosen PHP and knows why.

    What I will say is this: Ruby, Python, PHP, Perl, Java and .NET all bring the same capabilities to the table (some things are easier in some, and some more difficult in others). You can create any solution you want in each of these languages, in an efficient, well thought out, well developed way. Yahoo! could be written in Python. How do we know this? because Google uses it. Microsoft uses .NET for it’s web presence, and while you might not like to use it, it still stands up to more stress than most of the websites on the internet.

    With this in mind, I then would say that the language capabilities themselves, are the least important factor in choosing your environment.

    This then brings me neatly to what else that environment encompasses. These, to me, fall into three categories. People, knowledge and penetration.

    Access to People

    To put it bluntly, if you can’t find the people to write your code, you’re screwed. While you may know what you’re doing, and you have enough people now: your team will need to grow. If you can’t find people around you to hire, then what?

    There are estimates that for every 100 PHP developers, there are 42 Perl developers, there are 12 Python developers and 4 Ruby developers. (see: here)

    Some will say that this is because there is a lot of bad PHP developers. I will agree to a point, but that point is that there are just so damn many, that there are still more great developers to pick from than with other languages.

    Access to Knowledge

    While this one is more subjective, I believe that the sheer number of PHP developers generate far more useful knowledge from which to learn, cherry pick ideas and utilize them. Add to that the extensive number of books, and our excellent php|architect magazine; as well as the training and teachings provided by MTA, Zend and ibuildings, we have more going for us than most every other language with, I think, the exception of Java in terms of professionally backed learning.

    Market Penetration

    Simply put, the availability of PHP as a platform is there from the cheapest virtualhost, to the most expensive dedicated systems. It has gained wide acceptance from smaller companies, because it is cheap and reliable, and from enterprise companies such as IBM, Oracle and even Microsoft because they see that the ability is there for PHP to operate in that space and a huge number of developers willing to make that happen.

    Conclusion

    No other language can claim this trifecta, sure, there are a lot of .NET and Java developers, but a lot that goes on happens behind closed doors in big enterprises, and the knowledge is not shared. And while this isn’t true of Python, or Ruby, they lack in numbers and knowledge comparatively. This is why I choose PHP.

    - Davey

    Debugging PDO Prepared Statements

    tagged | Posted by Davey Shafik
    May 16 2009

    Something that has always bugged me about using prepared statements, is that you can really only get the query sent to the database by catching it in the logs.

    Today, a friend asking me if it was possible to get a prepared statement back from PDO with the values placeholders replaced, finally caught me in a moment where I could do something about it.

    I wrote a thin PDO wrapper class that will [imperfectly, I'm sure] return the completed query.

    It supports bound parameters, values and the array key->value methods of passing in values to prepared queries. You can see the code and examples below:

    <?php
    class PDOTester extends PDO {
    	public function __construct($dsn, $username = null, $password = null, $driver_options = array())
    	{
    		parent::__construct($dsn, $username, $password, $driver_options);
    		$this->setAttribute(PDO::ATTR_STATEMENT_CLASS, array('PDOStatementTester', array($this)));
    	}
    }
    
    class PDOStatementTester extends PDOStatement {
    	const NO_MAX_LENGTH = -1;
    
    	protected $connection;
    	protected $bound_params = array();
    
    	protected function __construct(PDO $connection)
    	{
    		$this->connection = $connection;
    	}
    
    	public function bindParam($paramno, &$param, $type = PDO::PARAM_STR, $maxlen = null, $driverdata = null)
    	{
    		$this->bound_params[$paramno] = array(
    			'value' => &$param,
    			'type' => $type,
    			'maxlen' => (is_null($maxlen)) ? self::NO_MAX_LENGTH : $maxlen,
    			// ignore driver data
    		);
    
    		$result = parent::bindParam($paramno, $param, $type, $maxlen, $driverdata);
    	}
    
    	public function bindValue($parameter, $value, $data_type = PDO::PARAM_STR)
    	{
    		$this->bound_params[$parameter] = array(
    			'value' => $value,
    			'type' => $data_type,
    			'maxlen' => self::NO_MAX_LENGTH
    		);
    		parent::bindValue($parameter, $value, $data_type);
    	}
    
    	public function getSQL($values = array())
    	{
    		$sql = $this->queryString;
    
    		if (sizeof($values) > 0) {
    			foreach ($values as $key => $value) {
    				$sql = str_replace($key, $this->connection->quote($value), $sql);
    			}
    		}
    
    		if (sizeof($this->bound_params)) {
    			foreach ($this->bound_params as $key => $param) {
    				$value = $param['value'];
    				if (!is_null($param['type'])) {
    					$value = self::cast($value, $param['type']);
    				}
    				if ($param['maxlen'] && $param['maxlen'] != self::NO_MAX_LENGTH) {
    					$value = self::truncate($value, $param['maxlen']);
    				}
    				if (!is_null($value)) {
    					$sql = str_replace($key, $this->connection->quote($value), $sql);
    				} else {
    					$sql = str_replace($key, 'NULL', $sql);
    				}
    			}
    		}
    		return $sql;
    	}
    
    	static protected function cast($value, $type)
    	{
    		switch ($type) {
    			case PDO::PARAM_BOOL:
    				return (bool) $value;
    				break;
    			case PDO::PARAM_NULL:
    				return null;
    				break;
    			case PDO::PARAM_INT:
    				return (int) $value;
    			case PDO::PARAM_STR:
    			default:
    				return $value;
    		}
    	}
    
    	static protected function truncate($value, $length)
    	{
    		return substr($value, 0, $length);
    	}
    }
    
    $pdo = new PDOTester('sqlite::memory:');
    $pdo->query('CREATE TABLE foo (bar TEXT, baz TEXT, num NUMERIC, empty TEXT)');
    $query = $pdo->prepare('SELECT * FROM foo WHERE bar = :bar AND baz = :baz');
    
    // Test with passed in array
    echo $query->getSQL(array(':bar' => 'foo', ':baz' => 'bat')) . PHP_EOL;
    
    $query = $pdo->prepare('SELECT * FROM foo WHERE bar = :bar AND baz = :baz AND num = :num AND empty=:empty');
    
    // Test with bound params and values
    $bar = 'bar';
    $baz = 'baz';
    $num = '0.1';
    $empty = 'empty!!';
    
    // Bind Param
    $query->bindParam(':bar', $bar);
    
    // Bind Value
    $query->bindValue(':baz', $baz);
    
    // Bind With types
    $query->bindParam(':num', $num, PDO::PARAM_INT);
    $query->bindParam(':empty', $empty, PDO::PARAM_NULL);
    
    echo $query->getSQL() . PHP_EOL;
    
    // Change the vars
    $bar = 'foo';
    $baz = 'bat';
    $num = '2.6';
    $empty = 'blah!';
    
    echo $query->getSQL() . PHP_EOL;
    
    // Bind with length
    $query->bindParam(':bar', $bar, PDO::PARAM_STR, 2);
    
    echo $query->getSQL() . PHP_EOL;
    ?>
    

    This results in the following output:

    SELECT * FROM foo WHERE bar = 'foo' AND baz = 'bat'
    SELECT * FROM foo WHERE bar = 'bar' AND baz = 'baz' AND num = '0' AND empty=NULL
    SELECT * FROM foo WHERE bar = 'foo' AND baz = 'baz' AND num = '2' AND empty=NULL
    SELECT * FROM foo WHERE bar = 'fo' AND baz = 'baz' AND num = '2' AND empty=NULL
    

    Hopefully, this will help you get a somewhat better idea of what’s going on :)

    - Davey