Interview with Mitch Pirtle on Large Scale Joomla Websites
Tell us a little bit about yourself
I’m a longtime supporter of the Joomla project, having begun my role as contributor back in the Mambo days in early 2003. I’m pretty sure I’m the only person that managed to bulldoze their way onto the core team 🙂 I was deeply involved in the efforts to rebrand as Open Source Matters and then Joomla; I also have the unique distinction as being the “John Hancock” of the project, as I was the only one physically present for the initial signing of incorporation documents for OSM. I’m not officially part of the project at present, but of course still an active promoter and supporter of the project. I’m also a contributor to MongoDB and getting involved with the Lithium PHP framework, and am a huge fan of emerging technologies. I am an unapologetic, unabashed FOSS homer.
Work-wise, I’m CTO at Totsy, a private sale website specialized in retailing cool and unique products for moms and their young children.
I’m a family man, so outside Joomla (and technology/career in general) I prefer to spend my time with my wife and kids. I also coach soccer (go Brooklyn Patriots!) and gaelic football (go Brooklyn Shamrocks!), and am super excited to be riding my skateboard again. I’m writing a couple books, which is startlingly difficult and time consuming.
What are some large-scale Joomla sites that you have built and deployed? Can you give us some traffic stats on these sites?
Viacom: I was the lead architect and developer for Quizilla.com, a Viacom property that peaks at around 58 million page views in a week. This site was done with the framework, and not the CMS stack. Efficiency gains were insane, as we didn’t need to worry about languages, timezones or dynamic navigation for this site. The overhead of a full CMS stack would have been a real bear to overcome, and I was extremely pleased with the performance of the Joomla framework. Our codebase ended up in the Viacom platform standards group, where I hope others can make use of it. I’d have to guess that the highest monthly traffic for Quizilla is near the quarter-billion mark for page views. I was initially hired as a consultant to validate their belief that a complete reimplementation of the existing platform was warranted, and then led the assessment and proposal for the replacement technologies.
Gilt Groupe: I was also the system architect and founding CTO for Jetsetter, a Gilt Groupe property, which was a really innovative implementation that also rivals Quizilla’s traffic. Using the full CMS stack, we also leveraged the MongoDB document database for all non-core data as well as the Zeus traffic manager and CD Networks for the CDN.
Food Networks: I was the lead on the Food2 project while being VP at KickApps, a hosted whitelabel social media company. KickApps was hired to produce the Food2.com website, which was essentially a mashup of social media technologies and a rich media experience. The base platform was Joomla with KickApps providing some way wicked rich client media players, integration with Facebook and MySpace, and integration with KickApps services via REST APIs. What really made this such an awesome achievement is that it was a very aggressive implementation of features in a brutally short development cycle. Ok there was no dev cycle, it was something between a sprint and death march at the same time, for about 2 months. I’m extremely proud of this site and the team that pulled it off – everyone involved is a total stud to me. I consider this a “large-scale” site in that there is a ton of systems integration and services combined with a pretty big burst of traffic when new episodes are uploaded and contests are run. Maybe not in the same ballpark as Quizilla and Jetsetter, but in my eyes a significant achievement nonetheless.
What other platforms, if any, did you consider when planning these sites?
I briefly considered Drupal for the Quizilla project, as well as many PHP frameworks. Drupal was getting phased out at the time as the current version back then was too chatty and tended to crush the database with additional traffic.
For Jetsetter and Food2, the rich administrative interface provided by Joomla and full features by the stock core were the biggest timesavers; for Quizilla the ability to customize at the framework level provided flexibility for solving extreme scale and performance challenges.
What criteria did you consider when choosing a platform/CMS for these sites?
- API richness
- MVC
- Object methodologies
- Modular, flexible framework
- Cache, session and database classes
- Ability for multiple connections to different database types
- Scaffolding for administrative UI
- Size of community, install base
- Performance of the platform
Why did you choose Joomla?
See response to the question above. 🙂
Can you describe the process you go through when planning and building a large-scale website?
- “Large-scale” can mean many things – sometimes that is a massive amount of content and data, other times it is a massive amount of traffic. Scale and performance are not always the same, so that is the primary consideration.
- Also the question of licensing comes up – if you’re wanting to build your own platform and own it (and therefore control licensing for distribution in the case that you sell your platform) then the license of the underlying technologies will be a major factor – but hasn’t been an issue for me to date.
- How rich does the administrative interface need to be? Is this a site that runs itself, or will there be many people required to manage a lot of things?
- What level of customization is required? Sometimes a full CMS stack will get you to the finish line fastest, other times you might be better served by using a framework and building only what you need.
- What is the development lifecycle? Most developers only worry about their tools while development is under way; what happens after the site launches? The operational headaches of a live site can be a heavy burden, and that overhead is always a factor in long-term planning.
- What operational resources are available, and how complex is the infrastructure from a management perspective?
- What are the requirements for integration with external systems, and are any of these forcing a specific toolset to achieve?
How/where are these sites hosted? Can you describe the server and network configuration required to handle large-scale sites?
Some of the high-traffic sites manage on three webservers and one database server, others require racks and racks of gear as they are doing a lot more heavy computational work. Still others fire up hundreds of nodes at Amazon’s EC2 cloud services every day for a couple hours, as they have extreme bursts of traffic usage due to the nature of the usage of their sites.
Did you have to make any core modifications to Joomla to accommodate the kind of traffic you anticipated for these sites?
Absolutely. Here’s where I have to take the gloves off. JSession is reall bad. Please say that out loud, and repeat three times. Every day. JDatabase is a little better but not much. Both JSession and JDatabase needed tweaking to make performance requirements, but JSession needed all kinds of work, which also required hacking at many other subsystems and modules, even JApplication! JUser is similarly inflexible, in that I once had a remote authentication source where user data was returned from a REST call – and had to store unwanted copies of that data in the jos_users table to make Joomla quiet. ACL in the 1.5 series is a mess; I always end up developing my own ACL for each project and tailor my implementation to those specific needs.
Lastly, and this is a challenge for all major platforms out there so this is not a dig at Joomla in particular, is the issue of PHP5 versus PHP4 support. I’m coding mostly with the Lithium framework nowadays which demands PHP 5.3 at the bare minimum, and having namespaces and closures (just to skim the surface) is a huge convenience for me. Joomla cannot ever be that bleeding edge in their implementation, as they have to also consider what all the major hosting providers out there are supporting. And you know what? Their support it terrible. Hey major ISPs out there, welcome to the new millennium, for crying out loud. This is the curse of the one-size-fits-all conundrum that all major CMS teams have to answer to.
Is Joomla’s built-in caching system sufficient?
To me Joomla’s caching is fine, there are just too many parts (and third party extensions) that don’t take advantage of JCache. Now don’t get me wrong, there are plenty of things that could be improved with JCache, but all in all it is enough to get the job done. I would love to see write-through and cache invalidation on a more automated scale, as well as cleaner access to cache objects directly. Having a cache intelligent enough to refresh cache objects on its own would be a boon, too.
What role does MongoDB play in the sites that you develop, especially for large-scale sites?
In a nutshell I cannot imagine taking on a high-traffic, heavy duty web project without something like MongoDB under the hood. I don’t develop with relational databases anymore unless I have relational data needs – and frankly that just doesn’t happen with web applications for the most part. Web developers made do with the best tools we had at the time (relational databases) but now it is time to move on, there are better tools for the job.
MongoDB was the difference-maker for Jetsetter. I’d basically gone non-relational by that point, and only used enough SQL to make Joomla happy. All Jetsetter-specific extensions were coded with a MongoDB helper library for Joomla. Sharding, replica sets, and extremely robust concurrency are just a few of the significant advantages MongoDB has over relational systems. Code is much simpler, reducing bugs and further providing performance gains.
Joomla 1.6 goes forward with a proper query class – with a new, unique name 🙂 – so we can start to pry MySQL out from under the hood, which will be a massive performance and scale improvement. The trick after that is finding ways to take advantage of MongoDB-specific features, without becoming too tied to that database in particular; being permanently tied to MySQL is IMHO the most significant weakness of Joomla, and we need to take Joomla to a place where data is just data, and allow the drivers to do the work – no more SQL in the code, please! What would really rock is a base set of data API calls, like getters and setters, and then each driver also expose unique abilities of the platform for developers who want to take advantage of those strengths. Folks into CouchDB or Cassandra would probably feel the same way as I, and each would likely have different features of their chosen database that they would like to leverage.
Is there anything else you would like to share about building and deploying large-scale sites with Joomla?
In the end it is all about balance, and knowing if you’re getting more benefits than roadblocks from using a full CMS stack. Does the Joomla feature set get you more than 50% of the way to the finish line? Or will there be more customization than actual implementation of Joomla in the first place? Every job has a right tool, and they are never consistently the same in requirements and needs.
I do believe Joomla having a framework under the hood makes it infinitely more modular and flexible than the other monolithic CMS platforms out there, and stubbornly cling to the opinion that Joomla is really the best FOSS PHP-based CMS out there that can claim to be truly enterprise grade.