MyISAM

Posted by Joel Thu, 31 Aug 2006 00:10:00 GMT

So I'm switching some of our production tables over to MyISAM. This is a pretty radical shift for me. I used to be a big PostgreSQL advocate, but now that I'm working on such a big site, it's all about speed. And MySQL is just much faster.

And why MyISAM over InnoDB? Because I don't need foreign key constraints. And I don't need transactions. All that stuff looks good on paper, but in reality, it comes at a huge performance hit. And for what? It helps you detect bugs. In short, it's basically like running your production application in debug mode to use an ACID-compliant database.

Any sort of validation, on delete cascade, anything can be done on the application layer if you really need it, but when's the last time you have ever really had data integrity issues that didn't become obvious quickly? Ultimately we're talking about application errors being caught at the database level, and that's a convenience that just isn't worth the cost.

The main cause of this is that I want to use the Fulltext indexing in MySQL and it is only supported by MyISAM.

Posted in  | no comments | no trackbacks

WWDC

Posted by Joel Tue, 08 Aug 2006 04:09:00 GMT

Well, that was kind of lame. I never liked the G5 case. The handles need rubber on them, at least; doesn't anyone notice these things? I don't really care about the Mac Pro though, I'm a laptop guy.

And, yeah. Leopard. I'm down with doing the time warp again; but do we really need virtual desktops? Isn't that a problem solved more elegantly by Expose? I dunno. Options, I guess; I know some weird people who'd installed software to provide that functionality. I was hoping for more. And what's with the release date?

Then again, I can cut them some slack because they had to spend a lot of time on that whole Intel thing this year. And really, they've done a good job. I'm still clutching feverishly to my PowerBook G4 hoping it doesn't die, because there are two major reasons I don't want a MacBook right now: heat and DarwinPorts.

I guess, looking back, it's not that bad. The only really good feature of Tiger was Spotlight, and Time Machine could compare in usefulness to that. I could still happily forget Dashboard ever happened, since it doesn't have much usefulness past eye candy to impress PC users.

Posted in  | 4 comments | no trackbacks

Distributed Workers

Posted by Joel Fri, 04 Aug 2006 03:15:00 GMT

Yesterday was stressful at work because of stability problems. I ended up getting rid of Lightty and Pen and going back to Pound balancing four Mongrel processes. Furthermore I had to disable session management in Pound and increase the timeout (Pen worked just as well really, but Pound has virtual hosting, making Lightty largely unnecessary).

The problem is that our site has to resize images frequently and when it does one processor is going to get eaten up. It employs a simple caching system so that when people click on a size to download, it will see if it exists and if not resize the photo for that size. This means the customer has to wait anywhere between 3 seconds and a minute, depending on the size requested, the size of the source image, and the traffic on the server. The 15 second default timeout on Pound resulted in 500 errors when a request took longer than 15 seconds (either because it was crunching an image, or was waiting in a line behind a request that was). So I increased that.

We have two processors. Pen and Pound (with a Session set) will try to send you back to the same server you used last time. This means that if your server is presently spending 30 seconds crunching an image, you wait 30 seconds. In Pen you can solve this by putting it on a more traditional round robin load balancing, so it will always use the available processor, and in Pound you can remove the Session attribute for the same round robin functionality.

However, if two people crunch images simultaneously, everyone gets to wait until one of them finishes. It's not ideal. And while one solution is to scale out and get fifty processors, another solution is to just approach resizing images more intelligently.

And that's how I found BackgrounDRb. A pretty cool system that will shift stuff off to another server which you can then run at a low priority. I can just queue up photos to resize in there and have it resize them all immediately after they are accepted into the system, not when a customer requests it. Meanwhile the web servers can easily ask this server whether or not it's done working on stuff. I think this is going to become a huge part of scaling sites. Why get new hardware when you can just increase the efficiency of your software? More info on the basic concept at Mongrel's site.

I've been messing with it so far with pretty good results. Responsive mailing list community.

I was confused by this because I thought requests would just share processors and multitask happily but it seems like they monopolize it. I tried running good ol' Lightty and FastCGI and got the same two maxed-out request limit. And my processors have hyperthreading, so I thought it would be four. I wonder if there's something I'm missing.

This seems like a basic CS issue. Is there a point to having more than two Mongrel web servers running if I only have two processors available? Why doesn't hyperthreading grant me four processors? Apache always spawned lots of processes, FastCGI always does too; so why would it do that if the simultaneous requests were limited by the processors? I need to do more research.

Posted in  | no comments | no trackbacks

The Dead Texan

Posted by Joel Fri, 21 Jul 2006 04:44:00 GMT

Thanks to my iPod I've been listening to lots of the music I have that I've never really heard. I've also still listened to NPR a bit. It's pretty easy to switch over to the radio at 9 am to catch the Writer's Almanac, and if the Fresh Air interview is no good, go back to the iPod.

I am pretty happy with my productivity at work and I feel like I am really turning this site around. I feel like it's going to be successful and I'm going to keep trying hard. We are still fixing it up and adding some more crucial basic features before we start marketing to the customers and generating revenue. However, we have just been immensely popular amongst photographers, with loads and loads of photos coming in at a very fast pace.

Today Eric and I went to House of Nanking which was fun. He invited me to a party in Soma but I'm pretty tired. I still haven't started working out (although I've signed up and bought some clothes) and I want to get some sleep so I can actually start tomorrow morning. We'll see if that happens.

Yesterday I worked for about 11 hours and then came home and worked for another two hours. Today I worked a more normal schedule (maybe 8.5 hours). I keep learning so much more about scaling sites, and I've now learned how amazingly significantly speed can improve by indexing the right table columns (and how it can really slow things down if you index the wrong ones!)

Also, I'm starting to realize my current/stable branching setup is just insufficient for the real world, for a couple reasons. While I like decentralized version control, my coworkers don't use it. And second of all, sometimes you want to commit things in a way that everyone can use them, but they are more long term projects, or need to wait, so can't go live with the next release. In short, I'm going to set up some branches based on certain projects, and might return to using the "trunk" terminology, something like trunk/branches/live. Still no release numbers, though ;)

Posted in ,  | no comments | no trackbacks

Simplified version control for the web

Posted by Joel Tue, 11 Jul 2006 06:31:00 GMT

This is a bit of a deviation from my normal blog practice, which is about my boring life, so bear with me. I'm going to talk about some of the technology I deal with in my trade.

If you are familiar with version control but don't know a whole lot about branching or merging (like many developers) you might find this useful. It is commonly accepted to set up your version control repository with the following structure:

trunk
branches
tags

These names are rather poor, but you should think of them like this:

trunk: merged in accepted stuff for the next planned release
branches: developer and version branches
branches/john: john's branch
branches/bob: bob's branch
branches/1.0: 1.0 branch
branches/1.1: 1.1 branch
branches/2.0: 2.0 branch
tags: previous releases; should not be altered, ever
tags/1.0.0: release 1.0.0
tags/1.0.1: release 1.0.1
tags/1.1.0: release 1.1.0

and so on.

The typical process of development is this:

A simple plan is made for new features and bug fixes to put in the next release. These features are then delegated amongst the developers. Each developer goes in his branch and works on it, committing away freely, and when his code is reasonably well self-tested he merges it into the trunk. Whenever new stuff is merged into the trunk that he wants, he can just merge it into his branch.

When trunk has been well tested it is copied as a version branch (e.g., 2.0) and as a release branch (e.g., 2.0.0). Any emergency or trivial fixes to the release need to be done in the version branch (2.0); the version branch must then be immediately merged into trunk and all developer branches (john and bob), and finally must be copied over as a new release branch.

This is branching in a nutshell. Some alternatives exist. Instead of naming developer branches after coders they could be named after particular features being worked on (although this requires excessive branch copying or renaming in my opinion). And an alternative to using developer branches in the central repository is using decentralized version control.

In decentralized version control, everyone has a copy of the entire repository on their machine. They occasionally sync their copy with the central one and can pull changes from it and push changes to it whenever they desire. This effectively makes each person have their own personal branch even though they all get to work directly off the trunk. It has the additional benefits of letting users commit while they do not have access to the internet, which is more frequent than one might think. Lastly, it removes one more layer of tedious merging.

Now, there are two kinds of web developing. There is web developing such that you release your code to people in versions, such as phpBB or Wordpress; and then there is web developing where you have a single site completely under your control, such as Google, Digg, etc. The latter type has a nice benefit, and that is that we don't have to ever make formal releases, nor do we have to support old releases in any way.

Personally, I find the above scenario quite unnecessarily complicated for the latter type of web developing. What I've found is that we only really need two branches if we use decentralized version control and have a singleton product: one branch for adding changes to your next planned internal release (which can be so small as to include just one new feature), and one which represents the currently live version, where emergency and trivial changes can be made and then merged into the former. I personally name these branches "current" and "stable" due to what I see as precedent in FreeBSD, but "development" and "production" would be just as well, or simply "dev" and "live" if you're not into needlessly multi-syllabic words.

current------------------------------------------------------->
        |   /|\   /|\   /|\                            |   /|\
        |    |     |     |                             |    |
       \|/   |     |     |                            \|/   |
stable ------------------------------------------------------->

This delightful ASCII-art graph shows the order of how merges are commonly to be done. A new set of features is merged into stable, some trivial or emergency fixes are made in stable and merged immediately into current, and then eventually a new set of features and larger fixes is merged into stable.

Well then, that's done. I hope someone out there found this useful.

Posted in  | 4 comments | no trackbacks

Ok

Posted by Joel Fri, 05 May 2006 17:23:00 GMT

Yesterday wasn't that exciting. Cereal, sandwich. Water, coffee, diet coke. Laundry. Sitting at computer. Writing down thoughts about my project. Wondering why I haven't heard back from that interview yet. Thinking about looking around for more contracts.

I did a little bit by way of programming. Inspired by that AnySite thing, I created a Ruby script that scrapes a feedless page and creates a feed for it, specifically for Perry Bible Fellowship. I used the htmltools and feedtools libraries this time. Was neat to learn them. Here's the code: pbf.rb

I'm the same weight as yesterday. I have no choice but to start getting more aggressive and that means regular exercise and less cheating. The sooner I've gotten fit the sooner I can get back to eating like a regular person without guilt. I've had success with just dieting in the past but it doesn't seem to be enough this time.

I'm not going to be some chubby fat guy the rest of my life. I'm not going to do this whole thing where I feel embarrassed all the time, feel guilty over enjoying food, and am deathly afraid of taking off my shirt at the beach or pool. This ends now.

Posted in ,  | no comments | no trackbacks

Toooosday

Posted by Joel Wed, 26 Apr 2006 16:41:00 GMT

I went to repark my car but got there too late :( I decided to remake my resume, and am now fairly satisfied with it. I applied for big jobs like Google and Apple again out of purist optimism, and then for a couple smaller ones, but we'll see if I take any, as I'm mainly looking into continuing the consulting lifestyle for now.

I spent a few hours going over design things in chats and text files for my project, but then Carl came over to help move some of Katie's stuff into the apartment. Then we all went to Magnolia and had a great time. At Magnolia I got a call from a client and ended up putting in some late night work once I got home.

Also, screw splitting up posts by category. RSSPECT is really cool. I didn't like the main premise, but the AnySite thing is really damned cool. RSSPECT is made doubly cool by the fact that it is made by the same man who makes those delightful Dinosaur Comics, one of the few comics I actually read daily.

Look! An RSS feed for Perry Bible Fellowship! And Bob the Angry Flower!

Posted in ,  | no comments | no trackbacks

Junk Food and Progress

Posted by Joel Sun, 23 Apr 2006 18:29:00 GMT

Yesterday I didn't go out at all really besides to get some Eddie's. Eventually in the afternoon I did my laundry. David ordered some pizza and we had that for dinner. We were just relaxing, but then Katie got upset at us for watching the Godfather during the spousal abuse scene, which just made everyone kind of awkwardly retreat into their own thing.

So I worked on my project and I did a lot. I worked out a lot of the details of the design that had been bugging me, made some crucial decisions about what to keep and what to ditch, and got some real implementation done.

My project uses lots of bi-directional self-referential multiple associations. What that means is that it has normal trees, but they allow for multiple parents as well. Due to the complexity of some, I have some implemented as a habtm, and others implemented as a has_many :through. I was surprised to find this morning, after having implemented a has_many :through solution which worked greatly, an article about it in one of the Rails blogs I read. We had implemented it exactly the same way, too, even with the "edge_as_association" terminology. Truly a case of simultaneous discovery. And here I thought I was so smart. Oh well, at least I got to add a useful comment.

Posted in ,  | no comments | no trackbacks

Unproductive

Posted by Joel Sat, 22 Apr 2006 19:16:00 GMT

Yesterday was marked by a staggering lack of productivity. The only thing which could even slightly be considered productive is finally switching to Adium over iChat. Once you install your message style of choice it becomes quite pleasant to use. The Growl support is great, and the minimalist contact list options are nice, although I tend to hide the contact list when not in use.

Anyway... I got up late, for some reason, slept through my alarm. Didn't do much of anything. Watched the new South Park, which was kind of gross.

David told me that my living situation was going to get shaken up again. Katie is moving in since she can't find a place yet, which I don't really have a problem with. But he said that come June or later he might be planning on moving out with Katie, and he was giving me ample notice to find another roommate or other living arrangements. I can't say this really screws things up for me though as I'm so stress-free and unattached to this place at the moment.

A while later we went to Alameda to meet with some friends. Another Italian restaurant (I just had steak and a beer this time), and we went to Carl's house. He was intent on bringing us on a dive bar experience, so we ended up all driving out and going to some place called the Pop Inn.

It was definitely a miserable little bar, with a bunch of old guys and waitresses sitting on their stools laughing with one another or being sullen by themselves. Only Bud was on tap. After some initial discomfort, I began enjoying the adventure out of our demographic.

The girls complained enough so we left and went to a more appropriate bar for our age group and had an ok but unmemorable time there.

Posted in ,  | no comments | no trackbacks

Frustration and svk

Posted by Joel Fri, 21 Apr 2006 19:29:00 GMT

Yesterday I got some contract work done, had Eddie's, tried to work on my project for a bit, and went out to dinner at an Italian place in Lower Hayes. Altogether the day was good, but I felt sort of awkward and boring. I have been concerning over my appearance again, which means I need to really do something about it before I start obsessing over it. Hopefully I will find the courage to venture out and get a haircut and some new clothes one of these days.

I was a bit frustrated by the difficulty of designing my project and didn't make a whole lot of progress besides more rambling in text files, which has been my main way of brainstorming. I wish I could reveal more about my project here, but secrecy is important in any business, I've decided. Odds are that if you're reading this blog, I've shown it to you anyway.

I did however switch to svk, a decentralized version of Subversion. It was really easy to switch because you only really have to change your client setup. But already I can tell it is much better and recommend it. For multi-person groups especially it would make things easier, as everyone is always in their own local branch. Which also means you can work offline, which is comfy. It doesn't even require ".svn" directories. And yes, there is an svk bundle for TextMate.

Posted in ,  | no comments | no trackbacks