Keeping Flexible on Database Design

One thing that writing for Google App Engine is teaching me is that you have to keep flexible on your database design.

Google App Engine doesn’t like joins, so you have to avoid them. However, there isn’t just one technique you’ll use to avoid joins in every situation. Which technique you use depends on how you’re using the tables.

For example, in one situation I talked about already, denormalizing by duplicating a field of data between tables was the right way to avoid a join just to get the name of a contest. In another situation, using the list:reference field type allowed me to avoid a join.

Now, I’m looking to get a list of all the people involved in a contest (the list should include their member id and their name). This list will include administrators for the contest, contestants, and judges. Since this list is retrieved from the database every time someone views the contest page, I want it to be quick, so I’m giving some thought as to the best way to optimize it.

For the list of contestants, I think denormalizing makes sense. I can have a model that maps between contest and member, and duplicate the member name in that model. Then it’s a simple select from that model using the contest id to get all the contestants. The list of judges, I think, should work the same way.

The list of admins, though, I’m confused about.

If I’d had a separate contest admin model, I’d start with their user id and look up all the rows from contest admin that match. That would have given me contest ids, and to display the contest details on their page, I’d either have to do separate lookups (effectively a join), or duplicate the contest info in the contest admin table.

So I used the list:reference field type to avoid denormalizing and duplicating all that data. This works well because it’s efficient in Google App Engine.

But now I’ve got a page that’s coming from the other direction…the main page of the contest should list the names of the administrators. So I’m starting with the contest id, and have to get to the member names of the administrators. That means looking up the contest info to get the list of admins, and then doing a separate select for each to get their names.

Since I’m needing to go in both directions (from admin id to contest info, and from contest id to admin info), denormalizing to avoid multiple selects would require me to duplicate almost all the contest info in the contest admin model, and to duplicate the admin name in the contest admin model.

I’m not crazy about that level of data duplication. But avoiding the duplication means I need to do multiple selects. The computer scientist in me says to do multiple selects on the page that is accessed less often, and duplicate data to avoid multiple selects on the page that is accessed more often.

The contest page is accessed more often, so I’d need to add in a contest admin model and duplicate the admin’s name in it. The page for an admin to view their contests is accessed less often, so I’d simply do the multiple selects there to get the contest information. If that ends up making things too slow, then I’d have to duplicate more data to avoid multiple selects on the admin pages, too.

This is all a bit of a learning curve for me, figuring out how to best work with Google App Engine. The point of this post is that if you’re in a similar situation and developing your first Google App Engine application, expect to have to change your database design around quite a bit during development.

This entry was posted in Web2py. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *