The Importance of Refactoring Your Database Schema

Programming an application is a pretty daunting task.  There are similarities between all applications but significant differences that require a programmers to work together.  The more I work on “rescue” style projects or help out in the open source world, the more I think that programmers need to sit down and think through the database design.

When it comes to designing a good schema you need to know most of the requirements.  I say most because trying to know all will lead you astray, this is software development after all.  For my Trail Status project I kept the schema small, but when the project starts to grow it can easily adapt into a more well rounded schema design.  Here are the project requirements for data to be stored:

  • Trail state (open/closed)
  • Coordinates to trail
  • Name
  • Translation (if translated from phone call)
  • URL slug

If the table grew to more trails without adapting or refactoring the schema you would be setting yourself up for failure.  What happens when there are more states like permanently closed, open soon, or under maintenance? If I let the data just grow without refactoring the schema I would be adding unnecessary code to try and manage my mistake.

Here is your challenge…

The next time you work on anything: pick a table and try to find a way that the schema design can be refactored.  Kudos if your schema is up to snuff! I know most out there need a good cleansing.

  • Aaron Saray

    I would challenge this a bit and say it really has nothing to do with ‘requirements’ and more with relationships (and yes this is even for non-relational databases). So, what are relationships? Let’s dig in deeper.

    First: What business object relates to another. But then again, we can make inferences from not even knowing the requirements. If I even remotely know what a model is, I can make some guesses on the relationships. So, a trail – well damn straight it relates to a runner. I don’t know if runners own them, if they run them, or whatever – but they’re made for runners. And chances are there are one to many relationships to individual runners. So, while that doesn’t necessarily allow me to make my final database, it helps me plan – which I think is the point you’re trying to make, yes?

    Second: What relationship does this thing have to existing. You mentioned open, closed, etc… I’d extend that even further – alive, dead. Available or not available. Priority or not priority? In the end, the requirement might be to only make things a certain priority when a certain workflow happens – but – we don’t need to know that to start… just that there is a priority most likely.

    Good points sir. Keep it up.

    • Thanks for the comments! The relationships between objects is always interesting to think about and I agree that it does have more to do with relationships. With the CMS at LPi it is becoming more and more clear.

      It’s interesting though because when I wrote this article I knew very little of Schema.org. Now I try to reference it as much as possible due to the focus on linked data.

      Thanks again for the comments!