So, I am developing a couple of sites at the moment, and they both are presenting me with a similar problem, from the database side of things. Now, I am no DBA and don't ever profess to be, however, I always set up and design my own DBs. Both sites have existing information that is in a database, so I am looking to integrate it for some functionality moving forward.
Heres the thing:
I have a table filled with pesticide information (you know the things you can spray on crops to control weeds/insect/mold) and each pesticide has a column for whether the product is applicable to a certain crop (so, pesticide X is OK on corn, but bad for green beans) however the architect of the database set up a column for each crop. This table is HUGE must be about 50 columns of crops, and most are empty....
So, in my thought process, I planned on doing a little normalization and streamlining of the data. Take the pesticides and place them in a table separate from what they can be applied on, and then in the OK to apply table, I would just use the ID and then a list to show which crops were OK to apply on.
Now, from my perspective this seems like a good solution, however I think that there is one small holdup. (I am not too concerned about it cause I don't think it will happen, and if it does, its' not my problem, basically, but I am trying to be a good guy). ColdFusion has a ton of ways to access, and manipulate lists, so I can easily check if a crop is in a list for a product etc (although I think that this methodology might also get me in trouble, as I identify what is applicable to a given crop) however, not everything has these features.
From an SQL standpoint, what is the issue (if any) of storing information in lists? or am I better off to set up a table that places the crops in their own cell for each product?
Currently the table would be something like this:
Product | Crops |
123 | corn,green beans,asparagus |
Is this better?
Product | Crop |
123 | corn |
123 | green beans |
I am curious as to what people think is the best way to handle this data... what is the more efficient methodology?
If you are going to do data normalization then why not go further and set up a crop table with and ID and a pesticide table with an id and then link them in a crop_pesticide (many to many) table to work with?
ReplyDeleteAfter all what happens when your crop gets a corn hybrid that needs less pesticides than normal corn? Will you add it manually to each of the lists or simply create the item in the crop table and then link it to the many to many table?
@Sandra,
ReplyDeleteThat will make it more efficient as crops are added to the system. I am also trying to plan out a nice UI for the users as well with check boxes etc for what pesticide can be applied to which crop.
Thanks!
Rob
Storing lists will cause lots of issues to code around later. Sandra's suggestion on normalization will take a bit more work for you up front but will save lots of work and frustration in other areas.
ReplyDeleteYou do NOT want to create a list of data as in your first example. Each crop should have its own entry. Treating a database table as a tokenized string eliminates most (all?) of the benefits that can be provided by the database engine's indexing system. If you have just pesticide id and crop and index both, then you can do very quick lookups on crop type. If you have to do a select * from crops and split all the text values, you'll be incurring huge data transit times/volumes and processing overhead. Database engines are optimized for the kind of querying you most likely want to do, let them do what they do best.
ReplyDeleteMany-to-Many is the way to go... Don't wanna code up the data layer? use CF9 + Hibernate! :)
ReplyDelete@Henry,
ReplyDeleteUnfortunately this project will not see CF9, at least not right away.
@All,
Thanks, I was leaning that way anyways, but thought I would 'poll the crowd'
Cheers!
I certainly agree with the comments about the trouble with lists.
ReplyDeleteWith apologies for the plug, my (free) DataMgr component set should make this easy without the drawbacks.
http://www.bryantwebconsulting.com/docs/datamgr/save-many-to-many-relationships.cfm