Dynamic MySQL tables , awful idea?

zig · 13-07-2017 08:00PM #1

I am building a data tracking application. This application will be sold to a wide variety of clients. All with their own needs and demands.

The idea of the application is that the business can have an account, create a project, create their own forms with their own fields, and create many rules around these fields. Within each form they may begin to track 1000s of items per day. They may have 20 or 30 forms within a project. They will be running various types of reports with this data too.

The way I have built it for now (as I am still teasing out the best ideas) is that when the user has finished building the form on the front end it then adds this data a field table in the back end that contains rows with all the different fields and their information like name, field type (e.g. text, textarea , etc).

But it also generates its own table exclusive to the form with its own custom columns. The purpose of this newly created table is to track the actual data the form will track.

Everywhere I have read has said this is an awful awful idea except for very rare cases. I wonder is this such a rare case?

But to be honest, I quite like the solution, it means data can be easily tracked to a single table for the specific form the customer has with its own unique columns and its own set of rules.

But more importantly if I was to build one giant table , or some sort of set of many-to-many related tables to collect all of the data from every single client with all their own rules (using lets say meta keys , ids etc) I cant even imagine how large and complicated that table would get after a few months, when taking 1000s by 20/day only for 1 client. Start adding more clients and it gets worse.

The reports would be a nightmare too.

My other options are:

Non Relational database: I am really not keen on this because a: the learning curve for me. I have used them and found them a little frustrating at times.

Rebuilding an instance of the app for every client: Really trying to avoid this for maintenance/control reasons.

Each client gets their own database: Well , I cant see how this solution is much better than the idea I am proposing anyway in terms of good practice. things would still need to be done dynamically.

Any advice would be great.

counterpointaud · 13-07-2017 08:45PM

If it's a relational database, it will probably be split across multiple tables anyway, no? MySQL can handle millions of rows, no problem. I am not a db expert but what you are suggesting does not sound like a good idea to me.

Also unsure how MySQL engine would handle having 1000s of tables (assuming you will have 1000s of user accounts)

zig · 13-07-2017 09:31PM

Hmmm, good point.

Maybe Im being too fearful of the whole millions of rows issue.

There won't be 1000s of tables. There would maybe be 25 tables/forms per client but it will be a long time (if ever) before there are 40 clients. If there are more that will be a good problem I guess.

That said even then you're probably right. It seems like a nice short term solution that get could very messy very quickly.

That said I do feel the same about having a table with keys (that would represent the fields) and a single value for each entry.

Just to put it in context....

Lets say someone creates a form with 10 fields and there are 1000 entries into that form in a day. Using my idea of creating a table for that form, that is 1000 entries that day, all with data easily associated with its columns.

Using the other method it will be 10 entries for every form submission. So in a day that will be 10,000. Multiply that by 20 (lets say they have 20 forms). That's 200,000 into a table in a day.

counterpointaud · 13-07-2017 09:49PM

zig wrote: »

Using the other method it will be 10 entries for every form submission. So in a day that will be 10,000. Multiply that by 20 (lets say they have 20 forms). That's 20,000 into a table in a day.

If you really don't know ahead of time what the schema (i.e. form) will look like (because the customer is creating it on the fly) it sounds like a NoSQL approach might be better for you.

Singer · 13-07-2017 09:53PM

This could be implemented using MySQL's document/JSON storage engine, which would suit processing unstructured data, but I wouldn't particularly recommend it over the likes of MongoDB, which is far more mature in this area.

However I don't think it's a too difficult schema to devise in MySQL that would allow this to work in a small number of tables shared by all forms/clients. You could have a table per column type, which are associated with a row, which are associated with forms, which are associated with projects that are associated with clients. With good indexes this should work well, until you're dealing with huge amounts of data.

Tom Dunne · 14-07-2017 08:38AM

The key issues are testing, performance and support.

When (not if) something goes wrong - how are you going to offer support? Bug checking/testing - how robust will your testing mechanisms be if you cannot possibly test every permutation? Performance - how can a database that is continually changing (structurally) be tuned? This has implications for scale - it might work, with reasonable performance, at low volume, but what happens when you want to scale it up?

zig · 14-07-2017 09:48AM

Thanks all,

One of the reasons Im avoiding MongoDB is because the tables within a single project are related but those relations will be determined by rules set up. Also for other reasons. I have used it before just for the sake of using it and found my lack of experience with it could be an issue in the future.

Just to note, there will be a set schema outside of the data itself the forms receive. So almost like two parts to this, 1 for the functionaliy of the application itself (e.g. managing accounts, projects, fields, forms, rules and other bits) and 2 the data itself that each form will record. I am not ruling out a second database for this data.

Regarding fine tuning the tables in the future. Well the main schema would be fine, but yes the dynamic tables would possibly be an issue alright.

90% of the time they will be VARCHAR text fields recording the likes of part numbers and serial numbers so having a table for each column type might be pointless. MySQL's document/JSON storage engine might be a good approach alright. Its a sort of compromise between noSQL and MySQL.

zig · 14-07-2017 09:50AM

Singer wrote: »

However I don't think it's a too difficult schema to devise in MySQL that would allow this to work in a small number of tables shared by all forms/clients. You could have a table per column type, which are associated with a row, which are associated with forms, which are associated with projects that are associated with clients. With good indexes this should work well, until you're dealing with huge amounts of data.

Just curious as to what you would consider a huge amount of data?

I think this is where Im falling down really. I may be underestimating the capabilties of MySQL with millions of records in a single table.

Singer · 14-07-2017 11:28AM

zig wrote: »

Just curious as to what you would consider a huge amount of data?

I think this is where Im falling down really. I may be underestimating the capabilties of MySQL with millions of records in a single table.

Where I work we have very busy tables with billions of rows. It works fine, with some well tuned indexes and on relatively large hardware. MySQL is good at this stuff, I wouldn't choose complexity (large numbers of tables, or using new fangled document storage systems) until you've built something simple and try to make it scale.

colm_c · 14-07-2017 06:32PM

You would be better to split concerns, use an RMDB for relational data (accounts, clients etc) and use a no SQL / JSON store for these arbitrary form data.

This will give you better options.

Personally I would probably use Postgres which has both RMDB and JSON storage.

beauf · 14-07-2017 11:02PM

zig wrote: »

...
Everywhere I have read has said this is an awful awful idea except for very rare cases. I wonder is this such a rare case?...

I worked on one project that did this with MS SQL. A good many years ago like 15yrs so perhaps it would be different now with regard to performance.

The DB performance was really bad.
DB Development took a lot longer
The DB wasn't immediately readable for anyone.
Application Development took a lot longer.
Trouble shooting and support took longer.

I would be beyond my skill-set to know what it would be useful for.
Rapid prototyping of very different databases?

The intent on the project I was on was to reduce the amount of changes the DBA had to make on the DB. But really it wasn't taking them that much time anyway. Everyone other than the DBA hated it. It only came about because the DBAs weren't under the main development teams control. So they ran riot. I think in the end they left and a regular database replaced it. Can't remember. I remember huge arguments about the performance.

beauf · 14-07-2017 11:19PM

I like the idea of NoSQL/Hybrids. But I'm a db noob.

zig · 17-07-2017 10:40AM

Thanks for all your help! Ive taken on board what you are saying and have decided to stick to a set schema.

I am going to have a table with id, form id, project id, field name, field value etc

I was considering the idea of perhaps a new table for every client because a client isnt something that will just appear on the fly so I wouldnt necessarily call that dynamic. The product will be specifically sold rather than just going to the general public.

One thing I didnt point out though which I should have really. Every time a field is filled there is a query to the database, I would need instant responses. Do you think even with 100s of 1000s of rows or even millions that that would be an issue (assuming its well indexed and other stuff like fast server etc)

zig · 17-07-2017 11:43AM

OSI wrote: »

What do you mean by "a query is made to the database"? It really depends on what the application is doing, if there are a lot of writes going on then it's very possible that your query will be blocked by table locks until the other queries are completed.

The application basically tracks parts coming through a manufacturing process. Each form might be a different station in the process.

The most important thing about this is that all the forms within a company are dependant on each other. The company can add many different rules based on the kind of business they have (our focus however is only on a particular industry where we have alot of connections/experience in)

So a typical query to the database when a field is scanned might be something like

"Does the serial number contain XXX characters?", "Does the serial number have XXX value in field XXX in Station XXX?"

Basically a rules engine will be built and each field may or may not have rules to check before it allows data to be entered and carried on to the next field. This is why speed of the queries is absolutely essential. A person can't be entering data and waiting for 3 or 4 seconds to see if it fits the criteria or not.

I assumed targeting a specific table (station) would allow this in an efficient way. But having a table per client should be enough based on what you guys are saying.

Do you think this affects my question?

colm_c · 17-07-2017 01:02PM

Thats quite a different proposition.

I would think about using an existing rules framework / db structure.

http://www.databaseanswers.org/data_models/rules_engines/index.htm

zig · 17-07-2017 02:31PM

Well to be honest, the rules themselves is a seperate issue/conversation. That said ceers for the link , its very useful. It will still have to query the data that will be entered in by the operators so its really how to store that data that makes it easy and fast to query is my main concern for now.

beauf · 17-07-2017 02:42PM

You'd have to consider the security of your database also and protect against SQL injection and such.

You might consider doing a short course if this is all new to you.

zig · 25-07-2017 09:59AM

Thanks again for the help. Having done a lot of digging in the past week or two and trying out different designs I settled with JSON storage inside the MySQL database.

I am using Laravel and it has lovely ways of querying/updating this data within a cell. It is seemless and each key may as well be its own column. This way I get to keep the flexibilty of form building without dynamic tables being generated, and get to keep the relational database.

I had tried the EAV Model (like Wordpress) but something was screaming at me about how complex it was going to be given the amount of data that would be going in all day. Not only would I be adding a ridiculous amount of rows just for one entry but the queries were going to get increasingly complex, not to mention updating any potential keys. I figured it would be just as bad as dynamic tables in terms of maintenance and scalability.

John_Mc · 25-07-2017 10:02AM

zig wrote: »

Thanks again for the help. Having done a lot of digging in the past week or two and trying out different designs I settled with JSON storage inside the MySQL database.

I am using Laravel and it has lovely ways of querying/updating this data within a cell. It is seemless and each key may as well be its own column. This way I get to keep the flexibilty of form building without dynamic tables being generated, and get to keep the relational database.

I had tried the EAV Model (like Wordpress) but something was screaming at me about how complex it was going to be given the amount of data that would be going in all day. Not only would I be adding a ridiculous amount of rows just for one entry but the queries were going to get increasingly complex, not to mention updating any potential keys. I figured it would be just as bad as dynamic tables in terms of maintenance and scalability.

Querying against JSON in a cell will surely be very slow though? It is with XML in MS SQL Server anyway.

If you're using JSON then it makes sense to use a NoSQL solution to persist, and combine it with a relational DB

zig · 25-07-2017 10:50AM

John_Mc wrote: »

Querying against JSON in a cell will surely be very slow though? It is with XML in MS SQL Server anyway.

If you're using JSON then it makes sense to use a NoSQL solution to persist, and combine it with a relational DB

Isnt that not kind of what Im doing though?

https://www.mysql.com/news-and-events/web-seminars/combining-the-power-of-sql-and-nosql-databases-in-mysql/

Or are you talking about seperate databases, e.g. using mongodb alongside mysql? I might look into it but I would be worried about the complexity. Reason being its not just the JSON data that I am querying , it could be something else in that table too, with them both combined e.g.

here is some code where entity_id is its own column, but account_number is a key within the column entry_data

$data = DB::table( [COLOR=#34b434][B]'process_station_data' [/B][/COLOR])
          ->where( [COLOR=#34b434][B]'entity_id'[/B][/COLOR], [COLOR=#34b434][B]'='[/B][/COLOR], $entityID )
          ->where([COLOR=#34b434][B]'entry_data->account_number'[/B][/COLOR], [COLOR=#34b434][B]'test'[/B][/COLOR])
          ->get();

John_Mc · 25-07-2017 10:53AM

zig wrote: »
Isnt that not kind of what Im doing though?

https://www.mysql.com/news-and-events/web-seminars/combining-the-power-of-sql-and-nosql-databases-in-mysql/

Or are you talking about seperate databases, e.g. using mongodb alongside mysql? I might look into it but I would be worried about the complexity. Reason being its not just the JSON data that I am querying , it could be something else in that table too, with them both combined e.g.

here is some code where entity_id is its own column, but account_number is a key within the column entry_data
$data = DB::table( [COLOR=#34b434][B]'process_station_data' [/B][/COLOR])
          ->where( [COLOR=#34b434][B]'entity_id'[/B][/COLOR], [COLOR=#34b434][B]'='[/B][/COLOR], $entityID )
          ->where([COLOR=#34b434][B]'entry_data->account_number'[/B][/COLOR], [COLOR=#34b434][B]'test'[/B][/COLOR])
          ->get();

My bad, didn't know MySQL supported that out of the box. That's cool! Looks like you're sorted so

Dynamic MySQL tables , awful idea?

Comments