An article about company per DB vs in table

seanh · August 26, 2022, 10:55am

An article about company per DB vs in table I found interesting on stackoverflow

Multi-tenant solutions range from one database per tenant (shared nothing) to one row per tenant (shared everything).

"Shared nothing", “separate database”, or one database per tenant

Most expensive per client. (Large numbers of clients imply large numbers of servers.)
Highest degree of data isolation.
Disaster recovery for a single tenant is simple and straightforward.
Maintenance is theoretically harder, because changes need to be carried out in every database. But your dbms might easily support running stored procedures in each database. (SQL Server has an undocumented system stored procedure, sp_msforeachdb, for example. You can probably write your own.)
“Shared nothing” is the most easily customizable, too, but that also raises more maintenance issues. For example, each tenant might have a different pattern of usage, which suggests each tenant might need some indexes that other tenants wouldn’t. That’s trivial to do with “shared nothing”; impossible with “shared everything” (below).
Lowest number of rows per table. Querying speed is near optimal.

"Shared everything", or “shared schema”, or "one database per planet"

Least expensive per tenant.
Lowest degree of data isolation. Every table has a column that identifies which tenant a row belongs to. Since tenant rows are mixed in every table, it’s relatively simple to accidentally expose other tenant’s data.
Disaster recovery for a single tenant is relatively complicated; you have to restore individual rows in many tables. On the other hand, a single-tenant disaster is relatively unusual. Most disasters will probably affect all tenants.
Structural maintenance is simpler, given that all tenants share the tables. It increases the communication load, though, because you have to communicate and coordinate each change with every tenant. It’s not easily customizable.
Highest number of rows per table. Quick querying is harder, but it depends on how many tenants and how many rows. You could easily tip over into VLDB territory.

Between “shared nothing” and “shared everything” is “shared schema”.

"Shared schema"

Tenants share a database, but each tenant has it’s own named schema. Cost falls between “shared nothing” and “shared everything”; big systems typically need fewer servers than “shared nothing”, more servers than “shared everything”.
Much better isolation than “shared everything”. Not quite as much isolation as “shared nothing”. (You can GRANT and REVOKE permissions on schemas.)
Disaster recovery for a single tenant require restoring one of many schemas. This is either relatively easy or fairly hard, depending on your dbms.
Maintenance is easier than “shared nothing”; not as easy as “shared everything”. It’s relatively simple to write a stored procedure that will execute in each schema in a database. It’s easier to share common tables among tenants than with “shared nothing”.
Usually more active tenants per server than “shared nothing”, which means they share (degrade) more resources. But not as bad as “shared everything”.

Microsoft has a good article on multi-tenant architecture with more details. (The link is to just one page of a multi-page document. Microsoft has since removed that page; the link is now to a copy in archive.org.)

FedericoNavarro · August 26, 2022, 2:41pm

This issue is becoming more and more current facilitated by the trend towards the cloud.

The scalability of the shared database makes it very beneficial for those who want to provide Software as a Service. Beyond the typical scenarios where it is the most natural design, as in the case of banks and public service companies, in general for those with many clients, it can also be a strategic design for those who start with few clients but aim to grow.

But as mentioned in Sean’s featured article: "Since tenant rows are mixed in every table, it’s relatively simple to accidentally expose other tenant’s data."

Each database query must contain expressions to filter only the rows belonging to the current company, added to the filters specific to the particular query.

Therefore, a great deal of care is required in programming every Browse/Process/Report, every hand-coded access to tables, and every custom query in PROP:SQL. And in a datacentric language like Clarion this is completely comprehensive. Each developer on the team must be fully aware of the scope of the risks this represents: potential leaks of sensitive data, potential damage in bulk updates, etc.

This necessary care also extends to 3rd party products and internal Clarion components.

It was recently posted in the comp.lang.clarion group (Topic: Prevent filling listbox when error “Filter and Range Limits ignored”) on 2022-04-29 by Koen Tjoa, where basically adding a long filter (more than 5000 characters in total) on a browse caused the entire filter to be ignored, including the one that identified the company’s rows, and its consequence: exposure of information from others.

The cause of the problem and possible workarounds were identified, and PTSS were reported: PTSS 43305 Prevent filling list box when error “Filter and Range Limits ignored” instead of ignoring filters, and 43307 ViewManager.ApplyFilter crops large filters and may undisclose information. Last updated 2022-05-04 with “Forwarded to Development”.

I shared other suggestions in that thread:

"…
I think that further more than solving this specific problem, you may need a way to “ensure” that “every” TSQL query sent to the DB has the company filter attached.

Whereas it theroretically possible with an SQL CallBack to interpose to them, parsing and modifying them on the right way would be such a complex task, and error prone with unpredictable combinations.

I think the way do it is to delegate all that on the database engine.
One obvious way would be using diferent schemas or databases, but as you already have a working design, you can research on Row Level Security:

…"

and

"…
Anyway it could worth taking a look on the link passed about Row Level Security that could be the ultimate solution. This varies with the software architecture of your app, a web app or desktop accessed by Remote Desktop Protocol using an unprivegied user on an up to date server would be different than having a server accessed through a VPN from customers controlled machines running your app locally. For example on this last case, a filter done “client side” would mean all data travelling to the customer machine.
"

JonW · August 30, 2022, 11:22am

I think if I was doing a multi-tenant thing where the tables were shared, I would set it up something like this…

The backend has the tables, like say TRANSACTION, with the company column. You set up a schema for each company which has views on the base tables, with the same name.
create view transaction as
select *
from main.transaction
where company = ‘MyCompany’;

Clarion logs onto the company schema and can see only the company “tables”, so cannot ever see the data from any other company. That’s assuming there’s a large number of tables to provide the comany level solution. For something like bank accounts, I’d probably do fine-grained access control (the row-level security thing).

Bruce · August 31, 2022, 6:26am

I’m a “one database per tenant” guy myself. Makes everything easier, and more secure.
Since cost is a function of servers, not databases, I’m not sure I buy the cost issue either.
And if cost is an overriding concern, use a free database engine like Postgres.

I would disagree with this though - I think most data problems are single-tenant, not multi-tenant.
On the other hand the potential to make a multi-tenant disaster goes up a lot when you have a multi-tenant system.

But ultimately pick which ever you like. I’m a simple guy, so I prefer simple…

anon23294430 · August 31, 2022, 10:28am

I agree with Bruce because you have redundancy when a communication link with the central database or other databases go down and the synchronisation process fails.

This has been the weakness of mainframes which served entire countries, the communication link going down. And its also the design of the internet, local independence when the internet goes down.

Now admittedly there is better redundancy with comms links today, think mobile phone data communication and cabled communication (Ethernet, ISDN, ADSL, SDSL, Fibre), so then it just becomes a case of what redundancy is the best.

When I read the post mortem analysis of Hurricane Katrina, the mobile phone companies did not all have backup diesel generators next to the masts, which meant unlike here in the UK, the mobile phone companies didnt go around to the diesel generators and ensure they were filled up with diesel and tested before the storm, and even when they did, they didnt go around and fill them all up with diesel before the storm. Ergo the US mobile phone masts failed very quickly once the Hurricane hit land.

When looking at the use of radio frequency, in particular the 5G network in the US and in Europe and how it affects airports, its clear Europe has thought more on the allocation of radio frequency spectrum because the US was still sorting out how 5G is supposed to work, over Xmas just gone!

So I’d always go onsite db with the ability to synchronise with other databases at other sites, bearing in mind the redundancy that is needed when problems occur. It makes it more complicated but thats what we get paid for imo.

I used to do the contingency planning for Tesco (UK’s largest supermarket chain) in the 90’s and run through different scenarios, like no food imports into the country, a mainframe going down in a distribution depot, national/regional strikes, natural disasters, etc etc. As a planning exercise its quite interesting because in the case of no food imports, the UK only has 7 days of food in the supply chain before all hell breaks loose!!! Now for personal planning a chest freezer is best because in the event of powercut, the top opening lid keeps the cold in, these more popular door opening fridge freezers let the cold out ergo food spoils more quickly in the event of a power cut. The fact these are more populat tell you that people dont plan ahead! And the Military has first claim to all food before the public if they need to be deployed, I saw this with orders coming in during the Balkans conflict when the military would put orders in to the distribution depots and they had to be picked first and whatever was left, was distributed according to an algorithm to the stores. This is one of the reasons why stores could be low on something when you go shopping and there is no apparent supply chain issue.

You would be surprised at how little redundancy there is, even at the national level, you cant rely on Govt and even the military mess up, I hear what goes on USAF/RAF bases here and they are fire fighting just as much as your stock market listed companies and your little independent businesses!