How does MSSQL organize Record Data on disk in the database?

In the Introducing Clarion FileManager topic I posed the below question. I’ll split this out into a separate new topic.

The simplest is SET( File ) which processes in file record order, so nothing meaningful. I think of this as the most efficient way to read all the records, but I have no knowledge that IS the most efficient . You would have to understand how the specific DB stores records. I would guess most SQL store records by Primary Key so SET( File:PrimaryKey ) may be most efficient. Would be interesting if anyone knows?

I would say my guess was ok, that in most cases the Primary Key is a “Clustered Index” so the Record Data is physically stored on disk in the Leaves of the PK Index.

It is possible for a file to use a HEAP to store the Record Data and then the PK has a Pointer into the Heap of Record Data. All other (non-clustered ) Index have Pointer instead of the Primary Key fields that would be needed to read the PK.

Curious what Trace would show for a SET(File)?

MS SQL stores records by CLUSTERED index, which may or may not be the primary key. Tables that don’t have a clustered index are stored as “heaps”.


So ELI5 what is a clustered index and heaps?

This is all MSSQL info. I do not know how other DBMS work.
A clustered index is the physical order that SQL keeps the rows in a table. Whatever columns are in the clustered index are used to put the rows into a series of 8KB pages where the rows are sorted by the clustered index. If a row is inserted into the middle of the existing clustered index, then rows are shuffled so the new row is in the proper physical order. This is not like a non-clustered index. The columns in a non-clustered index are stored in their own pages in order of the index, with a reference back to the page in the clustered index where the whole record resided. Think of the clustered index as being the table itself.
It is very common that the primary key and the clustered key are the same thing, but it doesn’t have to be.

A heap is a table without a clustered index. The rows are stored in no particular order.
Here is a free class/video on the concepts. If you are working with MSSQL, it is well worth your time, IMO.

Probably not possible to exactly ELI5. :slight_smile:


If the primary key is the clustered key, then this would be fastest. However, without using BUFFER so you reduce the number of calls to the server to return the result set, it will still be slow with a large number of rows. Using BUFFER with the file or a VIEW will greatly reduce the time required by returning a larger number of rows at a time from the server.


I totally agree. I’ve taken all of his fundamental and master classes.

1 Like

For which his Black Friday sale is extended through the end of December:

Understanding Pages and Extents | Microsoft Learn>

In SQL Server, the page size is 8 KB. This means SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page. This information includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page.

The following table shows the page types used in the data files of a SQL Server database.

Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

Clustered and Nonclustered Indexes Described - SQL Server 2014 | Microsoft Learn

An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.

A table or view can contain the following types of indexes:

  • Clustered
    • Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.
    • The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
  • Nonclustered
    • Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value.
    • The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.
    • You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key limits, 900 bytes and 16 key columns, and execute fully covered, indexed, queries. For more information, see Create Indexes with Included Columns.

So would this be accurate?
1 Data gets stored serially in 8KB pages, serially being in Entry Order.
2 A Clustered Index duplicates & stores the data from step 1 in sorted 8KB pages, sorted by the columns specified in the Index.
3. Identify the Index and read the sorted data (off disk or SQL memory cache) to get the SQL speed advantage, ISAM db’s don’t offer.

The clustered index IS the table/data. The rows are stored in clustered index order in the 8KB pages.

No. Data is not added in Entry Order. Data is added to the correct page based on the clustered index value. If the page is full then a new page is created, and typically both pages are then half full.

1 Like

So is it sorting the pages based on the primary index rather than entry order?

And then does it sort the pages into other index orders and keep those pages stored on disk or just in memory cache?

I’m trying to figure out where SQL server gets its speed from over ISAM, because it appears the SQL pages has a bit more code and thus work to do over ISAM in order to provide its robustness.

I found this answer as to why its 8KB
sql server - Why only 8kb per page? - Database Administrators Stack Exchange

At the SQL Server conference I attended in Sydney in 1999, Richard Waymire said that Microsoft’s intention was to allow a variable page size in SQL Server 2000 but that it turned out to be really difficult, so they put it in the too-hard-basket and went with 8k as a compromise.

I’m guessing Richard Waymire is this chap.
Richard Waymire | AWS Database Blog (
Richard Waymire - Argyle, Texas, United States | Professional Profile | LinkedIn

Terminology matters. A table with a CLUSTERED (not “primary”) index has its data stored IN THE INDEX and IN THE ORDER OF THE INDEX, as Rick said. That’s why one table can only have one clustered index.

A primary key may or may not be the clustered index.

Fastest loading of large amounts of data is often a table with no indexes. The data is stored in a “heap” wherever SQL’s mood leads it. A heap may have nonclustered indexes, but the more indexes a table has the slower the data inserts and updates. But the faster the searching. No free lunch.

Indexes may be stored in different filegroups in disk files on different drives or LUNs. As can some individual columns (I have separate filegroups for BLOBs in some databases.)

A thorough understanding of how the SQL engine works takes more than a couple of wikipedia articles.

Answers to this are in the Brent Ozar blog article Rick posted which is free. It is a nice read, the 16 parts go by fast.

Part 6 talks about how “other index orders” (non-clustered index) work:

Back in the 90’s one of the credit reference agency’s Experien or Equifax here in the UK was using the Clarion dat file to service the country, so Mon-Sat only lookups, and all credit file updates were performed on a Sunday, but like you say, more indexes = faster lookups but slower inserts & updates.

Of course, thats why I tend to post the manufacturer links where possible, like the MS links above, but their documentation isnt always the best, I’ve been known to post links on here to MS Win32 api’s with mistakes.

But I’m also looking at this from a hardware perspective and there is the user base to take into account as well, like number of users, central DB with local and remote users, distributed replicated db’s on multiple sites.

Even the raid configuration can cause differences of opinions.
SSD RAID Load Testing Results from a Dell PowerEdge R720 - Brent Ozar Unlimited®

Some experts claim Raid10 is the best way to go but Brent didnt see it in the link above, Raid 5 was faster for reading.

There are lots of variables to factor in which can alter the SQL performance, and back when it was spin disks, even the placement of partitions like physically on the outside of the disk rather than the inside of the spin disk saw greater performance, which was what MS noticed and implemented back in the XP days, so prefetch data was stored in the outer parts of the spin disk, because they were a lot lot slower back then, and SSD’s et al now change all of that.
What is app launch prefetching and how does it help? - TechNet Articles - United States (English) - TechNet Wiki (

Edit. I will add, in the early 00’s giving an SQL server more RAM outperformed what the type of index was in my experience, but that was SQL 2000 and SQL 2005 so things might have changed a bit since then. :grinning:

Edit2 Also in the early 00’s I was using these quad port nics Intel PRO/1000 MT Server Adapter, pre rack server days with decent managed switches as this also improved performance in busy environments because networks are massive bottlenecks. I went with the Intel quad ports because I’d read they worked direct with the Intel cpu’s, unlike some other nic’s. Obviously rack server’s tend to have quad port nics as standard now a days, but if someone is still using standalone box server’s, getting a quad port nic seriously improves performance with some decent switches and decent cat 6 shielded (FTP) cables. And these are visible performance increases imo, compared to default sql server installations, but again that depends on the type of work the SQL server is doing.

I’ll go through this, I’ve had a look at some of his posts but as I said to Jane, sometimes the hardware can affect performance as well, so getting to understand what is really going on under the hood along with my knowledge of the hardware helps to give an overall understanding of whats going on to get performance up.