Re: B-trees

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sun, 3 Aug 2008 01:08:47 -0700 (PDT)

Message-ID:

<345093ab-bd86-40a4-bdb6-d7ef9f4e992a@t54g2000hsg.googlegroups.com>

On Aug 3, 2:43 am, Erik Wikstr=F6m <Erik-wikst...@telia.com> wrote:

On 2008-08-03 00:02, rsprawls wrote:

I found a disk for a b-tree algorithm that I purchased back
in 93 or so. I'd hoped to find this, but now I'd like to
know how worthwhile are b-trees in today's advancements?
This is old C code using pascal and cdecl declarations, but
it would be a minor project to bring up to date in C, but I
was thinking it might be a good project to convert it over
to C++, especially since I don't know C++ very well and
would like a good sorting algorithm for medium sized
datafiles.

Are b-trees still a valued data structure today? It's been
so long since I set foot in programming, I'm itching to
start again, but I'm wanting a starting point.

Not really topical for this group, you might want to ask in a
general programming group like comp.programming.

AFAIK B-Trees or one of its derivatives are used in most
modern filesystems, since the time it takes to read in a node
from the disk is so high you want to have as few reads as
possible.

I'm not sure what you're considering a B-Tree here. I don't
think that the OS does anything special with directories,
however: the general philosophy is that if your directory
structure is well organized, no individual directory will
contain enough entries to make anything other than linear search
worth the effort. Also, the main use of B-Trees is to keep
sorted data; the OS's I know don't maintain the directory
entries sorted, so a B-Tree wouldn't improve access in any way.

I would also suspect that they are commonly used in databases
and other places which deals with on-disk storage.

In a relational database, it probably depends on the types of
indexes, and the use which is made of them. I think modern
databases dynamically tune the organization according to the
actual accesses they see, which means that if sorted access
isn't useful, they won't use a B-Tree. (Sorted access is useful
not only when you request sorted output, but also when you have
index criteria like "where x > a and x < b". On the other hand,
a hash table is generally better for "where x = a", and off
hand, I can't think of any "optimizing" structure for things
like "where x like '...%'"---I've also noticed that when I use
such requests, my access times go up significantly:-).)

If the data structure is in RAM other structures are generally
better.

That used to be true, but I'm not sure now, with paging, etc.
If you need random access to sorted data, using a B-Tree with
each node filling an entire page (and page aligned) might be
worth it (but you'd need one awfully big table to see the
difference).

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34