Relational databases are the long-term memory of
an organization;
when they are fragmented or hard to access
the organization can appear schizophrenic
to outsiders.
Experience has made clear that the databases
which provide the long term memory of an organization
are themselves very long lived.
It is appropriate therefore in considering their design,
construction,
enhancement, and administration
to consider long term as well as short term requirements.
It is like driving a car: most of our attention should be focused on the
road just ahead,
but we want to keep in mind the road to come as well.
The great problem in trying to design with a eye
on long term requirements
is that we frequently can't be sure what those
requirements will be.
We often have some reasonable ideas
as to what those requirements are likely to be,
but in practice competition, unexpected successes
and failures, changes of strategic direction,
mergers and divestitures,
new laws, and the like
will invalidate many of those ideas.
How are we to get a design which
satisfies today's requirements
while still being able to adapt to
tomorrow's surprises?
The first rule is make the design,
as nearly as possible,
reflect the nouns and verbs used
in the industry.
These generally do not change much with time.
For instance,
if the client talks about "patients" and "visits",
create tables called "patients" and "visits".
Another way to cope with continual change is
to deliberately underbuild:
build just enough structure to accommodate the immediate
requirements,
but don't build what isn't yet needed.
Given the inevitable twists in the road,
many of these proposed structures will turn out not to be
needed in practice.
And even when they are,
experience will often have changed our views on how best to implement them.
Even basically sound initial designs tend to be weakened over time.
Every active database is subject to continuous tuning to
- fix bugs
- add features
- improve performance
- upgrade to more recent version of application, database, operating system, hardware
There is a natural tendency to implement
the necessary adjustments
by adding new tables, new code, or the like.
The problem is that after a few generations of this,
the database and its applications start to become
unmanageably complex.
Multiple copies of what is essentially the same code
come into existence,
and gradually even small changes start to have unexpected
side-effects.
The great rule here is to "think globally, act locally".
That is,
to consider the system as a whole before making any significant
change,
but then to make the smallest necessary change that still gets the job done.
For instance,
the natural tendency when creating a new feature similar to an existing
is to clone the existing code, then "hack it over" to provide the new feature.
This is quick and
relatively safe in the short term,
but tends to multiply the amount of code unmanageably over time.
It is often better to refactor and generalize the existing code to
handle both functions.
has worked out a number of practical techniques
for doing this
-- while still keeping the risk to the existing system
small
Performance tuning in particular tends to put stress on a design.
A number of the tricks normally used to improve performance
-- e.g. pre-calculation, post-calculation, ancillary tables --
tend to make the overall system significantly more complicated.
Even when the tricks work,
they can create "code fear",
a not unreasonable fear that the slightest change
may cause the entire house of cards to disassemble itself.
Ironically -- and especially with modern hardware --
much performance trickery turns out to be unnecessary.
With the improvements in hardware speeds
and optimizer technology,
it often happens that a well-designed, well-maintained
system will run at least acceptably well over most or all of its
operating range.
A clean design should not be complicated
by additions meant to enhance performance
-- until an actual need is demonstrated.
If performance problems do appear,
then the first rule of performance tuning is to
monitor the system to locate the problems,
before
attempting to fix them:
programmers are notoriously poor at locating performance
problems by intuition.
Once the actual problem has been located,
care should be taken to avoid fixing the problem
in ways that create bugs and other problems elsewhere.
Again, "think globally; act locally".
That a performance problem reveals itself in one area
does not automatically imply that that
is the best spot to make a fix.
|
It is a mournful
truth of our industry,
that half of all bug fixes either don't fix the original problem
and/or induce new problems.
The stereotypical method of debugging -- trace the code's execution
till the bug is found, then swat it -- is not a particularly efficient way to
either find or fix bugs.
For one thing,
clean code should be readable enough that most bugs are obvious "by inspection".
If they aren't,
then their obscurity itself is a bug.
And what we refer to as "scientific debugging" can usually find bugs faster
than tracing can.
For another,
every bug gives one a chance to ask:
"how did this bug come into existence"?
Do we need to:
- drain the swamp: clean up the whole module, sub-system,whatnot
- add more guard code: the earlier a problem is seen, the easier to debug
- rethink the design: poorly understood and overly complex design
is at the root of many bugs
- add more automatic testing: make the machine do the dirty work of debugging
And so on.
The subject is large enough
-- and important enough, given that 60% or more of all programming
time is spent debugging
-- that
is putting together a book,
Zen & the Art of Debugging,
addressing this one issue.
Database administration is now getting
more attention than it once did.
Issues such as
- disaster recovery preparation (with fire drills)
- backup and restore
- index tuning
- load balancing
- query performance analysis
- lock management
- defragmentation
do seem to get routine attention, routinely.
There is still sometimes a reactive character to this
work.
Given that there are now many more GUI performance and system
monitoring tools available
recommends taking a few minutes a day
to simply watch the system go through its paces
and thereby get a sense of what "normal" means.
After unscheduled disaster recovery exercises,
conversions from one database vendor to another
are perhaps the scariest part of database administration.
A large body of expertise has been built up on system A;
unless the local developers are already familiar with system B,
a similar amount of time may be needed just to get back
up to current levels.
With that said, application and/or performance requirements,
changes in corporate policy, mergers, and the like may require a conversion.
The great rule in this case is to break the process out into substeps.
If database enhancements are required,
it is generally better to do them before or after the conversion.
Ditto hardware upgrades.
The conversion itself should be automated to the extent practicable.
The first effort will fail.
If the effort is being driven by scripts,
then the scripts can be fixed and rerun,
hopefully to faill at a new and higher level.
But if increasingly fatigued humans are trying to run
the procedures off scribbled notes,
then there is no guarantee that old errors will not be repeated
in new runs.
Once the conversion is done,
it will turn out that the process of adapting the system
to suit the sometimes highly specific requirements
of "system A" will simultaneously have
"un-adapted" it for the equally specific requirements of system B.
Performance enhancement tricks that worked brilliantly
for system A will turn out to reduce performance with B
to a nearly hopeless crawl.
Post conversion,
a fair amount of tweaking will usually be required.
And should be budgeted for.
also provides other relational database services
in support of client requirements
for business intelligence,
operational reliability,
data warehousing,
and the like.
|