Ashmead Software & Consulting, Inc./systemadministration

The secret of system administration is listening.

General comments

Performance of the underlying hardware and operating system can either impose significant penalties or, on the other hand, offer significant reliability and performance boosts, depending on how it is handled.

However, we should not be enticed by technology into ignoring the human factor, which in fact will normally have a more decisive effect on performance.

Communication between DBA and SA

In some large shops, the system administration (SA) and database administration (DBA) roles are kept separate. But tighter integration of these two roles can offer significant benefits in terms of reliability and performance.

For instance, the restore options provided with many relational database products force restores to be done for the entire instance or not at all. Adding in backup procedures at the operating system level can offer significant improvements in flexibility, and also provide insurance against bugs and other failures in the database vendor's own products.

Communication with operators

The rule that communication is key applies between administrators and operators as well. At one point John Ashmead was in charge of about twenty-five VAXes at a remote site. As was the custom at the company, he attempted to manage all administration electronically and by phone, only going physically over to the site when there was a serious problem.

As an experiment, he tried swinging by the site on the way into work in the morning and chatting with the operators, about problems on the machines of course but also about nothing in particular.

Over time, he noticed he was actually having to spend significantly less time trouble-shooting at the site. In the morning discussions, hiccups and trends "too small" to merit a formal report would come up. Often these led to pre-emptive strikes against problems. And the continual interaction ensured that the operators and he were "on the same page." They knew exactly what was required under various circumstances and would get started even while the report was being called in. Uptimes went up; trouble-shooting times went down.

Communication with clients

Obviously the most significant of all. Several examples:

Three scheduled downtimes are less disruptive than one unscheduled. If they know it is coming, the users will find other ways to spend their time. But if they don't, they will lose not only the time associated with the interruption, but also the time spent refocusing once the machine is back up, and the time spent wondering about what they might have forgotten in the shock of the crash. One implication is that problematic hardware should be scheduled for fixes at the first convenient opportunity rather than waiting to be sure it really is bad.
Technical trickery is not a substitute for talking with people. For instance, at one point Ashmead found one of his VAXes was running out of terminal lines on a machine. The traditional recourse was to time out those who had not touched their line in X minutes. As an experiment, he tried talking with the people using the machine and explaining the problem. It turned out that management itself was the problem: the managers for the workgroup had not realized that the lines were a precious resource and they were keeping their own people from getting in. Once this was clear, they stayed off voluntarily: no more need to drop lines via software.
At any client site it will normally happen that one or two people, even though they are not members of IT, will be particularly techno-savvy. Extra time with and training for these people will head many problems off.

Ashmead Software & Consulting, Inc. specializes in the design, enhancement, and administration of relational databases with particular emphasis on reliability, performance, and ease-of-maintenance.