Supporting a 24 by 7 operation drives
home the importance of pre-engineering reliability into your systems.
has provided a large number of project and
maintenance services to some Philadelphia area cable TV
advertising and marketing firms.
Some highlights:
The existing marketing channel was too "one-size-fits-all" for the
late 20th century.
split the single feed into eight,
allowing the client to specialize its market program
for each of its clients.
To do this
made significant adjustments to the existing
database,
cut 40,000 lines of legacy DOS C code down to 12,000,
and reburned some proprietary firmware.
Since the original development environment had gone walkabout,
reburning the firmware required disassembling and reverse engineering
the existing firmware, then
using hex tools to made the necessary modifications.
The field units playing the commercials
had to be polled as to what video they had played,
so that the various advertisers could be billed appropriately.
The field units were a bit underpowered for the task
and the modem pool not perhaps as reliable as it might have been.
The pre-existing polling solution was fragile: if one remote unit
(of the 400 odd) failed, the entire population might have to be repolled
and reloaded.
The solution turned out to be a complete rewrite to make the polling self-healing.
A set of independent processes on the main server were wired
to look for field units who had not yet "talked to mamma."
These processes were set to randomly try field unit and modem combinations
until every unit had either given up its information
or failed "too many" times to do so -- at which point hardware support was dispatched.
What was striking was that a randomized algorithm was
considerably more robust than the previous deterministic one:
the randomized processes just kept trying.
In fact one obvious-in-retrospect problem was the system was so good at bypassing
bad modems that the whole bank could be near failure before it became
noticeable.
Additional reports had to be worked up to look specifically for modem problems.
In general, a local cable company is entitled to two minutes out of each hour
on a typical cable channel.
The channel sends down "cue tones" (same thing a touch-tone phone uses) buried in the
broadcast signal to tell the local cable headend, "this one is yours".
Sub-nominal success rates suggested
either these cue tones
were not always being sent when they should be,
or else that the field equipment was not detecting/handling them correctly.
The response was a web-based data analysis tool
(written in HTML, Perl, and Java)
to determine whether the cue tones had been sent,
whether the field equipment had detected them,
and whether, if it had detected them, it had responded
quickly and correctly.
A simple color coding scheme made it easy to scan thousands
of cases to determine if a specific problem
originated with the network,
with the detection hardware,
or with the playback component.
|
Using the Scala technology (originally developed for Amiga,
but running for us on Windows),
built a complete advertising cable channel from the
ground up: with a rotation of audio, video, jpeg's, text and with parts of the
screen set aside for weather, time, stock market quotes, news, and the like.
The channel was used to sell real estate, cars, and so on. All commercials
all the time. But apparently people would watch this.
Maximizing revenue streams from advertising and marketing spots
requires a careful balancing of the value of the spot
against the size and character of the audience
(it is difficult to make much money selling Rogaine on Nickelodeon,
at least during the day).
put together a combination package of a scheduler generator which would
give a reasonable first cut of a schedule
and a schedule reporter which would assess a given schedule for quality
and fairness (there was some sense that each advertiser should get
a balanced set of "good" and "bad" times).
With hundreds of field units and hundreds of thousands of commercials
playing, it was sometimes a bit difficult to see where
sub-nominal results were coming from.
To deal with this,
built several web-based reports using a sophisticated color scheme
(green = "good", red = "goodness challenged", yellow = "somewhere in between")
to show where the problems were.
One of our favorites was the "those who are about to die" report,
which showed field equipment on the edge of a total nervous breakdown.
With this report field support was usually able to spare out defective
equipment before the failure resulted in significant revenue loss.
(Interestingly enough, field support really enjoyed working with this report --
that knife-edge of death aspect to it perhaps -- and was always requesting
tweaks to the timing rules and color schemes to make it more effective.
A very pleasant experience, all in all.)
Another favorite was a "pipeline" report, which showed at each stage of the
process -- scheduling, encoding, transmission, playback, and verification --
for a 100 spots coming in, what percentage had successfully left that stage
and moved on to the next.
This was a big help in moving from reactive to proactive mode,
and helped drive performance from a frankly discouraging 60% up to about 90%.
(For technical reasons 90% was about the theoretical limit for the
given architecture.)
Once it was clear where the losses were coming
from, the relevant departments were always strongly motivated to see
that their part of the pipe was functioning at 100%. And when all sections
of the pipe are nominal, the pipe as a whole tends to be nominal as well.
|