Success does not consist in never making mistakes but in never making the same one a second time. - George Bernard ShawI've seen this more than once on two different teams where I work: on a network upgrade and on our BI Tools Team. Capacity is identified as a problem which is causing incidents. The solution is to submit a change to add horsepower. Come Monday morning, new incidents arise. To the clients the systems still aren't working, but to IS the problem has simply shifted.
When you add capacity to a system, you're fixing a bottleneck but are all of your downstream systems equipped to handle the deluge of new data coming their way? You could be breaking South Fork Dam over Johnstown. When you open the floodgates, are your systems ready?
In my case, an upgrade to SSRS server capacity threw more at the report source databases than they could handle. After a day of intense investigation, we realized what needed to be done. Adjust queue depths and indexes and upgrade disk frame technology and database server memory and CPU. Additionally, the application support team took the most important and often overlooked step of report query tuning. Soon everything was humming like a well oiled machine.
The lesson learned was that bottlenecks are systemic: they don't go away they just move. Before adding capacity, if you're lucky enough to be able to replicate Production in your Test environment, test the upgrade thoroughly. If upgrading Test to match Prod is too expensive or wouldn't yield accurate results due to differences is load, then turn the upgrade spigot slowly. The temptation to do everything possible all at once can be hard to resist, but changes are best done one at a time so the effects can be observed. If multiple changes must go in at the same time, use administrative tools to restrict the effects, allowing you to turn the knob slowly and keep an eye on things.
No comments:
Post a Comment