Saturday 26 November 2011

SSD a step towards instant computing

 Ever since I first started working on optimizing server performance I have felt that the ultimate goal is instant computing.  Where I define instant as no for the user conceivable delay from the user from request to result.  Unfortunately few suppliers has set such, for the outside observer, quite natural goals.  They are usually just happy with a bit faster than last year or a bit faster than the competitor.  So you will run into a load of configuration limits for system parameters that hasn’t kept up with the explosion in hw possibilities combined with the lowering of price/performance.

As soon as you overcome one bottleneck it’s on to the next one.  Part of this quest has been to get as much of the data into memory as possible to overcome the slowness of traditional spinning plate disks.   With the arrival off ssd’s I thought we could be close to this goal.  And for sample email searches in Outlook it’s close.  If you had a few thousand emails searches takes an age because it goes on in the cache your Windows pc stores locally.  If you use an ssd disk in your laptop/pc it’s down from minutes and sometimes hours to seconds.  The greatest leap ever, but so little appreciated that even Dell stopped (for a while) putting ssd’s as an option even on their high end pc’s.

The greatest gain for servers is obviously where there is a high frequency of ever changing data.  Like database logs.  Unfortunately also the one area where the recommendations are not to use them due to the ssd’s limitation of total rewrites.  There is work going on to automatically exclude areas that nears this limit.  Though not fast enough for some that reached it with total failure of whole disk shelves as a result.  This write limit should also be a thought for san manufacturers that automate on what type of disk the different types of data are stored depending on their frequency of access.  Maybe one should just take the penalty and routinely change out the disks every about 18 months.  An easy task with proper raiding.  And if you went with the cheaper server type or medium sized storage ssd’s instead of the super san = super expensive ones, still a cost effective way.

Aside from that log versus max total writes anomaly databases has much to be gained from ssd’s.  Specially they so large that they can’t be all sucked into ram or where there is a high frequency of updates and where one for security precautions prefer the synchronous write instead of asynchronous.   

Server internal ssd’s are actually an alternative for servers that before was optimised by utilizing the caching ram of an external storage unit.  This way saving considerably on your next system hw  upgrade.  

Friday 25 November 2011

Backups, art that needs reinvention

Some of the articles you see about data lost in the cloud is beyond belief. There is no excuse for loosing data that was stored more than 24 hours before the problem happened.  Most storage users will have a few snapshots and a dr tested way of restoring them.  The problem comes when you go beyond the snapshot that is still on disk.  Backup of snapshots to other medium is still in its infancy.  The most prominent of backup solutions jsut don’t have it in them  And I have seen virtual server systems presented as complete solutions without a thought for how to get the data back if the thing burned down or, currently more likely, was drowned in a flood.  There is a job here for a specialist in deduping, with the added flavour of a couple of extra copies.

There is a tendency to not treat virtual servers as real servers.  Of course you can restore all the physical servers. But what about the virtual ones.  With dormant or little used virtual servers a lot of them can fit on a few physical hosts.  But the total data can still be the same as if each server was a separate physical.  If you haven’t backed it all up, you need to at a minimum have a definite restorable master and a record of all the steps taken to create each one.
We should not either forget the data people bring around with them. As laptops get ever more capable, most now more powerful than servers where 4 years ago. Developers like to have it all at hand.  A very important part of that time critical project might has its only copy on a thing thrown hither and dither every morning and evening.  Greatly encouraged by the cheap developer tools  licensing we see emerge as a teaser to get more people onboard.  And developers never where the first to think about what happens when things go wrong, or whether that online storage deal included a quantifiable and guaranteed backup/restore.

Often the issue is it takes a long time for a user to discover that their data is actually no longer there.  Today even the smallest of user can have thousands of files.  And since nobody longer learns about file system and folders they never see them except when they need them. It can take months or years if they are only used at the annual budget time or multi yearly planning stage. For that amount of data/iterations it is/was often uneconomical to store it all on disks.  Besides your auditor probably still loves the tape.  

We also have the fast pace of the technology. A much used refresh cycle is 3 to 4 years due to the rapid rise in hardware support costs after the initial contracted support period.  But the requirement is that financial data is to be stored for 7 years.   Ask your IT department if they can restore you a 7 year old backup.  Even if they have the tapes do they have the drives to restore them with or the system to restore them on to.  Not such a large problem if the software system is still in use and the data stored in a database.  They are easy to migrate with the hardware refresh as long as you haven’t segregated out to much of the old to fit the new.  Still you can always add some more modern storage to get those data back in, if you planned for that eventuality in the first place.

Relational databases – a quick look at flavours strengths and weaknesses

Let’s start with the master of them all Oracle.  It’s the db with all the tools, tweaks and it scales well.  And if the price was right this is the one most would or should pick, however it seldom is. Oracle never followed the development in the processor where each core get weaker but you get a lot more of them. Hence their penchant for charging per core and their customers liking of the HP Itanium processor.
 Oracle is so advanced that it’s more like an operating system in itself and you need to take your patching seriously.  Also be into your file system details. Play with the config files, there is a lot to be gained.  It’s a pity the 3 defaults of small medium and large is not more up to modern standards.  Proper bakcups are essential. 
Oracle do not like loss of any of it’s data.  And since a lot of performance can be gained from running it raw, a simple file system backup won’t do the job.  You need to learn about dumps like dd, and have it done in the correct order.  Exports is also very important. In addition to being a secondary way of doing backups, they can also give you a lot of hints on fragmentation and proper sizing.  Don’t either forget to have multiple control files in many separate locations.  
It’s the one db where you really can’t live without a support contract from the mothership.  And if you have a set of the printed manuals, they will be from a previous version but they are worth their weight in gold. And 95% of them is still applicable.  Read all about it’s system tables.  There is a lot to be gained here. For standardisation and easy admin to admin transfer have a look at the old OFA manual.

It’s nearest competitor as a multi os db is Sybase. Now owned by SAP. A brilliantly designed but more simplistic model. However you’ll have problems getting more than 1 installation (version) onto a single server. Instead it uses what they call userdatabases. Requires a strict discipline as an admin so you know which one you are in. But organizing the file storage and backups are a lot simpler
It’s penchant for “go” is not as good as Oracle’s execute command, and it’s method of dumping output to file is archaic. Like Oracle it’s very sensitive to playing with the kernel settings on unix/linux. Most of it’s performance is to be gained here. In addition to, like most relationals, a good scheduled reindexing.  A good set of Sybase’s own manuals will go a long way for your support needs.

Mssql could have done the knockout on the other db’s if it hadn’t such a scaling problem. It depends on a single server, and Windows on top of that, and can only scale upwards at the speed of the hardware development. Windows Datacenter is an option but due to its obscurity and odd Microsoft rules on deployment, Windows Enterprise is really your option. And then we are back to this processor thing again. It is a database that most admins can manage though, even without the scarce manuals.  They might not utilize all it’s potential but any Windows admin can make it run. Just give them a few hints or a small course on simple housekeeping like dumps and scheduled reindexing/reorg.
Mssql’s testing/analyzing tool is very good but it’s not as handy as Oracle’s command line “desc” for analyzing sigle sql queries.  However it does give you a nice way of presenting your findings.

Adabas owned by the German Softwareag is a story of what could have been. Popular among some German companies/developers it never reached the popularity of Sybase. If you have seen it its probably because you had a system from a German company that was based on it.  Very simple to manage, don’t even need a manual to start, stop and backup this one.  Low cost and flexible.  It’s ripe for a large multinational to take it over.  Somebody with a long reach, believe and financial muscle to push it into the limelight.

Mysql the developers  favourite due to their perception that it’s “free”.  Now owned by Oracle. There is no such thing as a “free lunch” however.  What you don’t pay for the software itself, you,  due to it’s popularity among specialists, will pay for in admins.  Recommend testing your restores frequently. Specially when it comes to getting back the last data entered.   Ripe for a organized set of admin tools. Oracle has a long way to go, and lots of opportunity for ad on profit making.

A problem with all relationals is that they are good for adding and picking/filtering small amounts of data and creating automation for repeated actions, but when you reach certain level of reads needed it’s better to forget about the indexing.  When that happens the old ways are better. The db’s  security against data loss also makes them vulnerable for slow down by locks and erroring by deadlocks. This is why if you have to read all the inputs/data it’s faster to use the file system directly for your (interim) storage, without the overhang.  Many large players do.

Prioritising when everybody is screaming

We have all been there, just one of those days when things go wrong. And sods law says they all come together.  Now you need to prioritise.  If you have done your preparations this should be easy in your head.  You have your list of systems in prioritised order as a result of their importance to the company and the immediacy of the effect of downtime.  Adjusted for top management priority and current interest. 

Let the admins get on with the job, see if they need additional outside help, and keep the top brass away.  Most fixes are based on the rolling halv hour.  It will take halve an hour to know if this will work before we discover the next issue, or try the next possibility.  Be ready to run parallel avenues or make sharp decisions.  If your systems are properly admin’d / backed up there is usually a fast way or a slow way to restore.  Problem is, when do you abandon the fast way and go with the more secure but slow.

Avoid falling in the trap of helping the biggest nuisance user first, or the directors darling.  The dangerous complainer is the one that says nothing to your face but complains without you knowing and without change of response.  And their argument will stand if their system should have had priority.   Many systems are important but they can take a certain amount of downtime. How many accountants do you see outside hours or in weekend outside of the budget and reporting cycles.  Use your urgency/dependency listing from the DR plan to help  you.  Think of when they normally schedule upgrades.  You will also thank yourself for not storing all the eggs in one basket or all the data on the same san.

Recruiting, hints for managers out hiring

First find out what you are looking for. If you are hiring for a manger are you sure you want the same as the last one. He/she might have worked out well but their field speciality is probably well covered for now. Maybe the next one should be slightly different. 

Time from announcement/advertisement to first interview should be as short as possible. Specially if you recruit for tech jobs where there are many companies looking for the same candidates. How many times has your company lost out on a candidate that is no longer available. How many has turned down an offer of interview. If the number is high you need to look at your process again.

If your plan is for more than 2 rounds of interview your recruiting is not as efficient as it could be and you are likely to lose out on the best candidates.  Did you weed out enough of the chaff by reading the cv’s thoroughly or are you wasting everyones time by just skimming them for the first time at the interview.  Large multinationals are big sinners in having many rounds of interviews. Is that a sign of too many corporate layers = bureaucrazy. Is there to many people involved in your decision process. However if you the manager ain’t technical, bring an expert from your team. It gives them the chance to meet their potential future colleague.

When doing the interview do you do the “take me through your cv thing”. That means you have to fit it to the job. An alternative approach could be “take me through samples of your experience for each of the requirements in our job description”.  This will give the candidate the opportunity to bring in more relevant stuff, and will let you see if they can translate their earlier experience to their new tasks.  

Do you use a technical test already at first interview. It will help you confirm your initial opinion, and you can have more experts evaluating it, covering a larger technical field.  There is nothing wrong with programming on paper. Many universities still use it for their exams. Personally I don’t see any problem with manuals, helping aids or mobile phones either. If they can get assistance at the test, they can get it in their work, and what you really want is somebody that can complete the job.  
There is nothing wrong testing for all the nice to haves also.  Remember one thing, if the candidate knew the answer to all the test questions, the test wasn’t hard enough.    

When you have done a few interviews you know the do’s and dont’s. Bring personnel in on the second round, reducing the times you have to wait for their availability. And they don’t really need to meet anyone that isn’t to be hired. One of the biggest delays can be organizing a time that suits everybody.  Therefore bring as few interviewers as possible, but always minimum 1 other for legal reasons.  And It gives you thinking time.. If you are not the final decision maker, think about if you yourself need to see the candidate more than 1 time and leave the final interview to the decision maker and personnel.  I would suggest just 2 for second rounds, just so there is a choice. And you don’t have to send forward anyone you can’t live with yourself.

If you are unsure of a candidate, or rather you think there could be a better candidate out there who’s cv you haven’t seen yet. If the potential candidate is free on the market at the moment, take a chance.  That’s what probation is for.  It’s takes less ruthlessness  to trial somebody currently out of work, than somebody that has to quit their current job.  And your deal with the recruitment agency should always include a step down ladder in fee if a candidate is later found not suitable.

Lastly, give a thought to all that was unsuccessful. It won’t cost you much to tell them, but it will mean a  lot to them to know.