Some Standards to Follow & Baseline

Copied from good scripting guys experience journal … “AS IT IS”

Server Setups / Installations

We need to standardize the setup of all the systems we own. By doing this in a consistent manner, not only does it save us work down the road, but it also makes it easier to do our current work. Let’s take an easy example: the name of the Active Directory account which runs the SQL Server service. If we always name this in a consistent way, place them in the same OU in Active Directory, and put them in the same security groups, we will always know where to find them and can leverage technologies such as delegation (i.e. the domain admins can let us do things ourselves) or group policy (which I’ll argue later is a form of centralization). It’s also something that we (hopefully) don’t have to do that often, so it’s probably not a good idea to trust our memory.

Here’s some specific examples of things we might standardize about our servers and installations:

  • Naming Standards
    • Account names – Make them something meaningful as to what they are for.
      • SQL Server service account
      • SQL Agent service account
      • Application service accounts that connect to the database (may be outside our control, but worth trying)
    • Server and Instance names – In these cases, we should include things like environment identifiers (i.e. Dev, Test, QA, Prod) and versions (i.e. 2K5, 2K8, 2K12).
  • Layout, size, and labeling of disk drives:
    • Do we split up the volumes for the system databases from TEMPDB and the user databases?
    • Do we have dedicated drives for backups?
    • Do we use mount points or drive letters (or some combination)?
  • OS versions / patch levels:
    • What OS (i.e. Windows 2003, 2008, 2012, etc) do we install and is it related to what version of SQL Server we install (i.e. do we always match SQL 2012 with Windows Server 2012)?
    • Is there a standard service pack or set of patches we always install?
  • SQL Server versions / patch levels:
    • What versions of SQL do we allow / support?
    • What patch levels should they be at (i.e. “SQL 2008 is always patched to SP3 CU2”)?
  • Post / Pre Installation Steps – are there actions that we take on all servers either before or after the installation, such as:
    • Restricting accounts from logging in locally or interactively.
    • Further reducing the security of standard roles like PUBLIC on SQL (I know this is sometimes frowned upon, but some like to do it), a.k.a “hardening”.
    • Adding additional TEMPDB data files based on some formula.
    • Setup of standard agent jobs for maintenance or other purposes.
    • Adding the newly created server to repositories or third party tools such as SQL Monitor.
    • Installation of standard third party tools or software.

Procedures (a.k.a. SOP – Standard Operating Procedures)

We must carry out our daily work in a consistent manner. By doing so, it not only makes our jobs easier (do you really want to try and remember how to restart the TPS report at 3 in the morning after a night at the bar?), but also ensures good service to our customers.

Here are some areas to think about when we write our SOP:

  • What are some common questions we get and what are their answers?
  • What are the steps to do common tasks, such as:
    • Granting access to systems for users.
      • Do we use domain security groups? If so, which ones for what roles?
      • Are SQL accounts allowed, or must all access be via Windows credentials?
      • Are there any restrictions on what levels of privilege we give without some kind of senior approval?
    • Moving backups from production to test systems.
      • Is there any kind of approval required, and if so, by whom?
      • Must any sensitive data be scrubbed or otherwise removed?
      • Are there any scripts or other actions that must be done after the database is restored, either to any or specific application’s databases?
        • Do we need to grant developers access?
        • Do we need to re-link orphaned SQL accounts?
        • Do we need to change any static configurations to point to development systems (don’t miss this one!)?
    • Releasing new code into the production systems.
      • Is someone’s approval or some kind of change management process required?
      • Who handles the release activities?
    • Responding to incidents (system down, slow, degraded)
      • Are there standard metrics to capture or baselines to compare against?
      • Are there any common problems for which known solutions are documented?
      • Is there an incident management process that needs to be triggered if certain things happen (i.e. we need to notify management if the system goes down between 8AM and 5PM EST)?
  • Are there any SLAs (Service Level Agreements) in place for things like:
    • Incident response times (i.e. “all Severity 1 incidents must be resolved within four hours” – and make sure you define what “Severity 1” means).
    • Off hours support or response times (and what defines “normal hours”).
    • Completion of standard activities (i.e. when a request is received to provide a backup for testing, it must be done by the end of the next business day).
    • Standard down times / maintenance windows.
    • Backup and recovery, such as Recovery Time Objectives or levels of acceptable amounts of data loss.
  • Is there contact information for teams which we commonly interface with, such as infrastructure or system administrators, domain admins, or SAN admins? Do we need to follow any specific procedures when asking them to do something for us?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s