Archive for January, 2008

Finding Used Databases

It can be difficult to determine which databases you are administering are actually being used. Below is a query that determines the last time an index was used since the last time SQL Server started to help narrow down these databases:

 
SELECT db.name, i.name, S.*, i.*
 FROM sys.dm_db_Index_Usage_Stats S
      INNER JOIN sys.DATABASES db ON db.database_id = s.database_id
      INNER JOIN sys.Indexes I
              ON S.Index_Id = I.Index_Id
              AND S.Object_Id = I.Object_Id
              AND S.DataBase_Id = DB_ID()
WHERE I.Object_Id > 1000
ORDER BY db.name

SQL Server RAM Usage

SQL Server has difficultly managing memory for systems with a large amount of RAM (say 8+ GB), typically found in 64-bit installations. You might see log messages such as:

A significant part of sql server process memory has been paged out.

Or worse, SSAS may fail to process a cube with the only message that it was cancelled. To see if you have this problem, open Performance Monitor and view "Private Bytes" under the Process category. The value should be less than the amount of physical RAM.

I found this following article useful in dealing with this issue: http://blogs.technet.com/askperf/archive/2007/05/18/sql-and-the-working-set.aspx

You want to configure SQL Server and SSAS to use a specific amount of RAM, but be sure to leave plenty of RAM for OS and other processes, such as SSIS packages. Monitor Private Bytes while you run a typical workload to see how much RAM you can allocate to SQL Server and SSAS. It won't be surprising if you allocate 2/3 of the RAM to SQL Server and SSAS, and leave 1/3 for the OS and other processes.

Defragging Indexes in SQL Server 2005

There are two ways to optimize your indexes in SQL Server 2005:

  • ALTER INDEX ... REORGANIZE
  • ALTER INDEX ... REBUILD

Which one should you choose? According to BOL, it depends on how fragmented they are. The avg_fragmentation_in_percent column from the sys.dm_db_index_physical_stats() system function lets you know how fragmented your indexes are. If the value is below 5.0, then there is no need to defragment the index. If the value is between 5.0 and 30.0, then "ALTER INDEX ... REORGANIZE", is the best option. If the fragmentation is greater than 30.0, then "ALTER INDEX ... REBUILD" is the best option.

REORGANIZE has a couple of benefits. REORGANIZE will log only the pages that changed, while REBUILD will log the whole index. REORGANIZE can be done online, while REBUILD can't (unless you add the ONLINE=ON option for Enterprise Edition installations).

However REBUILD will update the statistics of the index which lead to better execution plans, while REORGANIZE does not.

Listing checked-out files in Team System (TFS)

You can get a complete list of checked out files by using the tf.exe command line tool found in: C:\Program Files\Microsoft Visual Studio 8\Common7\IDE

All checked out files in $/MyProject:

tf.exe status $/MyProject /user:* /s:http://MyTfsServer:8080 /recursive

All files checked out to Mark:

tf.exe status /user:mark /s:http://MyTfsServer:8080

You can write the output to a file:

tf.exe status /user:* /s:http://MyTfsServer:8080 > c:\checkedout.txt

Backing up to a NTFS compressed folder

It seems like a good idea to backup SQL Server 2005 databases to a compressed folder, however if your database is larger than roughly 30 GB, you will get the following error:

BackupMedium::ReportIoError: WRITE failure ON backup
device 'G:\Backups\200501220402.BAK'. Operating system
error 33(The process cannot access the file because
another process has locked a portion of the file.).

One way to get around this is to stripe the backup to multiple files, for instance:

BACKUP DATABASE AdventureWorks TO
DISK='G:\Backups\200501220402.BAK1',
DISK='G:\Backups\200501220402.BAK2',
DISK='G:\Backups\200501220402.BAK3'
WITH FORMAT, CHECKSUM;

Not ideal, but there are two options:

  1. Buy LiteSpeed Backup which includes compressed backups
  2. Wait for SQL Server 2008, which includes compressed backups

The sp_getapplock secret

sp_getapplock is not very well advertised in SQL Server 2005, however it is a good way to synchronize code in a stored procedure.

Before finding out about sp_getapplock, I would SELECT from a table with an exclusive lock, like so:

BEGIN TRAN;
SELECT TOP 1 * FROM dbo.ATable WITH (tablockx, holdlock);
-- Do something while the lock is held, for instance:
UPDATE dbo.ATable SET FieldA = FieldA + 1 WHERE FieldB = 'something';
COMMIT TRAN;

This blocks other users from entering the section of code until the lock is released (when the transaction is committed). Normally you couldn't do FieldA = FieldA + 1, because other users might be updating the table, however with the lock on the table you could.

This approach has some downsides

  1. Other users can't SELECT from the table.
  2. If you rebuild indexes on the table with the "online=on" option, it will want to put a schema lock on the table, to prevent other schema changes. The exclusive lock prevents the rebuild from starting.

sp_getapplock is the built-in way to allow only one user in a section of code at a time, for example:

BEGIN TRANSACTION
DECLARE @res INT
EXEC @res = sp_getapplock @Resource = 'Lock ID', @LockMode = 'Exclusive';
IF @res >= 0
BEGIN
PRINT 'lock is held.';
END
COMMIT TRAN;

@Resource can be used for different locks that don't interfere with each other.

You have to be careful which database you are "use"ing. SQL Server assumes that different databases have nothing to do with each other and won't block each other.

SQL Server Security with EXECUTE AS OWNER

EXECUTE AS OWNER is a great way to limit the permissions of a SQL Server Login. The general idea is to create your stored procedure with the EXECUTE AS OWNER modifier. Any user who has the permissions to execute the stored procedure, runs the stored procedure under the Database's dbo user (which means it can do anything in the database, but nothing at the server-level nor on other databases). If you only allow your Logins to execute stored procedures (and not touch the tables directly), then you've effectively limited the Logins to code you've written. If you don't write any DELETE statements, then Logins can't delete anything.

This is better than Roles, because Roles are very coarse in comparison. With Roles, you may have to give a User INSERT permissions on table. Instead with EXECUTE AS OWNER you can write a stored procedure that checks the data exactly the way you want in the body of the stored procedure. This is much more fine grained way of handling permissions.

From beginning to end, this is what you do:

Create a Login:

CREATE LOGIN [MyLogin] WITH PASSWORD=N'Password',
DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=ON;

Create its User in the database:

CREATE USER [MyUser] FOR LOGIN [MyLogin];

I prefer to use schemas to identify "public" stored procedures. So create a schema:

CREATE SCHEMA [public] AUTHORIZATION [dbo];

Give your new user EXECUTE permissions on anything in the public schema: (We will put the new stored procedure in this schema):

GRANT EXECUTE ON SCHEMA::[public] TO [MyUser];

Create your stored procedure:

CREATE PROCEDURE [public].[MyStoredProc]
(
@Param1 int
)
WITH EXECUTE AS OWNER   -- This "EXECUTE AS" modifier on the stored procedure is key!
AS
BEGIN
SET NOCOUNT ON;
 
-- do something
 
END

When your stored procedure runs, it can do anything in the database, including calling other stored procedures. It is an easy way to segregate public stored procedures from private ones. This gives you encapsulation, which is a good thing (see section 5.3 in Code Complete about the benefits of encapsulation).

The only permissions outside users need is EXECUTE permission on the public schema, so it is easy to add new stored procedures by creating them in the public schema.

Instead of Roles, you can have schemas. Let's say you would have 3 roles in the database: admin, anon, and general. The admin role is for Logins that perform administrative activity on a website. The anon role is for people who view your site anonymously, and the general role is for stored procedures that are for both. You can instead, with EXECUTE AS OWNER, create three schemas for your stored procedures: admin, anon, and general. If you want the stored procedure to have admin only Logins to use it, create the stored procedure in the admin schema. The same goes for the other schemas.

Version Control - Part 4: Distributed Version Control

There are many reasons to create branches in the Update/Commit model and Distributed Version Control really excels in an environment with many branches. Take the examples in the previous post:

In Distributed Version Control, each of these would be “repository/working directories” (repository for short). I put these terms together, because in distributed version control, your repository contains both the complete history and metadata and the working directory in which to edit the files. What this means is that you can branch the code without anyone else knowing. It also means, you *have* to branch in order to get a working directory to edit files. This encourages developers to segregate their tasks into branches and then merge each feature back to a central repository when the work is done.
On the server you might have the following repositories:

  • Trunk
  • Released (which contains v1.0, v1.1, and v2.0 labels)

On your local machine you might have the following repositories based on the Trunk repository on the server:

  • Foo Feature
  • Bar Feature

Once a feature is complete, you can push the changes to the Trunk repository and then delete your local feature’s repository.

This link has a more detailed explanation on how distributed source control works.

Version Control - Part 3: Branching/Merging

Branch and Merge

Many people who implement Update/Commit typically organize the server into the Trunk and Branches folders:

This allows you work on long term, scheduled features in Trunk, while making unexpected bug fixes in the latest branch: v2.0. The changes in the v2.0 branch can easily be merged back into Trunk using the version control software.

After you release the code and before working on the next feature, you branch your code to the Branches\v2.0 directory. When a bug needs to be fixed in the released version, you

  • Fix the bug in the Branches\v2.0 directory (The blue line in the image above)
  • Commit the fix to the Branches\v2.0 directory
  • Merge (using your version control software) the change from Branches\v2.0 to the Trunk directory, which happens to have a change (green line).

Release Flexibility Problems

Now you have been using update/commit version control and you have released your software and it is being used by end users. The to-do list is growing and some features can be done quickly and some take longer. You can easily find yourself in a situation where some of the long-term features have been committed to the Trunk; however the business has requested a quick feature that they need ASAP. For instance, a new client has changed the priority of a quick feature.

Another problem is that developers commit changes to the code unaware of the current release cycle. You can find that a committed change during testing can delay the release when the committed change should have been held back for a later release. The fundamental problem is that developers “decide” what is included in the release, when this decision belongs to other people, such as a release manager, project manager or the test manager.

The Hack in the Update / Commit Model

There are several ways to address the problems above using the Update/Commit model, but the hack below will lead you naturally to distributed version control.

If you have become comfortable with branching and merging between Trunk and Branches\v2.0, then there is an easy next step to address this problem. Branching and merging each feature:

When a developer starts to work on a new feature, they branch from Trunk to Features\Bar Feature and start working in the Bar Feature folder. When it is decided that the Bar feature is finished and ready for release, they merge the code back to Trunk using the version control software. This allows a manager to decide to release the Bar feature even though the Foo feature is not ready.

Conclusion

Working on features in a branch allows people to decide late in the release cycle what will be included in the release. It also allows flexibility when priorities change in the middle of a release cycle.

The downside is that there will be many Feature folders that are no longer needed because they have been merged back, or have been orphaned. Linus calls these “expensive” branches, since they are intended to be either temporary or private. Distributed version control addresses this…

Version Control - Part 2: Update/Commit

The problem with Checkin/Checkout

Generally speaking, Checkin/Checkout interrupts developers. You want to finish your task, but you can't continue on a portion of it, which then interrupts your thought process.

The more developers you have with the Checkin/Checkout model, the more interruptions each developer has and then development as a whole slows down.

The Answer: Update/Commit model

The Update/Commit model addresses this by allowing developers to edit the same files at the same time. No one has to wait. No one gets their thought process interrupted.

The two operations you have available are Update (download new code to your machine) and Commit (Upload your changes to the server). SVN's docs have a detailed explanation here.

The Update/Commit work cycle is:

  • Update (download source code from the server)
  • Edit
  • Update (download and resolve conflicts)
  • Commit (upload source code to the server)

But, are you ready for two developers editing the same file at the same time? Most developers are wary about making this leap. After awhile you find that it isn't an issue. Why isn't it an issue? Surely you'll have the same number of problems as with Checkin/Checkout?

Resolving Conflicts

Actually you have fewer problems with Update/Commit as compared with Checkin/Checkout. Every time two users want to edit the same file under Checkin/Checkout, it interrupts one developer which is a problem. However when two users edit the same file and commit the change to the server, there are two advantages:

  • Resolving the conflict happens when committing the work, not when editing, which does not interrupt developers' thought processes.
  • Most conflicts can be resolved automatically by the version control software. I find that 90% of conflicts can be handled automatically.

If you find that you have to manually resolve the conflict in a file, you typically have 3 versions of the file:

  • Your changes to the file
  • Someone’s changes to the file
  • The final, resolved file. You edit this file with the goal of including your changes and other people’s changes.

Conclusion

Update/Commit is great. You and everyone else can work without getting into each other’s way and it seems like all the issues are solved. Why would you want distributed version control?

Next Page »