Version Control - Part 3: Branching/Merging

Branch and Merge

Many people who implement Update/Commit typically organize the server into the Trunk and Branches folders:

This allows you work on long term, scheduled features in Trunk, while making unexpected bug fixes in the latest branch: v2.0. The changes in the v2.0 branch can easily be merged back into Trunk using the version control software.

After you release the code and before working on the next feature, you branch your code to the Branches\v2.0 directory. When a bug needs to be fixed in the released version, you

  • Fix the bug in the Branches\v2.0 directory (The blue line in the image above)
  • Commit the fix to the Branches\v2.0 directory
  • Merge (using your version control software) the change from Branches\v2.0 to the Trunk directory, which happens to have a change (green line).

Release Flexibility Problems

Now you have been using update/commit version control and you have released your software and it is being used by end users. The to-do list is growing and some features can be done quickly and some take longer. You can easily find yourself in a situation where some of the long-term features have been committed to the Trunk; however the business has requested a quick feature that they need ASAP. For instance, a new client has changed the priority of a quick feature.

Another problem is that developers commit changes to the code unaware of the current release cycle. You can find that a committed change during testing can delay the release when the committed change should have been held back for a later release. The fundamental problem is that developers “decide” what is included in the release, when this decision belongs to other people, such as a release manager, project manager or the test manager.

The Hack in the Update / Commit Model

There are several ways to address the problems above using the Update/Commit model, but the hack below will lead you naturally to distributed version control.

If you have become comfortable with branching and merging between Trunk and Branches\v2.0, then there is an easy next step to address this problem. Branching and merging each feature:

When a developer starts to work on a new feature, they branch from Trunk to Features\Bar Feature and start working in the Bar Feature folder. When it is decided that the Bar feature is finished and ready for release, they merge the code back to Trunk using the version control software. This allows a manager to decide to release the Bar feature even though the Foo feature is not ready.

Conclusion

Working on features in a branch allows people to decide late in the release cycle what will be included in the release. It also allows flexibility when priorities change in the middle of a release cycle.

The downside is that there will be many Feature folders that are no longer needed because they have been merged back, or have been orphaned. Linus calls these “expensive” branches, since they are intended to be either temporary or private. Distributed version control addresses this…

Version Control - Part 2: Update/Commit

The problem with Checkin/Checkout

Generally speaking, Checkin/Checkout interrupts developers. You want to finish your task, but you can't continue on a portion of it, which then interrupts your thought process.

The more developers you have with the Checkin/Checkout model, the more interruptions each developer has and then development as a whole slows down.

The Answer: Update/Commit model

The Update/Commit model addresses this by allowing developers to edit the same files at the same time. No one has to wait. No one gets their thought process interrupted.

The two operations you have available are Update (download new code to your machine) and Commit (Upload your changes to the server). SVN's docs have a detailed explanation here.

The Update/Commit work cycle is:

  • Update (download source code from the server)
  • Edit
  • Update (download and resolve conflicts)
  • Commit (upload source code to the server)

But, are you ready for two developers editing the same file at the same time? Most developers are wary about making this leap. After awhile you find that it isn't an issue. Why isn't it an issue? Surely you'll have the same number of problems as with Checkin/Checkout?

Resolving Conflicts

Actually you have fewer problems with Update/Commit as compared with Checkin/Checkout. Every time two users want to edit the same file under Checkin/Checkout, it interrupts one developer which is a problem. However when two users edit the same file and commit the change to the server, there are two advantages:

  • Resolving the conflict happens when committing the work, not when editing, which does not interrupt developers' thought processes.
  • Most conflicts can be resolved automatically by the version control software. I find that 90% of conflicts can be handled automatically.

If you find that you have to manually resolve the conflict in a file, you typically have 3 versions of the file:

  • Your changes to the file
  • Someone’s changes to the file
  • The final, resolved file. You edit this file with the goal of including your changes and other people’s changes.

Conclusion

Update/Commit is great. You and everyone else can work without getting into each other’s way and it seems like all the issues are solved. Why would you want distributed version control?

Version Control - Part 1: Checkin/Checkout

Linus Torvalds is big on distributed version control (http://lwn.net/Articles/246381/), and I'm starting to see the light. Distributed version control is something any organization should seriously consider, not just open source projects. This is a start of a series of blog entries that shows my progression of version control preferences and the advantages of each.

  • Checkin/Checkout (Source Safe)
  • Update/Commit (CVS/SVN)
  • Branching/Merging
  • Distributed version control (git and others)

Checkin/checkout
Checkin/checkout solves the main issue: Multiple developers working on the same body of source code. Developers check out a file, and no one else can edit it. Once the developer is done, they check it in and someone else can edit it. Easy. Simple. Everyone understands how it works.

A lot of Microsoft shops historically preferred this model for many reasons, but one is because Source Safe works this way. Source Safe can handle shared checkouts (which is very similar to the Update/Commit model), however anyone who has used it will tell you that shared checkouts in Source Safe will go wrong eventually. It is unfortunate that shared checkouts have scared many people away from the Update/Commit model, because the Update/Commit model has advantages over the Checkin/Checkout model when done right, for instance in Team System.

Moving SSAS databases

Here is an article on moving SSAS databases. The only thing to add is to be sure to set the file permissions on the folder.

http://www.ssas-info.com/ssas_faq/ssas_management_faq/q_how_do_i_move_analysis_services_database_to_a_different_folder.html

Natural Keys vs Surrogate Keys

This blog entry has a good description of the pros (and some cons) of surrogate keys:

http://rapidapplicationdevelopment.blogspot.com/2007/08/in-case-youre-new-to-series-ive.html

SSRS filter values missing from a SSAS datasource

I recently ran into a problem where the list of values in a filter were missing for a dimension. It was due to having a fact relationship from the dimension to a very small fact table. In fact, the table was empty. I resolved it by changing the relationship to a regular relationship. Sure it will be less efficient, however the fact table is expected to be very small.

SAN RAID Performance

It is difficult to find information on how different RAID levels perform. I did some performance testing with 14 SCSI disks in a SAN, and came up with the following conclusions (heavy dose of salt needed):

  1. More disks == faster array, but doubling the number of disks does not double the performance. I found that to double the performance of RAID 5 w/ 3 disks, the array needed 10 disks.
  2. Sequential reading/writing is faster than random reading/writing.
  3. When reading, the RAID level does not matter much.
  4. When writing, the RAID level is very important. RAID 1 (or RAID 10 depending on the number of disks) is the fastest by far, then RAID 5 and then RAID 6.
  5. Server SCSI is faster than a SAN
  6. A SAN is faster than my laptop.

SQL Server notes:

  1. OLTP databases generally have random reading and writing.
  2. OLAP databases generally have sequential reading and writing.
  3. When committing a transaction, the log must write to disk, but the MDF does not need to write to disk. So the write speed of the database it dependant on the write speed of the LDF file. This is why it is common to have the MDF on RAID 5 and the LDF on RAID 1 (or RAID 10).
  4. SQL Server recommends RAID 5 for read-only databases. I assume it is because you'll get the most disk space compared to other RAID levels.
  5. Since sequential reading and writing is faster than random reading and writing, and since you need a lot of disks to double the performance of an array, it can be faster to isolate arrays for certain tasks, rather than combining all the disks into a large fast array. (I hope you took that heavy dose of salt)

More Information:

Graphs:

(I don't show MB/s above 150 MB/s because the 2Gb/s cable was saturated at 160 MB/s).

Read MB/s

The RAID level matters when writing:

Write MB/s

The RAID level doesn't matter when reading:

Read IO/s

The RAID level matters when writing:

Write IO/s

SQL Server 2008 Nov CTP Released

The November CTP for SQL Server 2008 has been released:

http://www.microsoft.com/downloads/details.aspx?FamilyId=3BF4C5CA-B905-4EBC-8901-1D4C1D1DA884&displaylang=en

List statistics and date updated

The following query will list all the statistics in the current database, and the date they were lasted updated.

 
SELECT object_name(object_id) AS tablename,
 
 name AS index_name,
 
 STATS_DATE(object_id, stats_id) AS statistics_update_date
 
FROM sys.stats
 
WHERE object_id > 1000
 
ORDER BY object_name(object_id), STATS_DATE(object_id, stats_id);

Sorting uniqueidentifiers in SQL Server 2005

I had an issue recently where I needed to sort on a uniqueidentifier column and read the data in .Net. I found that .Net sorts Guids differently than SQL Server.

You can see for yourself. :) Run the following code.

 
 
DECLARE @t TABLE (
   g uniqueidentifier
); 
 
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000000001' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000000010' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000000100' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000001000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000010000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000100000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000001000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000010000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000100000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-001000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-010000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-100000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0001-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0010-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0100-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-1000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0001-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0010-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0100-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-1000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0001-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0010-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0100-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-1000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000001-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000010-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000100-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00001000-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00010000-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00100000-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '01000000-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '10000000-0000-0000-0000-000000000000' ); 
 
SELECT * FROM @t ORDER BY g ;

It returns the data in the following bazaar order. Keep in mind the first row is the "smallest" number.

g
01000000-0000-0000-0000-000000000000
10000000-0000-0000-0000-000000000000
00010000-0000-0000-0000-000000000000
00100000-0000-0000-0000-000000000000
00000100-0000-0000-0000-000000000000
00001000-0000-0000-0000-000000000000
00000001-0000-0000-0000-000000000000
00000010-0000-0000-0000-000000000000
00000000-0100-0000-0000-000000000000
00000000-1000-0000-0000-000000000000
00000000-0001-0000-0000-000000000000
00000000-0010-0000-0000-000000000000
00000000-0000-0100-0000-000000000000
00000000-0000-1000-0000-000000000000
00000000-0000-0001-0000-000000000000
00000000-0000-0010-0000-000000000000
00000000-0000-0000-0001-000000000000
00000000-0000-0000-0010-000000000000
00000000-0000-0000-0100-000000000000
00000000-0000-0000-1000-000000000000
00000000-0000-0000-0000-000000000001
00000000-0000-0000-0000-000000000010
00000000-0000-0000-0000-000000000100
00000000-0000-0000-0000-000000001000
00000000-0000-0000-0000-000000010000
00000000-0000-0000-0000-000000100000
00000000-0000-0000-0000-000001000000
00000000-0000-0000-0000-000010000000
00000000-0000-0000-0000-000100000000
00000000-0000-0000-0000-001000000000
00000000-0000-0000-0000-010000000000
00000000-0000-0000-0000-100000000000

In the end, I decided to SELECT two bigint columns that indicate how SQL Server is sorting the data. This is CPU intensive, so it isn't ideal, however it shows SQL Server's strange sorting behaviour of the uniqueidentifier column.

 
CREATE FUNCTION dbo.GuidHigh
(
	@g uniqueidentifier
)
RETURNS bigint
AS
BEGIN
 
	DECLARE @s varchar(40);
	SET @s = @g;
	-- @s is in the format 3B3A8D04-5D0C-4E0C-AC69-EFC14EE7D849
 
	SET @s = REPLACE(@s, '-', '');
	-- @s is in the format 3B3A8D045D0C4E0CAC69EFC14EE7D849
 
	DECLARE @highA varchar(40);
	DECLARE @highB varchar(40);
 
	SET @highA = SUBSTRING(@s, 21, 12);
	SET @highB = SUBSTRING(@s, 17, 4);
 
	DECLARE @high varchar(40);
	SET @high = @highA + @highB;
 
	DECLARE @MinBigInt numeric(21,0);
	SET @MinBigInt = 9223372036854775808;
 
	RETURN CAST(dbo.[HexStrToNumeric](@high) - @MinBigInt AS bigint);
 
END
GO
 
CREATE FUNCTION dbo.[GuidLow]
(
	@g uniqueidentifier
)
RETURNS bigint
AS
BEGIN
 
	DECLARE @s varchar(40);
	SET @s = @g;
	-- @s is in the format 3B3A8D04-5D0C-4E0C-AC69-EFC14EE7D849
 
	SET @s = REPLACE(@s, '-', '');
	-- @s is in the format 3B3A8D045D0C4E0CAC69EFC14EE7D849
 
	DECLARE @lowA varchar(40);
	DECLARE @lowB varchar(40);
	DECLARE @lowC varchar(40);
	DECLARE @lowD varchar(40);
	DECLARE @lowE varchar(40);
	DECLARE @lowF varchar(40);
	DECLARE @lowG varchar(40);
	DECLARE @lowH varchar(40);
 
	SET @lowA = SUBSTRING(@s, 15, 2);
	SET @lowB = SUBSTRING(@s, 13, 2);
	SET @lowC = SUBSTRING(@s, 11, 2);
	SET @lowD = SUBSTRING(@s, 9, 2);
	SET @lowE = SUBSTRING(@s, 7, 2);
	SET @lowF = SUBSTRING(@s, 5, 2);
	SET @lowG = SUBSTRING(@s, 3, 2);
	SET @lowH = SUBSTRING(@s, 1, 2);
 
	DECLARE @low varchar(40);
	SET @low = @lowA + @lowB + @lowC + @lowD + @lowE + @lowF + @lowG + @lowH;
 
	DECLARE @MinBigInt numeric(21,0);
	SET @MinBigInt = 9223372036854775808;
 
	RETURN CAST(dbo.[HexStrToNumeric](@low) - @MinBigInt AS bigint);
 
END
GO
 
-- do not include "0x" in the parameter, just a string like "8E75EF35FF75A977"
 
CREATE FUNCTION dbo.[HexStrToNumeric](@hexstr varchar(16))
RETURNS numeric(21, 0) -- enough for 2^64
AS
BEGIN
    DECLARE @hex char(2), @i int, @count int, @result numeric(21, 0), @power numeric(21, 0);
    SET @result = 0;
    SET @count = LEN(@hexstr)
    SET @i = 1
    SET @power = 1;
    WHILE (@i <= @count)
    BEGIN
	SET @power = @power * 16;
        SET @i = @i + 1
    END;
 
    SET @i = 1
    WHILE (@i <= @count)
    BEGIN
 	SET @power = @power / 16;
        SET @hex = SUBSTRING(@hexstr, @i, 1)
        SET @result = @result + @power *
                CASE WHEN @hex LIKE '[0-9]'
                    THEN CAST(@hex AS int)
                    ELSE CAST(ASCII(UPPER(@hex))-55 AS int)
                END
        SET @i = @i + 1
    END
    RETURN @result
END
GO

« Previous PageNext Page »