Version Control – Part 4: Distributed Version Control

There are many reasons to create branches in the Update/Commit model and Distributed Version Control really excels in an environment with many branches. Take the examples in the previous post:

In Distributed Version Control, each of these would be “repository/working directories” (repository for short). I put these terms together, because in distributed version control, your repository contains both the complete history and metadata and the working directory in which to edit the files. What this means is that you can branch the code without anyone else knowing. It also means, you *have* to branch in order to get a working directory to edit files. This encourages developers to segregate their tasks into branches and then merge each feature back to a central repository when the work is done.
On the server you might have the following repositories:

  • Trunk
  • Released (which contains v1.0, v1.1, and v2.0 labels)

On your local machine you might have the following repositories based on the Trunk repository on the server:

  • Foo Feature
  • Bar Feature

Once a feature is complete, you can push the changes to the Trunk repository and then delete your local feature’s repository.

This link has a more detailed explanation on how distributed source control works.

Posted in Source Control | 3 Comments

Version Control – Part 3: Branching/Merging

Branch and Merge

Many people who implement Update/Commit typically organize the server into the Trunk and Branches folders:

This allows you work on long term, scheduled features in Trunk, while making unexpected bug fixes in the latest branch: v2.0. The changes in the v2.0 branch can easily be merged back into Trunk using the version control software.

After you release the code and before working on the next feature, you branch your code to the Branches\v2.0 directory. When a bug needs to be fixed in the released version, you

  • Fix the bug in the Branches\v2.0 directory (The blue line in the image above)
  • Commit the fix to the Branches\v2.0 directory
  • Merge (using your version control software) the change from Branches\v2.0 to the Trunk directory, which happens to have a change (green line).

Release Flexibility Problems

Now you have been using update/commit version control and you have released your software and it is being used by end users. The to-do list is growing and some features can be done quickly and some take longer. You can easily find yourself in a situation where some of the long-term features have been committed to the Trunk; however the business has requested a quick feature that they need ASAP. For instance, a new client has changed the priority of a quick feature.

Another problem is that developers commit changes to the code unaware of the current release cycle. You can find that a committed change during testing can delay the release when the committed change should have been held back for a later release. The fundamental problem is that developers “decide” what is included in the release, when this decision belongs to other people, such as a release manager, project manager or the test manager.

The Hack in the Update / Commit Model

There are several ways to address the problems above using the Update/Commit model, but the hack below will lead you naturally to distributed version control.

If you have become comfortable with branching and merging between Trunk and Branches\v2.0, then there is an easy next step to address this problem. Branching and merging each feature:

When a developer starts to work on a new feature, they branch from Trunk to Features\Bar Feature and start working in the Bar Feature folder. When it is decided that the Bar feature is finished and ready for release, they merge the code back to Trunk using the version control software. This allows a manager to decide to release the Bar feature even though the Foo feature is not ready.

Conclusion

Working on features in a branch allows people to decide late in the release cycle what will be included in the release. It also allows flexibility when priorities change in the middle of a release cycle.

The downside is that there will be many Feature folders that are no longer needed because they have been merged back, or have been orphaned. Linus calls these “expensive” branches, since they are intended to be either temporary or private. Distributed version control addresses this…

Posted in Source Control | 4 Comments

Version Control – Part 2: Update/Commit

The problem with Checkin/Checkout

Generally speaking, Checkin/Checkout interrupts developers. You want to finish your task, but you can’t continue on a portion of it, which then interrupts your thought process.

The more developers you have with the Checkin/Checkout model, the more interruptions each developer has and then development as a whole slows down.

The Answer: Update/Commit model

The Update/Commit model addresses this by allowing developers to edit the same files at the same time. No one has to wait. No one gets their thought process interrupted.

The two operations you have available are Update (download new code to your machine) and Commit (Upload your changes to the server). SVN’s docs have a detailed explanation here.

The Update/Commit work cycle is:

  • Update (download source code from the server)
  • Edit
  • Update (download and resolve conflicts)
  • Commit (upload source code to the server)

But, are you ready for two developers editing the same file at the same time? Most developers are wary about making this leap. After awhile you find that it isn’t an issue. Why isn’t it an issue? Surely you’ll have the same number of problems as with Checkin/Checkout?

Resolving Conflicts

Actually you have fewer problems with Update/Commit as compared with Checkin/Checkout. Every time two users want to edit the same file under Checkin/Checkout, it interrupts one developer which is a problem. However when two users edit the same file and commit the change to the server, there are two advantages:

  • Resolving the conflict happens when committing the work, not when editing, which does not interrupt developers’ thought processes.
  • Most conflicts can be resolved automatically by the version control software. I find that 90% of conflicts can be handled automatically.

If you find that you have to manually resolve the conflict in a file, you typically have 3 versions of the file:

  • Your changes to the file
  • Someone’s changes to the file
  • The final, resolved file. You edit this file with the goal of including your changes and other people’s changes.

Conclusion

Update/Commit is great. You and everyone else can work without getting into each other’s way and it seems like all the issues are solved. Why would you want distributed version control?

Posted in Source Control | 15 Comments

Version Control – Part 1: Checkin/Checkout

Linus Torvalds is big on distributed version control (http://lwn.net/Articles/246381/), and I’m starting to see the light. Distributed version control is something any organization should seriously consider, not just open source projects. This is a start of a series of blog entries that shows my progression of version control preferences and the advantages of each.

  • Checkin/Checkout (Source Safe)
  • Update/Commit (CVS/SVN)
  • Branching/Merging
  • Distributed version control (git and others)

Checkin/checkout
Checkin/checkout solves the main issue: Multiple developers working on the same body of source code. Developers check out a file, and no one else can edit it. Once the developer is done, they check it in and someone else can edit it. Easy. Simple. Everyone understands how it works.

A lot of Microsoft shops historically preferred this model for many reasons, but one is because Source Safe works this way. Source Safe can handle shared checkouts (which is very similar to the Update/Commit model), however anyone who has used it will tell you that shared checkouts in Source Safe will go wrong eventually. It is unfortunate that shared checkouts have scared many people away from the Update/Commit model, because the Update/Commit model has advantages over the Checkin/Checkout model when done right, for instance in Team System.

Posted in Source Control | 17 Comments

Moving SSAS databases

Here is an article on moving SSAS databases. The only thing to add is to be sure to set the file permissions on the folder.

http://www.ssas-info.com/ssas_faq/ssas_management_faq/q_how_do_i_move_analysis_services_database_to_a_different_folder.html

Posted in SSAS | 6 Comments

Natural Keys vs Surrogate Keys

This blog entry has a good description of the pros (and some cons) of surrogate keys:

http://rapidapplicationdevelopment.blogspot.com/2007/08/in-case-youre-new-to-series-ive.html

Posted in SQL Server Development | 9 Comments

SSRS filter values missing from a SSAS datasource

I recently ran into a problem where the list of values in a filter were missing for a dimension. It was due to having a fact relationship from the dimension to a very small fact table. In fact, the table was empty. I resolved it by changing the relationship to a regular relationship. Sure it will be less efficient, however the fact table is expected to be very small.

Posted in SSAS, SSRS | 5 Comments

SAN RAID Performance

It is difficult to find information on how different RAID levels perform. I did some performance testing with 14 SCSI disks in a SAN, and came up with the following conclusions (heavy dose of salt needed):

  1. More disks == faster array, but doubling the number of disks does not double the performance. I found that to double the performance of RAID 5 w/ 3 disks, the array needed 10 disks.
  2. Sequential reading/writing is faster than random reading/writing.
  3. When reading, the RAID level does not matter much.
  4. When writing, the RAID level is very important. RAID 1 (or RAID 10 depending on the number of disks) is the fastest by far, then RAID 5 and then RAID 6.
  5. Server SCSI is faster than a SAN
  6. A SAN is faster than my laptop.

SQL Server notes:

  1. OLTP databases generally have random reading and writing.
  2. OLAP databases generally have sequential reading and writing.
  3. When committing a transaction, the log must write to disk, but the MDF does not need to write to disk. So the write speed of the database it dependant on the write speed of the LDF file. This is why it is common to have the MDF on RAID 5 and the LDF on RAID 1 (or RAID 10).
  4. SQL Server recommends RAID 5 for read-only databases. I assume it is because you’ll get the most disk space compared to other RAID levels.
  5. Since sequential reading and writing is faster than random reading and writing, and since you need a lot of disks to double the performance of an array, it can be faster to isolate arrays for certain tasks, rather than combining all the disks into a large fast array. (I hope you took that heavy dose of salt)

More Information:

Graphs:

(I don’t show MB/s above 150 MB/s because the 2Gb/s cable was saturated at 160 MB/s).

Read MB/s

The RAID level matters when writing:

Write MB/s

The RAID level doesn’t matter when reading:

Read IO/s

The RAID level matters when writing:

Write IO/s

Posted in Hardware | 22 Comments

SQL Server 2008 Nov CTP Released

The November CTP for SQL Server 2008 has been released:

http://www.microsoft.com/downloads/details.aspx?FamilyId=3BF4C5CA-B905-4EBC-8901-1D4C1D1DA884&displaylang=en

Posted in SQL Server Administration | 9 Comments

List statistics and date updated

The following query will list all the statistics in the current database, and the date they were lasted updated.

SELECT object_name(object_id) as tablename,

 name AS index_name,

 STATS_DATE(object_id, stats_id) AS statistics_update_date

FROM sys.stats

WHERE object_id > 1000

ORDER BY object_name(object_id), STATS_DATE(object_id, stats_id);
Posted in SQL Server Administration | 6 Comments