Archive for the ‘.NET’ Category

The Future of Caching in .NET is “Velocity”…

velocity

Caching has always been a way to achieve better performance by bringing data closer to where it is consumed thus avoiding what can be a bottleneck to its original data source (usually the database).  Due to the architecture of the Internet itself, caching often takes place outside of the Enterprise.  This caching can happen in a user’s browser, on proxy servers, or on Content Distribution Networks (CDNs), etc.  This type of caching is great because results are served up without even entering the infrastructure of the enterprise that hosts the application.

Once a request makes it into the infrastructure of the Enterprise, it is up to us as Enterprise developers and Architects to efficiently handle these requests and the associated data in ways that yield good performance that will scale as to not overload the resources of the Enterprise.  To make this possible, there are various caching techniques that can be employed.

We are starting to see a trend towards applications that are becoming more data and state driven, especially as we are just beginning the journey into cloud-based computing.  At this year’s PDC, Microsoft has shown many new and enhanced technologies that are more and more data and state driven.  These include:  Oslo, Azure, Workflow Foundation, “Dublin”, etc.  To enable the massive scale that these technologies will help to provide, caching will become extremely important and will probably be thought of as a new tier in the application architectures of the future.  Caching will be crucial to achieve the scale and performance that users will demand.

Common Caching Scenarios

When thinking about the data in the Enterprise we begin to uncover some very different scenarios that could benefit from caching.  We also quickly realize that caching is not a “one size fits all” proposition because a solution that makes sense in one scenario may not make sense in another.  To get a better understanding of this, lets talk more about the three basic scenarios that caching tends to fall into:

Reference Oriented Data

Reference data is typically written infrequently but frequently read.  This type of data is an ideal candidate for caching.  Most of the time, when we think about caching data, this is the type of data that first comes to mind.  Examples of this could include: A product catalog, a schedule of flights, etc.  Getting this type of data closer to where it is consumed can have huge performance benefits.  It also doesn’t overload database resources with queries that are generating similar results.

Activity Oriented Data

Activity data is written and read by only one user as a result of a specific activity and is no longer needed when the activity is completed.  While this type of data is not what we typically think of as a good candidate for caching, it can yield benefits of scalability if we do find an effective caching strategy.  An example of this type of data would be for a shopping cart.  An appropriate, distributed caching strategy will allow better overall scale for an application because requests can be served easily by load balanced servers that do not require sticky sessions.  To handle this, the caching strategy must be able to handle many of these exclusive collections of data in a distributed way.

Resource Oriented Data

The trickiest of all is what’s know as Resource data. Resource data is typically read and written very frequently by many users simultaneously.  Because of the volatility of this data it is not often thought of as a candidate for caching but can yield big benefits in both performance and scale if an efficient strategy can be found.  An example of this type of data would include: The inventory for an online bookstore, the seats on a flight, the bid data for an online auction, etc.  In all of these examples, although the data is very volatile in a high throughput situation it would be very slow if every request needed to result in a database access.  The challenge for caching this type of data is having a strategy that can be distributed in order to achieve the ability to properly scale along with the necessary concurrency and replication so that the underlying data is consistent across machines.

Current Caching Technologies in .NET

There are existing .NET technologies that can be used to provide caching today.  Some of these are tied to the web tier (e.g. ASP.NET Cache, ASP.NET Session and ASP.NET Application Cache) while some are more generic in their usage (e.g. Enterprise Library’s Caching Application Block).  These are all great caching technologies for smaller applications but they have limitations that prevent them from being used for large Internet scale applications.

When it comes to larger Internet scale application caching technologies there are some 3rd party products in the space that do an excellent job, one of these being NCache by Alachisoft.  Microsoft is now also jumping into this space with a new caching technology codenamed “Velocity.”  Velocity can be configured to handle all of the caching scenarios described above in a performant and and highly scalable way.

What is “Velocity?”

“Velocity” is Microsoft’s new distributed caching technology.  Although it is not scheduled to be released until middle of 2009, it already has many impressive caching features with lots of very useful and ambitious features slated for future releases.  In Microsoft’s own words from the “Velocity” project website, they define “Velocity” in the following way:

Microsoft project code named “Velocity” provides a highly scalable in-memory application cache for all kinds of data. By using cache, application performance can improve significantly by avoiding unnecessary calls to the data source. Distributed cache enables your application to match increasing demand with increasing throughput using a cache cluster that automatically manages the complexities of load balancing. When you use “Velocity,” you can retrieve data by using keys or other identifiers, named “tags.” “Velocity” supports optimistic and pessimistic concurrency models, high availability, and a variety of cache configurations. “Velocity” includes an ASP.NET session provider object that enables you to store ASP.NET session objects in the distributed cache without having to write to databases, which increases the performance and scalability of ASP.NET applications.

In a nutshell, Velocity allows a cache to be distributed across servers which has huge scalability benefits.  It also allows for pessimistic and optimistic concurrency options.  This along with other features is what makes “Velocity” a great choice for the caching needs of large scale applications.

Features in Velocity

There are many features both existing and planned that will make Velocity an compelling caching technology that far exceeds the existing caching options offered by Microsoft.

Current Features:

Simple Caching Access Patterns Get/Add/Put/Remove

It is very easy using Velocity’s API to perform the standard cache access including Adding new items to the cache, getting items from the cache, putting updates to cache items and removing items from the cache.

Tag Searching

When saving a cache item to a specific cache “region” (regions are discussed later in this post) you are able to add one or more string tags to that entry.  It is then possible to query the cache for entries that contain one or more tags.

Distributed Across Machines

A cache can be configured to exist across machines.  The configuration options for this are quite extensive.  By allowing cache items to be distributed across machines the cache is allowed to scale out as new cache servers are added.  The other existing caching technologies offered by Microsoft only allow for scaling up.  Scaling up can be very expensive as the caching needs increase and often hit a limit as the the amount of caching that can be allowed.  With “Velocity” you can scale out a cache across hundreds of machines if needed effectively fusing the memory of these machines together to form one giant cache, all with low cost  commodity hardware.

High Availability

Velocity can be configured to transparently store multiple copies of each cache item when it is stored in the cache.  This provides high availability by helping to guarantee that a given cache item will still be accessible even in the event that a caching server fails.  Of course the more backup copies of an item that you configure the greater the guarantee that your data will survive a failure.  Velocity is also smart enough to make sure that each backup copy of a cache item exists on separate cache servers.

Concurrency

Velocity supports both Optimistic and Pessimistic concurrency models when updating an item in the cache.  With Pessimistic concurrency, you would request a lock when retrieving an item from the cache that you were going to update.  Then you would be required to unlock the object after it is updated.  In the meantime, no one else would be able to obtain a lock for that item until it was either unlocked or the lock expires.  With Optimistic concurrency, no lock is needed, instead the original version information is passed along when an item is updated.  During the update, Velocity will check to see if the version that is being updated on the caching server is the same version that was edited.  An error is passed back to the caller if the versions do not match.  For performance reasons it is always better to use optimistic concurrency if the situation can tolerate it.

Management and Monitoring

Velocity is managed using PowerShell.  There are over 130 functions that can be performed through PowerShell.  For example: you can create caches, set configuration info, start a cache server, stop a cache server, etc.

ASP.NET Integration

Velocity comes with a Session State provider that can be “plugged into” ASP.NET so that the Session State information is stored inside Velocity as opposed to the standard ASP.NET session provider.  Using the Velocity provider is completely transparent, the session object is used just as it was with the ASP.NET session provider.  This automatically scales session state across servers in a way that does not require sticky routers.

Local Cache Option

Velocity can be configured such that when an item is retrieved from the distributed cache that it can also be cached locally on the server where it was retrieved.  This makes it faster to retrieve the same object if it is asked for again.  The real savings here is in network latency and the time it would take for de-serializing the cache item.  Cache Items stored inside the Velocity cache are always serialized in memory but cache items stored in the local cache (when configured) are always stored natively as objects.  So, if the memory space can be afforded, this can be quite a performance boost.

Can add Caching Servers at Runtime

Several times during the sessions on Velocity at the PDC, the presenter would add or remove a caching server at runtime. When this was done, the cluster of Velocity caching servers would react immediately and start to redistribute cache items across the cluster.  This intelligence was very impressive and is the same smarts that is able to react if a caching server has a hardware failure, again redistributing cached items across the cluster.  With this ability it is possible to dynamically add more caching power at runtime without losing any cached data because of a system restart.

Future Features

Although no indication was given as to when the additional features described below would make it into the Velocity framework it was very encouraging to hear about the many features that were in the queue for a future release.  Here are some of the ones mentioned at the PDC:

Security

In future releases of Velocity, security will need to be a greater consideration.  Currently, information stored inside the Velocity cache is not secured in anyway.  Having access to the cache means that you have access to anything stored in the cache.  In future versions of Velocity there will be security options that will allow you to secure items in the cache using several different techniques. The future planned security options are:

  • Token-based Security –  when storing items in the cache you will be able to specify a security token along with that item.  That security token will need to be presented in order to retrieve the item from the cache.
  • App ID-based Security – This option will allow you to register a domain account with a named cache.  This way only specific users will be able to access a specified cache. Note: “Named caches” are discussed later in this post.
  • Transport Level Security – This option will allow the standard transport level security offered by the various WCF bindings.

Cache Event Notifications

In the future, when anything happens that affects the cache, a notification will be sent across the cache cluster and to any other subscribed listeners. These events, when implemented, will allow a view into all of the actions taken on the cache.  This will include notifications sent when items are added, updated and removed form the cache.

Write Behind

In many scenarios, it is the cache itself that we wish to front the data access to our system.  This can provide very high performance and throughput.  In order to make this as efficient as possible it would be great if it was possible to write to the cache and have the cache write the data to its backend data store.  This is what is referred to as “Write Behind”.  The actual writing to the backend happens asynchronously so that the caller does not have to wait for this write to happen and only has to wait for the item to be written to the cache memory.  This is possible and safe because of the high availability features offered in Velocity.  Because the cache data can be backed up inside the cache, there is little risk that a machine failure will prevent the data from being written to the backend data store.

Read Through

A future release of Velocity will also provide a “Read Through” feature. Read Through allows the cache to fetch an item if it doesn’t currently hold the item within the cache.  This is both a convenience and a performance enhancement because multiple calls are not needed to retrieve an item from the cache when it is not already there.  This again would allow the cache itself to be the data access tier with the cache itself handling the communications with the backend data source.

Bulk Access

Future versions of Velocity will provide access methods that will allow bulk operations to be performed on the cache.  This again can enhance performance in some scenarios simply by removing chatty calls to the cache when larger chunkier calls could be made.

LINQ support

Being able to query the cache using LINQ will open up many very interesting scenarios.  Having LINQ support will truly transform caching into its own robust tier in the overall architecture in the Enterprise.

HPC Integration

Other future features talked about with regards to Velocity involve High Performance Computing (HPC).  Up to this point, caching was all about placing data as close as possible to where it is consumed. When we begin to think about HPC scenarios, we are actually viewing the problem from the opposite point of view, that is, we are trying to put the processing as close to the data as possible.  In the PDC sessions, they mentioned that the Velocity team is interested in exploring ways to move calculations and processing close to the items inside the cache.

Cloud

With the announcement of Windows Azure and other cloud based initiatives, all major technologies at Microsoft are trying to figure out how they can fit into this new paradigm of computing.  It is not clear exactly how Velocity will take part in the cloud but I would imagine that at some point Velocity caching will be available to applications hosted in the Window’s Azure cloud.

Velocity’s Architecture

Physical Model

cachecluster

Velocity caching is architected to run on one or more machines, each using a cache host service.  Although you can run more than one cache host service on a single machine, it is generally not advised as you do not get the full protection that Velocity’s high availability feature has to offer when a machine failure occurs.  Multiple cache host services are meant to be run together as part of a cache cluster.  This cache cluster is really taken to be one large caching service.  The services that are part of the cluster actually interact together and are configured as a unit to offer all of the redundancy and scale that Velocity has to offer.  When the cluster is started, all of the cache service hosts that make up the cluster read its configuration information either from SQL server or a common XML configuration file that is stored on a network accessible file share.  A few of the cache hosts are assigned the additional responsibility of being “Lead Hosts”.  These special hosts track the availability of the other cache hosts and also perform the necessary load balancing tasks for the cache as a whole.

Logical Model

cacheregions

While the cache clusters and associated cache hosts make up the physical view of the Velocity cache, “Named Caches” are used to make up the logical view of the cache.  Velocity can support multiple name caches which act as separate isolated caches within a Velocity cache cluster.  Each Named Cache can span all of the machines in the cluster, storing its items across various machines that are configured to redundantly store items to achieve high availability which gives the named cache a tolerance for machine failure (since any item in the cache can live in multiple places on different machines).

Inside a given named cache, Velocity offers another optional level in which to cache objects called “Regions.”  There can be multiple “Regions” for any given Named Cache.  When saving cache items into a Region it is possible to add “Tags” to these items so that they can be searched and retrieved by more than a simple cache ID (which is how caches normally work).  The tradeoff in using regions is that all of the items in a given region are stored in the same cache host.  This is done because the cache items need to be indexed in order to provide the searching functionality that “Tags” provide.  Even though all of the items in a region are stored on the same host, they can still be backed up onto other hosts in order to provide the high availability that Velocity offers.  So, while Regions support a great tag based searching functionality, they do not provide the same distributed scalability that cache items have that do not use Regions.

Programming Model

So what does the code look like when getting and saving objects into the Velocity cache.

To access a Velocity cache, you must first create/obtain a cache factory.  From the cache factory you can get an instance of the named cache you wish to use (in the example, we are getting the “music” named cache).

CacheFactory factory = new CacheFactory();
Cache music = factory.GetCache("music");

Next, I will put an item into the cache.  In the example below, I am caching the “Abbey Road” CD using its ASIN number as the cache key.

music.Put("B000002UB3", new CD("Abbey Road", .,.));

To retrieve an item from the cache, simply call the cache’s “Get” method passing in the key to the cache item you wish to retrieve.

CD cd = (CD)music.Get("B000002UB3");

To create a region, simply call the “CreateRegion” method of the cache passing in the desired name for the region you wish to create.  In the example below, I create a “Beatles” region:

music.CreateRegion("Beatles");

Below, I show an example where two items are being put into the same region. When using regions, you must always specify the region along with the key and object you wish to cache.

music.Put("Beatles", "B000002UAU", new CD( “Sgt. Pepper’s…”,.));
music.Put("Beatles", "B000002UB6", new CD( “Let It Be”,.));

Lastly, below, I show how to retrieve a cache item from a regions.  Notice that the Region name must be specified along with the key.

CD cd = (CD)music.Get("Beatles", "B000002UAU");

How is High Availability Achieved?

highavail

As previously mentioned, Velocity has a feature that helps to promote high availability for the items in the cache. To gain this “High Availability” caches can be configured to to store multiple copies of an object when it is put into the cache.  What Velocity does when doing this is ensure that a given item is stored in multiple cache hosts (which is why it is advisable to only run one cache host per physical server).

In order to achieve high availability without adversely impacting performance, when putting an item into the cache, Velocity will write the cache item to its primary location and to only one secondary location before returning to the caller.  Then, after returning to the caller, Velocity will asynchronously write the cache item to other backup locations up to the number of backups configured for that cache.  Doing this ensures that the object being cached exists in at least two places (which gives the minimum requirement for high availability) but doesn’t hold up the caller while fulfilling all of the configured backup requirements.

Since Regions are required to live entirely inside one cache host, to achieve “High Availability” Velocity backs up the entire Region to another host.

Release Schedule

Just before the PDC08, Velocity had released its CTP2 (Community Technical Preview 2).  During the PDC they stated the release schedule for Velocity to be the following:

CPT3 would be released during the MIX09 conference (scheduled for Mid-March 09).

RTM release scheduled for Mid-2009.

Other Resources

If you would like to know more about Velocity, I would suggest you view the following presentations given at the PDC08:

Project “Velocity”: A First Look – presenter: Murali Krishnaprasad

WMV-HQ | WMV | MP4 | PPTX

Project “Velocity”: Under the Hood – presenter: Anil Nori

WMV-HQ | WMV | MP4PPTX

Also, there was a very informative interview on Scott Hanselman’s podcast “Hanselminutes”:  Distributed Caching with Microsoft’s Velocity

I would also recommend the Velocity article on MSDN: “Microsoft Project Code Named Velocity CTP2” as well as the Velocity Team Blog on MSDN.

Advertisements

F# – Functional Programming for the Masses….

fsharp

 

Functional programming has been around for quite some time.  Many functional languages predate the languages we use today.  In fact Lisp was created in the late 50’s.

Functional programming is based on lambda calculus which is a branch of mathematics that describes functions and their evaluations (see wikipedia).  Pure functional languages are built from mathematical functions and do not allow for any mutable state or side effects.  This means that no state can be modified and it cannot have IO. Because of this, until recently functional programming languages and concepts were primarily the domain of mathematicians and scientists doing theoretical work. 

So, why the new interest in functional programming? 

Some of this is due to functional programming concepts creeping into main stream languages such as C#.  With .NET 3.5, came LINQ (Language Integrated Query) which is firmly rooted in the functional programming paradigm.  LINQ essentially builds an expressions tree used to describe a query operation.  This operation is then evaluated “Lazily” at the moment it is needed.  This idea of lazy evaluation is also a core concept in functional programming language.

So why is functional programming important? 

While Moore’s law (which is the doubling of the number of transistors that can fit on an IC every two years) continues to be true, we are not seeing this same trend in clock speeds which have traditionally given us the performance boosts that we have seen over the years.  With the advent of multi-core processors, the future shows us that to increase performance, instead of clock speeds increasing (which has a physical limitation that is due to the speed of light), we will see an exponential growth in the number of processor cores.  Currently, two and four cores are popular but soon we will see 8, 16, 32 and more cores.

To take advantage of all of these cores, multithreaded programming will become increasingly more and more important as a way to exploit the new multi-core processors.  We all know that multi-threaded programming is very difficult at best.  With mutable state and multi-threaded programming, the possibility of encountering race conditions, dead locks and corrupted data goes up exponentially with the number of cores causing a much greater possibility of producing “Heisen-bug” (which are bugs that are very difficult to debug because the behavior is changed by the act of debugging).

Because Functional programming does not use mutable state, it is very suitable for being able to easily and safely handle computationally expensive processing across many, many cores in parallel,

Enter, Microsoft F#

Microsoft has recently introduced a new, mostly, Functional programming language built on top of the .NET CLR.  This new language, called F#, came out of the Microsoft Research group and is in the process of being productized so that it can be included in a future release of Visual Studio. 

F# at the PDC 2008

At the PDC08, Luca Bolognese, the product manager for F#, gave an excellent presentation on F#.  In his talk he created a program in F# that performed various financial calculations on stock prices that were downloaded from the internet.  What was really interesting was how he went about writing the program.  Typically, when writing programs using imperative languages like C#, we tend to take a top down approach.  Sure, we break our programs into methods but we think more or less in a procedural fashion.  With functional languages, as was the case in this presentation, the solution is often composed from the inside out, much like you would compose a complex SQL statement or an XSL transform.  With SQL or XSL, you often start in the middle by starting with a basic function then running it to check the results.  Then you slowly filter and modify the results to slowly build out the desired solution.  Because of this, when you look at the final result, it can look complex and you might often scratch your head when you come back to it at a later date. 

This inside-out composition technique is exactly how Luca built his solution.  Slowly getting to the final desired result.  The amazing thing was how little code it took to reach his end result compared to what would have been necessary if it were written in C#.  Much of the “noise” was removed by the F# solution and in the end it really described the problem at hand.  What really helped this approach is that F# uses type inference so you don’t need to declare your many of your types, this made the composition process of creating the program much easier.  This is not to imply that F# is not statically typed, it is NOT dynamic language.  In fact, functions are types in F#. 

Another technique he showed was currying.  This is basically piping the results of one function into the input of another.  This made the code very readable and gave a clear understanding of what was being accomplished.

After creating the program, Luca showed how simple it was to parallelize his solution to run across multiple processors.  To do this, he needed to make only slight modifications to the code but he didn’t need to add any locks, monitors, wait handles or any of the other constructs we normally associate with multithreaded programming.  In fact, his solution would safely run against as many processors as could be contained on a chip.  Amazing…doing this in an imperative language like C# would scare the most seasoned developer.

Since F# is built upon the CLR, it is a functional language that can interact with both the rich class library of .NET framework as well as with other assemblies written in other .NET languages.  This opens up lots of interesting scenarios and allows us the possibility of writing computationally expense algorithms  in F# which provide a safe story for parallelization. 

Here are links to Luca Bolognese’s presentation of F# at the PDC 2008.  I highly recommend it.  Luca is presentation style is both very informative as it is entertaining.  The bulk of his presentation was done as a demo which should please the developer in all of us.

An Introduction to Microsoft F# presenter: Luca Bolognese

WMV-HQ  |  WMV  |  MP4  |  PPTX

 

Other resources for F#

www.fshare.net

http://research.microsoft.com/fsharp/fsharp.aspx

http://www.ffconsultancy.com/dotnet/fsharp/index.html

PDC08, from my perspective…

Welcom to the PDC08

Welcome to the PDC08

The PDC was a great conference showing much of Microsoft’s future vision for their products and platforms.

I would say that this conference could be broken up into several major areas of interest:

  • Oslo
  • Azure Cloud Operating System
  • Live Services
  • User Interface – Silverlight
  • Languages – C#, Dynamic Languages, F#
  • Windows 7

Overall, I was very impressed with Microsoft’s ability as a company to coordinate the efforts of many diverse groups and technologies throughout the company. It seems that many of the efforts and initiatives that Microsoft has undertaken are starting to come together into a common cohesive vision. That said, it was also evident to me that they still have a bit of work to do to bring all of this new technology together for prime time usage. Although the PDC is an event that usually occurs only every few years, I did hear a rumor that Microsoft has already announced a PDC09. This makes me believe that the timing of this PDC may have been a little aggressive. For many of the Azure and Live services sessions the presenter was in constant IM contact with the Microsoft datacenter. That tells me that there were lots of stability concerns regarding the products that had strong reliance on their cloud services.

My personal interests drove me to many of the sessions on their cloud-based services and Oslo. It was pretty well know that Microsoft would be announcing a cloud-based platform and Live Mesh client-side platform but many of the details were clouded in secrecy before the PDC. Oslo was also talked about a little before the PDC and was shown in a little more detail at the PDC.

Over the last several years Microsoft has been building datacenters at a record pace. During the 1st keynote address, Ray Ozzie talked about how Microsoft realized that in building up their own web-based internet properties that many of these activities were being undertaken by many large and smaller companies around the world. While many large companies could afford to build large datacenters that provide reliability, redundancy, fault tolerance, etc. Many not so big companies were struggling with this overwhelming task. Even very large companies were having trouble with scaling out their services to handle geo-location and fault tolerance. It was becoming clear to Microsoft that cloud based services to complement on premises software was needed. In his keynote, Ozzie recognized similar efforts by both Google and Amazon in this area.

With Windows Azure, Microsoft differentiates itself with cloud based services offered by both Google and Microsoft by offering a service that is:

  • Abstracted out to be completely elastic, by allowing computing power to by dynamically sized and scaled at runtime so that you can handle peak loads without paying for more than you need.
  • Geo-location, by being able to spread a cloud-based deployment across the globe.
  • Fully fault tolerant. All data and software in the cloud is located in several places in the cloud and always spread between servers at different locations.
  • An infrastructure that can easily be connected to on-premises software through an internet service bus.
  • And the list goes on…

With Windows Azure will come many interesting deployment tools that will make deploying and managing applications in the cloud easy.  Microsoft has created many new and interesting technologies to create what they call “Fabric”.  This “Fabric” is the abstraction that sits on top of the actual servers that are running inside their many datacenters.  You can think of this “Fabric” as an abstraction that is a few levels higher than virtual machines.

From all of the various sessions on Azure and cloud-based services it was very clear that the new world of cloud-based software was going to require us to think differently about how we architect, design and implement our software so that it is better suited to fit into a cloud-based paradigm which, in my opinion, is where things will be headed in the next ten years.  Many of these practices involve the proper decoupling of software as well as other practices that are just part of good software design.

As part of this new cloud-based initiative, it is clear that WCF and WF (Windows Workflow) are going to be two very fundamental enabling technologies. Up to the point, we haven’t seen too much usage of Windows Workflow but it is clear that this will be a large part of many cloud-based applications.  As part of this, Microsoft announced a new server product called “Dublin” which is an application server built using WCF services that front Windows Workflows.  This server product has many advances in Windows Workflow that make it an ideal choice for hosting workflow in the cloud.

There were also talks given on the new data technologies that expand Sql Server into the cloud.  These were dubbed “Sql Services”.  One of the new services is Sql Server Data Service.  This service provides a new way to retrieve data over the internet using REST-based protocols.  These new REST protocols provide ways to query and update data using standard HTTP verbs, such as Get, Post, Put and Delete.  These protocols are built on the idea that specific data resources are uniquely addressable using URIs.  REST based protocols are built to scale like the internet itself.  It was clear that in addition to SOAP based protocols, Microsoft was heavily investing in REST-based protocols as well.

I also attended a session on storing scalable data in the cloud. From what I could gather, one will need to rethink how data is organized in order to take full advantage over the scalability that the cloud offers  What was described was very close to what Amazon does with it’s data storage services.  Basically, there are three storage items in cloud based storage services: blobs, tables and queues.  From what I could understand, the table storage was pretty basic.  Much of the storage seemed to center around entities which play nicely into Microsoft’s Entity Framework (which was just released with .NET 3.5 service pack 1).  They alluded to providing more relational storage in then cloud in the future but none of the cloud storage options had any relational capabilities.  I would imagine that much of this is because they want to provide a massively scalable and reliable data platform and the relational aspects would make this task much more difficult (which is the same route that Amazon has appeared to take).  Not sure where relational cloud storage would fit it but I would think that they would need to have a story here.

In addition to recognizing how difficult it is for companies to write highly scalable software and the need for cloud-based computing, Microsoft has put a considerable investment into “Oslo” which is a software modeling technology.  In recent years, software has become more and more complex.  It is also recognized that there are many aspects of software development that needs to be coordinated. These aspects include business analysis, software architecture, software design and implementation, software deployment. etc.

Over the past 20 years, there have been many attempts to model the software development process.  Many of these attempts, like Rational Rose, have had very limited success.  With “Oslo” Microsoft has taken software modeling to the extreme.  This appears to be one of Microsoft’s most ambitious projects in recent years.  In “Oslo” all of the modeling is stored in a database repository.  On top of this repository there is a highly customizable user interface that is used to explore this repository.  The user interface is very interesting and provides very in depth views into a given software application and how it connects to other various applications and services.

In addition to providing a graphical view of software, “Oslo” also provides a new modeling language called “M” that can be used to model software.  “M” makes it easy to also create Domain Specific Languages (DSLs) to make it easy to create applications.

As I mentioned, all of the modeling is stored inside a Sql Server database repository.  Runtimes that are meant to drive WCF/WF etc. are then driven from the data stored in the Oslo repository.

While I was very impressed by what I saw in “Oslo” it appeared to be a long way off.  Done right, this could really change the way we write and think about software in the future although it will be many years, in my opinion, before “Oslo” will be able to make that type of impact.

Well…that’s the long drawn out overview of PDC.  I will add more posts that target the specific sessions I attended along with links to the online videos of those sessions.