Diving deeper into Windows Azure table storage
Originally published on blog.einbu.no September 14. 2009In this previous post about Windows Azure Table Storage , I relied on the StorageClient project in the Azure SDK samples. This feels a bit strange, and raises the question: Am I expected to include references to sample projects and be using Microsoft.Samples.whatever namespaces in my future projects?
This raises a couple of questions about license, copyright, support and more. Instead of digging into those questions, I came up with some alternate questions:
- What does this sample project give us?
- How does it work?
- Can we do these things ourselves?
A lot of the searching was done in the sample code, since most of the other articles about accessing Windows Azure Table Storage depend on the same sample files. I was disappointed to see that even the Windows Azure SDK help file shows some partial code calling into the sample project. Little help there...
So, looking at that previous post again; All my entity classes inherit from TableStorageEntity and I use the TableStorageDataContext as a base class for accessing entities in the Azure Table Storage.
Why did I inherit TableStorageEntity?
My entities need not inherit TableStorageEntity at all. That class contains very little code. All thats needed for a entity class is to create a few properties and then decorate the class with the DataServiceKey attribute. Heres a minimum implementation:
using System.Data.Services.Common; [DataServiceKey("PartitionKey", "RowKey")] public class MyEntity { public string PartitionKey {get; set;} public string RowKey {get; set;} public DateTime Timestamp {get; set;} //...and our own properties }
The names and types of these properties are mandatory:
- PartitionKey (string) allows Windows Azure Table Storage to spread our data between different storage nodes. All entities with the same partitionkey reside on the same storage node. Splitting up our data on multiple partitions allows for massive scalability if needed.
- RowKey (string) must be unique within a partition. The PartitionKey and RowKey together make up the unique key for the entity.
- Timestamp (DateTime) stores the last modified time for the entity and can be used for optimistic concurrency. The value is set automatically by the Azure Table Storage.
In addition to this, the TableStorageEntity class also contains some convenience constructors taking parameters for PartitionKey and RowKey (which aren't really needed). That class also overrides the Equals and GetHashCode methods taking the PartitionKey and RowKey into account. This is not a requirement to make everything work.
...and what about TableStorageDataServiceContext?
Well, it turns out I didn't actually need that class either.
TableStorageDataServiceContext inherits DataServiceContext.
Previously I had used it for three things:
Infer the schema to Windows Azure Table Storage using the TableStorage.CreateTablesFromModel method.
The CreateTablesFromModel method determines which tablenames need to be registered. It does so by reflecting over the model. It looks properties of type IQueryable<T> where T is an entity. An entity is in this case defined as a class having at least one key. And a key is defined as a property with either the DataServiceKey attribute or by ending in "ID". (Devtablegen.exe is stricter than this, since it accepts only the DataServiceKey attribute with the specific key names PartitionKey and RowKey.)
Azure Table Storage has no notion of table schema, so we need only register the table name. Once registered, we can throw any kind of entities in there.
So if I can bypass the CreateTablesFromModel method, and instead register the tablenames manually, I should be good to go!
So how can we register a tablename in Azure Table Storage? Its all about putting together the right WebRequest. The easiest way to do that, is using the DataServiceContext from System.Data.Services (Astoria). If I can create a DataServiceContext object for my Azure Table Storage, its as easy as calling the AddObject method passing a table:
DataServiceContext ctx = new DataServiceContext(new Uri("http://myaccount.table.core.windows.net/")); ctx.SendingRequest += ctx_SendingRequest; ctx.AddObject("Tables", new MyTable() { TableName = "MyEntities" }); ctx.SaveChanges();
The ctx_SendingRequest eventhandler takes care of authentication. Authentication is an important part of the TableStorageEntity class, but we can implement it ourselves. This is an implentation of the SharedKeyLite authentication scheme taken from my previous article: Authenticating with Azure Table Storage.
static void ctx_SendingRequest(object sender, SendingRequestEventArgs e) { var account = "myaccount"; //replace with you account var sharedKey = Convert.FromBase64String("Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="); //replace with your key e.Request.ContentLength = 0; e.Request.Headers.Add("x-ms-date", DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture)); var resource = e.Request.RequestUri.PathAndQuery; if (resource.Contains("?")) { resource = resource.Substring(0, resource.IndexOf("?")); } string stringToSign = string.Format("{0}\n/{1}{2}", e.Request.Headers["x-ms-date"], account, resource ); HMACSHA256 hasher = new HMACSHA256(sharedKey); string signedSignature = Convert.ToBase64String(hasher.ComputeHash(Encoding.UTF8.GetBytes(stringToSign))); string authorizationHeader = string.Format("SharedKeyLite {0}:{1}", account, signedSignature); e.Request.Headers.Add("Authorization", authorizationHeader); }
And the table class looks like this:
[DataServiceKey("TableName")] internal class MyTable { public string TableName { get; set; } }
Infer the schema to the Development Storage with the devtablegen.exe utility.
When executing devtablegen.exe /?
on the commandline, the description says:
Creates tables for use with the development storage table service from a list of managed assemblies. The devtablegen tool scans the given set of assemblies for properties of type IQueryable<C> on classes derived from DataServiceContext. If the class C (or a base class of C) has the DataServiceKey("PartitionKey", "RowKey") attribute, then a table named after the name of the property is created with a schema corresponding to the declaration of class C.
See? There's nothing there indicating the need for TableStorageDataServiceClient. Here's what I need:
using System.Data.Services.Client; public class MyServiceContext : DataServiceContext { public IQueryable MyTableName {get; set;} }
Where MyTableName names the table to be created, and MyEntity defines the table's schema.
(Or we could just create the database and it's tables directly inside SQL Server, and then point development storage to use that database. When we then don't use devtablegen.exe, we don't need this class at all.)
Access the table storage to view or manipulate the data in the table store.
I thought this one was going to be a bit tricky, but I see that I have already done the grunt work. If I reuse the ctx_SendingRequest method from above and create a DataServiceContext again, I can use that DataServiceContext to query Azure Table Storage like this:
DataServiceContext ctx = new DataServiceContext(new Uri("http://myaccount.table.core.windows.net/")); ctx.SendingRequest += ctx_SendingRequest; var query = ctx.CreateQuery("MyEntities");
Or even use LINQ to filter the data before it is being sent over the wire:
foreach (var item in ctx.CreateQuery("MyEntities").Take(3)) { Console.WriteLine(item); }
Alternatively, you can use the RESTful XML based requests/responses that System.Data.Services (Astoria) gives you, as this is what Windows Azure Table Storage uses, but you'll still need to handle authenticating your requests as above. (My Authenticating with Azure Table Storage article mentioned earlier demonstrates how!)
So what now?
In this article, I've shown that you can use Azure table storage without the SDK or the StorageClient sample project. All the classes referenced by code on this page are part of the .NET framework already installed on your computer. So go ahead and access Azure Table Storage...
Also, I've kept only the minimum things needed to make it work. I've probably left out some try/catches and other checks for invalid values coming in and out of everything. I know I also skipped a lot of code to canonicalize the signed message.
The StorageClient sample project in the Windows Azure SDK handles the canonicalization parts and has some checks. Would it then be better to use the StorageClient sample project? Is it more robust. With this being a sample project, Microsoft won't back up those expectations: "... we [Microsoft] do not take responsibility that the implementation of the library is the best for perf/robustness. ". So expect little or no support using this sample.
This raises a fourth question:
Why isn't this functionality part of the SDK?
My guess is that the Azure dev team still are working on what the API will look like. Personally, I think pulling out canonicalization and authentication as seperate classes would be a good idea. This would give us several choices of how to Work with not only Table Storage, but also the blob and queue storages which use the same authentication schemes, and might have similar canonicalization needs. Even when using the REST API those libraries would be appreciated.
My guess is that the StorageClient sample wil evolve some more, and eventually the most important parts of it will make its way into the SDK.
Microsoft are however committed to the REST API. Maybe thats our safest bet at the moment. Remember, Windows Azure is still in beta. (At the time of writing I'm using the July CTP of Windows Azure SDK.)