More parsing textfiles with LINQ

In a previous article, I described how to use LINQ when parsing a textfile.

Following that train of thoughts further, I found a more elegant way of splitting the lines from the file into columns. Creating extension methods on top of IEnumerable<string> seems like a good idea! Something that could be used like this for a comma-separated file:

from columns in reader.AsEnumerable().AsDelimited(delimiter)
select ...

Or like this for a fixed position format:

from columns in reader.AsEnumerable().AsFixed(width, width, width...)
select ...

The extension methods are added on to the IEnumerable<string> interface instead of directly to the TextReader class. This will make them even more flexible to use in more scenarios. Here’s a sample implementation of them:

public static class FileParsingExtentions
{
	public static IEnumerable<string[]> AsDelimited(this IEnumerable<string[]> strings, params char[] separators)
	{
		foreach (var line in strings.AsEnumerable())
		{
			yield return line.Split(separators);
		}
	}
 
	public static IEnumerable<string[]> AsFixed(this IEnumerable<string[]> strings, params int[] widths)
	{
		foreach (var s in strings.AsEnumerable())
		{
			yield return s.AsFixed(widths);
		}
	}
 
	public static string[] AsFixed(this string s, params int[] widths)
	{
		var columns = new string[widths.Length];
		int startPos = 0;
		int i = 0;
		for (; i < widths.Length; i++)
		{
			columns[i] = s.Substring(startPos, widths[i]);
			startPos += widths[i];
		}
 
		return columns;
	}
 
}

So with this delimited test data:

id,description,x1,y1,x2,y2
1,top,10,10,10,100
2,left,10,100,100,100
3,bottom,100,100,10,100
4,right,10,100,10,10

we can use the AsDelimited extension method, and parse that file like this:

using (var reader = new StreamReader("testdata-delimiters.txt"))
{
	var query = from columns in reader.AsEnumerable().AsDelimited(',')
                    select new
                    {
			SegmentId = columns[0],
			Description = columns[1],
			x1 = columns[2],
			y1 = columns[3],
			x2 = columns[4],
			y2 = columns[5],
                    };
 
	foreach (var lineSegment in query.Skip(1))
	{
		Console.WriteLine(lineSegment);
	}
}

The same data in a fixed position format:

1         top     10        10        10        100
2         left    10        100       100       100
3         bottom  100       100       10        100
4         right   10        100       10        10

Here we can use the AsFixed extension method passing in the column widths. The file is parsed with this code:

using (var reader = new StreamReader("testdata-fixed.txt"))
{
	var query = from columns in reader.AsEnumerable().AsFixed(10, 10, 10, 10, 10, 10)
                    select new
                    {
			SegmentId = columns[0],
			Description = columns[1],
			x1 = columns[2],
			y1 = columns[3],
			x2 = columns[4],
			y2 = columns[5],
                    };
 
	foreach (var lineSegment in query)
	{
		Console.WriteLine(lineSegment);
	}
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.