More parsing textfiles with LINQ
Originally published on blog.einbu.no April 2. 2009In a previous article, I described how to use LINQ when parsing a textfile.
Following that train of thoughts further, I found a more elegant way of splitting the lines from the file into columns. Creating extension methods on top of IEnumerable
from columns in reader.AsEnumerable().AsDelimited(delimiter) select ...
Or like this for a fixed position format:
from columns in reader.AsEnumerable().AsFixed(width, width, width...) select ...
The extension methods are added on to the IEnumerable<string> interface instead of directly to the TextReader class. This will make them even more flexible to use in more scenarios. Here's a sample implementation of them:
public static class FileParsingExtentions { public static IEnumerableAsDelimited(this IEnumerable strings, params char[] separators) { foreach (var line in strings.AsEnumerable()) { yield return line.Split(separators); } } public static IEnumerable AsFixed(this IEnumerable strings, params int[] widths) { foreach (var s in strings.AsEnumerable()) { yield return s.AsFixed(widths); } } public static string[] AsFixed(this string s, params int[] widths) { var columns = new string[widths.Length]; int startPos = 0; int i = 0; for (; i < widths.Length; i++) { columns[i] = s.Substring(startPos, widths[i]); startPos += widths[i]; } return columns; } }
So with this delimited test data:
id,description,x1,y1,x2,y2 1,top,10,10,10,100 2,left,10,100,100,100 3,bottom,100,100,10,100 4,right,10,100,10,10
we can use the AsDelimited extension method, and parse that file like this:
using (var reader = new StreamReader("testdata-delimiters.txt")) { var query = from columns in reader.AsEnumerable().AsDelimited(',') select new { SegmentId = columns[0], Description = columns[1], x1 = columns[2], y1 = columns[3], x2 = columns[4], y2 = columns[5], }; foreach (var lineSegment in query.Skip(1)) { Console.WriteLine(lineSegment); } }
The same data in a fixed position format:
1 top 10 10 10 100 2 left 10 100 100 100 3 bottom 100 100 10 100 4 right 10 100 10 10
Here we can use the AsFixed extension method passing in the column widths. The file is parsed with this code:
using (var reader = new StreamReader("testdata-fixed.txt")) { var query = from columns in reader.AsEnumerable().AsFixed(10, 10, 10, 10, 10, 10) select new { SegmentId = columns[0], Description = columns[1], x1 = columns[2], y1 = columns[3], x2 = columns[4], y2 = columns[5], }; foreach (var lineSegment in query) { Console.WriteLine(lineSegment); } }