More parsing textfiles with LINQ
Originally published on blog.einbu.no April 2. 2009In a previous article, I described how to use LINQ when parsing a textfile.
Following that train of thoughts further, I found a more elegant way of splitting the lines from the file into columns. Creating extension methods on top of IEnumerable
from columns in reader.AsEnumerable().AsDelimited(delimiter) select ...
Or like this for a fixed position format:
from columns in reader.AsEnumerable().AsFixed(width, width, width...) select ...
The extension methods are added on to the IEnumerable<string> interface instead of directly to the TextReader class. This will make them even more flexible to use in more scenarios. Here's a sample implementation of them:
public static class FileParsingExtentions
{
public static IEnumerable AsDelimited(this IEnumerable strings, params char[] separators)
{
foreach (var line in strings.AsEnumerable())
{
yield return line.Split(separators);
}
}
public static IEnumerable AsFixed(this IEnumerable strings, params int[] widths)
{
foreach (var s in strings.AsEnumerable())
{
yield return s.AsFixed(widths);
}
}
public static string[] AsFixed(this string s, params int[] widths)
{
var columns = new string[widths.Length];
int startPos = 0;
int i = 0;
for (; i < widths.Length; i++)
{
columns[i] = s.Substring(startPos, widths[i]);
startPos += widths[i];
}
return columns;
}
}
So with this delimited test data:
id,description,x1,y1,x2,y2 1,top,10,10,10,100 2,left,10,100,100,100 3,bottom,100,100,10,100 4,right,10,100,10,10
we can use the AsDelimited extension method, and parse that file like this:
using (var reader = new StreamReader("testdata-delimiters.txt"))
{
var query = from columns in reader.AsEnumerable().AsDelimited(',')
select new
{
SegmentId = columns[0],
Description = columns[1],
x1 = columns[2],
y1 = columns[3],
x2 = columns[4],
y2 = columns[5],
};
foreach (var lineSegment in query.Skip(1))
{
Console.WriteLine(lineSegment);
}
}
The same data in a fixed position format:
1 top 10 10 10 100 2 left 10 100 100 100 3 bottom 100 100 10 100 4 right 10 100 10 10
Here we can use the AsFixed extension method passing in the column widths. The file is parsed with this code:
using (var reader = new StreamReader("testdata-fixed.txt"))
{
var query = from columns in reader.AsEnumerable().AsFixed(10, 10, 10, 10, 10, 10)
select new
{
SegmentId = columns[0],
Description = columns[1],
x1 = columns[2],
y1 = columns[3],
x2 = columns[4],
y2 = columns[5],
};
foreach (var lineSegment in query)
{
Console.WriteLine(lineSegment);
}
}