Arjan's blog

More parsing textfiles with LINQ

Originally published on blog.einbu.no April 2. 2009

In a previous article, I described how to use LINQ when parsing a textfile.

Following that train of thoughts further, I found a more elegant way of splitting the lines from the file into columns. Creating extension methods on top of IEnumerable seems like a good idea! Something that could be used like this for a comma-separated file:

from columns in reader.AsEnumerable().AsDelimited(delimiter)
select ...

Or like this for a fixed position format:

from columns in reader.AsEnumerable().AsFixed(width, width, width...)
select ...

The extension methods are added on to the IEnumerable<string> interface instead of directly to the TextReader class. This will make them even more flexible to use in more scenarios. Here's a sample implementation of them:

public static class FileParsingExtentions
{
    public static IEnumerable AsDelimited(this IEnumerable strings, params char[] separators)
    {
        foreach (var line in strings.AsEnumerable())
        {
            yield return line.Split(separators);
        }
    }

    public static IEnumerable AsFixed(this IEnumerable strings, params int[] widths)
    {
        foreach (var s in strings.AsEnumerable())
        {
            yield return s.AsFixed(widths);
        }
    }

    public static string[] AsFixed(this string s, params int[] widths)
    {
        var columns = new string[widths.Length];
        int startPos = 0;
        int i = 0;
        for (; i < widths.Length; i++)
        {
            columns[i] = s.Substring(startPos, widths[i]);
            startPos += widths[i];
        }

        return columns;
    }

}

So with this delimited test data:

id,description,x1,y1,x2,y2
1,top,10,10,10,100
2,left,10,100,100,100
3,bottom,100,100,10,100
4,right,10,100,10,10

we can use the AsDelimited extension method, and parse that file like this:

using (var reader = new StreamReader("testdata-delimiters.txt"))
{
    var query = from columns in reader.AsEnumerable().AsDelimited(',')
                select new
                {
                    SegmentId = columns[0],
                    Description = columns[1],
                    x1 = columns[2],
                    y1 = columns[3],
                    x2 = columns[4],
                    y2 = columns[5],
                };

    foreach (var lineSegment in query.Skip(1))
    {
        Console.WriteLine(lineSegment);
    }
}

The same data in a fixed position format:

1         top     10        10        10        100
2         left    10        100       100       100
3         bottom  100       100       10        100
4         right   10        100       10        10

Here we can use the AsFixed extension method passing in the column widths. The file is parsed with this code:

using (var reader = new StreamReader("testdata-fixed.txt"))
{
    var query = from columns in reader.AsEnumerable().AsFixed(10, 10, 10, 10, 10, 10)
                select new
                {
                    SegmentId = columns[0],
                    Description = columns[1],
                    x1 = columns[2],
                    y1 = columns[3],
                    x2 = columns[4],
                    y2 = columns[5],
                };

    foreach (var lineSegment in query)
    {
        Console.WriteLine(lineSegment);
    }
}