This is a follow-up to this post. (As I'm reviewing this post, I ran across this article. Interesting stuff).
One of my projects requires that several files are imported into SQL Server. The files are all fixed-width. Most of the files are under a megabyte, but several are between 10 and 30 MB, three of the files are between 100 and 130 MB and one of the files weighs in at a hefty 950 MB.
The small files are easy. I wrote a C# app that uses regex's to split the data. I then create the SQL to do the inserts and add the sql to a stringbuilder object. Once all the small files have been processed, I write the contents of the stringbuilder to a file and then exec osql. It's not the optimal solution, more of a hackish proof-of-concept than anything, but it works.
The big files are what I'm working on now. I figured I'd tackle the biggest first.
My first pass was to create a DTS package to suck the data in. This was a tedious task because I had to delimit all the fields so it would know where to parse the data. Once that was done, I had to make a couple of minor changes to my table, but overall, it worked well, although on my test hardware, it's taking about 45 minutes.
I'm currently in the process of writing a BCP format file to see if I get any gains. Writing a format file sucks. It's more tedious than the DTS package. :-) Anyway...as I'm working on this stuff, I can't help but think there's gotta be an easier way to deal with these situations. I have some ideas floating around...ideas that are probably worthy of their own posts.