I’ve always been a big proponent of the philosophy - use the best tools that money can buy. But sometimes those tools can be very hard to run in a serverless cloud environment or the licensing terms can be so restrictive that it’s hard to know which hoops to jump through to make everything legit.
One of the most common tasks I do in designing data-driven, cloud systems is importing data; especially from Microsoft Excel format. I had recently been using a paid tool because of the many different options it provided and the ease in quickly getting up to speed. However, recently, I had the need to find an Excel library that could easily be installed from Nuget and run in Azure without having access to install an .msi on the server or configure the server in anyway for licensing.
I landed on NPOI. It’s a port of the Java POI project (http://poi.apache.org) to the .net platform and it works really well. Like most open source projects though, it can be hard to track down the exact documentation or examples you need that are relevant to the code you need to write.
The task at hand for me was that I needed to import and transform several different Excel formats into an existing SQL user table. The formats were not guaranteed to always be the same and I needed a way to quickly add new formats without writing a bunch of new code each time.
Thanks to NPOI and a super handy nuget package called Npoi.Mapper from Donny Tian, I was able to complete this in about 75 lines of code.
And the Excel file I am working with looks like this:
Most of the magic happens by taking advantage of Npoi.Mapper. My MurphyExcelFormat class has some special attributes. You can see that I can map a property on my .net class to a column in the Excel file by either giving the name or the index of the column. Since the Username column in the Excel file actually has kind of a weird header “Without’@’”, I chose to map to that column by index instead of name. This is not the only thing that Npoi.Mapper does but it’s easily the most beneficial!
To use the mapper, you can see that all I have to do is (on line 27) create a new Mapper and tell it to use my custom MurphyExcelFormat class. That’s it! From that point on I can reference each row as a native .net class in my code. Very cool!
The other important thing to note, that I don’t see pointed out in NPOI examples floating around on the web, is that I use the WorkbookFactory static class to get an instance of IWorkbook. This is important because I might not always know if I’ll be working with .xls or .xlsx files. But if I use the factory method, it will take care of it for me so that I don’t have to know ahead of time.
Now that everything is working smoothly with this particular Excel file from this particular customer, all I need to do now is create my own class factory that maps different Excel layouts to some yet-to-be-created base class that I can work with in my code. That class will probably end up looking a lot like my MurphyExcelFormat class but I think from here, you can get the picture pretty quickly and how powerful this NPOI + Mapper combination is!