Category Archives: POI Framework

Apache POI – Formatting of ‘General’ data extracted from Excel

I just fixed an issue that has to do with pulling data out of excel using the Apache POI framework into Java and having the data’s format change along the way. It is not a particularly difficult issue, but when researching it I found that there were a couple of other posts on it that didn’t really solve it, or solved it in a round-about way. So my solution is here:

The issue I ran into was that there were cells in Excel that were meant to be strings but had the default Excel cell format of “General.” If a normal string was in one of the cells in this column, then it would be correctly read into the Java code as a string. However, if a value was like ‘123123123,’ it would be read in as an HSSFCell.CELL_TYPE_NUMERIC and would convert to a string in scientific notation. These needed to be read in exactly as there were seen in the spreadsheet.

Other solutions I saw included additional reformatting once the data was inside Java, but that seemed too dirty to me. I would have to make assumptions on what the user put into the spreadsheet. There should be a way to pull the data from the Excel file and receive a string that looks exactly like what is shown in Excel.

A quick solution to this would be to change the formatting of the cell in Excel to ‘Text’ so that the fields were always designated correctly. But, in my case, I cannot always control what is in the Excel file. A user might change the formatting to whatever they want, but my system expects a String. So this was not a solution.

What I can control is the java code, of course. So after some digging I found two keys that would help me solve this issue.  If I am trying to convert a HSSFCell.CELL_TYPE_NUMERIC value to a string, I need to use these two things:

  1. I can get the format of the cell that is being used by Excel with cell.getCellStyle( ).getDataFormat( ). This returns a short.  The Excel format for “General” is 0 and the format for Scientific Notation is 11 (I found this through debugging, but a table exists at least here in the ‘Value’ column).  So I know if the user really wanted scientific notation, or if the value was just formatted like that when passed to me.
  2. Once I know the user did not mean for the value to be in scientific notation, I can check the string for the existence of an “E,” and if it exists I will pull the string from Excel using Excel’s formatting instead of my own.  I do this with value = new HSSFDataFormatter().formatCellValue( cell ).  This converts the cell to a string using the Excel cell format pattern instead of trying to reformat the numeric value in the HSSFCell in Java.
I could simply use this formatCellValue all of the time instead of even checking I suppose, but i’m doing some other custom formatting as well so this is treated as a special case.
So this is the solution I ended up using.  The key is to check the format that is set up in Excel, and then use it directly if it produces a cleaner result in the Java code.  Hope that helps someone.

I’m not maintaining this blog here so commenting is closed… Please see this same post on my new site to ask questions: http://mpickell.com/blog/2011/06/03/apache-poi-formatting-of-data-extracted-from-excel/

Advertisements