You are all probably quite familiar with Microsoft’s Office software and have been using it for years. As many do, you likely get frustrated with some of the formatting assumptions office products make with documents and data sets. Today I am going to walk through my method for dealing with CSV files and Excel’s formatting blunders.
When dealing with large data sets in a cross platform office environment, it can be quite frustrating because Excel files are not compatible with the majority of database software. This is where CSV or Comma Separated Value file types help bridge the gap. A CSV is quite literally numbers and text stored in plain text and if a cell has a comma contained within it the CSV wraps that value in double quotes, like this: Option 1,”10,258″,Red.
For this post I created a short employee list with a few columns of data:
Using Excels built in cell formatting tool, I have instructed Excel to display the data in Column C “Department ID” as a 4 digit number, keeping the preceding zeros. This prevents Excels default action of dropping these zeros.
When dealing with large data sets for various organizations, it is surprising how many times cells begin with zeros. From large corporations department ID’s to SKU numbers for eCommerce or internal job codes.
Whatever it may be, Excel can handle the display if you tell it about this custom format.So, that all seems pretty straightforward, right? Right.
We created this data set in Excel and want to import this information to a 3rd party database application, like a time clock. This application doesn’t read Excel files, because they include XML language. We won’t dive into what XML is here, just know it’s how Excel stores column formatting, colors, font types, and other visual related items to a spreadsheet. Enter CSV, a common data exchange format that has been around since computers used punch cards. It has been and still is used due to its human and machine readability. So you export your fancy excel table to a CSV and import it into the third party software and away you go! Easy as pie.
But what’s this?
Saving as a CSV in Excel throws up a scary message about features of your workbook and you start to second guess if this CSV format is right for you.
Alas! Don’t fear, this is just Excel’s way of telling you that saving your dataset as a CSV will stripe any column formatting, font styles, colors, etc. and this is what we want! We want just the data, nothing else. We only want the data because the majority of databases have custom user interfaces which display the data already formatted for the end user.
The biggest issue I see is when editing a CSV.
When possible, always, always go back and edit the original Excel file, then export as CSV and import into your third party database.
Of course this isn’t possible in every situation.
Lets say you lost your original Excel file, gnomes ran off with it, or maybe your computer crashed. Another possibility is you received a CSV export from a third party database and you need to format and update it for importation into another program. Either way, this is where the real fun starts.
You will quickly see when opening your CSV in Excel, it makes some assumptions about numbers and dates, which are usually wrong.
In this example, Excel has taken the liberty of dropping all those pesky leading zeros.
Now you have two options,
1. Save your data as an Excel file and start manually formatting all the columns again.
- This option is okay if you have a small data set, without hundreds of columns and thousands of rows.
2. Use the Text Import Wizard, sorry no pointy hat or potions with this wizard.
- This option is really for optimizing large data sets.
The Text Import Wizard
Rename file from .csv to .txt
Using open command in Excel, Show “All Files”, open .txt file
Set Import Options for Delimited data with header rows (if your data has headers)
Select Comma as Delimiter
Format Column Data – This is the most important step, this tells Excel to read the column as plain text and not to convert it to a number. Converting it to a number with strip the leading zeros.
Click Finish and Excel will import the txt files data as a spreadsheet, of course missing all the XML styling elements but all the data you need is now there and correctly formatted.
You will notice the little cell error arrow, this is Excel looking at the data and alerting you that it see’s a number which you have told to read as plain text. Inherently Excel looks for numbers because it needs numbers for formulas, graphs, and arithmetic.
As long as you don’t let Excel convert any columns marked as plain text back to numbers, you won’t lose the plain text digits you worked so hard to format correctly.
Although, when adding rows to this column, you will need to format the row with your custom format option (0000) as we did above.
Now that you have imported data from a CSV file and saved your formatting, you can save as an Excel file, make your edits, then save as a CSV for transferring to your database software.