Thursday, October 4, 2018

Modest Request for Help

I have been battling my fading brain to write a program to convert a CSV file into something Microsoft Access can import.  With enough time, I can do it myself.  But I have final edits due for a Southern Illinois University Law Journal article in the next couple of days, and another to write for a symposium in January.  I also need to start preparing first semester American History for spring.

The CSV has a number of columns with a header row.  The first column are MM/DD/YYYY dates.  In some cases YYYY only or MM/YYYY.  These need to turn into three columns, month, date, year.  Where missing a month or date, substitute a zero. The second column is a year number only and can be discarded.  A later column is: "if firearm, type."  This field may contain one or more comma separated types: shotgun, musket, pistol, rifle.  These two columns need to become "firearm (unidentified)" "shotgun" (which includes musket), "rifle" and "pistol".  Everything else remains the same.  All other fields are either text or numbers (usually indicating that a particular class of weapon was used in this mass murder).

If you can help, I think you can see why this data is so important.

4 comments:

Tim Henderson said...

I can help. Do you just need the file converted, or do you need the program (or script) itself so you can use it on other data sets? I can do either. My email is thenders1966@gmail.com. (I've helped out before on a couple of your other projects.)

Paul Sand said...

Clayton, would a Perl solution work for you? (I think you've mentioned having Linux around.)

If so, could you clarify what you mean by "these two columns"? The preceding sentences only refer to a single column.

If you would prefer to take this off-blog, please feel free to e-mail: sand.paul@gmail.com

Rusty Miller said...

Clayton,

You might want to give DataWrangler a try. It is a tool developed at Stanford University, and is intended to help with data cleaning and transformation. The developers have moved on to a newer, commercial version, but their free tool may be of use to you.

http://vis.stanford.edu/wrangler/

Cheers,
Rusty Miller

coderpunk said...

Is the output format for Access also CSV? If so this shouldn't be too hard to do.

Can you post examples of the input and of some of your desired output (preferably an example of each of the possible input variations).

This should be fairly straightforward to write in Python, which has a csv module for importing and exporting.

You say one of the fields is a list of comma separated types. How are these expressed in the original csv, are they quoted so that they end up in one csv 'field'?