Jeff Keen

Dev Stuff

07 Aug 2019

How to Correct 32,000 Incorrect CSV Files in Fewer Than 32,000 Steps

Or how impossible problems can become possible when given no other choice.

CSV feels like the simplest of file formats, where it might seem that there’s not much to know after mentally expanding the acronym - Comma Separated Values. But tell me this: if it’s so simple, then why are there so many CSV parsing libraries, alternative CSV parsing libraries, and CSV parsing libraries that claim to be better or smarter, and a mountain of mangled CSVs in existence?

CSV isn’t so much a file format as it is a loose set of guidelines for converting tabular data into text. The closest thing to a spec for it is this, which deals with vital and often overlooked questions such as:

“What happens if a value has a comma in it?” - oh, you quote it
“What happens if a value has a quote in it?” - oh, you put another quote before it

One question the spec definitely does not cover is one I needed answering: “What do you do with 32,000 files claiming to be valid CSVs but of the 750,000 some lines an unknown number of them have extra unquoted commas hidden in the values, basically making the data untrustworthy?” This is not such a simple problem, but it’s an interesting problem.

Read the rest of this

Ruby Gem active

Comma Splice 07 Aug 2019

Dev Stuff

01 Jun 2019

FCC Ruby Gem

I just updated a gem I wrote in 2011 (which the FCC actually starred and forked, lol) to use their new API, which apparently knows about caching now. It doesn’t provide all the same data as the old one did, which is kinda weird. No “signal strength”? Why? So the gem can still query the old, horrifically slow and crusty API if you want it to.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
station = FCC::Station.find(:fm, "KOOP")

if station.exists? && station.licensed?
  #Basic attributes, available quickly because the FCC actually caches these in a CDN: 
  station.id #=> 65320
  station.status #=> LICENSED
  station.rf_channel #=> 219
  station.license_expiration_date #=> "08/01/2021"
  station.facility_type #=> ED
  station.frequency #=> 91.7 
  station.contact #=> <struct FCC::Station::Contact>
  station.owner #=> <struct FCC::Station::Contact>
  station.community #=> <struct FCC::Station::Community city="HORNSBY", state="TX">

  # Extended attributes, takes several seconds to load initially because the FCC is running this endpoint on a 1960s era mainframe operated by trained hamsters. 
  station.station_class #=> A
  station.signal_strength #=> 3.0 kW
  station.antenna_type #=> ND
  station.effective_radiated_power #=> 3.0 kW
  station.haat_horizontal #=> 26.0
  station.haat_vertical #=> 26.0
  station.latitude #=> "30.266861111111112"
  station.longitude #=> "-97.67444444444445"
  station.file_number #=> BLED-19950103KA
end

Ruby Gem active

FCC 01 Jun 2019

Dev Stuff

09 Apr 2015

Popularity Gem

I’ve been working on features for this website lately, one of which has been figuring out silly social sharing links. I’ve had a website long enough to remember the days of building a comment system in Perl, having a hit counter, and that being good enough.

Now you need to have all sorts of crap on a page in order to make sharing as easy as possible. One option is to piecemeal all the different networks together, but that requires giving up some control of your aesthetic, which I am not a fan of.

I hereby agree to have my site polluted with social sharing buttons in exchange for increased chances of going viral

Or you use a plugin like AddThis, which makes all that easier, maintains some control of your aesthetic, and also allows the people on your site a comical number of options for sharing.

Read the rest of this

Ruby Gem stale

Popularity Gem 09 Apr 2015

Dev Stuff

30 Dec 2010

Tracking Number Gem

I had this idea for (yet another) package tracking web service, and in the process of making it I got really into tracking numbers.

So I wrote this gem, which made it possible to detect and identify tracking numbers, and to tell if it’s even valid.

1
2
3
4
5
6
7
  t = TrackingNumber.new("MYSTERY_TRACKING_NUMBER")
  t.valid? #=> false
  t.carrier #=> :unknown

  t = TrackingNumber.new("1Z879E930346834440")
  t.valid? #=> true
  t.carrier #=> :ups

Also, can take a block of text and find all the valid tracking numbers within it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit,
  sed do eiusmod tempor incididunt ut labore et dolore
  magna aliqua. Ut enim ad minim veniam, 1Z879E930346834440
  nostrud exercitation ullamco laboris nisi ut aliquip ex
  ea commodo consequat. Duis aute 9611020987654312345672 dolor
  in reprehenderit in voluptate velit esse cillum dolore eu
  fugiat nulla pariatur. Excepteur sint occaecat cupidatat
  non proident, sunt in culpa qui officia deserunt mollit
  anim id est laborum."

  TrackingNumber.search(text)

  #=> [TrackingNumber, TrackingNumber]

There’s a lot more information baked into a tracking number than you’d think.

Ruby Gem active

Tracking Number 30 Dec 2010

Posts tagged with #ruby gems

How to Correct 32,000 Incorrect CSV Files in Fewer Than 32,000 Steps

Or how impossible problems can become possible when given no other choice.

FCC Ruby Gem

Popularity Gem

Tracking Number Gem