Search This Blog

Friday, June 29, 2012

Broken by design -> AT&T wireless billing export -- not-csv



Note: I invite your comments!  If you were able to get this to work for you, please leave me a note.  Likewise, if you were NOT able to make it fly, leave a comment with as much detail as possible, and subscribe, I'll try to address all problem reports in a timely fashion. If you concur with my opinion regarding this abominable data format, by all means, leave a comment, if you would be so kind.

TIA



Updated: 2013-02-04

What is "not-csv" format?


The AT&T Wireless website offers a feature that allows users to export their billing/usage details to either Excel or "comma separated variable" (CSV) format.  Sadly, what they call CSV is practically the antithesis of CSV, it is thoroughly useless for importing into a database, ergo the term was coined: "not-csv".  It is an abomination, a travesty, and those responsible for it should be horribly and deeply ashamed.

Update: the stupid just gets stupider!

Since publishing this script, not-csv has evolved again -- as I knew it would.
  • In between the column names header line and the data lines for the text/data sections, they added [space][tab][new line][space][space][newline] (0x20 0x09 0x0A 0x20 0x20 0x0A).
  • They also added a trailing space on the text/data column names line itself -- completely necessary, I'm sure!
Those two gems broke my script, it would only import the first phone line's voice call data, then it bailed.  So once again, superflous padding was added, toward what end I can't even guess.

I also noticed the following -- this may not be brand new, I'm uncertain.  It didn't impact my script, but it's worth mention:
  • The text/data sections now have two sub-sections (but without a whole new column names header) one for picture/video messages and one for data.
  • Sadly the "item" field reverts to "1" at the start of each sub-section (which potentially violates my table's primary key of [item]+[date]+[phone line]).
  • But even if it didn't, in keeping with their legacy of shit, the KB values are formatted to separate thousands with a comma -- yeah!  A field in a CSV row that contains embedded commas but is not enclosed in quotes.  Nice work, guys, absolutely brilliant!
So to restate: my script does NOT support importing the second and third sub-sections of the text/data section, which contain records of picture/video messages and data usage, respectively.   


Can "not-csv" be converted to something useful?

I have written a script (in javascript) that will import the AT&T "not-csv" data format into SQL Server, convert it to actually usable CSV, or both.  That script is available here, see the link below.

Download the Script


Why import wireless billing details into a database?

If you aren't familiar with database technology and how to use it, the answer to that question is well beyond the scope of a blog post.  If you are, your wireless phone billing details contain a wealth of information about your life, work and family. 

Personally I have used this data to optimize my wireless account settings, identify usage trends, recover contact phone numbers after losing a phone, bill clients for phone consultation/support time, and keep loose tabs on entries into and exits from my daughter's circle of friends while she was in high school. 

Without importing into a database there's no way to analyze more than one billing period at a time, and if you have multiple lines on a family plan it further limits you to analyzing only one line at a time (without significantly editing the exported data as provided .)  Once it's in a database table, your analysis possibilities are limited only by your skill and imagination, you can go macro to micro to everything in between.


Downloading Billing Details:

At time of this writing the AT&T wireless billing details export page can be reached at this link (you must already be logged in for it to work.)  From this page, select the desired billing period, select the "CSV" option button and click the download button.
Note that this link has changed several times over nearly as many years; without a doubt it will change again, at which point the link will be broken.  (The same can be said of my script as well, I have had to refactor it three times since 2005.)


Installing and using the script:

Installation for import to SQL Server: 
  1. Download http://www.thinkset.com/ImportATTBilling.zip and extract the files to a directory.
  2. In SSMS create or choose a database, select it as the default/current database, and execute DDL.SQL on SQL Server.
  3. Decide whether you wish to use trusted security or SQL auth, examine the files trusted_wrapper.cmd and sql_auth_wrapper.cmd, and edit one to suit your environment. (Edit the connection string parameter to reflect your server and database.)
  4. Rename the .cmd file you edited in step 3 to ImportATT.CMD. (This step is optional, mostly for purposes of usage instructions.)
Installation for conversion to usable CSV files:  
  1. Download http://www.thinkset.com/ImportATTBilling.zip and extract the files to a directory.
  2. One option is to simply execute the script directly from a command prompt, passing it the /file: and /outputdir: arguments
  3. For example, assuming you saved your not-csv download to a file named 12345678.csv in the same directory as the ImportATTBilling.js script, and you wanted the output files to be created in the same directory as well, the command would be:  ImportATTBilling.js /file:12345678.csv /outputdir:.
  4. Another option is to decide which directory you want the output to be written, and then edit the tofile_wrapper.cmd file to pass the /outputdir: argument.
  5. Rename tofile_wrapper.cmd to ImportATT.CMD. (This step is optional, mostly for purposes of usage instructions.)

Usage: 
  1. Logon to the AT&T web site and navigate to the billing details download page.
  2. Select the desired month, click the CSV option button, then click the Download button.
  3. For ease, save the download file to the same directory as the script, using a descriptive file name (e.g., 2012May.csv.)
  4. Execute ImportATT.CMD from a command prompt, passing it the name of the downloaded file as the first argument, enclosed in quotes if it contains embedded spaces (may be relative or fully qualified.)
Tip:
  • If you need to import or convert more than one month's worth of downloaded not-csv, open a command window and execute the following, in the directory that contains all the not-csv files: copy *.csv att.csv (Assumes that all .csv files in that directory were downloaded from AT&T and that none of them are named att.csv.)  Then you only need to run the script once on att.csv, as that command combines all files that match the pattern *.csv into one file. 
 



Script's supported arguments:       


     /file:[input file name] (required)
     /connection:[database connection string] (optional)
     /outputdir:[output file directory] (optional)

     /nobanner (optional, omits banner output)
     /? (optional, outputs a summary of accepted arguments)

     /f:, /c:, /o: and /n are synonyms for
     /file:, /connection:, /outputdir: and /nobanner, respectively.

[input file name]: The full or relative name of a file downloaded from the AT&T wireless site. This is the only required argument.

[database connection string]: (It's complicated, Google it.) This argument is optional, if it is omitted, no database output is generated.

[output file directory]: The directory in which you want converted output files to be written. (It must already exist.) Output files are named based on the [input file name], by prefixing with "voicedetails_" and "datadetails_". If files with those names exist, they will be overwritten without warning. This argument is optional, if it is omitted, no file output will be generated.

If neither /connection: nor /outputdir: are passed, output is written to console. If invalid values for either are passed, the script will exit without generating any output.


Schema Note: you may notice two pairs of fields in [ATTTextMsgImport] with very similar names:
  • [Msg/KB/Min] and [Msg/KB]
  • [In/ Out] and [In/Out]
These field names were changed by AT&T. If you need to, you can derive a single column to combine a pair,  e.g., COALESCE([In/ Out], [In/Out]) As CInOut

Disclaimer:
The Programmer asserts that this code has been tested on modern hardware, and is free of malintended constructs, however, you, the end user, assume all responsibility for verifying that it's safe to use in your environment.

If you lack the technical ability to do so, DO NOT RUN THIS CODE!  You alone are responsible for any damages and/or loss that may be incurred through use or misuse of this code.

Under absolutely no circumstances will the Programmer be held liable for damages of any sort, whether consequential, incidental, transcendental -- NONE, AS IN NOT ANY, WITHOUT REGARD TO NATURE OR CAUSE.

Your use of this code constitutes implicit acceptance of the terms above.

1 comment: