Tuesday, July 31, 2012

Using CSV (and Text) Files with D3

This tutorial is in response to a request by someone who wanted to know how to use data from CSV (Comma Separated Values) files with D3. Interestingly enough, you will also see how the data can be stored into and read from a simple text file (instead) along the way.

The CSV file format is commonly used in spreadsheets, and it's pretty well known to programmers as one of the easiest formats to read and write data with. Why? Take a look at what a CSV file's contents could be:
2345, 345, sdfgsd, 3453,
wgdg, 34, fsg, dgd, 34534.634,

sfsfggsdfgsdfgsdfh, 4, "hello",
-2
If you look past the stupidity of the bogus "data" I used in that example, you'll see that it's fairly simplistic! Every value is separated by a comma - hence, Comma Separated Values (CSV).

Before I show you how to use CSV, you should know that I made an "Introduction to D3" tutorial earlier. I highly recommend that you read it first.

So... I'll start off with an example! First, take a look at the CSV file I made in Microsoft Excel (you should be able to view it in any spreadsheet program or just plain old Notepad):

http://thecodingwebsite.com/tutorials/d3/csv/data.csv

Here are its contents:
56
78
103
74
2
45
6
18
(there's a blank line here at the end)
You're probably wondering: is this even the CSV file format? The answer is yes. The "official CSV file format guidelines that are supreme and must never be challenged or broken or your life will be in danger" (aka RFC4180) state that:

Each record is located on a separate line, delimited by a line break (CRLF).

So, basically: the last piece of data in each row does not need to have a comma at the end - it can be implied that there's new data simply because there are new line characters (P.S. there are two for every new line in a CSV file: "carriage return" and "line feed", for those of you who are real programmers and need to know this :).

Lastly, yes, the data must be separated on different rows, not columns. That's just how it works. At first, I got confused because I ignored this fact. I tried making my CSV file's contents be "56, 78, 103, ...", and obviously it didn't work. It thought that there was one piece of data that consisted of a bunch of numbers separated by commas.

Okay, so now take a look at my example:

http://thecodingwebsite.com/tutorials/d3/csv/d3csv.html

which looks like this:




























My, that looks familiar! Yeah, that was the idea, idiot.

Now that we're past my insults, let's look at the code... Note: if you read my previous tutorial like you were supposed to, you should be able to skim past this:

<html>

<head>

<script type="text/javascript" src="d3.v2.min.js"></script>

<script type="text/javascript">

 window.onload = function()
 {
  d3.text("data.csv", function(unparsedData)
  {
   var data = d3.csv.parseRows(unparsedData);
   
   //Create the SVG graph.
   var svg = d3.select("body").append("svg").attr("width", "100%").attr("height", "100%");
   
   
   
   //Add data to the graph and call enter.
   var dataEnter = svg.selectAll("rect").data(data).enter();
   
   
   
   //The height of the graph (without text).
   var graphHeight = 450;
   
   //The width of each bar.
   var barWidth = 80;
   
   //The distance between each bar.
   var barSeparation = 10;
   
   //The maximum value of the data.
   var maxData = 105;
   
   
   
   //The actual horizontal distance from drawing one bar rectangle to drawing the next.
   var horizontalBarDistance = barWidth + barSeparation;
   
   
   //The horizontal and vertical offsets of the text that displays each bar's value.
   var textXOffset = horizontalBarDistance / 2 - 12;
   var textYOffset = 20;
   
   
   //The value to multiply each bar's value by to get its height.
   var barHeightMultiplier = graphHeight / maxData;
   
   //The actual Y position of every piece of text.
   var textYPosition = graphHeight + textYOffset;
   
   
   
   //Draw the bars.
   dataEnter.append("rect").attr("x", function(d, i)
   {
    return i * horizontalBarDistance;
   }).attr("y", function(d)
   {
    return graphHeight - d * barHeightMultiplier;
   }).attr("width", function(d)
   {
    return barWidth;
   }).attr("height", function(d)
   {
    return d * barHeightMultiplier;
   });
   
   
   
   //Draw the text.
   dataEnter.append("text").text(function(d)
   {
    return d;
   }).attr("x", function(d, i)
   {
    return i * horizontalBarDistance + textXOffset;
   }).attr("y", textYPosition);
  });
 }

</script>

</head>

<body>

</body>

</html>

I obviously didn't change much. Here's what I did change:

window.onload = function()
 {
  d3.text("data.csv", function(unparsedData)
  {
   var data = d3.csv.parseRows(unparsedData);
   
   //Create the SVG graph.
   var svg = d3.select("body").append("svg").attr("width", "100%").attr("height", "100%");
   
   
   
   //Add data to the graph and call enter.
   var dataEnter = svg.selectAll("rect").data(data).enter();

and, of course, the ending right bracket to finish surrounding the rest of the code:

}

So that was pretty simple - I hardly changed a thing! All I did was wrap everything in some brackets, add 2 lines of code, and change 1 line of code.

The first thing I do is call the "text" function. This gets the contents of the "data.csv" file, and then inside the function the file content's are used as the "unparsedData" variable.

Then, I call the "csv.parseRows" function, passing in the "unparsedData" read from "data.csv", and I store the resulting array into the new "data" variable.

This "data" variable should now have this value: [56, 78, 103, 74, 2, 45, 6, 18]. If it doesn't, then something went wrong. If you want to be the safe, awesome coder that you're supposed to be, you'll run a conditional statement on the "data" variable to see if it's "null", like so:

if (data == null)
{
    alert("THERE WAS AN ERROR! :S I could send you to a pretty error page or something, but I'm not that nice. :)");
}
else
{
    //Place everything else in here for when it does work properly.
}

A good little function you can use to test that this data is being loaded in and parsed properly is "alert", which shows a "message box"/"alert box"/"popup mini window that shows a little text/data" You can use it like this:

alert(unparsedData);

which should give you the contents of the text file, OR:

alert(data);

which should give you the array we expected.



Finally, I replaced my old inline array of [1, 2, 3] with just "data", the new variable I created from the parsed "data.csv" file... and... that's it! That's all you need to do to use CSV files with D3.



One final note I will make here: your data doesn't even have to be in a CSV file, obviously! If your data is just going to be a series of numbers separated by new line characters, you might as well just make it a text (e.g. "data.txt") file if that's easier for you! (That's what I would do. ;)

I hope this tutorial has helped you with reading data from a CSV or text file to use with D3.

36 comments:

  1. "Each record is located on a separate line, delimited by a line break (CRLF)."

    Meaning, if my data is like this:

    2.3 30.4 20.1
    2.4 312.2 44.2
    3.3 23.2 434.2

    etc.

    Then that means the data would not be placed in the data array as:
    [2.3, 30.4, 20.1, 2.3, 312.2, 44.2, 3.3, 23.2, 434.2] ?

    It would probably give me an error because a line break is not separating the data elements? I'm sure there has to be a way around it, would it be that I would have to use jquery to sort the data and then input it in an array?

    Sorry for the long response. Btw the tutorial was very clear. Once I tackle that little issue, I'll be sure to use it. :)

    ReplyDelete
    Replies
    1. Okay, I deleted all of the comments from our previous discussion here, because I made a tutorial that explains everything:

      http://thecodingtutorials.blogspot.com/2012/08/using-multi-column-data-with-d3-part-1.html

      Enjoy!

      - Andrew

      Delete
  2. Hi, Im getting this error "XMLHttpRequest cannot load file://xx/xx/xx/data.csv. Cross origin requests are only supported for HTTP." any idea to solve this without a web server?

    Thanks.

    ReplyDelete
    Replies
    1. For one: you're not supposed to be able to. It's one of those browser security issues/limitations.

      I knew I saw something about this earlier - read the top of this page:

      https://github.com/mbostock/d3/wiki/Requests

      Personally, I would go for the XMLHTTPRequest, although my recommendation comes only from familiarity with these terms and from reading the top of that page.

      - Andrew

      Delete
    2. You can make a local server using Python. it's all explain there

      http://chimera.labs.oreilly.com/books/1230000000345/ch04.html

      Mat

      Delete
  3. Hey, and thanks for the guide!

    I've tried to import a CSV-file similiar to yours, but I keep getting the following error, and I'm not able to access any of the data:

    "Resource interpreted as Image but transferred with MIME type text/html: "data:text/html; charset=iso-8859-1;base64,PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9JRVRGLy9EVEQgSFRNTCAyLjAvL0VOIj4KPGh0bWw+PGhlYWQ+Cjx0aXRsZT40MDQgTm90IEZvdW5kPC90aXRsZT4KPC9oZWFkPjxib2R5Pgo8aDE+Tm90IEZvdW5kPC9oMT4KPHA+VGhlIHJlcXVlc3RlZCBVUkwgL2Zhdmljb24uaWNvIHdhcyBub3QgZm91bmQgb24gdGhpcyBzZXJ2ZXIuPC9wPgo8aHI+CjxhZGRyZXNzPkFwYWNoZS8yLjIuMjIgKFVuaXgpIERBVi8yIFNlcnZlciBhdCAxOTIuMTY4LjAuMTE3IFBvcnQgODA8L2FkZHJlc3M+CjwvYm9keT48L2h0bWw+Cg=="."

    Any idea?

    ReplyDelete
    Replies
    1. Thanks!

      All I can really suggest is to try looking your problem up on Google. Somebody else is bound to have had the same problem. If you find out what it is I might be able to help you more.

      - Andrew

      Delete
  4. Hi - your website is a great resource and has helped me get my head around various d3 and javascript concepts.

    I have looked through your examples using CSV and text files as well as the examples of charts on the d3 website. Your examples showcase multi-column CSV files (with header row) and also CSV with one column of data and no header row.

    I have some sensors that provide data to some software that writes out comma separated value text files without a header row. Each row has a date/time stamp and then eleven comma separated values. If I add a header row to the text file I can plot charts with d3 with very little tweaking of the example code. However, I want to just plot the data without having to edit the text files.

    Is this possible using variations of your examples and d3?

    ReplyDelete
    Replies
    1. I believe that's what the "d3.csv" function does, right?

      https://github.com/mbostock/d3/wiki/CSV

      (As opposed to "d3.csv.parse".)

      - Andrew

      Delete
    2. I've scoured the net for an answer but I don' think one exists. New to js and d3, so please bear with me. My csv file DOES NOT HAVE HEADERS and it has multiple values on each row. So, as you've suggested, I'm passing the d3.csv.parserows function into the d3.text function. I then assign column names to the header row. All if fine until I get to the point where I insert td elements with the actual data into the page. The code produces td elements that have column attributes of "ch1" and "ch2", yet the value attribute of the td element returns undefined. When I replace row[column] with row[0] or row[1], the value attribute of the td tag returns a value, so I know the data is there. How do you simply iterate through each data value in the data array and place it into a cell if you don't have headers in your csv file?

      function tab_data(data,columns)
      {
      var table = d3.select("#tabulatedData").append("table").attr("border","1"),
      thead = table.append("thead"),
      tbody = table.append("tbody");

      // append the header row
      thead.append("tr")
      .selectAll("th")
      .data(columns)
      .enter()
      .append("th")
      .text(function(column) { return column; });

      // create a row for each object in the data
      rows = tbody.selectAll("tr")
      .data(data)
      .enter()
      .append("tr");

      // Create a cell in each row for each column
      var cells = rows.selectAll("td")
      .data(function(row) {
      return columns.map(function(column) {

      return {
      column: column,
      value: row[column]
      };

      });
      })
      .enter()
      .append("td")
      .html(function(d) {

      return d.value;

      });


      window.onload = function()

      {

      d3.text("data.csv", function(unparsedData)

      {


      data = d3.csv.parseRows(unparsedData);





      tab_data(data,["ch1","ch2"]);



      });

      }

      Delete
    3. So in other words
      value: row[column] returns a value of undefined
      value: row[0] returns the first value in the row
      value: row[1] returns the second value in the row

      I don't understand why row[column] doesn't return the value in that row with column id of column.

      Delete
  5. Hey,

    I have looked at quite a few of your tutorials so far, and they are great help. Thanks for taking the time to do them.

    I am a novice when it comes to visualisation tools and scripting. So my question is I currently have a .csv file containing multiple types of data, for example and Id, latitude, longitude, time/date.

    By the time I finish my project I want to have various graphs/types of graphs showing these different data types. How can I go about visualising exactly what I want to for a specific graph, say Id and time/date, on a graph while ignoring everything else in the .csv It may seem like a silly question but I am a novice :P.

    And do you know of any sources, or places where I can read up on code, to make my graphs look appealing. I have to present my findings and plane black graphs probably won't look the greatest.

    Cheers

    ReplyDelete
    Replies
    1. Thanks!

      http://thecodingtutorials.blogspot.com/2012/08/using-multi-column-data-with-d3-part-1.html

      Google. Common sense. If you're unable to generate ideas in your head, look at other graphs on the web that you like and try to replicate them.

      - Andrew

      Delete
  6. Hi Andrew,

    Great tutorials, they're really helping me get a grasp of d3!

    Quick question -- I've copied your code and downloaded your data.csv file and placed it in the same folder as my html file. However, the page appears unable to load the csv file. If I place an alert(unparsedData); directly after calling the csv file with d3.text("data.csv", function(unparsedData), I get a javascript "null" alert messsage. Any idea for what could be going wrong? I'm assuming that I should have the data.csv file in the same folder as my html file. I've also tried calling the csv file from your url, but that doesn't work either. Any thoughts? Thanks!

    ReplyDelete
    Replies
    1. Thanks!

      If you've copied my code and it's not working, then I see 3 potential problems:

      1. The browser you're using sucks.
      2. Something else on the page is causing my code not to work.
      3. You didn't copy it properly (wrong place, missing character(s), or etc.).

      First of all, try downloading everything and using it entirely separately from any code you have. Make sure that MY code works in YOUR browser WITHOUT your code.

      If that works, go back to using your code again and check your browser's error list (you might have to clear the list, refresh the page, and THEN check the error list to get an accurate reading). Some errors will cause the code to immediately stop running when they are reached.

      After all that, if you're still having problems that you can't fix then tell me the details.

      - Andrew

      Delete
  7. I successfully used your code to do the graph. THANK YOU. Now if the base file changes, how would you detect that change and update the graph accordingly?

    ReplyDelete
    Replies
    1. You're welcome.

      You would need to use Ajax. What you could then do is have it load in the file every e.g. 30 seconds and check for differences or for an updated time/date or such. I can make a tutorial showing how to use Ajax if you're willing to compensate me for my time (http://thecodingtutorials.blogspot.com/p/aboutcontact.html).

      - Andrew

      Delete
  8. this is really awesome... exactly what i needed.. thanks a lot.. :)

    ReplyDelete
  9. Hi Andrew. Thank you so much for your great tutos. They've helped me getting very very close to succeeding my first aim as an exercise : realizing an Multiple Correspondence Analysis with D3. http://victoralexandre.fr/d3_acm.html . However, I have an issue with the loading of the 2 last sets of data, which won't appear when a click is made on the corresponding button. I have managed to make it work as a local file on my computer (though it is not this exact same version), but as I put it on my ftp, I stopped working...
    Do you have any idea of what could come in the way ?

    Thanks again so much.
    Cheers.
    Victor

    ReplyDelete
    Replies
    1. Just a quick addition to that, I've tried some other ways, and this is closer to my problem: http://www.victoralexandre.fr/d3_acm_2.html . Indeed, the "variables" and the "variables supplémentaires" do not manage to find their place in the graph, as their cx, cy, x and y are NaN.

      Delete
    2. Thanks!

      I recommend going through your code step by step and making sure it does what you expect at each step. Pretend you're the computer. Use the alert function to notify you of how the data is loaded in, calculated, stored, etc.

      - Andrew

      Delete
    3. Alright, but that's what I was doing for quite a while... :)
      So it seems that my problem was related to the fact that I was calling these variables without giving them any class. But even now that I am selecting only the right class for each set of data, the x, y, cx and cy are still NaN. Was there something else that I forgot ?

      Delete
    4. Let's take a look at this, for instance:

      .attr("cx", function(d){return x(d.Dim1);})

      Could you instead make it:

      .attr("cx", function(d){
      alert(d.Dim1);
      return x(d.Dim1);
      })

      .attr("cx", function(d){
      alert(x);
      return x(d.Dim1);
      })

      .attr("cx", function(d){
      alert(x(d.Dim1));
      return x(d.Dim1);
      })

      and verify that each of these is the expected value?

      ?

      - Andrew

      Delete
    5. Hi Andrew.
      Thanks for your help. After some tests, it seems that the only way to make this work was to change the d3.csv into a d3.tsv, and changing the file's extension... So... It's not so much clearer for me why, but at least, it worked. Thanks anyway.

      Delete
  10. Hi Andrew.
    It's me again !
    I have an issue with working with data from 2 separate tsv on the same plot. I need to link the points that share the same time value together with a line. I have been searching for quite a while, but I am not sure what would be the best solution (the link on my name shows you the page with working code).

    ReplyDelete
    Replies
    1. If (point1 == point2)
      {
      line.point1 = point1;
      line.point2 = point2;
      }

      ?

      - Andrew

      Delete
    2. Thanks Andrew. Always very quick answers ! I should have been more precise in my question. As a non developper I sometimes have issues figuring the right way to begin the resolution of a problem. Here, for example, i have to create a line between two coordinates that come from separate tsv. But i cannot create a simple d3.line because I have to declare two different datasets in the "data" method. Is there a workaround for this approach or I am taking it the wrong way ?

      Delete
    3. What you're wanting to do is referred to as a data append/merge/combine operation.

      These 3 seem to have something of the like figured out:

      http://stackoverflow.com/questions/10593245/merging-data-in-d3-js
      http://stackoverflow.com/questions/19320835/d3-js-how-to-combine-2-datasets-in-order-to-create-a-map-and-show-values-on-mou
      http://stackoverflow.com/questions/17817849/d3-js-how-to-join-data-from-more-sources


      - Andrew

      Delete
    4. FAN TAS TIC. Thanks Andrew. I spent hours on S.O. without finding the right threads. I lack some logic but I also lack the right English/American vocabulary to be as efficient as I would like to be.
      Thanks so much for taking the time to understand my problem. :)

      Delete
  11. If I need to extract values from a .txt file in order to obtain a graph in D3, should I just change the .csv extension in the program or is there anything else that I need to do?
    I've been kind of stuck at this for a while and any kind of help on it would be appreciated

    ReplyDelete
    Replies
    1. What matters here and in almost all other file I/O scenarios is not the file extension but the file formatting. A CSV (Comma Separated Values) file is distinct from regular text files in that its data is explicitly separated by commas (and potentially also separated by new lines as well). For instance, you can tell my sentences apart because they are in a "period separated values" format.

      - Andrew

      Delete
    2. Appreciate the response Andrew!
      First off, I'd like to thank you for this D3 tutorial. Especially for a beginner like me who has absolutely no prior knowledge about D3 and similar stuff.

      I've been importing data from a .csv file itself like you have mentioned in the tutorial and it is in the exact format as yours, i.e numbers in different lines and it also has the same file name "data.csv", but different values in it. If I try running the html file on Chrome/Edge it just opens a blank window. Whereas if I run your code with pre-defined values, I am getting the exact same graph as you are.

      Any idea where I could be screwing up? As far as I know there is no other change that I have made in the code at all except that I have altered the numbers in the csv file.
      Open to any kind of suggestion!

      Regards

      -Rahul

      Delete
    3. Thanks!

      Download my code (File -> Save) and literally try your CSV file using my code. If that works, interpolate between my code and yours until something breaks. Also, check your browser's error console for potential errors that may be crashing the script.

      - Andrew

      Delete
  12. see https://github.com/d3/d3-dsv#csvParse
    csvParseRows = csv.parseRows;

    FYI to update code

    ReplyDelete