William Blum's website

Plotting thesis word count

I have been playing around with PowerShell lately. Here is the little exercise that I set myself: measure and plot the evolution of the number of words in a TeX document.

The script takes a list of folders as parameters. For every TeX file found it calls texcount.pl to counts the number of words in the document. The total number of words is then saved to a CSV file and two graphs are plotted: one using GNU plot and the other one using the Google Visualization API.

I did not have this script at the time of my studies, but since I have been using Subversion to backup and manage revisions of my writings throughout my studies I was able to retrieve past revisions of my work and generate a few word count measurements. The graph below, generated using the PowerShell script, shows the evolution of word count in my D.Phil thesis.

The graph reveals few facts to the reader. For instance that I was not very productive during the first year... But it also shows that I started writing up early on (thanks to my supervisor's advice). Some doctoral students prefer to postpone writing up till the last few months. I am glad I did not adopt this strategy; it made the whole writing process very smooth and eliminate a lot of stress towards the end.

The graph also shows two particularly productive periods corresponding to two important milestones in the D.Phil process at Oxford: the transfer and the confirmation. The thesis reached a plateau on October 5th 2008, the date when I submitted the thesis to the examiners. There were subsequent minor modifications requested by the examiners after the viva, as well as some cosmetic changes for the camera-ready version sent to the Bodleian.

Here is the script in case you are interested: twc.ps1. I suggest you to use it to monitor your progress on your writings by setting it as a daily job under the Windows Task Scheduler.

The usage syntax is

./twc.ps1 [-tag name] [-date date] [-output outputdir] inputdirs

For example the command

./twc.ps1 -tag first_version -date '16 Nov 2010' -output c:\report c:\thesis

creates a datapoint named 'first_version' at the specified date whose value is the total number of words in TeX documents under directory c:\thesis; the CSV file and graphs are stored under c:\report.

For more examples just run the command without argument.