Visualization of burst analysis on publications

The burst analysis is done to identify the sudden increases or “bursts” in the frequency-of-use of certain terms or concepts over time,  how they were more active for a period of time, and then faded away.

This is the burst analysis on publications on mesothelioma over a period of time from 1930s to 2007. The data has been taken from the MEDLINE dataset  on http://sdb.cns.iu.edu. The data set was a .csv file and each row in the dataset has the details on a particular publication. The details include things like article title, year of publication, author name, publication mode etc.

This analysis will detect the “bursty” terms used in the title of papers on mesothelioma.  The contents in the field article_title were normalized resulting in lowercase, tokenized and stemmed words with no stop words. Burtst analysis was carried out with respect to the Year of publication which represents when the events / topics were in use.

Results of the burst analysis were visualized through a temporal Bar graph which is given below:

burst

We see that the term pleura was the most frequently used term in the context of mesithelioma during the period 1930 to 1966. Several other terms were actively used in various time frames before fading away.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s