home | bio | blog | group | cv

Plot 2D NMR Spectra in the Browser with nmrglue and HoloViews¶

Extracting NMR data from their native format into a more common data type like a numpy array opens up a whole world of possibilities for analyzing and visualizing NMR data. This post will explore a few examples of using nmrglue to parse the two most common biomolecular NMR formats (Bruker and UCSF format) into an array that we can then plot with HoloViews, a library that provies a (mostly) unified API for calling matplotlib and Bokeh.

Plotting with HoloViews allows us to generate publication-quality spectra that are superior to those generated natively by TopSpin (Bruker's NMR software) or POKY (the actively maintained and developed version of SPARKY, the most common software for interacting with biomolecular NMR data). I provided examples of plots of the same spectrum generated with TopSpin, SPARKY, and matplotlib back when I originally started using matplotlib to generate contour plots for NMR data:

Here's a slice of the same crosspeak exported from TopSpin as a PDF (1,340 paths), SPARKY as a ps file (45 paths), and contoured+rendered as an svg from processed data with @matplotlib (12 paths, 1/contour). The effect of breaking the contours into multiple paths is obvious. pic.twitter.com/InDz9DaDub

— Ray Berkeley (@ray_berkeley) October 24, 2019

It's worth noting that recent versions of POKY do provide access to matplotlib contour plots (also via nmrglue), which is a great addition. HoloViews also lets us easily switch between Bokeh and matplotlib, so we only need to configure one visualization to get an interactive Bokeh contour plot and a nice SVG file rendered by matplotlib.

Table of Contents

  • Introduction
    • Required Libraries
    • Example Data
  • Parsing Bruker Data with nmrglue
  • Parsing UCSF Data with nmrglue
  • Plotting Spectra in the Browser
  • Plotting Spectra for Publication

Required Libraries¶

The following libraries enable this workflow:

  • HoloViews for rendering the contours as an interactive plot. HoloViews offers an expressive syntax and some additional tools that make plotting with Bokeh, Plotly, or matplotlib more intuitive. Plotly should also work on its own for contour plotting if you prefer to stick with that library on the BMRB.
  • matplotlib for generating contour coordinates.
  • nmrglue provides utilites for parsing processed NMR data into a 2D array that we can use to generate contours. Modules are provided for a whole range of NMR data formats, although some of the less common formats are less fully-featured. I've only tested the Bruker and Sparky IO modules, which are the most common formats. Parsing data only scratches the surface of what nmrglue is capable of–it offers functionality to address every part of the NMR data processing pipline.
  • NumPy for general data manipulation and organization.
In [1]:
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', logo = False)
import matplotlib.pyplot as plt
import nmrglue as ng
import numpy as np

Example Data¶

The approach outlined here should work with any Bruker or SPARKY dataset. If you don't have any NMR data on hand, you can dowload the data that I'm using as an example here. It's a CP-DARR experiment run on a sample of HSPB1.

Parsing Bruker Data with nmrglue¶

Bruker organizes experiments into a hierarchy with a top-level directory containing information about the experiment itself and a subdirectory called pdata containing all of the processed data. The pdata directory can contain multiple processed versions of the same experiment. Each experiment and processed dataset is given a number by convention, but the processed data directory is always named pdata by TopSpin.

In this example, we'll parse the processed data in ./example_data/Bruker/pdata/1/ and prepare it for plotting. To do this, we'll use nmrglue's Bruker IO module. The documentation for this module can be found here. We need three pieces of data for our plot–a 2D array representing the data itself, a unit conversion dictionary which will map chemical shifts (in ppm) to our array, and a list of contour levels for drawing contours.

First, we'll get the processed 2D data. The data is stored in a file called 2rr ("2 real real"–the real components of each dimension in the data), but we only need to pass the directory to nmrglue.

In [2]:
# Declare the locations of the top-level experiment directory and the processed data 
# directory
bruker_dir = './example_data/Bruker/'
bruker_pdir = './example_data/Bruker/pdata/1/'

# Read experiment information and ignore time domain data
bruker_dic, _ = ng.fileio.bruker.read(bruker_dir)

# Read processing information and processed data
bruker_pdic, bruker_data = ng.fileio.bruker.read_pdata(bruker_pdir, scale_data = True)

# The data that we're interested in ends up in a 2D numpy array with the same shape as
# defined in the experiment.
bruker_data.shape
Out[2]:
(1024, 4096)

Next, we'll get the unit conversion dictionary using nmrglue's guess_udic method. This builds a unit conversion dictionary from experimental metadata that will let us map each point in the data array to some value in ppm, the standard axis for interpereting NMR data.

In [3]:
# Let nmrglue calculate the unit conversion dictionary
bruker_udic = ng.fileio.bruker.guess_udic(bruker_dic, bruker_data, strip_fake = False)

bruker_udic
Out[3]:
{'ndim': 2,
 0: {'sw': 40000.00000000006,
  'complex': True,
  'obs': 188.348853700883,
  'car': 21658.199116984633,
  'size': 1024,
  'label': '13C',
  'encoding': 'states-tppi',
  'time': True,
  'freq': False},
 1: {'sw': 56818.184,
  'complex': True,
  'obs': 188.445377905836,
  'car': 21671.194163985776,
  'size': 4096,
  'label': '13C',
  'encoding': 'direct',
  'time': True,
  'freq': False}}

Specifically, this would require the sweep width sw, number of points in each dimension size, the carrier frequency car, and the base frequency of the magnet obs. For the direct dimension, we can see that these values are reported as 56818.184 (Hz), 4096 (points), 188.348853700883 (MHz), and 21671.194163985776 (Hz, offset from the base frequency). We can do a quick back-of-the envelope division of car/obs to find the experimental carrier frequency of ~115 ppm.

The last thing that we'll need for plotting are the contour levels. Bruker saves contour levels in a standalone plain text file in the processing directory called clevels. We could use any contour levels that we want for plotting, but in practice I find that I usually mock up the contour levels in TopSpin, save them, and then parse the clevels file. We can parse the file like so:

In [4]:
# Get the path for the clevels file
bruker_cfile = f"{bruker_pdir}/clevels"

bruker_clevels = []

# Parse the file by pattern matching and brute force
with open(bruker_cfile, 'rt') as f:
    for line in f:
        if line[0].isdigit() or line[0] == '-':
            for entry in line.split():
                if float(entry) > 0:
                    try:
                        bruker_clevels.append(float(entry))
                    except ValueError:
                        pass 
                else:
                    continue

This leaves us will all of the data necessary to generate and plot contours. Our real data is in an array called bruker_data, our unit conversion information are in our unit conversion dictionary bruker_udic, and our contours are bruker_clevels.

Parsing UCSF Data with nmrglue¶

Unlike Bruker's approach, all of the necessary information for plotting an NMR spectrum (except for the contour levels) can be parsed a .ucsf file. Let's pull in the same data, this time in UCSF format. The methods used for parsing data in UCSF format are generally in the `sparky` module.

In [5]:
sparky_file = './example_data/UCSF/CP-DARR_example.ucsf'

# The syntax here is different, but this line reads the experiment information and the processed data
sparky_dic, sparky_data = ng.fileio.sparky.read(sparky_file)[0], ng.fileio.sparky.read(sparky_file)[1]

sparky_data.shape
Out[5]:
(1024, 4096)

The unit conversion dictionary can be generated with the guess_udic method.

In [6]:
sparky_udic = ng.fileio.sparky.guess_udic(sparky_dic, sparky_data)

For this dataset, let's just specify the contour levels by hand. These values can be copied from the SPARKY or POKY GUI. The user-defined contours are probably in the SPARKY session files, but I haven't bothered to dig them out in this case.

In [7]:
# Set contour values
contour_min = 2.5e+006         # contour level start value
contour_num = 14                # number of contour levels
contour_factor = 1.40           # scaling factor between contour levels

# Calculate contour levels
sparky_clevels = contour_min * contour_factor ** np.arange(contour_num)
sparky_clevels = np.where(sparky_clevels==0, contour_min, sparky_clevels)

This yields the same three pieces of data necessary for plotting–sparky_data, sparky_udic, and sparky_clevels.

Plotting Spectra in the Browser¶

NMR data are generally too large to plot directly in an interactive plot in the browser. For example, the dataset that we're working with has just under 4.2M (4096 x 1024) points–enough to slow your browser to a crawl. To get around this, we can pop the hood on matplotlib's contour plot functionality to get contour coordinates directly. We can then pass these coordinates to HoloViews' Path plot to effectively create a contour plot from a list of paths, circumventing the need to plot all of the points in the NMR spectrum directly. Note that if you only want to get your hands on an SVG file for publication, you can probably skip this step and generate the contour plot with HoloViews/matplotlib since it is only the interactive plots that are challenging for the browser to render.

For our plot axes, we'll need to pass scaled axes (in ppm) to matplotlib using two numpy meshgrids (one for each dimension) that that have the same shape as the data. These meshgrids can be generated using 1D arrays calculated with nmrglue's uc_from_udic method. Each of these arrays contains a value mapping each point on each axis of our data to a value in ppm. If necessary, spectral referencing can be achieved for Bruker data by referring to the processed data dictionary object pdic that we generated when parsing the processed data. I haven't had any issues with referencing for SPARKY data, but since each of the `uc_from_udic` arrays is just a list of ppm values, any referencing could be applied by adjusting the values in the array.

Let's build the plot axes first. From now on I'll just use the Bruker data from above.

In [8]:
# This will contain a list of arrays representing units for each axis in ppm
ppm_vectors = []

# This loop cycles through each dimension in the unit conversion dictionary
for dim in np.arange(0, bruker_udic["ndim"]):
    
    # nmrglue appears to stumble with referencing for data produced using newer versions of TopSpin
    # These two lines set the referencing manually by referring to the processed data dictionary
    bruker_udic[dim]["obs"] = bruker_pdic['procs']['SF']
    bruker_udic[dim]["car"] = (bruker_dic['acqus']['SFO1'] - bruker_udic[dim]["obs"]) * 1e6

    # Once the referencing is set, we can use nmrglue's built in methods to set ppm vectors for each
    # dimension (ie vectors mapping each index in our data to some value in ppm)
    uc = ng.fileiobase.uc_from_udic(bruker_udic, dim)
    ppmsc = uc.ppm_scale()
    ppm_vectors.append(ppmsc)
    
# Build the meshgrids. 
xgrid, ygrid = np.meshgrid(ppm_vectors[1], ppm_vectors[0])

With the axis meshgrids in hand, we can generate and extract contour coordinates. Contours are generated using matplotlib's contour method. Documentation for this method and the QuadContourSet object that it returns can be found here. The allsegs attribute of QuadContourSet is a list of $x$, $y$ coordinates for each contour that we'll extract for plotting with HoloViews/Bokeh.

In [9]:
# Use matplotlib to calculate contours
bruker_contours = plt.contour(xgrid, ygrid, bruker_data, levels = bruker_clevels)
plt.close()

type(bruker_contours)
Out[9]:
matplotlib.contour.QuadContourSet

We can now parse the QuadContourSet object into a data structure that can be read by HoloViews for plotting.

In [10]:
bruker_contour_list = []
level = 0

# Parse plt's allsegs object to extract contour line coordinates 
for seg in bruker_contours.allsegs:
    level_value = bruker_clevels[level]
    level+=1

    for contour in seg:
        cdict = {('x', 'y'): contour, 'level': level_value}
        bruker_contour_list.append(cdict)

The bruker_contour_list, which is just a list of individual contour paths, can be passed directly to Holoviews' Path plot for rendering.

In [11]:
# Instantiate the plot
bruker_plot = hv.Path(bruker_contour_list, vdims = 'level', label = 'Bruker Plot')

# We can pull the axis labels directly from the udic if desired
xlab = bruker_udic[0]['label']
ylab = bruker_udic[1]['label']

# We can make formatting nice by keeping a dictionary with unicode superscripts (HoloViews 
# doesn't support html labels)
superscripts = {'1H': '¹H',
                '13C' : '¹³C',
                '15N' : '¹⁵N',
                '19F' : '¹⁹F'} #...etc.

# Specify plot options
bruker_plot.opts(invert_xaxis = True, invert_yaxis = True,
          data_aspect = 1, responsive = True, 
          xlabel = f'{superscripts[xlab]} chemical shift (ppm)', ylabel = f'{superscripts[ylab]} chemical shift (ppm)',
          xlim = (10, 80), ylim = (10, 80),
          color = 'darkslategray', show_legend = True
          )
Out[11]:

Plotting Spectra for Publication¶

I had trouble with Bokeh's export_svg() function. Since HoloViews lets us switch backends so easily, switching to matplotlib was a much easier solution–just a single line of code along with a few tweaks to the plot options.

In [12]:
# switch back end to mpl
hv.extension('matplotlib', logo = False)

bruker_plot.opts(invert_xaxis = True, invert_yaxis = True,
          data_aspect = 1, fig_inches = 20, fontsize = 12, fontscale = 2, 
          xlabel = '¹³C chemical shift (ppm)', ylabel = '¹³C chemical shift (ppm)',
          xlim = (10, 80), ylim = (10, 80),
          color = 'darkred', show_legend = True
          )
Out[12]:

We can save this plot with another line of code:

In [13]:
hv.save(bruker_plot, 'bruker_plot.svg', fmt='svg')

That's it! Once the NMR data are in a numpy array, there are a lot of fun things you can do with them. In future posts, I'll explore some options.