Extracting NMR data from their native format into a more common data type like a numpy array opens up a whole world of possibilities for analyzing and visualizing NMR data. This post will explore a few examples of using nmrglue to parse the two most common biomolecular NMR formats (Bruker and UCSF format) into an array that we can then plot with HoloViews, a library that provies a (mostly) unified API for calling matplotlib and Bokeh.
Plotting with HoloViews allows us to generate publication-quality spectra that are superior to those generated natively by TopSpin (Bruker's NMR software) or POKY (the actively maintained and developed version of SPARKY, the most common software for interacting with biomolecular NMR data). I provided examples of plots of the same spectrum generated with TopSpin, SPARKY, and matplotlib back when I originally started using matplotlib to generate contour plots for NMR data:
Here's a slice of the same crosspeak exported from TopSpin as a PDF (1,340 paths), SPARKY as a ps file (45 paths), and contoured+rendered as an svg from processed data with @matplotlib (12 paths, 1/contour). The effect of breaking the contours into multiple paths is obvious. pic.twitter.com/InDz9DaDub
— Ray Berkeley (@ray_berkeley) October 24, 2019
It's worth noting that recent versions of POKY do provide access to matplotlib contour plots (also via nmrglue), which is a great addition. HoloViews also lets us easily switch between Bokeh and matplotlib, so we only need to configure one visualization to get an interactive Bokeh contour plot and a nice SVG file rendered by matplotlib.
The following libraries enable this workflow:
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', logo = False)
import matplotlib.pyplot as plt
import nmrglue as ng
import numpy as np
The approach outlined here should work with any Bruker or SPARKY dataset. If you don't have any NMR data on hand, you can dowload the data that I'm using as an example here. It's a CP-DARR experiment run on a sample of HSPB1.
Bruker organizes experiments into a hierarchy with a top-level directory containing information about the experiment itself and a subdirectory called pdata
containing all of the processed data. The pdata
directory can contain multiple processed versions of the same experiment. Each experiment and processed dataset is given a number by convention, but the processed data directory is always named pdata
by TopSpin.
In this example, we'll parse the processed data in ./example_data/Bruker/pdata/1/
and prepare it for plotting. To do this, we'll use nmrglue's Bruker IO module. The documentation for this module can be found here. We need three pieces of data for our plot–a 2D array representing the data itself, a unit conversion dictionary which will map chemical shifts (in ppm) to our array, and a list of contour levels for drawing contours.
First, we'll get the processed 2D data. The data is stored in a file called 2rr
("2 real real"–the real components of each dimension in the data), but we only need to pass the directory to nmrglue.
# Declare the locations of the top-level experiment directory and the processed data
# directory
bruker_dir = './example_data/Bruker/'
bruker_pdir = './example_data/Bruker/pdata/1/'
# Read experiment information and ignore time domain data
bruker_dic, _ = ng.fileio.bruker.read(bruker_dir)
# Read processing information and processed data
bruker_pdic, bruker_data = ng.fileio.bruker.read_pdata(bruker_pdir, scale_data = True)
# The data that we're interested in ends up in a 2D numpy array with the same shape as
# defined in the experiment.
bruker_data.shape
(1024, 4096)
Next, we'll get the unit conversion dictionary using nmrglue's guess_udic
method. This builds a unit conversion dictionary from experimental metadata that will let us map each point in the data array to some value in ppm, the standard axis for interpereting NMR data.
# Let nmrglue calculate the unit conversion dictionary
bruker_udic = ng.fileio.bruker.guess_udic(bruker_dic, bruker_data, strip_fake = False)
bruker_udic
{'ndim': 2, 0: {'sw': 40000.00000000006, 'complex': True, 'obs': 188.348853700883, 'car': 21658.199116984633, 'size': 1024, 'label': '13C', 'encoding': 'states-tppi', 'time': True, 'freq': False}, 1: {'sw': 56818.184, 'complex': True, 'obs': 188.445377905836, 'car': 21671.194163985776, 'size': 4096, 'label': '13C', 'encoding': 'direct', 'time': True, 'freq': False}}
Specifically, this would require the sweep width sw
, number of points in each dimension size
, the carrier frequency car
, and the base frequency of the magnet obs
. For the direct dimension, we can see that these values are reported as 56818.184 (Hz), 4096 (points), 188.348853700883 (MHz), and 21671.194163985776 (Hz, offset from the base frequency). We can do a quick back-of-the envelope division of car
/obs
to find the experimental carrier frequency of ~115 ppm.
The last thing that we'll need for plotting are the contour levels. Bruker saves contour levels in a standalone plain text file in the processing directory called clevels
. We could use any contour levels that we want for plotting, but in practice I find that I usually mock up the contour levels in TopSpin, save them, and then parse the clevels
file. We can parse the file like so:
# Get the path for the clevels file
bruker_cfile = f"{bruker_pdir}/clevels"
bruker_clevels = []
# Parse the file by pattern matching and brute force
with open(bruker_cfile, 'rt') as f:
for line in f:
if line[0].isdigit() or line[0] == '-':
for entry in line.split():
if float(entry) > 0:
try:
bruker_clevels.append(float(entry))
except ValueError:
pass
else:
continue
This leaves us will all of the data necessary to generate and plot contours. Our real data is in an array called bruker_data
, our unit conversion information are in our unit conversion dictionary bruker_udic
, and our contours are bruker_clevels
.
Unlike Bruker's approach, all of the necessary information for plotting an NMR spectrum (except for the contour levels) can be parsed a .ucsf
file. Let's pull in the same data, this time in UCSF format. The methods used for parsing data in UCSF format are generally in the `sparky` module.
sparky_file = './example_data/UCSF/CP-DARR_example.ucsf'
# The syntax here is different, but this line reads the experiment information and the processed data
sparky_dic, sparky_data = ng.fileio.sparky.read(sparky_file)[0], ng.fileio.sparky.read(sparky_file)[1]
sparky_data.shape
(1024, 4096)
The unit conversion dictionary can be generated with the guess_udic
method.
sparky_udic = ng.fileio.sparky.guess_udic(sparky_dic, sparky_data)
For this dataset, let's just specify the contour levels by hand. These values can be copied from the SPARKY or POKY GUI. The user-defined contours are probably in the SPARKY session files, but I haven't bothered to dig them out in this case.
# Set contour values
contour_min = 2.5e+006 # contour level start value
contour_num = 14 # number of contour levels
contour_factor = 1.40 # scaling factor between contour levels
# Calculate contour levels
sparky_clevels = contour_min * contour_factor ** np.arange(contour_num)
sparky_clevels = np.where(sparky_clevels==0, contour_min, sparky_clevels)
This yields the same three pieces of data necessary for plotting–sparky_data
, sparky_udic
, and sparky_clevels
.
NMR data are generally too large to plot directly in an interactive plot in the browser. For example, the dataset that we're working with has just under 4.2M (4096 x 1024) points–enough to slow your browser to a crawl. To get around this, we can pop the hood on matplotlib's contour plot functionality to get contour coordinates directly. We can then pass these coordinates to HoloViews' Path
plot to effectively create a contour plot from a list of paths, circumventing the need to plot all of the points in the NMR spectrum directly. Note that if you only want to get your hands on an SVG file for publication, you can probably skip this step and generate the contour plot with HoloViews/matplotlib since it is only the interactive plots that are challenging for the browser to render.
For our plot axes, we'll need to pass scaled axes (in ppm) to matplotlib using two numpy meshgrids (one for each dimension) that that have the same shape as the data. These meshgrids can be generated using 1D arrays calculated with nmrglue's uc_from_udic
method. Each of these arrays contains a value mapping each point on each axis of our data to a value in ppm. If necessary, spectral referencing can be achieved for Bruker data by referring to the processed data dictionary object pdic
that we generated when parsing the processed data. I haven't had any issues with referencing for SPARKY data, but since each of the `uc_from_udic` arrays is just a list of ppm values, any referencing could be applied by adjusting the values in the array.
Let's build the plot axes first. From now on I'll just use the Bruker data from above.
# This will contain a list of arrays representing units for each axis in ppm
ppm_vectors = []
# This loop cycles through each dimension in the unit conversion dictionary
for dim in np.arange(0, bruker_udic["ndim"]):
# nmrglue appears to stumble with referencing for data produced using newer versions of TopSpin
# These two lines set the referencing manually by referring to the processed data dictionary
bruker_udic[dim]["obs"] = bruker_pdic['procs']['SF']
bruker_udic[dim]["car"] = (bruker_dic['acqus']['SFO1'] - bruker_udic[dim]["obs"]) * 1e6
# Once the referencing is set, we can use nmrglue's built in methods to set ppm vectors for each
# dimension (ie vectors mapping each index in our data to some value in ppm)
uc = ng.fileiobase.uc_from_udic(bruker_udic, dim)
ppmsc = uc.ppm_scale()
ppm_vectors.append(ppmsc)
# Build the meshgrids.
xgrid, ygrid = np.meshgrid(ppm_vectors[1], ppm_vectors[0])
With the axis meshgrids in hand, we can generate and extract contour coordinates. Contours are generated using matplotlib's contour
method. Documentation for this method and the QuadContourSet
object that it returns can be found here. The allsegs
attribute of QuadContourSet
is a list of $x$, $y$ coordinates for each contour that we'll extract for plotting with HoloViews/Bokeh.
# Use matplotlib to calculate contours
bruker_contours = plt.contour(xgrid, ygrid, bruker_data, levels = bruker_clevels)
plt.close()
type(bruker_contours)
matplotlib.contour.QuadContourSet
We can now parse the QuadContourSet
object into a data structure that can be read by HoloViews for plotting.
bruker_contour_list = []
level = 0
# Parse plt's allsegs object to extract contour line coordinates
for seg in bruker_contours.allsegs:
level_value = bruker_clevels[level]
level+=1
for contour in seg:
cdict = {('x', 'y'): contour, 'level': level_value}
bruker_contour_list.append(cdict)
The bruker_contour_list
, which is just a list of individual contour paths, can be passed directly to Holoviews' Path
plot for rendering.
# Instantiate the plot
bruker_plot = hv.Path(bruker_contour_list, vdims = 'level', label = 'Bruker Plot')
# We can pull the axis labels directly from the udic if desired
xlab = bruker_udic[0]['label']
ylab = bruker_udic[1]['label']
# We can make formatting nice by keeping a dictionary with unicode superscripts (HoloViews
# doesn't support html labels)
superscripts = {'1H': '¹H',
'13C' : '¹³C',
'15N' : '¹⁵N',
'19F' : '¹⁹F'} #...etc.
# Specify plot options
bruker_plot.opts(invert_xaxis = True, invert_yaxis = True,
data_aspect = 1, responsive = True,
xlabel = f'{superscripts[xlab]} chemical shift (ppm)', ylabel = f'{superscripts[ylab]} chemical shift (ppm)',
xlim = (10, 80), ylim = (10, 80),
color = 'darkslategray', show_legend = True
)
I had trouble with Bokeh's export_svg()
function. Since HoloViews lets us switch backends so easily, switching to matplotlib was a much easier solution–just a single line of code along with a few tweaks to the plot options.
# switch back end to mpl
hv.extension('matplotlib', logo = False)
bruker_plot.opts(invert_xaxis = True, invert_yaxis = True,
data_aspect = 1, fig_inches = 20, fontsize = 12, fontscale = 2,
xlabel = '¹³C chemical shift (ppm)', ylabel = '¹³C chemical shift (ppm)',
xlim = (10, 80), ylim = (10, 80),
color = 'darkred', show_legend = True
)
We can save this plot with another line of code:
hv.save(bruker_plot, 'bruker_plot.svg', fmt='svg')
That's it! Once the NMR data are in a numpy array, there are a lot of fun things you can do with them. In future posts, I'll explore some options.