ScienceSoft, LLC

CHAPTER 17: AssembleIt Workwindow

The AssembleIt workwindow combines NMRanalyst results from various spectral analyses and derives the likely molecular structure. A full structure elucidation may require many NMR spectra to be acquired. The FindIt task of the AssembleIt workwindow compares available spectra of an unknown sample with a database of known molecular structures. This matching is based on the molecular formula, weight range, proton, protonated carbon (e.g., DEPT-135 or HSQC), carbon spectrum results, or any combination thereof. The supplied database consists of over 14.5 million small organic molecular structures from the NIH PubChem collection. The best matching structures reported by FindIt can be displayed by the Graphic workwindow.

Often the likely structure of a compound is known. Perhaps the compound was synthesized using well established reactions. The VerifyIt task compares a specified structure with the NMR data. The expected structure can be specified in the NMRgraph native plot or Molfile format. See CHAPTER 19: "NMRgraph: Molecular Correlation Editor" for a graphical way to enter the structure. VerifyIt explains the consistency between the proposed structure and the NMR data, from which an overall agreement rating is derived.

The third task of this workwindow is AssembleIt. It supports the molecular skeleton elucidation from a combination of direct carbon-carbon correlations (2D INADEQUATE), heteronuclear correlations (ADEQUATE or HMBC), and proton-proton correlations (DQF-COSY). Such experimental correlations pose challenges, for example, expected correlations may not be observed. Among the observed correlations, some may correspond to long-range (> 3 bond) correlations, and some may result from incorrect assignments. AssembleIt can still derive the carbon skeleton or molecular fragments despite these challenges.

The AssembleIt workwindow input fields are described in this chapter. See CHAPTER 12: "Using the Workwindows" for a general description of a workwindow.

17.1 Combine NMR Analysis Results

The molecular skeleton determination consists of two steps. This first one reads NMRanalyst analysis results. For running the subsequent AssembleIt task, it derives possible atom-atom correlations from ADEQUATE, DQF-COSY, HMBC, N15_HMBC, and/or INADEQUATE results. Results from an HSQC spectrum plus those from at least one of the previous spectrum types need to be specified. This panel is only active when this switch is selected. (Key: Task1)

A structure determination typically starts with the determination of proton and carbon chemical shifts. All multidimensional resonances are then assigned to these "1D" resonances. Specify the 1D Analysis workwindow output file for the proton spectrum or the HSQC/report.log file in this input field. (Key: FnProton)

Specify the 1D Analysis workwindow output file for the carbon spectrum. This can also be a generated generic line list or a carbon resonance list derived from F1 frequencies of heteronuclear spectra. (Key: FnCarbon)

Specify the 1D Analysis workwindow output file for a nitrogen spectrum. Given the NMR challenges for direct 15N observation, this entry will likely be a 15N HMBC derived or generic line list. (Key: FnNitrogen)

This input field is displayed if the FindIt or VerifyIt instead of the AssembleIt task is selected. Specify a DEPT spectrum in this field. (Key: FnDEPT)

To include ADEQUATE information in the molecular skeleton determination, specify the location of its Report workwindow report.log file. In contrast to HMBC, ADEQUATE only shows direct bonds without longer-range correlations. AssembleIt supports ADEQUATE spectra with single-quantum (SQ) or double-quantum (DQ) frequencies in F1. SQ ADEQUATE spectra have the same assignment challenges as HMBC when several carbons in the HSQC spectrum have indistinguishable proton chemical shifts. These challenges can be avoided by using DQ ADEQUATE. (Key: FnADEQ)

To include DQF-COSY correlations in the carbon skeleton determination, specify the location of its Report workwindow report.log file. A bond is only observable by DQF-COSY if its carbons are protonated. The proton frequencies of a DQF-COSY correlation are mapped to the directly bonded carbons. Geminal proton-proton couplings are reliably identified since both proton resonances are bonded to the same carbon. Vicinal proton-proton couplings are the desired ones. Proton-proton couplings over more than three bonds are undesired. Their coupling constants are generally below 3 Hz. NMRanalyst marks such correlations as potential long-range correlations and does not save them as atom-atom correlation for the molecular structure generation. (Key: FnCOSY)

To include HMBC correlations in the carbon skeleton determination, specify the location of its Report workwindow report.log file. To observe a carbon-carbon bond using HMBC data, at least one of the bonded carbons has to be protonated. AssembleIt can recover unobserved bonds, if 3-bond HMBC correlations involving the unobserved bond are observed.

HMBC correlations correspond to 2-bond, 3-bond, and longer-range proton-carbon couplings. Longer-range couplings are compute intensive to consider during the structure elucidation. HMBC correlations stronger than the specified Weak threshold are assumed to be either 2-bond or 3-bond correlations. HMBC correlations at or below the Weak threshold can involve any number of bonds. (Key: FnHMBC, ThHMBC)

To place nitrogens in derived molecular skeletons, 15N HMBC and/or HSQC spectra can be used. Specify the location of the 15N-HMBC Report workwindow report.log file in this field. To observe a nitrogen-carbon bond using this data, the bonded carbon has to be protonated. But AssembleIt can recover unobserved bonds, if 3-bond HMBC correlations involving the unobserved bond are observed. See the input field above for the use of the Weak threshold. (Key: FnNHMBC, ThNHMBC)

Information about direct carbon-proton bond information is required for running the AssembleIt task. Inverse-detected heteronuclear correlation experiments like HSQC and HMQC are supported. Multiplicity editing of these spectra is recommended. The detected phase difference between methyls and methines vs. methylene groups is used to set the number of free valences for each carbon. (Key: FnHSQC)

To include 2D INADEQUATE correlations in the carbon skeleton determination, specify the location of its Report workwindow report.log file. Its correlations are a direct way to determine carbon-carbon bond information. In contrast to indirect detection spectra, no protonation of involved carbons is necessary. But its sensitivity is over an order of magnitude less than that for indirect detection spectra. (Key: FnINAD)

Experimentally, only proton, carbon, and nitrogen correlation information is detectable by NMR for the AssembleIt molecular skeleton determination. Bond multiplicities and the location of unobserved bonded heteroatoms (e.g., oxygen, sulphur, halogens except fluorine) are unobservable. Two rules are provided to automatically reduce the number of free carbon valences for these cases: If a carbon chemical shift lies between 100 and 168.5 ppm, its free valence count is reduced by one. If the carbon shift is 168.5 or above, it corresponds to either a C=O or C=S group, and its free valences are reduced by two. These carbon shift rules can be disabled by deselecting this Consider: [Chemical Shift Rules] switch. The number of free valences can also be manually edited as described in CHAPTER 19: "NMRgraph: Molecular Correlation Editor". (Key: LShiftRules)

Protons in HMBC spectra show long-range couplings to carbons independently of the type of atom they are bonded to. By default, only the correlations of protons bonded to carbon, and for 15N-HMBC spectra bonded to nitrogen, are reported. Select this switch if correlations of protons bonded to heteroatoms (e.g., oxygen) should be reported and saved in the atom-atom correlations file. (Key: LHeteroH)

HMBC spectra often contain incompletely suppressed direct carbon-proton bonds. Selecting this switch will remove correlations which could result from such unsuppressed direct bonds. By default, this switch is unselected as valid HMBC correlations might be eliminated through it as well. (Key: LBondsCH)

If this switch is unselected, AssembleIt only reports correlations having a unique assignment. HSQC correlations may map one proton frequency to more than one carbon (or nitrogen) atom. When this switch is selected, such correlations are listed with an "ambiguous" comment and are saved using a dashed line style. Such correlations are used in the AssembleIt structure elucidation. The HMBC and N15_HMBC analysis result input fields above have a Weak threshold input field. Correlations below these thresholds are treated as ambiguous. So these Weak input fields are deactivated when this switch is deselected. (Key: LAmbiguities)

Information about carbon and nitrogen correlations derived from the spectrum types above is saved in the specified plot file. These correlations include direct bond information, longer-range correlations resulting typically from DQF-COSY and HMBC, and occasionally incorrect information resulting from misinterpretation of spectral features or incorrect assignments. These correlations are displayed by NMRgraph as "correlation circle diagram". See CHAPTER 18: "Graphic Workwindow" for details. The displayed correlations can be edited, see CHAPTER 19: "NMRgraph: Molecular Correlation Editor". Atoms are labeled by their chemical shift in ppm. INADEQUATE derived bonds are shown as solid lines between atom labels with the determined carbon-carbon coupling constant in Hertz as bond label. ADEQUATE derived correlations are shown as solid arrows. Their direction corresponds to the direction in which they were detected and the correlation label gives one involved proton frequency in ppm. DQF-COSY, HMBC, and N15_HMBC derived correlations are ambiguous and are shown as dotted arrows with a label indicating one of the involved proton frequencies in ppm. Correlations with ambiguous assignment or of a strength below a specified Weak threshold are shown as dashed lines. (Key: FnCorr)

17.2 FindIt: Identify Database Structures Best Matching NMR Data

When this switch is selected, this workwindow section is displayed. The FindIt task compares experimentally observed carbon and/or proton chemical shifts with predicted ones for known structures in its database. A database with over 14.5 million structures from the National Institutes of Health (NIH) PubChem collection is shipped with NMRanalyst. Select [FindIt Database Manager...] from the NMRanalyst Tools menu to add further structures (see SECTION 9.8: "The Application Window Menus" for details). (Key: LFindIt)

Specify the molecular formula constraint to be considered when matching the database structures. Atoms can be specified in any order and multiple entries of the same atom type are combined. A number or number range can be specified following each element. If no number is specified, one such element is assumed. If only "," is specified, zero to a hundred such atoms are considered. The range "3,5" would mean three to five preceding elements are considered. If an element should not be present, specify the element with the number zero. If no constraint is specified, all database structures are considered (the default setting). Specify a trailing "*", if further unspecified atom types can appear in matching structures. So "N0*" (N, letter zero, star) means any molecule without nitrogen matches. "C,H,O," means molecules containing only any number of carbon, hydrogen, and oxygen match. (Key: mFormula)

When the molecular formula is unknown, the weight of the unknown might be known from a mass spectrum. Specify the minimum weight in Dalton (g/mol) in the left field and the maximum weight in the right field. By default, these input fields are empty and all database structures are considered. (Key: minWeight, maxWeight)

Specify the name of the output file for saving database structures best matching the NMR data. (Key: FnFind)

Specify the maximum number of best matching structures to be saved in the output file. By default, the top 10 best matches are saved. Up to 1000 structure matches can be requested. (Key: numMatch)

17.3 VerifyIt: Rate Specified Structure by Agreement With NMR Data

The VerifyIt task rates the consistency between a proposed structure and MNR data by comparing the predicted and observed proton and/or carbon chemical shifts. High consistency generates a high rating, with 1.0 being the perfect matching and 0.0 being no consistency. VerifyIt can also assign observed shifts to the proposed structure. This workwindow section is shown when this switch is selected. (Key: LVerifyIt)

Specify the expected structure in this field. The structure file can be in the NMRgraph plot file format or in Molfile format. (Key: FnVerify)

VerifyIt determines the most likely assignment of observed proton shifts for the proposed structure. When a name is specified in this input field, the specified structure with assigned proton shifts is saved in NMRgraph plot file format. (Key: FnHShifts)

VerifyIt determines the most likely assignment of observed carbon shifts for the proposed structure. When a name is specified in this input field, the specified structure with assigned carbon, or protonated carbon only (DEPT), shifts is saved in NMRgraph plot file format. (Key: FnShifts)

If this switch is selected, after VerifyIt determines the rating of the expected structure, it rates all the structures in the FindIt database against the observed carbon and/or proton shifts. It then reports the rating position of the expected structure among all the FindIt database structures. (Key: LFindItPos)

This switch is similar to the above [FindIt Position] switch except that only the FindIt database structures with the same molecular formula as the expected structure are rated. (Key: LFindItPosMF)

17.4 AssembleIt: Elucidate Molecular Structure From NMR Data

Atom-atom correlations derived from spectral analysis results can be edited as described in CHAPTER 19: "NMRgraph: Molecular Correlation Editor". The function of this AssembleIt task is to derive the most likely molecular skeleton(s) from specified carbon, nitrogen, and heteroatom bonded proton correlations. The challenge is that the specified atom-atom correlations do not necessarily correspond to direct bonds. Longer range or incorrect correlation assignments are expected as described below. This workwindow section is displayed when this switch is selected. (Key: LAssembleIt)

The result of combined spectral analyses is a structure file containing a circle diagram of known atom-atom correlations. This file can be displayed and edited from the Graphic workwindow to enter further information. For example, carbon atoms belonging to solvent or impurity resonances can be removed. If a carbon chemical shift indicates the presence of bonded heteroatoms or unsaturation in the molecule, the number of free valences of the affected carbons can be reduced. Additional known correlations can be added. Specify in this field the structure file with the final set of atom-atom correlations for the structure determination. (Key: FnPlot)

A molecular fragment consists of the atoms connected directly or indirectly by specified correlations. Only one fragment is evaluated at a time and the Fragment # is 1 by default corresponding to the largest fragment. The maximum allowed Fragment # is 50. (Key: numFrag)

The 2D INADEQUATE spectrum type can detect carbon-carbon bonds even between unprotonated carbons. But from heteronuclear NMR alone, a bond between two unprotonated carbons is not directly detectable. Specify the maximum number of unobserved bonds to be added during the structure generation in this field. Up to 50 unobserved bonds can be considered. AssembleIt can derive the presence of such bonds from observed 3-bond heteronuclear HMBC correlations involving the unobserved bond. (Key: addBonds)

Bonds not directly observed can be derived by combining connectivity information over several HMBC correlations. Two HMBC correlations are combined by default to derive additional unobserved bonds. Conjugated systems typically have 3-bond HMBC couplings stronger than 2-bond ones. Often only 3-bond correlations are detected. Select [Bonds Over sp2-?-?-sp2] to consider bonds between sp2 carbons over 3 correlations. Select [Bonds Over sp2-?-?-any] to allow bonds involving at least one sp2 carbon and [Bonds Over Any 3 HMBC Correlations] for bonds between any carbons over 3 correlations. (Key: L3CorrBond)

Indistinguishable chemical shifts, 4-bond or longer-range HMBC correlations, and incompletely suppressed spectral resonances can lead to incorrectly derived atom-atom correlations. Specify the maximum number of 4-bond and longer-range correlations to consider in these input fields. Up to 50 4-bond and 50 longer-range correlations can be evaluated by AssembleIt. If more 4-bond correlations are detected than specified, the excess number is considered as long-range violations. AssembleIt reports molecular structures up to the specified numbers of 4-bond and long-range correlations. (Key: vio4Bond, vioLongRange)

In molecular skeletons, true sp2 carbons appear paired as both carbons contribute an electron to the double bond. The heuristic used is that CH2, CH, and C carbons with a shift between 100 and 168.5 ppm are assumed to be sp2. As this is not always true, the pairing of assumed sp2 carbons can be switched off. (Key: unpairedSp2)

By default, AssembleIt does not report structures containing 3-Membered Rings and 4-Membered Rings. Due to the ambiguity of long-range heteronuclear data, small ring systems are in agreement with even linear chains of carbon atoms. When such features are expected in the molecule, select the corresponding switches. More possible structures result, but reported structures are sorted in decreasing rating. So only if the specified correlations support such structures will they obtain a high ranking. (Key: ring3, ring4)

The best structure found so far is saved in temporary files during the generation process. When the generation is paused or completes, the temporary structure files are replaced by the Best Structures number of most likely structures. Up to 1000 best structures can be requested. (Key: numStruct)

ScienceSoft, LLC