Introductory example

The most straightforward way to discuss the operation of the program is through demonstrations with real examples. The first of these is a dataset from a DNA / ligand complex recorded at Diamond Light Source as part of ongoing research. The structure includes barium which may be used for phasing, and the data were recorded as a single sweep. As may be seen from Figure 2, the quality of diffraction was not ideal, and radiation damage was an issue. Initially the data were processed with:

xia2 pipeline=3d -atom Ba /here/are/my/data

giving the merging statistics shown below:

High resolution limit

1.25

6.45

1.25

Low resolution limit

18.85

18.85

1.27

Completeness

95.2

60.1

70.2

Multiplicity

12.2

8.4

4.8

I/sigma

12.3

18.5

2.6

Rmerge

0.113

0.096

0.564

Rmeas(I)

0.129

0.118

0.633

Rmeas(I+/-)

0.121

0.105

0.679

Rpim(I)

0.034

0.038

0.267

Rpim(I+/-)

0.043

0.041

0.368

Wilson B factor

12.131

Anomalous completeness

93.3

52.6

58.0

Anomalous multiplicity

6.4

5.0

2.0

Anomalous correlation

0.544

0.791

-0.297

Anomalous slope

1.085

0.000

0.000

Total observations

118588

529

1634

Total unique

9749

63

337

From these it is clear that there is something wrong: it is very unusual to have near atomic resolution diffraction with ∼10% Rmerge in the low resolution bin. The most likely reasons are incorrect assignment of the pointgroup and radiation damage - the latter of which is clear from the analysis of Rmerge as a function of image number:

_images/3qrn-all-rmerge-aimless.png

From the cumulative completeness as a function of frame number it is clear that the data were essentially complete after approximately 200 frames, though the low resolution completeness is poor:

_images/3qrn-all-complete-aimless.png

Modifying input

From the example it would seem sensible to investigate processing only the first 200 of the 450 images. While it is usual to limit the batch range in scaling when processing the data manually, xia2 is not set up to work like this as decisions made for the full data set (e.g. scaling model to use) may differ from those for the subset - we therefore need to rerun the whole xia2 job after modifying the input. It is easy to do this using the image=/path/to/image_001.img:start:end syntax:

xia2 pipeline=3d image=/dls/i02/data/2011/mx1234-5/K5_M1S3_3_001.img:1:200

giving the following merging statistics:

High resolution limit

1.22

6.34

1.22

Low resolution limit

19.62

19.62

1.24

Completeness

86.9

49.1

37.8

Multiplicity

5.3

4.9

1.7

I/sigma

20.1

37.0

2.3

Rmerge

0.036

0.020

0.355

Rmeas(I)

0.060

0.038

0.448

Rmeas(I+/-)

0.043

0.023

0.491

Rpim(I)

0.023

0.014

0.297

Rpim(I+/-)

0.022

0.011

0.339

Wilson B factor

10.70

Anomalous completeness

77.7

41.0

18.3

Anomalous multiplicity

2.7

3.5

0.5

Anomalous correlation

0.779

0.931

0.000

Anomalous slope

1.553

0.000

0.000

Total observations

50875

272

342

Total unique

9552

55

199

These are clearly much more internally consistent and give nice results from experimental phasing though with very poor low resolution completeness. At the same time we may wish to adjust the resolution limits to give more complete data in the outer shell, which may be achieved by setting the d_min= paramater on the command line.