The careless command-line interface

The core careless function has two versions - careless mono for monochromatic data, and careless poly for polychromatic data. All command-line options for both functions are documented below.

careless

Scale and merge crystallographic data by

approximate inference.

usage: careless [-h] [--version] {mono,poly} ...

-h, --help: show this help message and exit

--version: show program’s version number and exit

careless mono

usage: careless mono [-h] [--embed] [--mc-samples MC_SAMPLES]
                     [--structure-factor-file STRUCTURE_FACTOR_FILE]
                     [--freeze-structure-factors]
                     [--structure-factor-init-scale STRUCTURE_FACTOR_INIT_SCALE]
                     [--epsilon EPSILON] [--disable-metadata-standardization]
                     [--disable-progress-bar] [--test-fraction TEST_FRACTION]
                     [--merge-half-datasets]
                     [--half-dataset-repeats HALF_DATASET_REPEATS]
                     [--validation-frequency VALIDATION_FREQUENCY]
                     [-c ISIGI_CUTOFF] [-d DMIN] [--spacegroups SPACEGROUPS]
                     [--image-key IMAGE_KEY] [--intensity-key INTENSITY_KEY]
                     [--uncertainty-key UNCERTAINTY_KEY] [--anomalous]
                     [--separate-files] [--studentt-likelihood-dof DOF]
                     [--refine-uncertainties] [--iterations ITERATIONS]
                     [--learning-rate LEARNING_RATE] [--beta-1 BETA_1]
                     [--beta-2 BETA_2]
                     [--positional-encoding-keys POSITIONAL_ENCODING_KEYS]
                     [--positional-encoding-frequencies POSITIONAL_ENCODING_FREQUENCIES]
                     [--kl-weight KL_WEIGHT] [--wilson-prior-b WILSON_PRIOR_B]
                     [--double-wilson-r DWR] [--double-wilson-parents PARENTS]
                     [--scale-file SCALE_FILE] [--freeze-scales]
                     [--mlp-layers MLP_LAYERS] [--mlp-width MLP_WIDTH]
                     [--image-layers IMAGE_LAYERS] [--disable-image-scales]
                     [--run-eagerly] [--disable-gpu] [--gpu-id GPU_ID]
                     [--disable-memory-growth] [--tf-debug] [--seed SEED]
                     metadata_keys reflections.{mtz,stream}
                     [reflections.{mtz,stream} ...] out

metadata_keys: Metadata keys for scaling. This is expected to be a comma delimitted string

reflections.{mtz,stream}: Mtz or stream file(s) containing unmerged reflection observations. If you are supplying stream files, you must also use the –spacegroups option to supply the symmetry for merging.

out: Output filename base.

-h, --help: show this help message and exit

--embed: Drop to an IPython shell at the end of optimization to inpsect variables.

--mc-samples <mc_samples>: This is the number of samples to take per gradient step with default 1.

--structure-factor-file <structure_factor_file>: Initialize the structure factors from the ouput of a previous run. This argument should be a string beginning with the base filename used in the previous run and ending in _structure_factor. For instance, if the previous run was called with careless mono [...] merge/hewl, the appropriate filename to use would be merge/hewl_structure_factor.

--freeze-structure-factors: Do not optimize the structure factors.

--structure-factor-init-scale <structure_factor_init_scale>: A floating point number usually between 0 and 1. The width of the initial structure factor distribution is this timesthe standard deviation of the prior distribution. The default is 1.0.

--epsilon <epsilon>: A small constant added to the scale parameters of variational distributions for numerical stability. The default is 1e-7.

--disable-metadata-standardization: By default careless will convert metadata to z-scores. This flag disables that behavior. In general, unstandardized metadata will lead to unstable optimization. However, this flag might be useful if the user wants to use their own normalization scheme.

--disable-progress-bar: Disable the progress bar. This is helpful if you’re logging stderr.

--test-fraction <test_fraction>: Output model predictions for a held-out fraction of data. This should be used for model selection purposes. By default, no data will be held out during training.

--merge-half-datasets: After training, split the data in half randomly by image and merge each half using the scaling model learned on the training fraction. The output of the halves will be written to a file which can be used to estimate traditional CChalf type measures. The full data set will always be used to generate half data sets irrespective of the test fraction. One file is written for each merged mtz. The files have the *_xval_#.mtz suffix.

--half-dataset-repeats <half_dataset_repeats>: Number of times to Repeat the half dataset crossvalidation. By default this is one.

--validation-frequency <validation_frequency>: During training, set how frequently to evaluate the model on the test set. This is an integer >= 1 which defaults to 10 for once every 10 steps.

-c <isigi_cutoff>, --isigi-cutoff <isigi_cutoff>: Minimum I over Sigma(I) for included reflections. Default is to include all reflections

-d <dmin>, --dmin <dmin>: Maximum resolution in Ångstroms. If this is not supplied,reflections will be merged out to the highest resolution reflection present in the input.

--spacegroups <spacegroups>: The spacegroup(s) to use for merging. You may either supply a single spacegroup which will be used for every input reflection file or a comma-separated list of spacegroups with the same length as the number of reflection files. For example –spacegroups=”P 21 21 21” or –spacegroups=”P 21 21 21,P 1 21 1”

--image-key <image_key>: The name of the key indicating image number for each data set. If no key is given, careless will use the first key with the BATCH dtype.

--intensity-key <intensity_key>: What key to use for reflection intensities. If no key is given, careless will use the first key with the intensity dtype.

--uncertainty-key <uncertainty_key>: What key to use for reflection error estimates. If no key is given, careless will first check for a key beginning with ‘SIG’ or ‘Sig’ which matches the intensity key (e.g. Iobs -> SigIobs). Failing that, careless will ust first key with the StdDev dtype.

--anomalous: If this flag is supplied, Friedel mates will be kept separate.

--separate-files: Use this flag to produce a separate output for each input mtz.In this mode, the data will be ‘scaled’ together and ‘merged’ separately.The default is to merge all the files into a single output.

--studentt-likelihood-dof <dof>: Degrees of freedom for student t likelihood function.

--refine-uncertainties: Use Evans’ 2011 error model from SCALA to correct uncertainties.

--iterations <iterations>: Number of gradient steps to take.

--learning-rate <learning_rate>: Adam learning rate. The default is 0.001

--beta-1 <beta_1>: Adam beta_1 param. The default is 0.9

--beta-2 <beta_2>: Adam beta_2 param. The default is 0.99

--positional-encoding-keys <positional_encoding_keys>: If the --positional-encoding-frequencies flag is set to an integer > 1, this parameter enables encoding a specific subset ofof mtz columns. Supply a comma separated string of metadata keys (ie “XDET,YDET”), and these keys will be encoded separately and appended to the rest of the metadata.

--positional-encoding-frequencies <positional_encoding_frequencies>, -L <positional_encoding_frequencies>: Number of positional encoding frequencies to apply to metadata. If you use this option, it should be paired with ‘mlp-width=’ in order to prevent the model from using too much memory.By default all metadata columns will be encoded using the same formula. To encode a subset of the columns, please seethe --positional-encoding-keys parameter

--kl-weight <kl_weight>: Set the weight of the kl divergence term relative to the likliehood. By default, by default this is based purely on the number of reflections.

--wilson-prior-b <wilson_prior_b>: This flag enables learning reflections on a particular Wilson scale. By default, the Wilson prior is flat across resolution bins.

--double-wilson-r <dwr>: For each input mtz, designate a prior correlation coefficient with its parent. Supply one float for each file separated by commas. Supply zero for each root node.for example, –double-wilson-r=0.,0.9.

--double-wilson-parents <parents>: For each input mtz, designate a parent upon which its prior is conditioned. Supply one integer for each file separated by commas. Supply None for root nodeswhich follow single Wilson priors. for example, –double-wilson-parents=None,0

--scale-file <scale_file>: Initialize the scale model weights from the ouput of a previous run. This argument should be a string beginning with the base filename used in the previous run and ending in _scale. For instance, if the previous run was called with careless mono [...] merge/hewl, the appropriate file name would be merge/hewl_scale.

--freeze-scales: Do not optimize the scale model weights.

--mlp-layers <mlp_layers>: The number of dense neural network layers in the scaling model. The default is 20 layers.

--mlp-width <mlp_width>: The width of the hidden layers of the neural net. This defaults to the dimensionality of the metadata array.

--image-layers <image_layers>: Add additional layers with local image-specific parameters.

--disable-image-scales: Do not learn a local scale param for each image.

--run-eagerly: Running tensorflow in eager mode may be required for high memory models.

--disable-gpu: Disable GPU for high memory models.

--gpu-id <gpu_id>: Specify the physical device used for acceleration. This is an integer from0 to num accelerators - 1. The default is zero. If --disable-gpu is set,this option is ignored.

--disable-memory-growth: Disable the experimental dynamic memory allocation.

--tf-debug: Increase the TensorFlow log verbosity by setting the TF_CPP_MIN_LOG_LEVEL environment variable.

--seed <seed>: Random number seed for consistent sampling.

careless poly

usage: careless poly [-h] [-l lambda_min lambda_max] [-w WAVELENGTH_KEY]
                     [--embed] [--mc-samples MC_SAMPLES]
                     [--structure-factor-file STRUCTURE_FACTOR_FILE]
                     [--freeze-structure-factors]
                     [--structure-factor-init-scale STRUCTURE_FACTOR_INIT_SCALE]
                     [--epsilon EPSILON] [--disable-metadata-standardization]
                     [--disable-progress-bar] [--test-fraction TEST_FRACTION]
                     [--merge-half-datasets]
                     [--half-dataset-repeats HALF_DATASET_REPEATS]
                     [--validation-frequency VALIDATION_FREQUENCY]
                     [-c ISIGI_CUTOFF] [-d DMIN] [--spacegroups SPACEGROUPS]
                     [--image-key IMAGE_KEY] [--intensity-key INTENSITY_KEY]
                     [--uncertainty-key UNCERTAINTY_KEY] [--anomalous]
                     [--separate-files] [--studentt-likelihood-dof DOF]
                     [--refine-uncertainties] [--iterations ITERATIONS]
                     [--learning-rate LEARNING_RATE] [--beta-1 BETA_1]
                     [--beta-2 BETA_2]
                     [--positional-encoding-keys POSITIONAL_ENCODING_KEYS]
                     [--positional-encoding-frequencies POSITIONAL_ENCODING_FREQUENCIES]
                     [--kl-weight KL_WEIGHT] [--wilson-prior-b WILSON_PRIOR_B]
                     [--double-wilson-r DWR] [--double-wilson-parents PARENTS]
                     [--scale-file SCALE_FILE] [--freeze-scales]
                     [--mlp-layers MLP_LAYERS] [--mlp-width MLP_WIDTH]
                     [--image-layers IMAGE_LAYERS] [--disable-image-scales]
                     [--run-eagerly] [--disable-gpu] [--gpu-id GPU_ID]
                     [--disable-memory-growth] [--tf-debug] [--seed SEED]
                     metadata_keys reflections.{mtz,stream}
                     [reflections.{mtz,stream} ...] out

metadata_keys: Metadata keys for scaling. This is expected to be a comma delimitted string

reflections.{mtz,stream}: Mtz or stream file(s) containing unmerged reflection observations. If you are supplying stream files, you must also use the –spacegroups option to supply the symmetry for merging.

out: Output filename base.

-h, --help: show this help message and exit

-l <lambda_min> <lambda_max>, --wavelength-range <lambda_min> <lambda_max>: Minimum and maximum wavelength for harmonic deconvolution in Ångstroms. If this is not supplied, Harmonics will be predicted out to the minimum and maximum wavelengths recorded in the mtz.

-w <wavelength_key>, --wavelength-key <wavelength_key>: Mtz column name corresponding to the reflections’ peak wavelength.

--embed: Drop to an IPython shell at the end of optimization to inpsect variables.

--mc-samples <mc_samples>: This is the number of samples to take per gradient step with default 1.

--structure-factor-file <structure_factor_file>: Initialize the structure factors from the ouput of a previous run. This argument should be a string beginning with the base filename used in the previous run and ending in _structure_factor. For instance, if the previous run was called with careless mono [...] merge/hewl, the appropriate filename to use would be merge/hewl_structure_factor.

--freeze-structure-factors: Do not optimize the structure factors.

--structure-factor-init-scale <structure_factor_init_scale>: A floating point number usually between 0 and 1. The width of the initial structure factor distribution is this timesthe standard deviation of the prior distribution. The default is 1.0.

--epsilon <epsilon>: A small constant added to the scale parameters of variational distributions for numerical stability. The default is 1e-7.

--disable-metadata-standardization: By default careless will convert metadata to z-scores. This flag disables that behavior. In general, unstandardized metadata will lead to unstable optimization. However, this flag might be useful if the user wants to use their own normalization scheme.

--disable-progress-bar: Disable the progress bar. This is helpful if you’re logging stderr.

--test-fraction <test_fraction>: Output model predictions for a held-out fraction of data. This should be used for model selection purposes. By default, no data will be held out during training.

--merge-half-datasets: After training, split the data in half randomly by image and merge each half using the scaling model learned on the training fraction. The output of the halves will be written to a file which can be used to estimate traditional CChalf type measures. The full data set will always be used to generate half data sets irrespective of the test fraction. One file is written for each merged mtz. The files have the *_xval_#.mtz suffix.

--half-dataset-repeats <half_dataset_repeats>: Number of times to Repeat the half dataset crossvalidation. By default this is one.

--validation-frequency <validation_frequency>: During training, set how frequently to evaluate the model on the test set. This is an integer >= 1 which defaults to 10 for once every 10 steps.

-c <isigi_cutoff>, --isigi-cutoff <isigi_cutoff>: Minimum I over Sigma(I) for included reflections. Default is to include all reflections

-d <dmin>, --dmin <dmin>: Maximum resolution in Ångstroms. If this is not supplied,reflections will be merged out to the highest resolution reflection present in the input.

--spacegroups <spacegroups>: The spacegroup(s) to use for merging. You may either supply a single spacegroup which will be used for every input reflection file or a comma-separated list of spacegroups with the same length as the number of reflection files. For example –spacegroups=”P 21 21 21” or –spacegroups=”P 21 21 21,P 1 21 1”

--image-key <image_key>: The name of the key indicating image number for each data set. If no key is given, careless will use the first key with the BATCH dtype.

--intensity-key <intensity_key>: What key to use for reflection intensities. If no key is given, careless will use the first key with the intensity dtype.

--uncertainty-key <uncertainty_key>: What key to use for reflection error estimates. If no key is given, careless will first check for a key beginning with ‘SIG’ or ‘Sig’ which matches the intensity key (e.g. Iobs -> SigIobs). Failing that, careless will ust first key with the StdDev dtype.

--anomalous: If this flag is supplied, Friedel mates will be kept separate.

--separate-files: Use this flag to produce a separate output for each input mtz.In this mode, the data will be ‘scaled’ together and ‘merged’ separately.The default is to merge all the files into a single output.

--studentt-likelihood-dof <dof>: Degrees of freedom for student t likelihood function.

--refine-uncertainties: Use Evans’ 2011 error model from SCALA to correct uncertainties.

--iterations <iterations>: Number of gradient steps to take.

--learning-rate <learning_rate>: Adam learning rate. The default is 0.001

--beta-1 <beta_1>: Adam beta_1 param. The default is 0.9

--beta-2 <beta_2>: Adam beta_2 param. The default is 0.99

--positional-encoding-keys <positional_encoding_keys>: If the --positional-encoding-frequencies flag is set to an integer > 1, this parameter enables encoding a specific subset ofof mtz columns. Supply a comma separated string of metadata keys (ie “XDET,YDET”), and these keys will be encoded separately and appended to the rest of the metadata.

--positional-encoding-frequencies <positional_encoding_frequencies>, -L <positional_encoding_frequencies>: Number of positional encoding frequencies to apply to metadata. If you use this option, it should be paired with ‘mlp-width=’ in order to prevent the model from using too much memory.By default all metadata columns will be encoded using the same formula. To encode a subset of the columns, please seethe --positional-encoding-keys parameter

--kl-weight <kl_weight>: Set the weight of the kl divergence term relative to the likliehood. By default, by default this is based purely on the number of reflections.

--wilson-prior-b <wilson_prior_b>: This flag enables learning reflections on a particular Wilson scale. By default, the Wilson prior is flat across resolution bins.

--double-wilson-r <dwr>: For each input mtz, designate a prior correlation coefficient with its parent. Supply one float for each file separated by commas. Supply zero for each root node.for example, –double-wilson-r=0.,0.9.

--double-wilson-parents <parents>: For each input mtz, designate a parent upon which its prior is conditioned. Supply one integer for each file separated by commas. Supply None for root nodeswhich follow single Wilson priors. for example, –double-wilson-parents=None,0

--scale-file <scale_file>: Initialize the scale model weights from the ouput of a previous run. This argument should be a string beginning with the base filename used in the previous run and ending in _scale. For instance, if the previous run was called with careless mono [...] merge/hewl, the appropriate file name would be merge/hewl_scale.

--freeze-scales: Do not optimize the scale model weights.

--mlp-layers <mlp_layers>: The number of dense neural network layers in the scaling model. The default is 20 layers.

--mlp-width <mlp_width>: The width of the hidden layers of the neural net. This defaults to the dimensionality of the metadata array.

--image-layers <image_layers>: Add additional layers with local image-specific parameters.

--disable-image-scales: Do not learn a local scale param for each image.

--run-eagerly: Running tensorflow in eager mode may be required for high memory models.

--disable-gpu: Disable GPU for high memory models.

--gpu-id <gpu_id>: Specify the physical device used for acceleration. This is an integer from0 to num accelerators - 1. The default is zero. If --disable-gpu is set,this option is ignored.

--disable-memory-growth: Disable the experimental dynamic memory allocation.

--tf-debug: Increase the TensorFlow log verbosity by setting the TF_CPP_MIN_LOG_LEVEL environment variable.

--seed <seed>: Random number seed for consistent sampling.