Enduserdocs

From ADIOS API

Jump to: navigation, search

ADaptable IO System (ADIOS) initially developed at ORNL by Jay Lofstead and Scott Klasky, Summer 2007

This was built on work previously done by

  • Hasan Abbasi, Karsten Schwan, and Matt Wolf (GT) - PBIO portals & infiniband
  • Ciprian Docan, Manish Parashar (Rutgers) - DART
  • Chen Jin (Northwestern) - data tagging format and code for reading and writing

The major goals of the API are threefold:

  • Provide a simplified, easy to use API for scientists to write their IO operations
  • Deliver enhanced I/O and code performance through both asynchronous techniques and best practices implementations of IO routines
  • Provide a stable interface platform for experimentation in the IO space for existing scientific codes at scale without requiring any changes to the scientific codes

The API consists of two main parts:

  • The programmatic interface for Fortran and also usable for C and other C-linkable languages. The routines have the same names and parameters except that the Fortran routines have an additional error return parameter at the end that is represented by the return value of the C routine.
  • An XML configuration file for defining the IO types and methods.

Programmatic Interface:

  • adios_init (filename) - [required] load the XML configuration file creating internal representations of the various data types and defining the methods used for writing.
  • adios_open (io_handle, group_name, filename, mode) - [required] prepare a data type for subsequent calls to write data using the io_handle. Mode is one of "r" (read), "w" (write), "a" (append), "u" (update [a future feature]).
  • adios_write (io_handle, field_name, var) - [required] submit a data element for writing and associate it with the given field_name for this type. When this call returns, the value has been either buffered for a single, large output or written directly if insufficient buffer space is available.
  • adios_get_write_buffer (io_handle, field_name, size, buffer) - [optional] for the given field, get a buffer that will be used at the transport level for it of the given size. If size == 0, then auto calculate the size based on what is known from the datatype in the XML file and any provided additional elements (such as array dimension elements). To return this buffer, just do a normal call to adios_write using the same io_handle, field_name, and the returned buffer.
  • adios_set_path (io_handle, path) - [optional] set the HDF-5-style path for all vars in a group. This will reset whatever is specified in the XML file.
  • adios_set_path_var (io_handle, path, var) - [optional] set the HDF-5-style path for the specified var in the group. This will reset whatever is specified in the XML file.
  • adios_read (io_handle, field_name, var, size) - submit a buffer space (var) for reading a data element into. This does NOT actually perform the read. Actual population of the buffer space will happen on the call to adios_close
  • adios_close (io_handle) - [required] trigger the building of the buffer for transfer and then returns control back to the caller. At this point, all of the data is copied and will be sent as-is downstream. [experimental] If the handle is opened for read, this will cause the fetch of the data, parse it, and populate it into the provided buffers. This is currently hard-coded to use posix io calls.
  • adios_end_iteration () - [optional] a tick counter for the IO routines to time how fast they are emptying the buffers.
  • adios_start_calculation () - [optional] an indicator that it is now an ideal time to do bulk data transfers as the code will not be performing IO for a while.
  • adios_stop_calculation () - [optional] an indicator that it is no longer a good time to do bulk data transfers as the code is about to start doing communication with other nodes causing possible conflicts.
  • adios_allocate_buffer () - [required/optional] tells the API to allocate the write buffers now. This is used in conjunction with the configuration file to determine the size and wether or not this all is required.
  • adios_finalize () - [required] cleanup anything remaining before exiting the code

BP File read commands (for an MxN read of BP formatted data):

  • adios_fopen (handle, filename, comm)
  • adios_fclose (handle)
  • adios_inq_file (handle, file_info)
  • adios_print_fileinfo (file_info)
  • adios_init_fileinfo (file_info)
  • adios_free_fileinfo (file_info)
  • adios_print_groupinfo (group_info)
  • adios_init_groupinfo (group_info)
  • adios_free_groupinfo (group_info)
  • adios_gopen (handle, group_handle, group_name)
  • adios_inq_group (group_handle, group_info)
  • adios_get_var (group_handle, varname, var_buffer, start, readsize, timestep)
  • adios_inq_var (group_handle, varname, type, ndim, is_timebased, dims)
  • bp_type_to_string (type)


XML file format and elements: format: <element-name attr1 attr2 ...> with descriptions to follow. Formatted like an XML document.

<adios-config host-language> - root element for the entire file

  • host-language - [optional]. Default "Fortran". Either "Fortran" or "C". This is an indicator for MPI handle conversion.

<adios-group name coordination-communicator coordination-var host-language time-index> - a grouping element for a datatype used for a write operation (such as a restart or diagnostics data set)

  • name - the name used to select this type from within the code
  • coordination-communicator - [optional] the name of the var that contains the communicator used for coordinated writes
  • coordination-var - [optional] the name of the var that can be used to perform the grouping/coordination downstream from the compute nodes
  • host-language - [optional]. Default "Fortran". Either "Fortran" or "C". This is an indicator for MPI handle conversion.
  • time-index - [optional]. Default none. The name of the variable that indicates the progress of time in the source code. This is used to organize the process groups into a set that logically belong to the same parallel output.

<global-bounds dimensions offsets> - [optional] enclosing var element(s) within a global-bounds specifies how those var(s) map into a global space. Use the coordination-* attributes of the adios-group to collate the vars into a single whole.

  • dimensions - the global array sizes for each dimension. Follows the same standard as the var dimension (below)
  • offsets - the offset the enclosed var(s) should have in this global space

<var name path type dimensions gname/> - non-vector data types

  • name - name of this element
  • path - HDF-5-style path
  • type - data type. Currently supported values (size): byte (1-byte), integer (4-byte), real (4-byte), string, real*8 (8-byte), double (8-byte), integer*4 (4-byte), integer*8 (8-byte), long (8-byte), real*4 (4-byte), complex (8-byte (2 reals)), complex double (16-bytes (2 doubles), and unsigned versions of the integer types (prepend "unsigned " in front of the type name).
  • dimensions - a comma separated list of numbers and/or names that correspond to var elements to determine the size of this item.
  • gname - [optional] Default none. Used by gpp to generate the proper source code expression for writing this variable.

</global-bounds>

<attribute name path type var value/>

  • name - name of the attribute
  • path - HDF-5-style path of the element (var) or group to which this attribute is attached
  • type - [optional] Default="string". data type of this attribute. See the type of the var element for a complete list of supported types.
  • var - [optional] var value name this value will be provided through.
  • value - [optional] value for the attribute.

Either var or value must be provided, but not both.

The mesh elements below are experimental and not supported at this time: <mesh type time-varying>

  • type - this changes the expected contents and must be one of these 4 values (expected contents): "uniform" (dimensions, origin, spacing), "rectilinear" (dimensions, coordinates-multi-var or coordinates-single-var), "strutured" (nspace, dimensions, points-single-var or points-multi-var), or "unstructured" (points, one or more of uniform-cells and mixed-cells).
  • time-varying - does this mesh change over time. Valid values are "yes" and "no". It defaults to "no". If it does not vary then it should generally only be written the first time writes are done.

"uniform" <dimensions value/>

  • value - magnitude of the space to mesh

<origin value/>

  • value - origin of the space to mesh

<spacing value/>

  • value - spacing (size) of each mesh element

"rectilinear" <dimensions value/>

  • value - number of points in each dimension

<coordinate-single-var value/>

  • value - a single multi-dimensional array that lists the points for all dimensions

<coordinate-multi-var value/>

  • value - comma separated list of array vars that list the points for each dimension

"structured" <nspace value/>

  • value - number of dimensions in mesh

<dimensions value/>

  • value - count of points in each dimension

<points-single-var value/>

  • value - a single multi-dimensional array that lists the points for all dimensions

<points-multi-var value/>

  • value - comma separated list of array vars that list the points for each dimension

"unstructured" <points components number-of-points value/>

  • components - number of dimensions in each point
  • number-of-points - how many points will be provided
  • value - one dimensional array of values that will be interpreted in components-sized groups as coordinates. Numbered from 1

<uniform-cells count data type/>

  • count - number of cells to look for in the value
  • data - a list of points that correspond to entries in the points value element. There are no shape entries in this list
  • type - the vtk cell shape to interpret the data using

<mixed-cells count data types/>

  • count - number of cells to look for in the value
  • data - a one dimensional integer list of point count and point lists for the cells
  • types - the list of the vtk cell shape types for interpreting the data

</mesh>

<gname src/> [optional] - Provide source code to interperse among the calls to adios_write. Multiple, carefully placed instances of this element would be used for conditionals or other operations.

  • src - the text to insert into the gpp generated file.

</adios-group>

<method type method priority iterations>parameters</method> - mapping a writing method to a data type including any initialization. One or more of these should be provided for each data-group. If more than one is provided, all will be used.

  • group - corresponds to a datatype specified earlier in the file
  • method - a string indicating the method to use. Currently supported values: MPI, PBIO, DART, POSIX, NULL (no io).
  • priority - [optional] a numeric priority for the IO methods to better schedule this write with others that may be pending currently
  • iterations - [optional] a number of iterations between writes of this type used to gauge how quickly this data should be evacuated from the compute node
  • base-path - [optional] the root path to use as a starting point for writes. This will be prepended to filenames, in most cases.
  • parameters - [optional] a string passed to the method for initialization.

</method>

<buffer size-MB free-memory-percentage allocate-time/> - internal buffer sizing and creation time

  • size-MB - the number of MB to allocate for buffering. Either size-MB or free-memory-percentage is required.
  • free-memory-percentage - the percentage of free ram to allocate for buffering. Either size-MB or free-memory-percentage is required.
  • allocate-time - either 'now' or 'oncall' to indicate when the buffer should be allocated. 'oncall' will wait until the programmer decides that all memory needed for calculation has been allocated and will then call adios_allocate_buffer ()

</adios-config>

NOTES:

  • Name elements in the XML file are just strings. The only restrictions are that if the item is to be used in a dataset dimension, it must not contain a comma and must contain at least one non-numeric character. This is useful for putting expressions as various dimensions.
  • For non-unique names within an adios group, use a standard '/'-based path notation to differentiate which instance of the name should be referenced. This is supported throughout the parsing.
  • As much as possible efforts were made to make the parsing case insensitive. The only known restriction to this is that the open and close tags for an element must have the same case or the XML parser will not recognize them as matching.
  • There is a mailing list for those interested in developments about this API available at [[1]]
  • The format for the .bp files is the Dataformat
  • A semi-public SVN repository of the code is available at ORNL. Please contact us for details about getting access.
Personal tools