Hi all - I have been funded by a NumFocus small development grant to investigate what is the most appropriate object is for sunpy
to store timeseries data in. This comes with the context of
- sunpy’s current choice,
pandas.DataFrame
having a number of drawbacks - There not being a one-size fits all data format used in solar physics for timeseries data (compared to imaging data where FITS is a relatively common standard)
- Historically less development work being done on sunpy’s support for timeseries data (compared to imaging data)
The first steps in my project are to identify user requirements, and identify possible options for storing our data. Later in the project I will bring these together, evaulating each option against the requirements and coming up with a recommendation for us to discuss as a community.
With minimal engagement, my findings so far are below. Please do read, and if you have any additional requirements for timeseries data, leave a comment below in this thread. I have also tried to list options for the container - again, please read and leave comments in this thread!
User requirements for a timeseries data container in sunpy
Requirement | Notes |
---|---|
Store data that is a function of time | This means the time column should be treated as the index or coordinates to the data, and be stored as a time-like type. |
Handle different time scales | Data can have times defined in a variety of different time scales (e.g. UTC, TAI) |
Store multi-dimensional data | Although time is a common index to timeseries data, it isn’t always the only one. As an exapmle, velocity distribution functions measured in the solar wind are 4D datasets, with data as a function of time and three dimensions in velocity space. |
Handle time scales with leapseconds | Some timescales can contain timestamps that occur within a leapsecond. |
Store and use physical units with the data and any non-time indices | |
Store data in a format that can be used with scientific Python libraries | |
Support for storing out-of memory datasets | |
Store metadata alongside actual data | |
Have a way to store an observer coordinate alongside the time index | |
Have an easy way to do common data manipulation tasks | e.g. interpolating, resampling, rebinning |
Have a way to combine multiple timeseries objects, and keep track of metadata | |
Ability to convert to other common time series objects (e.g. pandas.DataFrame ) |
|
Functionality for loading and saving out to common file formats |
Options for a timeseries data container
astropy.timeseries.TimeSeries
pandas.DataFrame
-
xarray.DataArray
(orxarray.DataSet
) numpy.ndarray
ndcube