Plot OMNI dataset from sunpy

Sundodo · November 20, 2023, 11:50am

I am trying to plot various quantities between two specific dates from the Omni dataset. Here is an excerpt from my code:

import numpy as np
import matplotlib.pyplot as plt
from sunpy.net import Fido
from sunpy.net import attrs as a

def convert_to_datetime(year, day, hour,minute):
    return datetime(int(year), 1, 1) + timedelta(days=day - 1, hours=hour, minutes=minute)

  trange = a.Time(start_time, end_time)
  dataset = a.cdaweb.Dataset("OMNI_HRO2_1MIN")

  result = Fido.search(trange, dataset)

  files = Fido.fetch(result)
  cdf_data = cdflib.CDF(files[0])

  var_list = ['Epoch', 'YR', 'Day','HR', 'Minute','Vx','Vy','Vz','BX_GSE','BY_GSE','BZ_GSE','proton_density']

  data_dict = {}

  for var in var_list:
      data = cdf_data.varget(var)
      data_dict[var] = data

  data = pd.DataFrame(data_dict)
  data['Calculated_Datetime'] = data.apply(lambda row: convert_to_datetime(row['YR'], row['Day'], row['HR'], row['Minute']), axis=1)

  data.set_index('Calculated_Datetime', inplace=True)

  data = data[start_time:end_time]

  data['V'] = np.linalg.norm(np.vstack((data['Vx'], data['Vy'], data['Vz'])), axis=0)
  data['B'] = np.linalg.norm(np.vstack((data['BX_GSE'], data['BY_GSE'], data['BZ_GSE'])), axis=0)

  fig, ax = plt.subplots(len(Label.keys()), 1, figsize=figsize, sharex=True, )
  for ax, col in zip(ax, Label.keys()):
      ax.plot(data.index, data[col], linewidth=1,label='In-situ',linestyle='dotted',color='black')

The file downloads correctly, but the plot I obtain shows aberrant values for different quantities, resulting in vertical lines.

Could this be due to incorrect retrieval of the date (creation of my Calculated_Datetime column) ?

I should mention that I used to get a perfect curve when I created my DataFrame with data = heliopy.data.omni.hro2_1min(start_time, end_time).to_dataframe(), but now that heliopy is deprecated, I am turning to sunpy.

Thanks in advance

nabobalis · November 23, 2023, 1:14am

Hello @Sundodo,

So I ran this modified version:

import numpy as np
import matplotlib.pyplot as plt
from sunpy.net import Fido
from sunpy.net import attrs as a
from datetime import datetime, timedelta
import pandas as pd
import cdflib

def convert_to_datetime(year, day, hour,minute):
    return datetime(int(year), 1, 1) + timedelta(days=day - 1, hours=hour, minutes=minute)

start_time = datetime(2023, 7, 14, 0, 0)
end_time = datetime(2023, 7, 20, 0, 0)
trange = a.Time(start_time, end_time)
dataset = a.cdaweb.Dataset("OMNI_HRO2_1MIN")

result = Fido.search(trange, dataset)

files = Fido.fetch(result)
cdf_data = cdflib.CDF(files[0])

var_list = ['Epoch', 'YR', 'Day','HR', 'Minute','Vx','Vy','Vz','BX_GSE','BY_GSE','BZ_GSE','proton_density']

data_dict = {}

for var in var_list:
    data = cdf_data.varget(var)
    data_dict[var] = data

data = pd.DataFrame(data_dict)
data['Calculated_Datetime'] = data.apply(lambda row: convert_to_datetime(row['YR'], row['Day'], row['HR'], row['Minute']), axis=1)

data.set_index('Calculated_Datetime', inplace=True)

data = data[start_time:end_time]

data['V'] = np.linalg.norm(np.vstack((data['Vx'], data['Vy'], data['Vz'])), axis=0)
data['B'] = np.linalg.norm(np.vstack((data['BX_GSE'], data['BY_GSE'], data['BZ_GSE'])), axis=0)

figsize = (10, 10)
fig, ax = plt.subplots(len(data_dict.keys()), 1, figsize=figsize, sharex=True, )
for ax, col in zip(ax, data_dict.keys()):
    ax.plot(data.index, data[col], linewidth=1,label='In-situ',linestyle='dotted',color='black')

plt.show()

I get this plot (which is quite different):

I see the abnormal values for this time period that is return the cdflib reader, so unless that has a bug, I can’t see this as a problem with the code but as a data “problem”.

With these dates, does heliopy product a clean plot? If so, we should drill down into heliopy to see if it does any data cleaning.

Not really the most helpful response.

Sundodo · November 23, 2023, 7:59am

First of all, thank you for responding.

If I didn’t have the plot obtained with Heliopy, I would have thought like you that it was a data problem.

But here’s the plot obtained by acquiring the data via : data = heliopy.data.omni.hro2_1min(start_time, end_time).to_dataframe()

As you can see, it’s the same pattern, but without the outliers.

But finally, I found the difference :

In the CDF file, for each variable, there is the “FILLVAL” attribute which gives the value obtained when there is a measurement error. It is then sufficient to replace for each variable, this value with np.nan to obtain the same plot as HELIOPY.

    for var in var_list:
        data = cdf_data.varget(var)
        FILLVAL=float(cdf_data.varattsget(var)['FILLVAL'])
        try:
            data[data==FILLVAL]=np.nan
        except ValueError:
            pass
        data_dict[var] = data

nabobalis · November 23, 2023, 2:50pm

Ah, I forgot about the fill value. That would explain it.

Thanks for finding that.

Topic		Replies	Views
Fetching daily OMNI dataset from sunpy SunPy sunpy , question	2	354	March 27, 2023
Plotting RHESSI Hard X-Ray emission light curve SunPy sunpy , astropy , fits , question	8	57	August 21, 2024
ScaleValueError when trying to create sunpy plot from Hinode XRT image SunPy question	7	366	July 8, 2022
AttributeError: 'datetime.datetime' object has no attribute 'astype' SunPy sunpy , question	4	450	December 6, 2023
Overplotting HEK feature/event polygons on a map (sunpy gallery example) SunPy sunpy , question	3	370	March 30, 2023

Plot OMNI dataset from sunpy

Related topics