Plot OMNI dataset from sunpy

I am trying to plot various quantities between two specific dates from the Omni dataset. Here is an excerpt from my code:

import numpy as np
import matplotlib.pyplot as plt
from sunpy.net import Fido
from sunpy.net import attrs as a

def convert_to_datetime(year, day, hour,minute):
    return datetime(int(year), 1, 1) + timedelta(days=day - 1, hours=hour, minutes=minute)

  trange = a.Time(start_time, end_time)
  dataset = a.cdaweb.Dataset("OMNI_HRO2_1MIN")

  result = Fido.search(trange, dataset)

  files = Fido.fetch(result)
  cdf_data = cdflib.CDF(files[0])

  var_list = ['Epoch', 'YR', 'Day','HR', 'Minute','Vx','Vy','Vz','BX_GSE','BY_GSE','BZ_GSE','proton_density']

  data_dict = {}

  for var in var_list:
      data = cdf_data.varget(var)
      data_dict[var] = data

  data = pd.DataFrame(data_dict)
  data['Calculated_Datetime'] = data.apply(lambda row: convert_to_datetime(row['YR'], row['Day'], row['HR'], row['Minute']), axis=1)

  data.set_index('Calculated_Datetime', inplace=True)

  data = data[start_time:end_time]

  data['V'] = np.linalg.norm(np.vstack((data['Vx'], data['Vy'], data['Vz'])), axis=0)
  data['B'] = np.linalg.norm(np.vstack((data['BX_GSE'], data['BY_GSE'], data['BZ_GSE'])), axis=0)

  fig, ax = plt.subplots(len(Label.keys()), 1, figsize=figsize, sharex=True, )
  for ax, col in zip(ax, Label.keys()):
      ax.plot(data.index, data[col], linewidth=1,label='In-situ',linestyle='dotted',color='black')

The file downloads correctly, but the plot I obtain shows aberrant values for different quantities, resulting in vertical lines.

Could this be due to incorrect retrieval of the date (creation of my Calculated_Datetime column) ?

I should mention that I used to get a perfect curve when I created my DataFrame with data = heliopy.data.omni.hro2_1min(start_time, end_time).to_dataframe(), but now that heliopy is deprecated, I am turning to sunpy.

Thanks in advance

Hello @Sundodo,

So I ran this modified version:

import numpy as np
import matplotlib.pyplot as plt
from sunpy.net import Fido
from sunpy.net import attrs as a
from datetime import datetime, timedelta
import pandas as pd
import cdflib

def convert_to_datetime(year, day, hour,minute):
    return datetime(int(year), 1, 1) + timedelta(days=day - 1, hours=hour, minutes=minute)

start_time = datetime(2023, 7, 14, 0, 0)
end_time = datetime(2023, 7, 20, 0, 0)
trange = a.Time(start_time, end_time)
dataset = a.cdaweb.Dataset("OMNI_HRO2_1MIN")

result = Fido.search(trange, dataset)

files = Fido.fetch(result)
cdf_data = cdflib.CDF(files[0])

var_list = ['Epoch', 'YR', 'Day','HR', 'Minute','Vx','Vy','Vz','BX_GSE','BY_GSE','BZ_GSE','proton_density']

data_dict = {}

for var in var_list:
    data = cdf_data.varget(var)
    data_dict[var] = data

data = pd.DataFrame(data_dict)
data['Calculated_Datetime'] = data.apply(lambda row: convert_to_datetime(row['YR'], row['Day'], row['HR'], row['Minute']), axis=1)

data.set_index('Calculated_Datetime', inplace=True)

data = data[start_time:end_time]

data['V'] = np.linalg.norm(np.vstack((data['Vx'], data['Vy'], data['Vz'])), axis=0)
data['B'] = np.linalg.norm(np.vstack((data['BX_GSE'], data['BY_GSE'], data['BZ_GSE'])), axis=0)

figsize = (10, 10)
fig, ax = plt.subplots(len(data_dict.keys()), 1, figsize=figsize, sharex=True, )
for ax, col in zip(ax, data_dict.keys()):
    ax.plot(data.index, data[col], linewidth=1,label='In-situ',linestyle='dotted',color='black')

plt.show()

I get this plot (which is quite different):

I see the abnormal values for this time period that is return the cdflib reader, so unless that has a bug, I can’t see this as a problem with the code but as a data “problem”.

With these dates, does heliopy product a clean plot? If so, we should drill down into heliopy to see if it does any data cleaning.

Not really the most helpful response.

First of all, thank you for responding.

If I didn’t have the plot obtained with Heliopy, I would have thought like you that it was a data problem.

But here’s the plot obtained by acquiring the data via : data = heliopy.data.omni.hro2_1min(start_time, end_time).to_dataframe()

As you can see, it’s the same pattern, but without the outliers.

But finally, I found the difference :

In the CDF file, for each variable, there is the “FILLVAL” attribute which gives the value obtained when there is a measurement error. It is then sufficient to replace for each variable, this value with np.nan to obtain the same plot as HELIOPY.

    for var in var_list:
        data = cdf_data.varget(var)
        FILLVAL=float(cdf_data.varattsget(var)['FILLVAL'])
        try:
            data[data==FILLVAL]=np.nan
        except ValueError:
            pass
        data_dict[var] = data

1 Like

Ah, I forgot about the fill value. That would explain it.

Thanks for finding that.