Downloading HMI intensity grams to match a set of HARPS-N data timestamps

I am attempting to download a suite of hmi_Ic_45s files to match HARPS-N solar observation timestamps. Is there a simple python code that has been created to do this. I have started to write a code that will do this, but I keep getting 70+ hmi files and I only have 64 HARPS-N files. Am I pushing a cart sideways up a hill. Cheers

Hello @Petra63,

I am not sure if what I will say will help but if you just need to download HMI files for that specific series, you should be able to use sunpy’s Fido to get that. I forgot if the Virtual Solar Obseractory directly carries that series but you can use Finding and Downloading Data from JSOC — sunpy 6.1.1 documentation to get that dataset for sure.

Cheers

Hi there and thank you for the response

I am basically following all the best practices from the SunPy docs: specifying the series, restricting to “continuum,” picking a single camera, filtering by “QUALITY=0,” and retrieving only the closest record in time.

The link does not reveal anything new for my situation beyond that: I think I am covering everything. Why I am getting more than 64 test files??

maybe it’s typically one of:

  1. Multiple files in the same row (two continuum versions).
  2. Some leftover from a previous run.
  3. Reprocessed vs. near‐realtime versions both being offered by the JSOC.

ill try printing the file URLs that I fetch for each HARPS time, to see exactly where the “extra 12” come from—and then I can decide how to filter them out or ignore them. BUT having said that if anyone has any other suggestions that will allow me to just download the hmi for each HARPSN observation exactly - I would be thrilled. : -)
Cheers

Can you provide the code you are running?

I will see if I can work out what is going on.

#!/usr/bin/env python3

“”"
This script matches HMI Ic_45s data (camera=2, continuum only) to HARPS-N observations
for 2015-09-01, ensuring QUALITY == 0. It then writes a CSV of matched times and files.

It also disables most logging and progress bars:

  • sets drms, sunpy, parfive loggers to ERROR
  • uses a parfive Downloader with progress=False for silent downloads
    “”"

import os
import glob
import csv
import numpy as np
import sunpy.map
from astropy.time import Time
import astropy.units as u
from sunpy.net import Fido, attrs as a
import logging
from parfive import Downloader

-----------------------------------------------------------------------------

this will suppress some of the logging info I am getting

-----------------------------------------------------------------------------

logging.getLogger(“drms”).setLevel(logging.ERROR)
logging.getLogger(“sunpy”).setLevel(logging.ERROR)
logging.getLogger(“parfive”).setLevel(logging.ERROR)

-----------------------------------------------------------------------------

Sentinel Check: Exit if Already Processed

-----------------------------------------------------------------------------

base_folder = os.path.expanduser(“~/Desktop/Solar_Comparison_Test/”)
results_folder = os.path.join(base_folder, “Results”)
sentinel_path = os.path.join(results_folder, “matching_done.txt”)
if os.path.exists(sentinel_path):
print(“Matching has already been performed. Exiting.”)
exit(0)

-----------------------------------------------------------------------------

1) Folder Setup

-----------------------------------------------------------------------------

harps_folder = os.path.join(base_folder, “HARPSN/”)
hmi_folder = os.path.join(base_folder, “HMI/”)
os.makedirs(results_folder, exist_ok=True)

-----------------------------------------------------------------------------

2) Loading HARPS-N Timestamps in the test folder

-----------------------------------------------------------------------------

harps_files = sorted(glob.glob(os.path.join(harps_folder, “r.HARPN.2015-09-01T*.fits”)))
harps_times =
for file in harps_files:
stamp_str = os.path.basename(file).split(“r.HARPN.”)[-1].split(“_S1D_A_filtered.fits”)[0]
date_part, time_part = stamp_str.split(“T”)
time_part = time_part.replace(“-”, “:”)
iso_str = f"{date_part}T{time_part}"
harps_times.append(iso_str)
harps_times = Time(harps_times, format=“isot”, scale=“utc”)
print(f" Loaded {len(harps_times)} HARPS-N timestamps.")

-----------------------------------------------------------------------------

3) Just a convenience step built in to know how many files I already have before fetching new data from JSOC.

-----------------------------------------------------------------------------

hmi_files = sorted(glob.glob(os.path.join(hmi_folder, “*.fits”)))
print(f"Found {len(hmi_files)} HMI FITS files in {hmi_folder}.")
if len(hmi_files) == 0:
print(“No local HMI files found. We’ll fetch from JSOC as needed.”)

hmi_times =
for hf in hmi_files:
try:
hmi_map = sunpy.map.Map(hf)
hmi_times.append(Time(hmi_map.date)) # Already UTC
except Exception:
pass
print(f" Loaded {len(hmi_times)} existing local HMI intensitygrams (if any).")

-----------------------------------------------------------------------------

4) Prepares a Downloader That Doesn’t Show Progress

-----------------------------------------------------------------------------

dl = Downloader(max_conn=1, progress=False)

-----------------------------------------------------------------------------

5) Query & Download HMI for Each HARPS-N Timestamp

-----------------------------------------------------------------------------

jsoc_email = “my emai address” # Your JSOC-registered email
time_window_half = 5 * u.second
matched_info = # Will store (HARPS-N time, downloaded HMI filename)

def parse_jsoc_time(tstr):
“”"
Convert something like ‘2015.09.01_09:39:45_TAI’
into ‘2015-09-01T09:39:45’.
“”"
tstr = tstr.replace(“TAI", “”)
parts = tstr.split("
”)
date_part = parts[0].replace(“.”, “-”)
time_part = parts[1]
return f"{date_part}T{time_part}"

for t in harps_times:
# ±5 second window
window = a.Time(t - time_window_half, t + time_window_half)

# Only specify recognized prime keys:
result = Fido.search(
    window,
    a.jsoc.Series("hmi.Ic_45s"),
    a.jsoc.Notify(jsoc_email),
    a.jsoc.Segment("continuum"),
    a.jsoc.PrimeKey("CAMERA", 2)
)

if len(result) == 0:
    # No data found in that 10 s window
    continue

table = result[0]
if "QUALITY" in table.colnames:
    # Filter out rows with nonzero QUALITY
    table = table[table["QUALITY"] == 0]
    if len(table) == 0:
        continue

# Convert T_REC or Start Time to Astropy Time
try:
    times_iso = [parse_jsoc_time(x) for x in table["T_REC"]]
except KeyError:
    times_iso = [parse_jsoc_time(x) for x in table["Start Time"]]
query_times = Time(times_iso, format="isot", scale="utc")

# Pick the single closest row
idx = np.argmin(np.abs(query_times - t))
single_result = table[idx:idx+1]

# Download that single row's data (could be 1 or 2 files)
files = Fido.fetch(
    single_result,
    path=os.path.join(hmi_folder, "{file}"),
    downloader=dl
)

# -- ****Here's the key part: if multiple files come down, keep only the first --
if len(files) > 1:
    files = [files[0]]

if len(files) > 0:
    matched_info.append((t.isot, os.path.basename(files[0])))

print(f" Completed downloads. Found matches for {len(matched_info)} HARPS times.")

-----------------------------------------------------------------------------

6) Save Matched Timestamps to CSV

-----------------------------------------------------------------------------

csv_path = os.path.join(results_folder, “matched_times.csv”)
with open(csv_path, “w”, newline=“”) as f:
writer = csv.writer(f)
writer.writerow([“HARPS-N Time (UTC)”, “Downloaded HMI File”])
for harps_time, hmi_file in matched_info:
writer.writerow([harps_time, hmi_file])
print(f" Saved matched timestamps to {csv_path}")

-----------------------------------------------------------------------------

7) Create Sentinel File

-----------------------------------------------------------------------------

with open(sentinel_path, “w”) as f:
f.write(“Matching completed successfully.\n”)
print(f" Sentinel file created at {sentinel_path}")

Hello @Petra63,

So I don’t have any SHARPS data to run the full code. When you say SHARPs-N, you mean the NRT data?

One issue with the downloading with FIDO and the JSOCClient is that slicing the table and passing that into Fido will result in the original table being downloaded. We do not issue any warning about this, so its very common to be caught out our even forget about this behaviour.

You will have to match the query exactly before you download if you just want a subset of files.
So if you do need to filter by quality, you should pass that into the query:

result = Fido.search(
    a.Time(
        Time("2015-09-01T00:00:00"),
        Time("2015-09-01T00:01:00"),
    ),
    a.jsoc.Series("hmi.Ic_45s"),
    a.jsoc.Notify(jsoc_email),
    a.jsoc.Segment("continuum"),
    a.jsoc.PrimeKey("CAMERA", 2),
    a.jsoc.Keyword("QUALITY") == 0,
)

I am not sure if this will fix your original issue.