Questions about writing a scraper client for XRT data

I’m trying to write a fairly basic scraper client for downloading XRT data using Fido. One thing I’m unclear about is how to specify parts of the baseurl that users don’t need to specify. So, I doubt that users will specify the seconds or tenths of a second, which are part of the filenames. So for Level 1 data I came up with this as the baseurl:r"https://umbra.nascom.nasa.gov/hinode/xrt/level1/%Y/%m/%d/%H00/L1_XRT%Y%m%d_%H%M(\d){2}.(\d){1}.fits
The (\d){2} and (\d){1} are supposed to correspond to the seconds and tenths of seconds and match any digits. Is that right?
Also, is there more to registering the instrument than adding an import statement in __init__.py under sunpy/net/dataretriever?

1 Like

Hello @jdslavin,

The (\d){2} and (\d){1} are supposed to correspond to the seconds and tenths of seconds and match any digits. Is that right?

That is my understanding of how this is meant to work.
Specifically, it should match two and one digits.

We are working on making this process simpler and more documented with a GSoC project.

Also, is there more to registering the instrument than adding an import statement in __init__.py under sunpy/net/dataretriever?

If you are adding this client to sunpy, then yes.

If you are writing this client externally, then you do not need to modify any code in sunpy to add the client, or to register instruments.
The client should automatically work as long as your Client inherits GenericClient.
You need to import your Python library before you import Fido, which we do the sunpy_soar client example (GitHub - sunpy/sunpy-soar: A sunpy plugin for accessing data in the Solar Orbiter Archive (SOAR).) for all the import machinery in Python to work.

We have a guide here that might be helpful: Adding new data sources to Fido — sunpy 6.0.dev12+gead811939 documentation

Thanks for your response. The link to the sunpy plugin for SOAR data is helpful. The link to “Adding new data sources to Fido…” that you included gives me a 404.

Ah, we merged a pull request that renamed some URLS in preparation for 5.0rc1.

The updated URL is now: Adding new data sources to Fido — SunPy 4.1.dev1287+g728e6767f documentation

It looks like the original question has already been answered, but I have a larger question about the intended use case for your XRT scraper client.

As you probably know, the VSO, which Fido searches by default, includes L0 and L1 XRT data from SAO. Though this data source has been offline for some time, my understanding is that the SAO source in the VSO should be coming back online very soon (~1 week). Is the intention for this Scraper client to serve the same type of data as the VSO? Would there be some advantage to using this Scraper approach over the VSO?

Hi Will,

I didn’t know that the VSO would be coming back online so soon. There is a use case depending on which data archives will be served by the VSO. That is additional datasets such as grade maps and composite synoptic images.

I don’t care which package the scraper is under, sunpy or xrtpy. Basically we want summer REU students to be able to download the data they need. So the timeline is somewhat short.

Jon

My understanding is that there are plans to add the L2 data products as well so these would be available through the VSO as well and thus through Fido without the need for any additional clients. It looks like these L2 products are not available at the SDAC URL you posted but are available via the CfA: XRT: Data Products

I can see the need for a short turnaround time. Hopefully the VSO will have the data source back online by then. If not, a Scraper client should be a useful stopgap.

There are plans to host all the data at the CfA under xrt.cfa.harvard.edu. So to download that (programmatically) we’d need a scraper client, wouldn’t we?

I believe the VSO will be serving that same data so once it comes back online, you should be able to programmatically access that via Fido in sunpy without any additional clients (since searching the VSO is supported by default).