Pyvo TAP result to csv (via astropy.Table?)

ygrange · October 20, 2021, 4:16pm

I have a TAP table in a VO service that I want to dump to csv. So for this I basically was thinking of
using the following code snippet (replaced the VO service by a public one for demonstration purposes):
=========8<========
import pyvo

tapurl = ‘Information on Service 'The VO @ ASTRON TAP service'’
qry= “select * from tgssadr.img_main”
tap_service = pyvo.dal.TAPService(tapurl)
tap_result = tap_service.run_sync(qry)

tbl = tap_result.to_table()
tbl.write(“my.csv”)

=========8<========

However the last command returns:

Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.9/site-packages/astropy/table/connect.py”, line 127, in call
registry.write(instance, *args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/astropy/io/registry.py”, line 570, in write
writer(data, *args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/astropy/io/ascii/connect.py”, line 26, in io_write
return write(table, filename, **kwargs)
File “/usr/local/lib/python3.9/site-packages/astropy/io/ascii/ui.py”, line 842, in write
writer.write(table, output)
File “/usr/local/lib/python3.9/site-packages/astropy/io/ascii/fastbasic.py”, line 207, in write
self._write(table, output, {‘fill_values’: [(core.masked, ‘’)]})
File “/usr/local/lib/python3.9/site-packages/astropy/io/ascii/fastbasic.py”, line 184, in _write
writer.write(output, header_output, output_types)
File “astropy/io/ascii/cparser.pyx”, line 1125, in astropy.io.ascii.cparser.FastWriter.write
TypeError: unhashable type: ‘MaskedArray’

This confuses me because first of all, the Table is not masked (tbl.masked is set to False). I guess this may be a bug but I don’t really know if this is an issue with pyvo or astropy.Table . Hopefully the crowd has seen this issue before, or can indicate me another way to do this.

dhomeier · October 21, 2021, 11:07pm

That’s odd, but I’d guess the format returned by pyvo does play a role here.
What do tbl.dtype and tbl.mask return (it looks like the table was created from np.ma.MaskedArray rather than by setting mask=True, in which case it can apparently still have a mask even though tbl.masked says False)?
Still, simple masked arrays of both types should be supported even by the fast writer (on Astropy 4.3.1); but does tbl.write("my.csv", fast_reader=False) work?
I suspect this could rather be a problem with a more complex column dtype, which might not be supported by csv.

tomdonaldson · October 22, 2021, 1:57am

That is strange. I do see that tbl.has_masked_columns is True, and that there are several masked columns in the table. That said, I don’t know why a masked column wouldn’t be written to csv. I would expect the masked (null) values to be written as empty strings.

Seeing FastWriter in the stack trace made me curious, so I tried writing without it. This seemed to work for me, so might be a good workaround:
tbl.write('my.csv', fast_writer=False)

I’m not an expert on masked columns, but having inconsistent behavior between the fast and slow writers is surely a bug, and again, I don’t see any reason why masked columns shouldn’t be written out.

dhomeier · October 22, 2021, 1:44pm

As per my tests mentioned above, simple cases like

tbl = Table([np.ma.MaskedArray(range(4), mask=[0, 1, 0, 0]), np.arange(3, 7)])`
tbl.write('my.csv', fast_writer='force')

are working just fine (writing indeed empty fields for the masked elements), and generally the writer should automatically switch to the non-fast version where necessary, as long as it is not explicitly forced with fast_writer='force'.
At least that is the behaviour for the table reader.

The TypeError: unhashable type: ‘MaskedArray’ is in fact an ancient issue –

and precisely the reason why self._write() above uses an astropy.io.ascii.masked instance instead – no idea why that would not be working in this case…

dhomeier · October 22, 2021, 7:03pm

As far as I can see the error results from the multi-dimensional columns like

>>> tap_result['pixelsize']
masked_array(data=[masked_array(data=[2903, 2903, 1, 1],
                                mask=[False, False, False, False],
                          fill_value=999999,
                               dtype=int32)                       ,
                   masked_array(data=[2903, 2903, 1, 1],
                                mask=[False, False, False, False],
                          fill_value=999999,
                               dtype=int32)                       ,
                   masked_array(data=[2903, 2903, 1, 1],
                                mask=[False, False, False, False],
                          fill_value=999999,
                               dtype=int32)                       , ...

– since all those columns are already converted to dtype “Object” on .to_table() handling them becomes rather inconvenient (the data in each column are a np.ma.MaskedArray each element of which in turn is a np.ma.MaskedArray in itself!) Still not sure what happens internally in the writer, but I would recommend saving it as an ECSV instead, which simply works as

tbl.write("my.ecsv")

and mostly preserves the internal structure of those columns. In CSV they could only be written as strings "[2903, 2903, 1, 1]" etc. – in fact without all that weird wrapping trying to writ as CSV should already raise an error like
column(s) with dimension > 1 cannot be be written with this format, try using 'ecsv' (Enhanced CSV) format.

tomdonaldson · October 22, 2021, 10:19pm

The csv output is interesting for those fields with
tbl.write('my.csv', fast_writer=False)
yielding strings like
[2903 2903 1 1]
for pixelsize (space delimited values inside square brackets).

This is semi-reasonable I suppose, but it definitely doesn’t round trip well. I.e., reading that csv into a table not only doesn’t give the same structure (the new column is a string type), but it looks like the rows are messed up due to newlines present in the output of the coverage column objects. Aside from round-tripping, the csv might have been usable by another script, but those newlines (another bug I think) make the csv less than usable.

dhomeier · October 22, 2021, 11:03pm

Yes, I think that’s the reason why write(..., format='csv') normally doesn’t accept ndim>1 columns but tries to guide people to more sensible formats like ECSV instead.
Without knowing much about the background of the pyvo implementation it feels a bit like it tries very hard to bypass that check by packing the higher-dimensional fields deep inside object arrays.

tomdonaldson · October 23, 2021, 12:20am

Ah OK, that makes sense to me. ndim>1 seems like a good thing for the csv writer to check.

pyvo uses the astropy.io.votable reader to parse the results coming from VO services like this. VOTables are allowed to have arrays of primitives in individual table cells, and the parser puts them into the object arrays. From what you’re saying, it seems like that might not be the best way to store those values in the astropy Table.

At least the astropy.io.votable reader and writer are consistent with respect to the VOTable format. So tbl.write('out.xml', format='votable') results in a valid VOTable with those array values represented correctly.

Topic		Replies	Views
Writing a sextractor formatted ascii table in AstroPy Astropy table	0	27	August 26, 2024
Setting Astropy table column types Astropy astropy , table	2	395	November 9, 2022
Extract unmasked elements from Astropy Table Astropy astropy , table , question	5	1115	December 16, 2021
Unable to instantiate astropy.io.fits.TableHDU class Astropy	6	478	November 28, 2022
How to convert a QTable to JSON Astropy astropy , question	1	382	June 10, 2023

Pyvo TAP result to csv (via astropy.Table?)

Related topics