Strange results with Lomb Scargle algorithm on simple sinus

Hello,

Sorry in advance for the (certainly numerous) grammar and orthograph errors, english is not my primary language.

Long story short, my problem is that I’m getting strange not really exploitable results when using LombScargle on simple, not noisy sinus. Every time i try, the periodogram i obtain show multiple spikes and the highest one is never the correct one and i would like to understand why and how to solve this issue.

For a little context, i am trying to judge the results of an implementation of the LombScargle function and i planned to use the one in astropy as a point of reference.

I precise that i do not have any astrophysics nor Signal theory background

I’ve tried running the LombScargle function on a simple, not noisy sinus with a periode of 250 seconds. I’ve created a sample with a constant step between each value and every time i try to run the code, the obtained periodogram show one spike indicating the correct frequency and two false spikes with frequencies at 0.5 and 1.0 and of course the 0.5 and 1.0 spikes are higher than the correct one.

the sinus is a simple sinus sin(2*pi/250 * t) with a point of data every two seconds

i’ve run the default LombScargle presented in the doc on it with 125 points of data and the result is this

If i’ve understood this algorithm correctly the highest spikes is supposed to be the frequence of the function and on a simple function as the one i used i was expecting to see a single very high spike.

What am i doing wrong for the result to be this bad ? Is there any parameter i forgot to apply or maybe I’m using this tool completely backward ?

the code i used is as follows:

from astropy.timeseries import LombScargle
points = list(range(0,250,2))
sample = (points,map(lambda x: math.sin(math.pi*2/250*x),points))
res = LombScargle(*sample).autopower

I’ve tried this on astropy 6.0.0 and 6.0.1

i’ve also tried different options for the autopower (scipy,slow,and cython) and adding more data points up to 1000 with always the same step between them with no luck.

I have not tried to mess with the frequence range manually since i have no real understanding of the internals working of this algorithm

Thanks in advance

This looks good to me, not bad. It’s roughly what I expect this to look. There are a few things going on here, and most of them have to do with the fact that in computers, arrays only have a finite number of entries.

Why do you have three peaks at 0, 0.5 and 1?
Remember that lomb-scargle just looks for periodicity. You have a noise-free sinus curve with peaks at pi, 3pi, 5pi etc… That means that you ALSO have a period that’s twice a long, with peaks at pi, 5 pi, 9 pi, 13 pi, … That period twice as long has a frequency of half as much (1 / 2 = 0.5). You can make a similar argument for other periods and you’ll find more peaks.

When you zoom into your peaks, they look like two very closed peaks (about 0.99 and 1.01 or so). Why?
Well, you only have 125 evenly spaced points. Because numbers have a finite accuracy in a computer and the LS algorithm can’t put point in between so sometimes a top of a peak is missed just because there are not enough point in there. That just happens in real data.

So, the result is not bad it looks like it should.

1 Like

Thank you for you answer. I think i unserstand this a bit better but i have still some points i don’t really quite grasp.

If i’ve understood correctly what you’ve said, on a sinus of period P, the algorithm will find periodicity on P, 2P, 3P etc… along the point i gave it. But in my example, i’ve only given 1 period to the function. My sinus have a period of 250 s and i have created exactly 125 points with 2s step so all my points are between t=0 and t=250. How can the alorithm find more than one period on a single oscillation ? At least these periods should not be as clear as the main one no ?

I’ve also made some more tests with a higher number of points to limit the risk to skip the top of a spike, and this time with two oscillation of the function. this way, all the points are always between t=0 and t=500s and this make the “supplementary spikes” slide to the right:

  • with 500 points (250 points per period 1s step) i have the spikes between 0.5 and 1
  • with 1000 (500 points per period 0.5s step) they move towards 1.5 and 3
    etc…

I observe the same thing with a sin of period 1 and 200 points evenly spaced between t= 0 and t = 2.
Here is the data sample used
Sample graph

The resulting periodogram show the same form. It is just more extreme with the first spike at 1 and the supplementary ones arround 50 and 100.

Furthermore, adding more points do not get rid of the slight gap inside the two supplementary spikes.

If i’ve understood correctly, since i have always two oscillation of the sinus in the data points and since i do not change the period of the sinus, if the spikes were only due to the period repeating itself, i should at least see globally the same spikes with each test. The number of points should not influence the result that way isn’t it ?

Does the lombscargle algorithm find the spacing between the points as a period of the graph ? Is this even possible ?

Is my understanding of all this process still completly wrong ?

I’m sorry to bother you again this way and i hope i havn’t completly misunderstood your answer but i think i’m still a bit lost there.

I think you are doing the right thing - try our a few cases where you know what answer you expect to see what the pitfalls are.
In real live, there always are numeric issues for example, with a regular grid of 0, 0.01, 0.02, 0.03, … you will never exactly hit the max of sing curve which is as 3.1415… / 2 and not a a multiple of 0.01. The Lomb-Scargle will try frequencies like 1/0.01, 1/.02, 1/0.02 etc, but not 1/3.14159295…
That’s live! In a real dataset you don’t know the period, and thus probably also don’t hit is exactly.

In practice, you probably know enough to use the Lomb-Scargle. If you want a deeper explanation, I recommend: an article by Jake VanderPlas

Ok that’s reassuring.

I’ll continue to work with this then and check the references you’ve send as soon as i can.

Thank you a lot for your answers and for your time.

Have a nice day.

In addition to all of the above, I think you are also seeing multiple peaks as a result of the windowing function (having only a finite number of periods in your data).

Yes that’s what i understood too after reading the article.

There also seems that there is a limit of the interessting frequencies according the the step. Which seems quite logical when i think about it, it is not a good idea to search for period with less than one point of data per period. But i will need more time to exactly understand why this makes the “comb function” to appear in the result and not just nothing or a random noise.

I’ll continue to read the article and try to re-learn integration and all this.

Thank you everyone again for your help