Big Data Analytics

TRAPPIST-1: Interplanetary Listening With IBM Cloud

Share this post:

TRAPPIST-1 System (Credit: NASA/JPL-Caltech/R. Hurt, T. Pyle (IPAC))

TRAPPIST-1 is a planetary system located 12 parsecs (39 light years) away from the Solar system, near the ecliptic, within the constellation of Aquarius. Around a star which is 12 times smaller than the Sun and only slightly larger than Jupiter, there are at least seven planets in orbit.

The first interplanetary listening observation made by the Search for Extraterrestrial Intelligence Institute (SETI) between two TRAPPIST-1 planets (e & f) in conjunction with Earth was conducted on April 6, 2017. The hypothesis: An advanced civilization could establish radio-frequency communications between planets e & f, and if these planets line up with Earth (conjunction) we could listen in on that interplanetary transmission with a sensitive radio telescope. The question: Can we do fast cloud-based signal processing on a very large (five terabyte) dataset of fast-rate TRAPPIST-1 measurements?

My team’s new research brings SETI to the cloud by using the IBM Data Science Experience to compute signal spectrograms and autocorrelation plots to look for signs of possible transmissions during a TRAPPIST-1 conjunction. I’m presenting our research this month at the SKA Driven Big Data conference.

The TRAPPIST-1 System

The lone star at the center of the TRAPPIST-1 system is an ultra-cool red dwarf that hosts at least seven temperate planets, more than any other known planetary system. The figure below shows an alignment simulation of TRAPPIST-1, aiming for an alignment between Earth, planet f (orange) and planet e (green). The approximated alignment occurs at 15h53 UTC time, but the hypothetical interplanetary signal from planet f (orange) is likely occluded by planet e (green), so we measure for some time before and after the conjunction.


Trappist-1 e-f-Earth conjunction event perpendicular to Earth (Credit: Jon Richards/SETI Institute)

In reality, we don’t necessarily know the exact moment when the transmitter associated with planet f lines up with the receiver on planet e. The transmitter/receiver could be in orbit around those planets. Having said that, we can put some limits on a reasonable allowed angle range that may be interesting. Since we know the orbits of e and f, we should be able to work out the angle between a line between the planets and our observation direction.

Crudely, we would expect that this angle would be no more than a few degrees with reasonable assumptions. For example, say that you’re willing to accept signals as potentially interesting if the transmitter were in orbit with radius Rf from the center of planet f, and the receiver is within an orbital radius of Re around planet e. A reasonable upper limit for Re and Rf might be, say, the orbital radius of Earth’s moon, Rf = Re = Rmoon.

For such a choice, can we work out the range of times when we might have seen an alignment? In other words, do we have to be within one second of the time of best alignment? Or would the duration of allowed times be an hour long? Probably something in between, though for the purposes of this analysis the center of the conjunction is approximated as 15h53 UTC time.

TRAPPIST-1 measurement period from 15h34 to 19h17 UTC on 6 April 2017 (Credit: Jon Richards/SETI Institute)

The Measurements

The SETI Institute commandeered the Allen Telescope Array (ATA) at the Hat Creek Radio Observatory to listen in on the hypothetical TRAPPIST-1 e/f interplanetary broadband transmission, simultaneously measuring at 2.84 GHz and 8.2 GHz frequencies from 15h34 UTC to 19h17 UTC on 6 April 2017. These are the frequencies used for spacecraft communications, but they are unusual since SETI observations normally look at 1 GHz. Note the rich set of possible interplanetary transmissions between the TRAPPIST-1 planets, some of which could be present in the measurements, such as between the second inner-most planet c and planet f (orange).

The ATA backend has a 104 MHz bandwidth and two correlators are used, correlator 1 at 8.2 GHz with a 0.43° field of view and correlator 2 at 2.84 GHz with a complete field of view of 1.2°. The first beam is focused to a field of view of 0.012° at 8.2 GHz frequency with 0.1 GHz bandwidth, and the second beam is focused to 0.035° at 2.84 GHz frequency with 0.1 GHz bandwidth, each beam producing five terabytes for six and a half hour measurements.

One missing piece is to give a sensitivity limit for the ATA or more likely, the minimum transmitter strength on planet f that we could possibly see. It is a straightforward calculation if you know the sensitivity of the ATA beam. A good rule of thumb is that in a single sample, the point-source flux density equivalent to the system noise is around 1000 Jy, or 10^-23 watts / meter squared / Hz. 1000 Jy sounds like a lot, but if we average a million points, then the noise level is reduced to 1 Jy (factor is sqrt(Nsamples)). For one of the 67 MB blocks of data, that number is probably even smaller.

Once we know the minimum detectable flux for our observations we can work backwards to the strength of the transmitter using just the inverse square law and how far away TRAPPIST-1 is from Earth. There are a number of subtleties to consider here, so this calculation is omitted in this article.

The Cloud

The IBM Data Science Experience features a deployment of Apache Spark on IBM Cloud with an optimized high-speed object store interconnect to enable big data analytics on the cloud. This Spark compute cluster is accessed through DSX’s Jupyter notebook interface to perform a variety of signal processing operations on the SETI measurements.

The two ATA beam measurements are approximately 2.5 TB each, and have been uploaded from the ATA backend directly to the object store at speeds faster than TCP/IP. Hadoop-style segmentation of these measurements into same-sized segments are ideal for signal processing computations, such as Fourier Transforms and autocorrelations, using Spark parallelization.

DSX was recently rebranded to “IBM Watson Studio,” which adds APIs and more deep-learning goodness to the platform.

The Computations

The Hadoop-style segment sizes are 67,043,328 bytes each fitting sequentially together to represent one continuous measurement. Each segment consists of separate 4,160 byte packets, each packet with a 64-bit header containing a timestamp. The packet body contains alternating 8-bit real and imaginary measurement/voltage components, which are subsequently stored in a complex numpy array together with the starting packet timestamp for the segment. (Click for Github)

A Fast Fourier Transform (FFT) is the primary operation performed on these 67 MB segments, in order to directly inspect signal frequency content over time. Note that for a 104 MHz sampling bandwidth, the FFT width needs to at least be in the same magnitude order, which is chosen as 67 million FFT elements in this case. If the FFT width was 10,000 elements, for example, then the FFT resolution would be too coarse to adequately measure power at finer frequency intervals.

Average total power is calculated over the 67 MB segments, as well as spectrogram and auto-correlation waterfalls to obtain complementary views of signal activity. The spectrogram shows basic power distribution over frequency and time, whereas autocorrelation plots can indicate more specific signal-pattern activity. (Click for Github)

Spark’s MapReduce roughly performed all three computations at 1 sec/segment for the 67 MB segments, so the computation for 38,000 segments (2.5 TB) in one beam took approximately 11 hours using the IBM Data Science Experience.

The Power and the Waterfalls

The average total power of approximately 34 million complex voltage values per segment is calculated for all segments, which gives us the power of the radio signals as a function of time in the below graph. We’re looking for significant increases in radio signal power around the time of the conjunction (indicated by ef-Earth), but there are no notable power fluctuations directly around the conjunction time. (Click for chart)

Each of the approximately 38,000 segments of 67 MB size represents a line/row in the waterfall of signal content over frequency and time. Each line indicates a new timestep, with a sequential series of timesteps forming the waterfall, which depicts the signal power at different frequencies over time. (Click for Github)

The pairwise waterfall plot shows the signal frequency content of the two beams synchronized over the horizontal time axis. Since this is a long 3-hour waterfall, the time axis is split into multiple sections to allow for the visualization at a higher time resolution. Higher pixel values in a waterfall indicates that there was relatively more signal power at that specific frequency and time, and we have to inspect the waterfall during critical conjunction periods to look for evidence of such higher power signals.

The waterfall plots have significant signal activity at 15h35–15h40 (beam 1), 17h02–17h17 (both beams), around 17h19–17h30 (both beams), around 17h47–17h51 (both beams), and around 18h20–18h23 (beam 2). During the actual approximate conjunction time there is signal activity at f=~25 MHz (0.24=f/104 MHz) in beam 1, with a pattern of activity around that frequency generally appearing from 15h46 to 16h27. All of these relatively high power signals are most likely signal activity of man-made origin, as it is in the same power range as other likely RFI activity.

The Autocorrelations

Autocorrelation is the correspondence of a signal with a delayed version of itself as a function of the delay period, such that fixed-period signal components will become pronounced in the result. This could amplify more complex multipart signal components in a way that a normal FFT-based spectrogram can not, so the autocorrelation is complementary to the FFT. (Click for Github)

As with the previous waterfall plot, the pairwise autocorrelation plot for the two synchronized beams is also given for an extended timeline broken into separate rows. Note that only the positive delays (y-axis) of the symmetric autocorrelation plot is used and the 8-bit values are scaled between 0 and 1 in the colormap. We are looking for thin vertical stripes that stand out from the background, which could indicate that there were fixed-period signal components at the specific timestep and thus a likely sign of a deliberate comms signal.

Some interesting auto-correlations occur at 16h35–16h46 (beam 1), 17h20–17h31 (beam 1), around 17h50 (beam 2), and around 18h21 (both beams). Note the periodic component autocorrelations in beam 2 around 17h50 in the side figure. Unfortunately, no obvious autocorrelation patterns are seen directly around the conjunction time of 15h53 for the specific plot calibration.


What if radio transmissions are made between planets e & f in the habitable TRAPPIST-1 system, and the ATA is sensitive enough to hear the transmission during an TRAPPIST-1 ef-Earth conjunction?

This interplanetary listening hypothesis was investigated for the 6 April 2017 TRAPPIST-1 ef-Earth conjunction, conducted by the SETI Institute using its ATA radio telescope, with the subsequent signal analysis performed on IBM Data Science Experience. The Spark MapReduce capability of DSX was leveraged for expedient FFT-based signal processing to do an initial inspection of spectral/autocorrelation content in 5 TB of recordings.

While there does appear to be signal activity in parts of the measurement, there is nothing apparent at either 2.84 GHz or 8.2 GHz directly around the conjunction instance at 15h53 UTC on 6 April 2017. Since these are also the frequencies used for our comms with our own satellites, the signal activity that is seen is likely to be human-made and/or radio-frequency interference.

TRAPPIST-1 is a solar system rich with SETI hypothesis exploration because of numerous other interplanetary conjunctions and planet/star occultations. TRAPPIST-1 will be a focal point of SETI research for many years to come.

Authors: Francois Luus (IBM Research – Africa), Adam Cox (IBM Watson Data Platform), Gerald Harp & Jon Richards (SETI Institute), Graham Mackintosh

More Big Data Analytics stories

The remarkable work of women scientists and researchers at IBM Research

During the month of March, IBM Research put the spotlight on a number of women scientists and engineers, and asked them about their professional and personal motivations, journeys and experiences as women — and particularly, as women in STEM. They represent the breadth of career experiences at IBM Research, across disciplines, geographies, ethnicities, tenures and backgrounds, who share a passion for science and tech, as well as a commitment to help all women rise to meet their aspirations.

Continue reading

Hybrid cloud for accelerating discovery workflows

Hybrid cloud could ultimately enable a new era of discovery, using the best resources available at the right times, no matter the size or complexity of the workload, to maximize performance and speed while maintaining security.

Continue reading

IBM AI helps to break down massive code to ease cloud migration

We use AI to automatically break down the overall application by representing application code as graphs. Our AI relies on Graph Representation Learning – a popular method in deep learning. Graphs are a natural representation for software and applications. We translated the application to a graph where the programs become nodes. Their relationships with other programs become edges and determine the boundary to separate the nodes of common business functionality.

Continue reading