Talk:Automatic spectral and variability classification

Overview
For the future of VSOP, we have to implement some sort of automatic spectral classification. Numerous tools exist and some are even described in the literature. However, to our knowledge, none has so far entered 'routine operations', which is what VSOP will strive for. To start out, the following people will be contacted about their particular work/tool:
 * Carlos Allende-Prieto, Univ.Texas (by Thomas)
 * Coryn Bailer-Jones, MPI Heidelberg (by Julia)
 * Alejandra Recio-Blanco, Nice (by Cedric)
 * Luis Manuel Sarro, Dpt. Artificial Intelligence, UNED, Madrid (by Pedro)
 * Hans Bruntt, Univ.Sydney (by Thomas -- note: added by request of Pedro after meeting)

We need to collect information so that we can compare the tools and make a decision on which one(s) (if any) to use for VSOP. For that purpose, the following points must be addressed:
 * what resolutions can it work with?
 * what wavelength ranges?
 * what spectral types?
 * what will installation require (e.g. other non-free software)?
 * what would the author want in return?
 * what data will the software yield?
 * can it handle special cases (e.g. it flunks when it sees emission lines)?

As the info gets collected, please put in here.

Carlos's tool
(actually discussed first with Lars Koesterke, a student of Carlos, who is the one implementing this, later with both Carlos and Lars.) Very positive feedback; would have no problem providing the code. The code is working at the STELLA facilities in Tenerife (Strassmeier et al.).
 * works with any resolution: they calculate a library/grid of spectra from their models. Must have a seperate grid for each combination of resolution and wavelength range. Grid-producing code is part of the package.
 * works with any wavelength range: tested using small areas of the spectrum. Is apparently slow when used on larger wavelength range.
 * works with any spectral type; the limiting factor is the goodness of the stellar model. Discussed best approach: define spectral regions appropriate for all spectral types, make the software use all of them, then discard those that are way off (chi^2 statistics/goodness of fit is provided)
 * no extra software required. However, accepts only ascii spectra at RV=0 as of now, so need an outer layer for conversion (simple). Also, needs normalization of spectral region to be used. A normalization tool may be part of the package, but has not been tested for real. In any case, normalization need only be done for the (supposedly small) intervals used for the fitting.
 * we left it open whether they'd like to join VSOP (not variable star people), join on an "outside collaborator" status (likely), or just want to reference our use of it (unlikely)
 * T eff, logg, [M/Fe]. Is under development to later yield full parameters (vsini, microturbulence, abundances). Note that it will not at the moment be able to handle rotationally broadened spectra. One way around such a case would be to degrade resolution, but then we're outside the automatic realm...
 * Does not handle special cases, but is meant to be tailored to look only at "safe" regions of the spectrum, i.e. not where emission lines may be found.

The software will be installed and tested on AIP/STELLA computers Nov.06, and then we can have a version to play with (after the debugging!) -- Tdall 19:13, 7 November 2006 (CLST)

Coryns tool
Coryn Bailer-Jones provides the code "statnet" on his web page http://www.mpia.de/homes/calj/. I contacted him by email to ask for details about using "statnet" for V.S.O.P. He kindly sent me the email summarized in the following.

Regarding the techical details, Coryn answered the following: "statnet is a generic feedforward neural network and so has no restrictions per se on no. of input pixels etc. (There are some array size limits set in the statnet.h file, but these can be increased at will, limited only by machine memory.) As a multi-dimensional interpolation algorithm its performance is limited by the quality of the training data and the complexity of the problem. It does not directly use any domain-specific information, so *in principle* emission line stars etc. are not a problem. However, as so often with high-dimensional, nonlinear regression, one needs to think carefully about what input data you provide and what pre-processing you do. There is, of course, a lot of literature on this."

If we should decide to use statnet, Coryn would appreciate a citation of his A&A 2000 paper and he would like to receive a copy of the respective V.S.O.P paper(s).

Regarding support, Coryn himself writes that "it may not be the easiest program to use, especially in terms of setting the regularization parameters (alpha and beta)" and that he "can only provide limited support for statnet". We would thus be basically left with the statnet manual and the references therein.

In my opinion, the limited support could be a significant argument against using statnet. --

Alejandras tool
Mail sent since already a few days. Waiting for answer. -- C édric T alk  05:35, 22 November 2006 (CLST)

Luis Manuel
After some feedback from people in Madrid (Luis Manuel Sarro) working in the automatic classification of light curves from COROT it was clear that the main point of all this discussion is that the software already available generates physical parameters of the objects but does not provide a variability type. An important part of VSOP is to provide this typing and, as far as I know, there is no automatic software available for that (yet).

The ideal classifier would work with a photometric time series and a single spectrum. The people working for COROT and GAIA know this and I think it would be very important for VSOP to become the 'official' provider of spectra of the stars used in the teaching of the classifying software used for these missions. This type of classifier and a training set of spectra are needed for VSOP, which will have then to provide this training set by observing stars with well defined variability types and/or with an overlap with the set of variable stars with photometric time series already used to train the classifier for COROT and GAIA.

Luis Manuel proposes the development of a new variability classifier based on photometric time series + spectra. There is already a classifier based on phot. time series features (frequencies, amplitudes, phase differences, ratios...) developed for CoRoT/GAIA. So, the project implies the extension of the training database with attributes related to continuum/line-fluxes/EWs. Ideally, spectra of the stars already in the training set would be collected if not already available.

Since the classifier has to be developed, this is an open issue. But the final aim would be to implement the classifier as a VO-compliant service, so the choice would go in the sense of maximal generality.
 * what resolutions can it work with?

As above. There is the possibility to have flexibility on this issue (i.e. allow for varying wv ranges) if bayesian classifiers (allowing for missing inputs) are used.
 * what wavelength ranges?

Since the classifier has to be developed, this can be decided, again with the hope of maximal generality.
 * what spectral types?

No non-free software. Basically, standard libraries of data mining software (e.g. weka)
 * what will installation require (e.g. other non-free software)?

Papers authorship to be negociated and the possibility to use the classifier on other data.
 * what would the author want in return?

class conditional probabilities (i.e. a probability distribution that a given object belongs to any of the classes considered by the classifier).
 * what data will the software yield?

Yes.
 * can it handle special cases (e.g. it flunks when it sees emission lines)?

The project would be a collaboration between Madrid, Leuven (TBC) and Granada. The Madrid/Leuven/Granada team is responsible for the CoRoT variability classifier and is leading GAIAs WPs on (un)supervised classification of variability. Madrid (UNED/SVO) conveys the expertise in Data Mining and Knowledge Discovery and VO related matters, and Leuven/Granada on variability characterization of classes. -- Pedro 13:47, 8 November 2006 (CLST)

Hans's tool
Hans was invited to join VSOP after Pedro pointed out at the meeting that his VWA software was being automated, and thus might be useful for VSOP. Hans's answers to the critical questions about VWA were: I have analysed data from R=5,000 to 200,000 with the VWA package.
 * what resolutions can it work with?

It has line lists in the visual, say 4000-8000 AA.
 * what wavelength ranges?

The ATLAS9 grid runs from 5000-10000 K, logg=2.0 to 5.0.
 * what spectral types?

The interactive data language from RSI: IDL.
 * what will installation require (e.g. other non-free software)?

Acknowledgment and science data!
 * what would the author want in return?

Abundances and fundamental atmospheric parameters: Teff, logg.
 * what data will the software yield?

It can also analyse binary stars (SB2), but this requires manual interaction and quite some time.
 * can it handle special cases (e.g. it flunks when it sees emission lines)?

Other tools?
Following Pedro's email from today, I was about to ask people from Gaia ( whom I know very well, at least the people in Meudon) about their software. Let me know if you think this can be of any help. Eric 10:22, 8 November 2006 (CLST)
 * It certainly could. Please go ahead Eric. -- Tdall 13:39, 12 November 2006 (CLST)
 * Their answer is short and concise: right now, they don't have any working tool available. ;-) Eric 14:27, 23 November 2006 (CLST)

Carlos
The software will be installed and tested on AIP/STELLA computers Nov.06, and then we can have a version to play with (after the debugging!) The nice things about this one is that it's already working, that it's going to be automatically reducing spectra obtained with an automatic telescope, and that Thomas is involved in that (STELLA) project too. Thus, it will probably be very easy for us to implement on VSOP.


 * Looks very promising until it is said it does not handle rotationally broadened spectra... But that's only for Teff, logg and [M/Fe], right? So for short-term work, this looks promising. -- C édric [[Image:kmail.png|16px]]T alk  04:53, 21 November 2006 (CLST)


 * Yes. A "dirty" way around this is to degrade the resolution of the template spectra... but it is anyway planned to implement rotation in future versions. Again, I think it is an advantage to have STELLA as a main driver for the specs of this tool. In addition, Carlos would like to join VSOP if we decide to use it. -- Tdall 23:04, 21 November 2006 (CLST)

Coryn
Regarding support, Coryn himself writes that "it may not be the easiest program to use, especially in terms of setting the regularization parameters (alpha and beta)" and that he "can only provide limited support for statnet". We would thus be basically left with the statnet manual and the references therein.

In my opinion, the limited support could be a significant argument against using statnet.


 * I agree. Especially since these "learning" systems are not straightforward to deal with. -- Tdall 23:04, 21 November 2006 (CLST)

Luis Manuel
After some feedback from people in Madrid (Luis Manuel Sarro) working in the automatic classification of light curves from COROT it was clear that the main point of all this discussion is that the software already available generates physical parameters of the objects but does not provide a variability type. An important part of VSOP is to provide this typing and, as far as I know, there is no automatic software available for that (yet).

The ideal classifier would work with a photometric time series and a single spectrum. The people working for COROT and GAIA know this and I think it would be very important for VSOP to become the 'official' provider of spectra of the stars used in the teaching of the classifying software used for these missions. This type of classifier and a training set of spectra are needed for VSOP, which will have then to provide this training set by observing stars with well defined variability types and/or with an overlap with the set of variable stars with photometric time series already used to train the classifier for COROT and GAIA.
 * So, Pedro, what you are suggesting is that we in some way track down the available photometric data, and stores it here at VSOP together with the spectrum. It is certainly a very nice idea, but a bit outside the scope right now. However, I see no problem if there are people who are willing to join VSOP with the purpose of providing this kind of information. Worth considering, and feel free to advertise such an opportunity. -- Tdall 13:36, 12 November 2006 (CLST)


 * Luis Manuel is already working with the photometric data and they are developing a classifier for COROT. Our work would be to provide the spectra they need for teaching the classifier to work with both photometric and spectroscopic data. Once the classifier is working we would use it with our own data. No need to store photometric data. The people from Belgium have already agreed that this new classifier would be very interesting and that we should be the "providers" of the spectra if we agree. Furthermore, this project is obtaining already manpower and funding for its development. Pedro 15:09, 17 November 2006 (CLST)


 * I am taking the train a bit late. But what I see is that these guys are simply interested at getting our spectra. That's fine, of course, since we plan to make our reduced spectra available once they are published. However, I don't see any real benefit from their side to the VSOP project itself. They may have an automatic variability classifier. Fine, but they need photometric data. And we don't have this. The only benefit I would see is indeed if we become the "official" provider of spectra. Although important, that's (only) advertising. Of course it would be interesting to know what var. type they obtain with their software, and how this can be used in VSOP by auto-comparing spectra for instance. Hey, that's an idea. VSOP could define a set of template spectra of variable stars, i.e. spectra of stars representing the best example of a given var. type. This could also be used to classify our own spectra then... -- C édric [[Image:kmail.png|16px]]<font color="#006AD5">T <font color="#46A3FF">alk  04:51, 21 November 2006 (CLST)


 * One thing: This tool doesn't even exist yet! We risk to become guinea-pigs for the development, with lots of work and unknown results in an unspecified - but clearly too long - timeframe. Once they get underway, I think it sounds extremely promising, but I'd not take this as our first priority tool. -- Tdall 23:04, 21 November 2006 (CLST)


 * I fully agree. We have already some difficulties publishing paper one... ;-) -- <font style="font-size: 100%;"><font color="#006AD5">C <font color="#46A3FF">édric [[Image:kmail.png|16px]]<font style="font-size: 80%;"><font color="#006AD5">T <font color="#46A3FF">alk  05:34, 22 November 2006 (CLST)