The Messenger No. 177 - Quarter 3 | 2019 - European Southern Observatory
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Distributed Peer Review M87 Event Horizon Telescope Results The PHANGS Surveys Total Solar Eclipse Over La Silla The Messenger No. 177 – Quarter 3 | 2019
ESO, the European Southern Observa- Contents tory, is the foremost intergovernmental astronomy organisation in Europe. It is Telescopes and Instrumentation supported by 16 Member States: Austria, Patat F. et al. – The Distributed Peer Review Experiment 3 Belgium, the Czech Republic, Denmark, Coccato L. et al. – On the Telluric Correction of KMOS Spectra 14 France, Finland, Germany, Ireland, Italy, Gonté F. et al. – Bringing the New Adaptive Optics Module the Netherlands, Poland, Portugal, Spain, for Interferometry (NAOMI) into Operation 19 Sweden, Switzerland and the United Kingdom, along with the host country of Astronomical Science Chile and with Australia as a Strategic Goddi C. et al. – First M87 Event Horizon Telescope Results Partner. ESO’s programme is focussed and the Role of ALMA 25 on the design, construction and opera- Schinnerer E. et al. – The Physics at High Angular resolution in tion of powerful ground-based observing Nearby GalaxieS (PHANGS) Surveys 36 facilities. ESO operates three observato- ries in Chile: at La Silla, at P aranal, site of Astronomical News the Very Large Telescope, and at Llano Ventura L. et al. – Total Solar Eclipse Over La Silla 43 de Chajnantor. ESO is the European Christensen L. L. et al. – Science & Outreach at La Silla During the partner in the Atacama Large Millimeter/ Total Solar Eclipse 47 submillimeter Array (ALMA). Currently Dennefeld M. et al. – Pointing the NTT at the Sun: Studying the Solar Corona ESO is engaged in the construction of the During the Total Eclipse 54 Extremely Large Telescope. Sani E. et al. – Report on the ESO Workshop “KMOS@5: Star and Galaxy Formation in 3D — Challenges in KMOS 5th Year” 56 The Messenger is published, in hardcopy Liske J., Mainieri V. – Report on the ESO Workshop “Preparing for 4MOST — and electronic form, four times a year. A Community Workshop Introducing ESO’s Next-Generation Spectroscopic ESO produces and distributes a wide Survey Facility 61 variety of media connected to its activi- Mroczkowski T. et al. – Report on the ESO Workshop “ALMA Development ties. For further information, including Workshop” 64 postal subscription to The Messenger, Mérand A., Leibundgut B. – Report on the ESO Workshop “The VLT in 2030” 67 contact the ESO Department of Commu- Yang C. – Fellows at ESO 70 nication at: Jethwa P., Oikonomou F. – External Fellows at ESO 71 Hofstadt D. – Lodewijk Woltjer (1930–2019) 74 ESO Headquarters Personnel Movements 75 Karl-Schwarzschild-Straße 2 85748 Garching bei München, Germany Phone +498932006-0 information@eso.org The Messenger Editor: Gaitee A. J. Hussain Layout, Typesetting, Graphics: Jutta B oxheimer, Mafalda Martins, Lorenzo Benassi Design, P roduction: Jutta Boxheimer Proofreading: Peter Grimley, Caroline Reid w ww.eso.org/messenger/ Printed by FIBO Druck- und Verlags GmbH Fichtenstraße 8, 82061 Neuried, Germany Unless otherwise indicated, all images in The Messenger are courtesy of ESO, except authored contributions which are courtesy of the respective authors. © ESO 2019 Front cover: A series of exposures showing the tra- ISSN 0722-6691 jectory of the Sun over roughly two and a half hours. The total solar eclipse resulted in almost two minutes of totality at 20:39 UT. Credit: ESO/P. Horálek 2 The Messenger 177 – Quarter 3 | 2019
Telescopes and Instrumentation DOI: 10.18727/0722-6691/5147 The Distributed Peer Review Experiment Ferdinando Patat 1 the significant growth of the user com- example, late dropouts during the review Wolfgang Kerzendorf 2, 3, 4 munity, which has made ESO one of the process can reduce the number of Dominic Bordelon 1 largest astronomical facilities in the world, pre-meeting reviews per proposal, mak- Glen Van de Ven 5 the way telescope time applications are ing the triage procedure less robust. Tyler Pritchard 2 reviewed has remained substantially the While this change was relatively easy to same since 1993. Barring the necessary implement, experience gained during increase in the number of reviewers, the Periods 102 and 103 suggests that the 1 ESO procedure has changed in the details, negative consequences outweigh the 2 Center for Cosmology and Particle but not in its substance. Following steady benefits. It is clear that further and more Physics, New York University, USA growth in the numbers of submissions, drastic and structured actions need to 3 Department of Physics and Astronomy, the current review load is about 70 pro- be taken; these include a move to an Michigan State University, USA posals per panel member and up to 100 annual cycle and the deployment of a fast 4 Department of Computational Mathe- for OPC-proper members (the latter serve track channel (FTC; see Patat, 2018a). matics, Science and Engineering, on a second panel which reviews the Michigan State University, USA recommendations across all science cat- By construction, the FTC requires a short 5 Department of Astrophysics, University egories). These numbers have reached duty cycle during which referees are of Vienna, Austria critical levels, requiring a re-evaluation of continuously on duty. The most suitable the procedures and an examination of the mechanism for reviewing the proposals effectiveness of peer review. is a Distributed Peer Review (DPR), one All large, ground- and space-based of the most innovative schemes through astronomical facilities serving wide The pressure on the peer review process which the load on referees can be allevi- communities face a similar problem: in has been the subject of a study by the ated (Merrifield & Saari, 2009). This con- many cases the number of applications ESO OPC Working Group (Brinks et al., cept has been successfully applied to they receive in response to each call 2012) and the Time Allocation Working the Fast Turnaround channel deployed at exceeds 1000. This poses a serious Group (TAWG; Patat, 2018a). Both stud- the Gemini Telescope, which has pro- challenge to running an effective selec- ies identified the excessive number of cessed over 1000 proposals in this way tion process under the classic peer- proposals per referee as the most urgent since 2015. The Gemini Observatory has review paradigm, in which the propos- problem that ESO needs to tackle. Not published a report (Andersen et al., 2019) als are assigned to pre-allocated panels only does the workload severely affect and updates are continuously provided with fixed compositions. Although, in the referees (also increasing the rejection on its webpages 1. principle, one could increase the size of rate during the recruitment phase), but it the time allocation committee, this cre- can also have an impact on the quality Depending on the fraction of total tele- ates logistic and financial problems of the reviews and the feedback provided scope time that is allocated via the FTC, which place a practical limit on its to the applicants, with potentially serious this channel may also serve to decrease maximum size, making this solution consequences. The feedback has been the load on the OPC, which would then unviable beyond a certain volume of repeatedly and consistently identified focus only on proposals with larger time applications. For this reason, alternative as a major problem by the OPC and the requests. ESO has conducted a system- solutions must be sought. One of these Users Committee, and via direct commu- atic study aimed at better evaluating the is the so-called Distributed Peer Review nications from numerous individual users. application of DPR to its programmes. (DPR) in which, by submitting a pro- Problems with the peer review could In Period 103, in parallel with the regular posal, the Principal Investigators (PIs) ultimately affect the scientific productivity OPC cycle, a DPR experiment was run agree both to act as reviewers and and impact of the Organisation itself. A involving a subset of submitted propos- to have their proposal reviewed by their number of recommendations have been als. This article presents a brief descrip- peers. In this article we report the proposed by the working groups, some tion of the experiment setup and summa- results of a DPR experiment run by ESO of which are interdependent. rises an analysis of several statistical in Period 103, in parallel with the regular indicators. More details can be found in review by the Observing Programmes As a first step, since Period 102 ESO has Kerzendorf et al. (2019). Committee (OPC). decreased the number of referees (from six to three) who review a proposal ahead of the OPC meeting. Triage is then Distributed Peer Review and the DPR Introduction applied using the three pre-OPC meeting Experiment grades, with about the lowest 30% of Following the start of VLT operations in proposals being rejected. At the meeting Different measures to alleviate the load 1998, the number of applications to all non-conflicted panel members are on the reviewers have been and are use ESO telescopes has been steadily then asked to discuss and grade only the being considered by various facilities. growing, exceeding 1100 proposals in surviving proposals. While this measure These include drastic solutions, like the Period 84. After this peak, the number of has successfully reduced the workload one deployed by the National Science submissions per semester stabilised at of the panel members, it has become Foundation (NSF, USA) to limit the num- around 900 (Patat et al., 2017). Despite cumbersome to manage in practice. For ber of applications (Mervis, 2014a). The The Messenger 177 – Quarter 3 | 2019 3
Telescopes and Instrumentation Patat F. et al., The Distributed Peer Review Experiment Distributed Peer Review (DPR) concept is participate in the experiment. This implied complete mismatch (orthogonal knowl- simple; in submitting a proposal the PI that each would review eight proposals edge vectors), while a unit cosine indi- agrees to review n proposals submitted submitted by peers and have their pro- cates a case of perfect match (parallel by peers, and to have her/his proposal/s posal refereed by the same number of knowledge vectors). For the purposes of reviewed by n peers. Also, if s/he submits peers. The participants were given two the statistical analysis, each DT referee m proposals, s/he accepts to review weeks to complete their reviews and received four proposals with the largest n × m proposals, hence essentially limit- were informed that the outcome of the similarity, two proposals with median ing the number of submissions through DPR would have no effect on the fate of similarity, and two proposals with the a self-regulating mechanism. Following their proposals. By the deadline (22 Octo- lowest similarity. this idea, the Gemini Observatory ber 2018) 167 (97.1%) had completed their deployed the DPR for its Fast Turnaround task. In a real implementation the five PIs The participants were not aware of the channel (Andersen et al., 2019), which is who did not meet the deadline would distribution mechanism just described. capped to 10% of the total time. The have had their proposals automatically They were just provided with a simple NSF also explored this possibility with a rejected. In this experiment however, their web-based interface giving them access pilot study in 2013, in which each PI was proposals were kept in the sample, but to the eight assigned proposals and asked to review seven proposals sub the PIs did not receive the final feedback. allowing them to review, grade and com- mitted by peers (Ardabili & Liu, 2013; Additionally, the participating PIs were ment on the applications. Before access- Mervis, 2014b). The NSF pilot was based asked to fill in a web-based questionnaire ing the proposals, the referees were on 131 applications submitted by volun- covering various aspects of the experi- asked to sign a non-disclosure agree- teers within the Civil, Mechanical and ment. A total of 140 (83.8% of the DPR ment, very similar to that signed by the Manufacturing Innovation Division, but sample, 19% of the total PI sample of OPC and Panel members. the outcome is unknown as no report on P103) returned the completed form. the study was published. Interestingly, During the review phase, the participants a similar pilot experiment was carried out The proposal distribution was performed were also asked to declare any scientific/ in 2016 by the National Institute of Food using two channels, which we will call personal conflicts, while institutional and Agriculture 2; in this case too the OPC Emulate (OE) and DeepThought conflicts were automatically taken into results were not published. Despite the (DT). In both cases the reviewers were account by the distribution software, general acceptance that followed the assigned eight proposals each. For the based on the affiliations recorded in the deployment of this channel at the Gemini OE channel, 60 volunteers were selected User Portal database. For each proposal, Observatory, to the best of our knowl- at random and assigned, on the basis the referees had to fill in a comment (with edge the Fast Turnaround channel is the of the category of the proposal each sub- a minimum length of 80 characters), and only example of DPR being employed by mitted, to the four scientific categories: also provide a self-evaluation of their a large-scale astronomical facility. A (Cosmology), B (Galaxy Structure and expertise level (high/medium/low) for Evolution), C (Planets, Star Formation and each proposal assigned to them. In the specific case of ESO, the TAWG Interstellar Medium) and D (Stellar Evolu- tasked to address these issues has pro- tion). The underlying (and reasonable) Once the review process was completed, duced a set of recommendations. The assumption is that a scientist submitting the grades of the various referees were core aim is to reduce the number of a proposal for a given category is an combined using a simple average (similar applications per reviewer, which has expert in that same area. This emulates to the regular OPC process), and a final been identified as an urgent action that the case of the real OPC, in which a per- ranking list was compiled. The PIs were ESO needs to take (Patat, 2018a). The son only receives proposals within her/his then provided with the quartile rank and deployment of DPR falls within the rec- area of expertise. the individual, unedited anonymous ommendations. As a first step, and after comments. Finally, they were asked to consulting the advisory bodies, ESO For the remaining 112 volunteers selected provide feedback on the experiment via decided to run a test during the ESO for the DT channel, the process was as a web-based form; this included a request Period 103 in parallel to the regular OPC follows. For each scientist, a knowledge to express the usefulness of each com- review. The experiment was designed in vector was built based on their publica- ment they received on their proposal. line with the implementation at Gemini, tions, which were downloaded from the enhancing the process by means of public SAO/NASA Astrophysics Data Sys- Natural Language Processing (NLP) and tem database (ADS) and processed by General statistics and demographics Machine Learning (a different method a machine learning algorithm (Kerzendorf, of using NLP for proposal reviews can be 2017). The same approach was used for Although, in principle, each proposal found in Strolger et al., 2017). the proposals and applied to their scien- should have been reviewed by eight sci- tific rationale. The match between the entists and each scientist should have The DPR experiment was announced in referee expertise and the area covered by reviewed eight proposals, because of the the Call for Proposals for Period 103, the proposal was then quantified through scientific/personal conflicts declared released on 30 August 2018. A total of the “cosine distance”, which is directly during the refereeing process (and to a 172 PIs — representing 23% of all distinct related to the angle formed by the two much smaller extent because five partici- PIs in that semester — volunteered to hyper-vectors; a null cosine signals a pants did not complete the process), 4 The Messenger 177 – Quarter 3 | 2019
both these numbers were on average Figure 1. Scientific smaller than eight. The number of review- Seniority (this work) seniority distribution of the DPR sample (blue) ers Nr ranged from 4 to 8, with an aver- Seniority (Patat 16) 0.4 and the OPC sample age of 7.3; in 95% of the cases the num- (orange). From Patat ber was Nr ≥ 6. The number of proposals (2016). Np varied from 5 to 8, with an average of 0.3 Fraction 7.6, and Nr ≥ 6 in 98% of cases. The DPR produced a total of 4055 distinct grade pairs, to be compared with the maximum 0.2 number of pairs 172 × 8 × 7/2 = 4816 (see below for more details) one would obtain in the case of no conflicts and no 0.1 dropouts. The F/M gender distribution of the DPR 0.0 yet ye a rs ars ars participants (32/68) and the scientific hD 4 12 ye 12 ye N oP tha n n4 – a n seniority distribution derived from the Le s s we e e th B et Mor DPR questionnaire (see Figure 1) reflect the underlying PI population of ESO users (Patat, 2016). Since participation in Figure 2. Distribution of the number of proposals the experiment was on a completely vol- 0.5 submitted to ESO by the untary basis, we cannot exclude the DPR participants. presence of self-selection biases. For instance, one could argue that research- 0.4 ers who already had a positive opinion of the DPR concept would be more will- Fraction ing to participate than opponents, hence 0.3 introducing systematics into the final analysis. On the other hand, if the com- munity were strongly against the para- 0.2 digm, one would expect a similar effect. In general, although we cannot guarantee that there are no specific attributes that 0.1 lead the participants to self-selection, the demographics indicate that, if they exist, 0.0 they are well hidden. Fewer than Between 3 and More than 3 proposals 10 proposals 10 proposals An important aspect regarding the demographics of the experiment con- cerns the fraction of junior scientists. there are published studies that indicate one single proposal sub-category (within Since, as a rule, the regular panel mem- reviewers who self-report higher levels a given scientific category), the panel bers serving on the OPC are required to of expertise tend to be less generous in members are requested to identify three have a minimum seniority level (typically assigning the top grades (Gallo et al., sub-categories, ranking them in order starting with scientists at their second 2016), the differences seen between the of expertise. This information is then used postdoc onward), this establishes a sig- grade distributions of senior and junior to compose review panels in such a way nificant difference between the two pools DPR participants are not statistically that the expertise coverage within each of reviewers. In the case of the OPC, the significant. of them is as broad as possible. This is distribution is heavily skewed towards required by any schema in which physical senior members (88%), with a small frac- panels exist, which is in turn a constraint tion of postdocs (12%) and no students Referee-Proposal matching stemming from the fact that the panels (Patat, 2016), while the postdoc and stu- have to meet face-to-face and discuss dent reviewers reach about 18% in the In the regular OPC process, the panel the same set of proposals. This intro- case of the DPR sample (Figure 1). members are recruited to cover the widest duces a certain rigidity, which is also possible range of astrophysical areas. related to the relatively small number of Most DPR participants were relatively Each of the selected reviewers is asked available reviewers. experienced in submitting proposals (Fig- to declare her/his expertise by providing ure 2), although almost 60% of them sub-categories from the same list used Since DPR has the advantage of involving had never served on a time allocation by the applicants to categorise their pro- a much larger number of reviewers, it commitee before (Figure 3). Although posal. While the PI is allowed to indicate allows a significantly more flexible and The Messenger 177 – Quarter 3 | 2019 5
Telescopes and Instrumentation Patat F. et al., The Distributed Peer Review Experiment more objective approach in which, for 0.6 Figure 3. Distribution of expertise in serving each proposal, an ad hoc, optimised on Time Allocation panel can be formed. A key ingredient in Committees (TAC) for 0.5 this approach is the proposal-referee the DPR participants. matching, which should work without the need for human supervision, especially 0.4 when the turnaround has to be fast. Fraction For this purpose, the DT algorithm used 0.3 in the DPR experiment was designed to predict what we call domain expertise, 0.2 which in this context can be considered to be the objective ability of a given sci- entist to review a given proposal. Before 0.1 we discuss its reliability, we examine how referees assessed their own ability to 0.0 review each proposal assigned to them. Never served Served once Served multiple times As anticipated in the introduction, during on TAC on TAC on TAC the refereeing process each participant was asked to express their self-perceived Figure 4. Distribution of Negative, no PhD yet self-reported domain expertise level for each of the assigned knowledge for the differ- proposals, resulting in about 1200 eva Less than 4 years ent scientific seniority of luations. The distribution of participants’ 0.4 Between 4 and 12 years the DPR participants. self-evaluated ability to review the assigned More than 12 years proposals is presented in Figure 4, where we have used different colours for the 0.3 Fraction different classes of scientific seniority. As expected, junior scientists tend to perceive themselves as experts less often than 0.2 senior scientists do. Also, they often indi- cate that they have limited knowledge of a given field. We take this is an indication that the self-evaluated ability of a referee 0.1 to review the assigned proposals is a useful proxy of the more objective (albeit more abstract) concept of domain 0.0 Expert General knowledge No knowledge knowledge. The data collected in the DPR experiment expertise, which can be considered as a perceive them to be the top and interme- enable an additional analysis of a possi- reasonable first approximation to the diate classes. As shown in Figure 5, the ble gender dependence on the above underlying domain knowledge. From a correlation in the intermediate cases self-evaluation. This has been reported, statistical point of view, this is equivalent becomes fuzzier. With the available data for instance, by Huang (2013), who con- to computing the Bayesian conditional it is impossible to tell which of the two cluded that females tend to under-predict probability P (self-reported | DT) of having estimators is responsible for the observed their performance in certain STEM fields. a certain self-reported expertise level, noise. If on the one hand we can argue Our data suggest that, at least for post- given the DT-inferred level. In simpler that the DT approach has obvious limita- graduates in the domain of astrophysics, words, one checks how the self-reported tions (which is certainly true), on the other there is no statistically significant gender and DT-inferred levels correlate. The hand the self-reported levels are affected difference. result is p resented in Figure 5, which by a significant level of uncertainty, as shows an encouragingly high correlation. they are related to subjective perceptions Since the DT is designed to predict the For instance, the probability that the DT rather than to objective criteria. expertise of a referee with respect to a considers a match as the worst which given proposal, the first question one the referee believes is the best, is less Another aspect is the importance of should ask is how reliable the algorithm then 1%. At the other extreme, it is very proper proposal-referee matching. Our is. Obviously, there is no absolute refer- likely (78%) that if the DT estimates the direct experience, accumulated over ence; the DT is one possible objective match is poor, the referee is of the same many years of managing the review pro- estimate of this quality. Therefore, as a opinion. The agreement on the best cess at ESO, shows that, in addition to first exploratory test, one can check the matches is at the level of 50%, while for the obvious problem related to exces- DT results against the self-evaluation of 81% of the best DT matches, the referees sively large numbers of proposals, panel 6 The Messenger 177 – Quarter 3 | 2019
members report a general uneasiness P (helpful comment | DeepThought) Figure 5. Conditional probability for the when dealing with proposals in areas in DeepThought inferred knowledge various combinations of which they feel they are not experts. For self-reported and a more quantitative assessment, DPR 0.14 0.24 0.39 0.24 DT-inferred knowledge st participants were asked to express their be level. level of confidence, using a four-point scale, when asked to evaluate those cases; the corresponding distribution is 0.29 0.28 0.27 0.16 n presented in Figure 6. In about 60% of the ia ed cases, the reviewers were not comfortable m with this situation. This implies that better matching of expertise gives the reviewers a better experience, an aspect which 0.28 0.31 0.27 0.13 st or should not be underestimated. w 1 2 3 4 Not helpful Very helpful Feedback quality Review evaluation In the classical review concept, the feed- back provided by the panel to the PI is supposed to reflect the consensus opin- 0.5 Figure 6. Distribution of the answers to the ion. This paradigm has at least two obvi- question: “How satisfac- ous limitations: (a) proposals that are tri- torily were you able to aged out (i.e., the bottom ~ 30%) are not 0.4 evaluate the proposals discussed, and the feedback is based for which you were not an expert?”. Fraction on the opinion of the primary referee; (b) 0.3 for proposals that are discussed during the face-to-face meeting the primary 0.2 referee tries to capture the main points of the discussion and produces a single comment. There is simply not enough 0.1 time for the panel members to review all the feedback and to make sure it reflects 0.0 provided an unfair evaluation all the aspects of the discussion. In the might not always have been Mostly; I sometimes missed Somewhat; I struggled and Fully; I could evaluate well and fairly as a non-expert current implementation at ESO, the com- the expertise but was still Not satisfactory; I might ments are formally supervised by panel chairs, who are responsible for the integ- have unintentionally rity of the feedback (particularly as it able to evaluate able to evaluate relates to the language used). The net effect, possibly coupled with a sub- optimal matching between proposal and referee, is a high level of dissatisfaction in the community, which is consistently reported by the Users Committee; the dissatisfaction reported is about 30% for The participants were asked to rate each (99% of the sub-sample that responded). all of ESO and exceeds 50% for ALMA 3. of the comments they received for their In about 40% of the cases the DPR was proposal, based on its helpfulness. It is reported to have provided better com- Since the TAWG recommended the use important to stress that they were not ments, while the fraction of comments of DPR for a FTC, no attempt was made asked whether the comments were good with quality similar to, or better than the to produce consensus feedback and/or or bad, or whether they liked them or not, OPC reaches about 85%. to edit/check individual comments, which but whether they were useful for improv- were distributed to the PIs in their original ing the quality of their proposal. The gen- The analysis of comment helpfulness as form. The purpose of this implementation eral response was very satisfactory, as a function of the reviewer’s expertise was two-fold: (a) to get feedback on the shown in Figure 7, with more than 60% (either self-reported or DT-inferred) shows concept itself, and (b) to detect possible of the comments judged as being useful, that the dependence is mild in the central problems (for example, inappropriate lan- and about 5% not useful. One of the regions; the experts very rarely gave guage) generated by the unedited/unfil- questions also concerned the compari- unhelpful comments and, conversely, tered text. son with the edited OPC comments non-experts rarely gave very helpful com- received by the PIs in previous semesters ments. A similar analysis as a function of The Messenger 177 – Quarter 3 | 2019 7
Telescopes and Instrumentation Patat F. et al., The Distributed Peer Review Experiment the reviewer’s scientific seniority reveals a Figure 7. Distribution of the “helpfulness” flat distribution (within the noise), with one 0.4 ratings of the referee remarkable exception: graduate students comments for the entire seem to be unable to provide very useful DPR sample. comments. This may signal a training 0.3 issue, which can probably be addressed Fraction by exposing the students to schemes like 0.2 the DPR. Finally, no statistically significant difference is seen between the helpful- 0.1 ness of comments written by female and male referees. 0.0 Somewhat; some comments might help me to strengthen Fully; overall the comments will not help me to improve Mostly; several comments Not useful; the comments will help me to strengthen will allow me to improve A brief primer on subjectivity my proposed project my proposed project my proposed project my proposed project Figure 8 (below). Pre- Before we proceed with the comparison meeting OPC referee– between the final OPC and DPR out- referee correlation. In comes, a digression on the subjectivity this density diagram each point represents a inherent in the process is necessary. pair of grades attributed Although it is common knowledge that to the same proposal two different panels reviewing the same by two distinct referees. set of proposals would provide different The data are from the P18 sample. rankings (and this is often used to compare time allocation committees to roulette), 4.0 quantitative statements are very rare. This matter is addressed in great detail in an N(data) = 196153 480 extensive study based on about 15 000 ESO proposals (Patat, 2018b; hereafter 3.5 P18). The interested reader is referred to 420 the paper for a thorough discussion, while here we will focus only on the con- cepts relevant to the present discussion. 3.0 360 Referee grade #2 One way of quantitatively describing the reproducibility of a review process is 300 the correlation between the grades attrib- 2.5 uted to the same set of applications by two distinct bodies. These bodies can be 240 composed of a single individual or of sev- eral members. We will be talking about referee–referee (r–r) and panel–panel 2.0 180 (p–p) correlations. In the first instance, one simply considers all the distinct 120 grade pairs attributed by referee #1 and referee #2 to the same set of proposals, 1.5 placing them in a diagram in which the 60 grades are used as coordinates, so that each single grade pair is represented Corr. coeff. = 0.21 by a point. One can then repeat the pro- 1.0 cess for all possible referee pairs, plotting 1.0 1.5 2.0 2.5 3.0 3.5 4.0 all the corresponding points on the r–r Referee grade #1 plane. Since the same proposal is graded by many reviewers, each single proposal proposal is np = Nr (Nr –1)/2. For instance, with Np = 172 clouds of points. In the is represented on the r–r plane by a cloud in the case of the DPR experiment, with case of the DPR experiment, this would of points. typically Nr = 7, the above combinatorics yield 172 × 21 = 3612 points. In an ideal formula yields 21 distinct pairs per pro- situation, all the clouds would be very In the simplifying assumption that each posal. Of course, the same operation can small in size (meaning that all referees proposal is seen by Nr referees, the num- be repeated for all Np proposals in the would provide very similar grades for the ber of distinct grade pairs np for each sample, which will populate the diagram given proposal), and so the points would 8 The Messenger 177 – Quarter 3 | 2019
be distributed very close to the straight- is 1, while a null value would signal a Comparing the OPC and DPR line y = x on the r–r plane. complete disagreement. The average outcomes agreement is expected to be 0.25 in case To illustrate what one is to expect in real a fully stochastic process, i.e., when there The first test we apply to the DPR data life, we have constructed the r–r plane for is no correlation between the two bodies. concerns the subjectivity level character- the pre-meeting OPC P18 sample, from The concept can be extended to all ising the typical participant. For this pur- which we derived almost 200 000 grade quartiles, including cross-quartile values, pose, we have computed the average pairs accumulated over 16 ESO cycles. and the quartile agreement matrix (QAM) r–r QAM that we introduced in the previ- The resulting diagram is presented in Fig- can be constructed. In statistical terms, ous section. Because of the DPR setup, ure 8. It is important to note that for a the generic element Mij of the QAM is the the ranking list for each referee includes perfectly stochastic process, the points conditional probability that a proposal at most eight proposals, so each quartile would be distributed within a circular area, ranked in the i-th quartile by referee #1 is contains no more than two proposals. with some radial, typically Gaussian, dis- ranked in the j-th quartile by referee #2. Also, at variance with the classical panel tribution. The fact that the real d istribution scheme, the number of proposals in is elongated along the diagonal direction The application of this concept to the common between two reviewers is typi- signals that the process is not aleatory. P18 pre-meeting sample shows that, on cally very small. As a direct comparison This qualitative conclusion can be made average, the ranking lists produced by between ranks is not possible, we use more quantitative by computing the Pear- two distinct referees have about 33% of a bootstrap approach. Very briefly, for son linear correlation coefficient, which the proposals in common in their first and each of the 172 proposals we randomly ranges from –1 (complete anti-correla- last quartiles. In the central quartiles the extract one grade pair and form two tion) to 1 (complete correlation) and is null intersection is compatible with a purely ranking lists, which are used to compute for complete uncorrelation. The value random selection (25%). This extends to the quartile agreement fractions. The derived for the sample is 0.21. Given the the mixed cases (i ≠ j ), with the exception process is repeated a large number of very large number of points, this is a very of the extreme quartiles; the fraction times and the average values and stand- robust estimate which can be reliably of proposals ranked in the first quartile by ard deviations are derived for each of the taken as a low correlation. For the same referee #1 and in the fourth quartile QAM elements. The result is presented reason, however, this value reveals that by referee #2 is ∼ 17%, which deviates in in Table 1. A direct comparison with there is a statistically significant signal a statistically significant way from the the values derived from the P18 sample indicating that the process is not com- random value. As in the case of the r–r reveals that the two results are statisti- pletely aleatory. If on the one hand this correlation introduced above, the r–r cally indistinguishable. No meaningful may sound discouraging, it helps to put agreement fraction gives a quantitative difference is seen in the QAMs computed things in the correct context, as it char- estimate of the high level of subjectivity for the OE and DT sub-samples. acterises the subjectivity of the process that characterises the process, providing in a more quantitative and objective way, a precise indication of what one should In a further test, we have investigated the as opposed to the common statements expect. possible dependence on the scientific which are normally based on pure anec- seniority level introduced above. Of the dotal evidence. The reason why the applications are usu- 167 reviewers, 136 provided this informa- ally evaluated by more than one reviewer tion, which we used to sub-divide the A different way of measuring the repeata- is to reduce the inherent “noise” which, reviewers into two classes: junior (groups bility of the process, which we will use as we have just seen, is quite substantial. 0 and 1) and senior (groups 2 and 3). extensively in the next section, is the For this purpose, the grades attributed These classes roughly correspond to quartile agreement fraction (P18). The by different referees to the same proposal PhD students plus junior postdocs (37), concept is as follows. When the same set (typically grouped in panels) are aggre- and advanced postdocs plus senior sci- of proposals is reviewed by two different gated to form one single figure of merit. In entists (99), respectively. We then com- bodies #1 and #2, one can compile the ESO implementation (and this is a puted the r–r QAM for the two classes; the rankings for the two distinct reviews common recipe), this is achieved simply the first quartile terms are 0.22 and 0.32, based on their distinct grades. The taking the average, with no weights and/ respectively. At face value this indicates rankings are then used to derive a merit or rejection. The effect of increasing the a larger agreement between senior classification within the classical quartile number of reviews is diffusely discussed reviewers. However, the small size of the scheme. For instance, the top 25% of in P18; here it suffices to say that for proposals are ranked in the first quartile Nr = 3 the first quartile agreement fraction of the distribution of grades. grows to 43% and 30% in the first and Table 1. Bootstrapped r–r Quartile Agreement Matrix for the DPR experiment. second quartiles, respectively. Once this is done, one can compute the Referee #1 Referee #2 quartile fraction of applications ranked in the Armed with these terms of reference we quartile 1 2 3 4 first quartile by review #1 which are also can now discuss the results of the DPR 1 0.33 0.26 0.24 0.18 graded in the same quartile by review #2. experiment. 2 0.26 0.26 0.25 0.23 For a complete agreement the fraction 3 0.24 0.25 0.25 0.26 4 0.18 0.23 0.26 0.34 The Messenger 177 – Quarter 3 | 2019 9
Telescopes and Instrumentation Patat F. et al., The Distributed Peer Review Experiment junior class produces a significant scatter, Table 2. Average DPR–OPC (pre-meeting) Table 3. DPR–OPC (pre-meeting) r–r Quartile Agreement Matrix. p–p Quartile Agreement Matrix. so the difference may not be significant. DPR referee OPC referee quartile DPR OPC (pre-meeting) quartile One can extend the above bootstrapping quartile 1 2 3 4 quartile 1 2 3 4 procedure to subsets with a number of 1 0.31 0.26 0.24 0.18 1 0.37 0.26 0.28 0.09 referees Nr > 1. The case of Nr = 3 is par- 2 0.24 0.27 0.25 0.24 2 0.28 0.16 0.28 0.28 ticularly interesting as this is directly 3 0.24 0.23 0.26 0.26 3 0.16 0.40 0.19 0.26 comparable to the results presented in 4 0.20 0.23 0.25 0.31 4 0.19 0.19 0.26 0.37 P18. The procedure is as follows: we first make a selection of the proposals This matrix is very similar to that derived of the pre-meeting OPC process (P18). having at least 6 reviews (164); for each of within the DPR reviews (see Table 1), pos- Note that, given the large noise inherent these we randomly select two distinct sibly indicating a DPR–OPC r–r agree- in the process, a much larger data set (i.e., non-intersecting) subsets of Nr = 3 ment slightly lower than the correspond- (or more realisations of the experiment) grades each, from which two average ing DPR–DPR. A check performed on would be required to reach a sufficiently grades are derived; the subsequent steps the two sub-samples for the junior and high statistical significance and to make are identical to the r–r procedure, and senior DPR reviewers (according to the robust claims about possible systematic lead to what we will call the p–p QAM. classification described above) has given deviations. statistically indistinguishable results. The first-quartile agreement turns out to The fact that in the real OPC process be 41%, while for the second and third As explained in the introduction, the pro- there is a face-to-face meeting consti- quartiles this is 30%. The top-bottom posals were reviewed by Nr = 3 OPC tutes the most pronounced difference quartile agreement is 10%. These values referees in the pre-meeting phase. This between the two review schemes. In the are very similar to those presented in constitutes a significant difference, in meeting, the opinions of single reviewers P18 for the OPC process for Nr = 3 sub- that the DPR ranking is typically based are changed during the discussion, so panels. As for the r–r case, the OE and on ~ 7 grades, whereas the pre-meeting that grades assigned by individual refer- DT sub-samples yield statistically indistin- OPC ranking rests on 3 grades only. ees are not completely independent guishable values. The conclusion is that, With this caveat in mind, one can never- from each other (as opposed to in the in terms of self-consistency, the DPR theless compute the QAM for the two pre-meeting phase, in which any signifi- review behaves in the same way as the overall ranking lists. The result is pre- cant correlation should depend only on pre-meeting OPC process. sented in Table 3. At face value, about the intrinsic merits of the proposal). The 37% of the proposals ranked in the 1st effects of the meeting can be quantified We now come to what is perhaps one of quartile by the DPR were ranked in the in terms of the quartile agreement frac- the most interesting aspects. As antici- same quartile by the OPC, with a similar tions between the pre- and post-meeting pated, the proposals used in the DPR fraction for the bottom quartile. When outcomes, as outlined in Patat (in prepa- experiment were also subject to the regu- looking at these values, one needs to ration; hereafter called P19). Based on lar OPC review. This enables the com consider that this is only one realisation, the P18 sample, P19 concludes that the parison between the outcomes of the two which is affected by large scatter, as change is significant; on average, only selections, with the caveats outlined can be deduced from the comparatively 75% of the proposals ranked in the top above about their inherent differences. large fluctuations in the QAM. These are quartile before the meeting remain in the evident when compared to, for instance, top quartile after the discussion (about For a first test we used a bootstrap the average values obtained from the 20% are demoted to the second quartile, procedure in which, for each proposal bootstrapping procedures described and 5% to the third quartile). P19 charac- included in the DPR, we randomly above. The numerical simulations show terises this effect by introducing the extracted one evaluation from the DPR that the standard deviation of a single Quartile Migration Matrix (QMM). For the (typically one out of 7) and one from realisation is ~ 0.1. specific case of Period 103, the QMM the OPC (one out of 3), forming two is reported in Table 4 for the subset of the ranking lists from which a r–r QAM was Using the model presented in P18, one DPR experiment. Of the initial 172 pro- computed. The operation was repeated can predict that, on average, the top and posals included in the DPR sample, 36 a large number of times and the average bottom quartile agreement between the were triaged out in the OPC process and and standard deviation matrices were DPR and the pre-meeting OPC should be are therefore not considered. constructed. This approach provides a around 0.5 (see Kerzendorf, 2019 for direct indication of the DPR-OPC agree- more detail). The observed value (0.37) As anticipated, the effect is very marked; ment at the r–r level and overcomes differs at the 1.3-s level from the average the meeting does have a strong effect the problem that the two reviews have value. For the central quartiles the differ- on the final outcome. In light of these a different number of evaluations per ence is at the ~ 1.5-s level. Therefore, facts, we can finally inspect the QAM proposal (see below). The result is pre- although lower than expected on aver- between the DPR and the final outcome sented in Table 2. The typical standard age, the observed DPR–OPC agreement of the OPC process. This is presented deviation of single realisations from the is statistically consistent with that in Table 5. With the only possible excep- average is 0.06. expected from the statistical description tion of M4, 4, which indicates a relatively 10 The Messenger 177 – Quarter 3 | 2019
Table 4. OPC Quartile Migration Matrix for the Table 5. DPR–OPC (post-meeting) processing. The next logical step is to DPR sub-sample (N = 136). Quartile Agreement Fraction. expand this experiment and distribute a OPC pre-meeting OPC post-meeting quartile DPR OPC post-meeting quartile fraction of observing time using DPR at quartile 1 2 3 4 quartile 1 2 3 4 more facilities. More than 95% of the 1 0.56 0.32 0.12 0.00 1 0.26 0.38 0.24 0.12 participants suggest an implementation 2 0.32 0.32 0.29 0.06 2 0.24 0.35 0.24 0.18 of such a scheme for some part of the 3 0.12 0.26 0.38 0.24 3 0.32 0.12 0.29 0.26 ESO proposal types, with 75% support 4 0.00 0.09 0.21 0.71 4 0.19 0.15 0.24 0.44 for the short programmes (time requests < 20 hours). Fewer than 5% of the marked agreement for the proposals in weakest aspect of the DPR. However, responses were against implementing the bottom quartile, the two reviews it remains unclear whether panel discus- DPR for any of the programme types. In appear to be almost completely uncorre- sions lead to the selection of better particular, about 70% of the responses lated. By means of simple Monte-Carlo science. In this respect, it is important to are in favour of deploying DPR for the calculations one can show that for two note that several studies have shown that Fast Track Channel, while only about 15% fully aleatory panels, the standard devia- panel meetings can increase the differ- are against it (the remaining 15% is indif- tion of a single realisation around the ences between two panels with respect ferent). We take this as a clear indication average value (0.25) is 0.10. We conclude to the pre-meeting agreement. In other of support. the majority of the Mi,j elements in Table 5 words, while the meeting increases the are consistent with a stochastic process internal consensus by polarising different One of the objections that is typically at the 1-s level. opinions within the panels, it does not made to the DPR concept is that, by dis- lead to a better panel-panel agreement tributing the proposals to a larger number The main conclusion of this analysis is (see Obrecht et al., 2007 and references of unselected scientists, it increases the that, while the pre-meeting agreement therein). One would expect the discus- chances of information leakage and pla- is consistent, with the DPR and OPC sions to bring judgment closer to identify- giarism. In the specific case of the DPR reviewers behaving in a very similar way ing the best science; however, these experiment, the proposals were distrib- (in terms of r–r and p–p agreements), studies indicate that a face-to-face meet- uted to 172 reviewers, while in the OPC the face-to-face meeting has the effect of ing does not necessarily make the pro- process the applications were seen by 78 significantly increasing the discrepancy cess better. individuals. However, while in the OPC between the two processes. However, implementation each reviewer has access we caution that the sample is relatively to all proposals assigned within her/his small, and therefore the results are signifi- Conclusions and outlook panel (typically 70–80), the DPR reviewer cantly affected by noise. sees a factor of ~ 10 fewer proposals. Gemini has already implemented a Therefore, under the reasonable hypo That the DPR–OPC agreement is smaller variant of this mechanism successfully thesis that the fraction of “malevolent” than the internal DPR–DPR agreement over the past few years for their Fast scientists is the same in both review bod- is not unexpected, as there are intrinsic Turnaround (Andersen et al., 2019). The ies (which are selected from the same differences between the two setups, the approach presented here enhances this community), one would actually expect largest one being the absence of a face- process, using better review-proposal that the DPR is less prone to confidential- to-face meeting, which is potentially the matching based on natural language ity issues on average. To get a direct opinion from DPR participants, the ques- 0.30 Figure 9. Distribution of tionnaire contained an explicit question the answers to the about this aspect. The distribution of the question: “For which types of proposals responses is shown in Figure 10. Exclud- 0.25 do you think distributed ing the “no strong opinion” cases, 66% peer review would be of the users declared themselves to be beneficial?” in the DPR 0.20 equally or more confident in the DPR survey. process, resulting in about a third of the Fraction users placing more trust in the classical 0.15 scheme. 0.10 Another concern that is often heard when discussing DPR is the possible presence of biases. Again, the specific question 0.05 put to the participants regarding this point does not support this concern; 74% 0.0 of the respondents believe DPR is equally Short Regular Large Short, Regular, Short, None or more robust against biases (Figure 11). regular large regular, large The Messenger 177 – Quarter 3 | 2019 11
Telescopes and Instrumentation Patat F. et al., The Distributed Peer Review Experiment Figure 10. Distribution gives an objective criterion to assign a of answers to a question particular expertise, eliminating biases in 0.4 about how secure the participants felt about self-reporting. DPR implicitly removes the concept of panel, which adds rigidity Fraction confidentiality issues. to the process. For instance, it maximises 0.2 the overlap in evaluations, which is a t ypical issue in pre-allocated panels. The lack of a face-to-face meeting prevents 0.0 strong personal opinions from having a pivotal influence on the process. Also, concerned about confidentiality DPR involves a larger part of the commu- I am more concerned about I am less concerned about confidentiality issues in the confidentiality issues in the issues in the DPR process nity, increasing its democratic breadth I am neither more nor less than in the OPC process DPR process than in the DPR process than in the I have no strong opinion and exposing all applicants to the typical quality of the proposals. This allows them to better understand if their request is not allocated time by placing it in a wider OPC process OPC process on this point context, which will help to improve their proposal-writing skills, training the mem- bers of the community without additional effort. We acknowledge that the lack of a meet- 0.3 Figure 11. Distribution of ing does not allow the exchange of answers to a question opinions and the possibility of asking and about the robustness of the process against answering questions to/from the peers. 0.2 Despite the fact that its effectiveness Fraction biases. remains to be demonstrated and quanti- fied (see above), it is clear that the social, 0.1 educational and networking aspects of the face-to-face meeting should not 0.0 be undervalued. In this respect, we note that the resources freed by the DPR I have no strong opinion against biases than the against biases than the process is more robust I think that DPR review I think that DPR review I think that DPR review process is less robust approach can be used by the organisa- against biases as the OPC review process OPC review process OPC review process process is as robust tions for education and community networking (training on proposal writing, fostering collaborations, etc.). on this point In April and May 2019, results of the DPR experiment were presented to the ESO governing bodies most closely concerned with the Peer Review process The main conclusions drawn from the To these aspects, which come directly (i.e., the Scientific Technical Committee, DPR experiment can be summarised as from the data, other positive facts can the Users Committee and the Observing follows: be added. DPR allows a much larger sta- Programmes Committee). The ensuing – The DeepThought-enhanced DPR tistical basis enabling robust outlier rejec- discussions have resulted in a wealth of experiment was very well received by tion (the number of proposals per referee useful feedback that is being discussed the participants. can be easily brought to 10–12) and it internally. We would like to conclude – The mechanism allows an optimal removes possible biases generated by by pointing out that these kinds of stud- referee-proposal matching. panel member nominations. The larger ies are crucial if we are to progress from – The DPR process is as subjective as pool of scientists allows much better a situation in which the classical peer the OPC process. coverage in terms of proposal expertise review process is adopted notwithstand- – The participants do not see the confi- matching, and the smaller number of ing its limitations simply due to the lack of dentiality and bias issues as being more proposals per reviewer allows more care- better alternatives. As scientists, we firmly severe than in the classical scheme. ful work and more useful feedback. believe in experiments, including those – ESO should consider deploying DPR for that address the selection of the experi- regular proposals below a certain Another aspect of the DeepThought ments themselves. time request, while leaving the classical approach to proposal-referee matching is review for larger time requests. that it can be semi-automated; it also 12 The Messenger 177 – Quarter 3 | 2019
2 Acknowledgements istributed Peer Review Pilot in Foundational D Kerzendorf, W. E. et al. 2019, submitted to Nature Program: https://nifa.usda.gov/resource/ Astronomy The authors wish to express their gratitude to the distributed-p eer-review-pilot-foundational-program Merrifield, M. R. & Saari, D. G. 2009, Astronomy and 3 167 volunteers who participated in the DPR experi- Report from ESO Users Committee No. 42 (2018): Geophysics, 50, 4.16 ment, for their work and enthusiasm. The authors are https://www.eso.org/public/about-eso/commit- Mervis, J. 2014a, Science, 344, 1328 also grateful to Markus Kissler-Patig for passionately tees/uc/uc-42nd/UCreport2018.pdf Mervis, J. 2014b, Science, 345, 248 promoting the DPR experiment following his experi- Obrecht, M., Tibelius, K. & D’Aloisio, G. 2007, ence at Gemini; to ESO’s Director General Xavier Research Evaluation, 16 (2), 79 Barcons and ESO’s Director for Science Rob Ivison References Patat, F. 2016, The Messenger, 165, 2 for their support; and to Hinrich Schütze for several Patat, F. et al. 2017, The Messenger, 169, 5 suggestions on the NLP process. Andersen, M. et al. 2019, AAS, 233, 455.03 Patat, F. 2018a, The Messenger, 173, 7 Ardabili, P. N. & Liu, M. 2013, CoRR, arxiv:1307.6528 Patat, F. 2018b, PASP, 130, 084501 Brinks, E. et al. 2012, The Messenger, 150, 20 Strolger, L.-G. et al. 2017, AJ, 153, 181 Links Gallo, S. A., Sullivan, J. H. & Glisson, S. R. 2016, PLoS ONE, 11, e0165147 1 emini Observatory Fast Turnaround Observing G Huang, C. 2013, European Journal of Psychology of Mode webpage: http://www.gemini.edu/sciops/ Education, 28, 1 observing-gemini/proposal-routes-and-observing- Kerzendorf, W. E. 2017, Journal of Astrophysics and modes/fast-turnaround Astronomy, arxiv:1705.05840 ESO/G. Hüdepohl (atacamaphoto.com) Snowfall at Paranal is a rare phenomenon that serves to utterly trans- form the surroundings of the VLT/I into an other- worldly landscape. The Messenger 177 – Quarter 3 | 2019 13
Telescopes and Instrumentation DOI: 10.18727/0722-6691/5148 On the Telluric Correction of KMOS Spectra Lodovico Coccato 1 rate atmospheric and instrumental (MIPAS) atmospheric profiles for temper- Wolfram Freudling 1 effects, (for example, the instrument ature, humidity, water vapour and other Alain Smette 1 response) if a large wavelength range molecules, and (d) analytic functions Eleonora Sani 1 of stellar continuum is absorbed by or user-provided files for the instrumental Jose A. Escartin 1, 2 blended absorption lines. Last but not spectral resolution. The fit to the telluric Yves Jung1 least, the noise and imperfections in absorption lines in the observed spectra Gurvan Bazin1 the data reduction of these stars are inev- provides the integrated column density itably propagated to scientific spectra. of individual molecules. Future versions will further improve the quality of the 1 ESO Alternatively, one can model the atmos- model by including real-time measure- 2 ax-Planck-Institut für extraterrestrische M phere, generate its transmission spec- ment of precipitable water vapour and Physik, Garching, Germany trum and apply it to observations. The other molecules along the line of sight of model itself can be obtained by fitting the exposures. well-defined telluric lines to the spectrum The presence of strong absorption of either a standard star or a sufficiently In the following, we describe the improve- lines in the atmospheric transmission bright science target. In general, a model ments in the quality of KMOS (Sharples spectrum affects spectroscopic obser- depends on four components: (a) a radia- et al., 2013) spectra obtained with vations, in particular those in the near- tive transfer model; (b) a set of parame- the model approach using molecfit with and mid-infrared. Therefore, there is the ters that determines the absorption and respect to the empirical method. Data need to correct scientific observations transmission properties of individual were reduced using the KMOS pipeline for this effect, a process known as tel- molecules; (c) atmospheric profiles of (Davies et al., 2013). In the model luric correction. The use of a detailed temperature, humidity, and volume mix- approach, the atmospheric model was model of the atmospheric transmission ing ratio for the molecules involved; and obtained by fitting a number of pre- spectrum brings several advantages (d) instrumental parameters such as defined telluric lines on a standard star over the method of empirically deriving spectral resolution. This model-depend- spectrum observed close in time to corrections using observations of a ent approach has several advantages the scientific data (i.e., the same standard telluric standard star. In this paper, we over the empirical method. First, no addi- star that was used in the empirical discuss and compare the two methods tional noise or sources of error coming method). The telluric correction over applied to K-band Multi-Object Spec- from the standard star observations and the full wavelength range was then com- trograph (KMOS) observations and reduction are propagated to the science puted accounting for the differences in show the improvements in the quality of spectra. Second, it allows additional airmass and spectral resolution between the final products obtained by imple- components to be taken into account, the s cientific spectrum to correct and menting the modelling technique such as the amount of precipitable water the standard star. As a test-bench for offered by the ESO molecfit sky tool. vapour from external sources and inac- comparison, we processed one month of curate wavelength calibrations, and dif- KMOS data and compared the results ferences between the observations of the obtained with these two different telluric Correction for atmospheric transmission standard star and the science target (for correction strategies. in spectroscopic data example, airmass and spectral resolu- tion). On the other hand, using a model of Ground-based spectroscopic observa- the atmosphere for the telluric correction Benefits of the molecfit strategy for tions are strongly affected by the Earth’s risks the introduction of systematics KMOS observations atmosphere. In particular, spectra of because of limitations in the modelling. In objects taken in the near- and mid-infra- practice, the artefacts caused by such As described previously, because the red wavelength ranges are characterised systematics are outweighed by the molecfit correction is based on a model, by a forest of absorption lines, called improvements made in the corrections. it does not add noise to the final products telluric absorptions. These features are or defects such as uncorrected cosmic caused by (mainly water and OH) mole- The model approach has been devel- rays that are embedded in the standard cules present in the atmosphere that oped in a software package named star spectrum. Figure 1 shows a compari- absorb the light from astrophysical molecfit (Kausch et al., 2013; Smette et son between the mean signal-to-noise sources. The standard way to correct for al., 2015). Molecfit uses (a) the Line-by- per pixel of the datacubes obtained by this effect is to acquire a spectrum of line Radiative Transfer Model 1 (LBLRTM) correcting the telluric absorption directly a bright and featureless star close in time algorithm (Clough, Iacono & Moncet, with a standard star (i.e., the empirical and airmass to the scientific target, and 2005) to compute the radiative transfer method) and by modelling the atmos- compare it either with its model or, if model, (b) the high-resolution transmis- pheric absorptions with molecfit. The available, with a spectrum taken from sion molecular absorption (HITRAN) signal-to-noise is measured in a wave- space. This empirical strategy, however, database 2 for the molecular parameters, length region that is free of sky or telluric has some drawbacks. First, it requires (c) Global Data Assimilation System 3 lines, and therefore is an indication of additional (expensive) telescope time. (GDAS) and ESA Michelson Interferome- the noise added by the telluric correction. Second it can be complicated to sepa- ter for Passive Atmospheric Sounding 4 As expected, the data corrected with 14 The Messenger 177 – Quarter 3 | 2019
You can also read