I Forgot Your Password: Randomness Attacks Against PHP Applications
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
I Forgot Your Password: Randomness Attacks Against PHP Applications∗ George Argyros Aggelos Kiayias Dept. of Informatics & Telecom., Dept. of Informatics & Telecom., University of Athens, University of Athens, argyros.george@gmail.com aggelos@di.uoa.gr & Computer Science and Engineering, University of Connecticut, Storrs, USA. Abstract PHP for example lacks a built-in cryptographically se- cure PRNG in its core and until recently, version 5.3, it We provide a number of practical techniques and tottaly lacked a cryptographically secure randomness algorithms for exploiting randomness vulnerabilities generation function. in PHP applications.We focus on the predictability of This left PHP programmers with two options: They password reset tokens and demonstrate how an attacker will either implement their own PRNG from scratch can take over user accounts in a web application via or they will employ whatever functions are offered by predicting or algorithmically derandomizing the PHP the API in a “homebrew” and ad-hoc fashion. In ad- core randomness generators. While our techniques are dition, backwards compatibility and other issues (cf. designed for the PHP language, the principles behind section 2), often push the developers away even from our techniques and our algorithms are independent of the newly added randomness functions, making their PHP and can readily apply to any system that utilizes use very limited. As we will demonstrate and heavily weak randomness generators or low entropy sources. exploit in this work, this approach does not produce Our results include: algorithms that reduce the entropy secure web applications. of time variables, identifying and exploiting vulnera- Observe that using a low entropy source or a crypto- bilities of the PHP system that enable the recovery or graphically weak PRNG to produce randomness does reconstruction of PRNG seeds, an experimental analy- not necessarily imply that an attack is feasible against sis of the Håstad-Shamir framework for breaking trun- a system. Indeed, so far there have been a very limited cated linear variables, an optimized online Gaussian number of published attacks based on the insecure us- solver for large sparse linear systems, and an algorithm age of PRNG functions in PHP, while popular exploit for recovering the state of the Mersenne twister gen- databases1 contain nearly zero exploits for such vul- erator from any level of truncation. We demonstrate nerabilities (and this may partially explain the delay in the gravity of our attacks via a number of case studies. the PHP community adopting secure randomness gen- Specifically, we show that a number of current widely eration functions). Showing that such attacks are in used web applications can be broken using our tech- fact very practical is the objective of our work. niques including Mediawiki, Joomla, Gallery, osCom- In this paper we develop generic techniques and al- merce and others. gorithms to exploit randomness vulnerabilities in PHP applications. We describe implementation issues that allow one to either predict or completely recover the 1 Introduction initial seed of the PRNGs used in most web applica- tions. We also give algorithms for recovering the in- Modern web applications employ a number of ways ternal state of the PRNGs used by the PHP system, in- for generating randomness, a feature which is critical cluding the Mersenne twister generator and the glibc for their security. From session identifiers and pass- LFSR based generator, even when their output is trun- word reset tokens, to random filenames and password cated. These algorithms could be used in order to salts, almost every web application is relying on the attack hardened PHP installations even when strong unpredictability of these values for ensuring secure op- seeding is employed, as it is done by the Suhosin ex- eration. However, usually programmers fail to under- tension for PHP and they may be of independent inter- stand the importance of using cryptographically secure est. pseudorandom number generators (PRNG) something We also conducted an extensive audit of several pop- that opens the potential for attacks. Even worse, the ular PHP applications. We focused on the security same trend holds for whole programming languages; of password reset implementations. Using our attack ∗ Research partly supported by ERC Project CODAMODA. 1 e.g. http://www.exploit-db.com 1
framework we were able to mount attacks that take 35K(≈ 215 ) requests. Sample benchmarks that we per- over arbitrary user accounts with practical complex- formed in various applications and server installations ity. A number of widely used PHP applications are show that on average one can perform up to 222 re- affected (see Figure 7), while we believe that the im- quests in the course of a day. pact is even larger in less known applications. Our results suggest that randomness attacks should 2 PHP System be considered practical for PHP applications and ex- isting systems should be audited for these vulnerabili- We will now describe functionalities of the PHP sys- ties. Weak randomness is a grave vulnerability in any tem that are relevant to our attacks. We first describe secure system as it was also recently demonstrated in the different modes in which PHP might be running, the widely publicized discovery of common primes in and then we will do a description of the randomness RSA public-keys by Lenstra et al. [14]. We finally generation functions in PHP. We focus our analysis in stress that our techniques apply in any setting beyond the Apache web server, the most popular web server at PHP, whenever the same PRNG functions are used and the time of this writing, however our attacks are easily the attack vector relies on predicting a system defined ported to any webserver that meets the configuration random object. requirements that we describe for each attack. This is only an extended abstract, a full version can be found in [1]. 2.1 Proccess management 1.1 Attack model There are different ways in which a PHP script is ex- ecuted. These ways affect its internal states, and thus In Figure 1 we present our general attack template. An the state of its PRNGs. We will focus on the case when attacker is trying to predict the password reset token in PHP is running as an Apache module, which is the de- order to gain another user’s privileges (say an admin- fault installation in most Linux distributions and is also istrator’s). Each time the attacker makes a request to very popular in Windows installations. the web server, his request is handled by a web appli- cation instance, usually represented by a specific op- mod php: Under this installation the Apache web erating system process, which contains some process server is responsible for the process management. specific state. The web application uses a number of When the server is started a number of child proccesses application objects with values depending on its in- are created and each time the number of occupied pro- ternal state, with some of these objects leaking to the cesses passes a certain threshold a new process is cre- attacker through the web server responses. Examples ated. Conversely, if the idle proccesses are too many, of such objects are session identifiers and outputs of some processes are killed. One can specify a maxi- PRNG functions. Although our focus is in password mum number of requests for each process although this reset functions, the principles that we use and the tech- is not enabled by default. Under this setting each PHP niques that we develop can be readily applied in other script runs in the context of one of the child processes, contexts when the application relies on the generation so its state is preserved under multiple connections un- of random values for security applications. Examples less the process is killed by the web server process of such applications are CAPTCHA’s and the produc- manager. The configuration is similar in the case the tion of random filenames. web server uses threads instead of processes. Attack complexity. Since we present explicit practi- Keep-Alive requests. The HTTP protocol offers a cal attacks, we define next the complexity under which request header, called Keep-Alive. When this header an attack should be consider practical. There are two is set in an HTTP request, the web server is instructed measure of complexity of interest. The first is the time to keep the connection alive after the request is served. complexity and the second is the query or communi- Under mod php installations this means that any sub- cation complexity. For some of our attacks the main sequent request will be handled from the same process. compuational operation is the calculation of an MD5 This is a very important fact, that we will use in our hash. With current GPU technologies an attacker can attacks. However in order to avoid having a process perform up to 230 MD5 calculations per second with hang from one connection for infinite time, most web a $250 GPU, while with an additional $500 can reach servers specify an upper bound on the number of con- up to 232 calculations [9]. These figures suggest that sequent keep-alive requests. The default value for this attacks that require up to 240 MD5 calculations can bound in the Apache web server is 100. be easilty mounted. In terms of communication com- plexity, most of our attacks have a query complexity 2.2 Randomness Generation of a few thousand requests at most, while some have as little as a few tens of requests. Our most commu- In order to satisfy the need for generating randomness nication intensive attacks (section 5) require less than in a web application, PHP offers a number of different 2
Figure 1: Attack template. randomness functions. We briefly describe each func- ditive feedback generator (resembling a Linear tion below. Feedback Shift Register (LFSR)), while in Win- php combined lcg()/lcg value(): the dows it is an LCG. The numbers generated by php combined lcg() function is used internally rand() are in the range [0, 231 − 1] but like before by the PHP system, while lcg value() is its the two optional arguments give the ability to map public interface. This function is used in order to the random number to the range [min, max]. Like create sessions, as well as in the uniqid function before the srand() function seeds the generator described below to add extra entropy. It uses two similarly to the mt srand() function. linear congruential generators (LCGs) which it openssl random pseudo bytes(length, combines in order to get better quality numbers. strong): This function is the only function The output of this function is 64 bits. available in order to obtain cryptographically uniqid(prefix, extra entropy): This secure random bytes. It was introduced in version function returns a string concatenation of the 5.3 of PHP and its availability depends on the seconds and microseconds of the server time con- availability of the openssl library in the system. verted in hexadecimal. When given an additional In addition, until version 5.3.4 of PHP this argument it will prefix the output string with the function had performance problems [2] running prefix given. If the second argument is set to true, in Windows operating systems. The strong the function will suffix the output string with an parameter, if provided, is set to true if the output from the php combined lcg() function. function returned cryptographically strong bytes This makes the total output to have length up to and false otherwise. For these reasons, and 15 bytes without the prefix. for backward compatibility, its use is still very limited in PHP applications. microtime(), time(): The function In addition the application can utilize an operating sys- microtime() returns a string concatenation tem PRNG (such as /dev/urandom). However, this of the current microseconds divided by 106 with does not produce portable code since /dev/urandom the seconds obtained from the server clock. The is unavailable in Windows OS. time() function returns the number of seconds since Unix Epoch. mt srand(seed)/mt rand(min, max): 3 The entropy of time measurements mt rand is the interface for the Mersenne Twister (MT) generator [15] in the PHP system. Although ill-advised (e.g., [5]) many web applica- In order to be compatible with the 31 bit output of tions use time measurements as an entropy source. rand(), the LSB of the MT function is discarded. In PHP, time is accessed through the time() and The function takes two optional arguments which microtime() functions. Consider the following prob- map the 31 bit number to the [min, max] range. lem. At some point a script executing a request made The mt srand() function is used to seed the MT by the attacker makes a time measurement and use the generator with the 32 bit value seed; if no seed results to, say, generate a password reset token. The is provided then the seed is provided by the PHP attacker’s goal it to predict the output of the measure- system. ment made by the PHP script. The time() function srand(seed)/rand(min, max): rand is the in- has no entropy at all from an attacker point of view, terface function of the PHP system to the rand() since the server reveals its time in the HTTP response function provided by libc. In unix, rand() ad- header as dictated by the HTTP protocol. On the other 3
hand, microtime ranges from 0 to 106 giving a max- succession two password reset requests, one for his ac- imum entropy of about 20 bits. We develop two dis- count and one for the target user’s account, then these tinct attacks to reduce the entropy of microtime() requests will be processed by the application with a that have different advantages and mostly target two very small time difference and thus the conditional en- different scenarios. The first one, Adversial Time Syn- tropy of the target user’s password reset token given chronization, aims to predict the output of a specific the attacker’s token will be small. Thus, the attacker time measurement when there is no access to other can generate a token for an account he owns and in such measurements. The second, Request Twins, ex- fast succession a token for the target account. Then ploits the fact that the script may enable the attacker to the microtime() used for generating the token of his generate a correlated leak to the target measurement. account can be used to approximate the microtime() output that was used for the token of the target account. Adversarial Time Synchronization (ATS). As we mentioned above, in each HTTP response the web Experiments. We conducted a series of experiments server includes a header containing the full date of the for both our algorithms using the following setup. We server including hour, minutes and seconds. The basic created a PHP “time” script that prints out the current observation is that although we get no leak regarding seconds and microseconds of the server. To evaluate the microseconds from the HTTP date header we know the ATS algorithm we first performed synchronization that when a second changes the microseconds are ze- between a client and the server and afterwards we sent roed. We use this observation to narrow down their a request to the time script and tried to predict the value value. it would return. To evaluate the Request Twins algo- The algorithm proceeds as follows: We connect to rithm we submitted two requests to the time script in the web server and issue pairs of HTTP requests R1 fast succession and measured the difference between and R2 in corresponding times T 1 and T 2 until a pair the output of the two responses. is found in which the date HTTP header of the cor- In Figure 3 we show the time difference between responding responses is different. At that point we the server’s time and our client’s calculation for four know that between the processing of the two HTTP servers with different CPU’s and RTT parameters. Our requests the microseconds of the server were zeroed. experiments suggest that both algorithms significantly We proceed to approximate the time of this event S in reduce the entropy of microseconds (up to an average localtime, denoted by the timestamp D, by calculating of 11 bits with ATS and 14 bits with Request Twins) the average RTT of the two requests and offsetting the having different advantages each. Specifically, the middle point between T 2 and T 1 by this value divided ATS algorithm seems to be affected by large RTT val- by two. ues while it is less affected by differences in the CPU In the Apache web server the date HTTP header is speed. The situation is reversed for Request Twins set after processing the request of the user. If the at- where the algorithm is immune to changes in the RTT tacker requests a non existent file, then the point the however, it is less effective in old systems with low header is set is approximatelly the point that a valid processing speed. request will start executing the PHP script. It fol- lows that if the attacker uses ATS with HTTP requests 4 Seed Attacks to not existent files then he will synchronize approx- imately with the beggining of the script’s execution. In this section we describe attacks that allow either the Given a steady network where each request takes RT2 T recovery or the reconstruction of the seeds used for the time to reach the target server, our algorithm devia- PHP system’s PRNGs. This allows the attacker to pre- tion depends only on the rate that the attacker can send dict all future iterations of these functions and hence HTTP requests. In practice, we find that the algo- reduces the entropy of functions rand(),mt rand(), rithm’s main source of error is the network distance lcg value() as well as the extra entropy argument between the attacker’s system and the server cf. Fig- of uniqid() to zero bits. We exploit two properties ure 3. The above implementation we described is a of the seeds used in these functions. The first one is proof-of-concept and various optimizations can be ap- the reusage of entropy sources between different seeds. plied to improve its accuracy. This enables us to reconstruct a seed without any ac- cess to outputs of the respective PRNG. The second is Request Twins. Consider the following setting: an the small entropy of certain seeds that allows one to application uses microtime() to generate a password recover its value by bruteforce. token for any user of the system. The attacker has ac- We present three distinct attacks. The first attack al- cess to a user account of the application and tries to lows one to recover the seed of the internal LCG seed take over the account of another user. This allows the used by the PHP system using a session identifier. Us- attacker to obtain password reset tokens for his account ing that seed our second attack reconstructs the seed of and thus outputs of the microtime() function. The rand() and mt rand() functions from the elements key observation is that if the attacker performs in rapid of the LCG seed without any access to outputs of these 4
Configuration ATS Req. Twins CPU(GHz) RTT(ms) min max avg min max avg 1 × 3.2 1.1 0 4300 410 0 1485 47 4 × 2.3 8.2 5 76693 4135 565 1669 1153 1 × 0.3 9 53 39266 2724 1420 23022 4849 2 × 2.6 135 73 140886 83573 2 1890 299 Figure 3: Effectiveness of our time entropy lowering techniques against four servers Figure 2: ATS. of different computational power and RTT. Time measurements are in microseconds. functions. Finally, we exploit the fact that the seed actly most of the values. Specifically, the client IP ad- used in these functions is small enough for someone dress is the attacker’s IP address and the Unix Epoch with access to the output of these functions to recover can be determined through the date HTTP header. In its value by bruteforce. addition, if php combined lcg() is not initialized at the time the session is created, as it happens when a fresh process is spawned, then it is seeded. The state Generating fresh processes. Our attacks on this sec- of the php combined lcg() is two registers s1 , s2 of tion rely on the ability of the attacker to connect to a size 32 bits each, which are initialized as follows. Let process with a newly initialized state. We describe a T1 and T2 be two subsequent time measurements. Then generic technique against mod php in order to achieve we have that a connection to a fresh process. Recall that in mod php when the number of occupied processes passes a cer- s1 = T1 .sec ⊕ (T1 .usec 11) and s2 = pid ⊕ (T2 .usec 11) tain threshold new processes are created to handle the new connections. This gives the attacker a way to force where pid denotes the current process id, or if threads the creation of fresh processes: The attacker creates a are used the current thread id 2 . large number of connections to the target server with Process id’s have a range of 215 values in Linux the keep-alive HTTP header set. Having occupied a systems In Windows systems the process id’s (resp. large number of processes the web server will create threads) are also at most 215 unless there are more a number of new processes to handle subsequent re- than 215 active processes (resp. threads) in the system quests. The attacker, keeping the previous connections which is a very unlikely occurence. open, makes a new one which, given that the attacker Observe now that the session calculation involves created enough connections, will be handled by a fresh three time measurements T0 , T1 and T2 . Given that process. these three measurements are conducted succesivelly it is advantageous to estimate their entropy by examin- ing the random variables T0 , ∆1 = T1 −T0 , ∆2 = T2 −T1 . 4.1 Recovering the LCG seed from Ses- We conducted experiments in different systems to es- sion ID’s timate the range of values for ∆1 and ∆2 . Our exper- In this section we present a technique to recover the iments suggest that ∆1 ∈ [1, 4] while ∆2 ∈ [0, 3]. We php combined lcg() seed using a PHP session iden- also found a positive linear correlation in the values of tifier. In PHP, when a new session is created using the the two pairs. This enables a cutdown of the possible respective PHP function (session start()), a pseu- valid pairs. These results suggest that the additionally dorandom string is returned to the user in a cookie, in entropy introduced by the two ∆ variables is at most 5 order to identify that particular session. That string is bits. generated using a conjuction of user specific and pro- To summarize, the total remaining entropy of the cess specific information, and then is hashed using a session identifier hash is the sum of the microseconds hash function which is by default MD5, however there entropy from T0 (≈ 20 bits) the two ∆ variables (≈ is an option to use other hash functions such as SHA-1. 5 bits) and the process identifier(15 bits). These give The values contained in the hash are: a total of 40 bits which is tractable cf. section 1.1. Client IP address (32 bits). Furthermore the following improvements can be made: (1) Using the ATS algorithm the microseconds entropy A time measurement: Unix epoch and microsec- can be reduced as much as 11 bits on average. (2) The onds (32 + 20 bits). attacker can make several connections to fresh pro- A value generated by php combined lcg() (64 cesses instead of one, in rapid succession, obtaining bits). Notice now that in the context of our attack model 2 In PHP versions before 5.3.2 the seed used only one time mea- the attacker controls each request thus he knows ex- surement which made it even weaker. 5
session identifiers from each of the processes. Because the PRNG functions rand() and mt rand() when the the requests were made in a small time interval the attacker has access to the output of these functions. We preimages of the hashes obtained belong into the same exploit the fact that the seed used by the PHP system is search space, thus improving the probability of invert- only 32 bits. Thus, an attacker who connects to a fresh ing one of the preimages proportionally to the number process and obtains a number of outputs from these of session identifiers identifiers obtained. Our experi- functions can bruteforce the 32 bit seed that produces ments with the request twins technique suggest that at the same output. least 4 session identifiers can be obtained from within We emphasize that this attack works even if the out- the same search space thus offering a reduction of at puts are truncated or passed through transformations least two bits. Adding these improvements reduces the like hash functions. The requirements of the attack is search time up to 227 MD5 computations. that the attacker can define a function from the set of all seeds to a sufficiently large range and can obtain a 4.2 Reconstructing the PRNG Seed from sample of this function evaluated on the seed that the attacker tries to recover. Additionally for the attack to Session ID’s work this function should behave as a random map. In this section we exploit the fact that the PHP system Consider the following example. The attacker has reuses entropy sources between different generators, in access to a user account of an application which gen- order to reconstruct the PRNG seed used by rand() erates a password reset token as 6 symbols where each and mt rand() functions from a PHP session identi- symbol is defined as g(mt rand()) where g is a ta- fier. In order to predict the seed we only need to find ble lookup function for a table with 60 entries contain- a preimage for the session id, using the methods de- ing alphanumeric characters. The attacker defines the scribed in the previous section. One advantage of this function f to be the concatenation of two password re- attack is that it requires no outputs from the affected set tokens generated just after the PRNG is initialized. functions. The attacker samples the function by connecting to a When a new process is created the internal state of fresh process and resetting his password two times. the functions rand() and mt rand() is uninitialized. Since the table of function g contains 60 entries, the Thus, when these functions are called for the first time attacker obtains 6 bits per token symbol, giving a total within the script a seed is constructed as follows: range to the function f of 72 bits. The time complexity of the attack is 232 calculations seed = (epoch × pid) ⊕ (106 × php combined lcg()) of f however, we can reduce the online complexity of the attack using a time-space tradeoff. In this case where epoch denotes the seconds since epoch and pid the online complexity of the attack can be as little as denotes the process id of the process handling the re- 216 . The query complexity of the attack depends on quest. It it easy to notice, that an attacker with access to the number of requests needed to obtain a sample of a session id preimage has all the information needed in f . In the example given above the query complexity is order to calculate the seed used to initialize the PRNGs two requests. since: epoch is obtained through the HTTP Date header. pid is known from the seed of the 5 State recovery attacks php combined lcg() obtained through the preimage of the session id from section 4.1. One can argue that randomness attacks can be easily php combined lcg() is also known, since the at- thwarted by increasing the entropy of the seeding for tacker has access to its seed, he can easily predict the PRNG functions used by the PHP system. For ex- the next iteration after the initial value. ample, the suhosin PHP hardening extension replaces In summary the technique of this section allows the re- the rand() function with a Mersene Twister generator construction of the seed of the mt rand() and rand() with separate state from mt rand() and offers a larger functions given access to a PHP session id of a fresh seed for both generators getting entropy from the oper- process. The time complexity of the attack is the ating system3 . same as the one described in section 4.1 while the We show that this is not the case. We exploit the query complexity is one request, given that the attacker algebraic structure of the PRNGs used in order to re- spawned a fresh process (which itself requires only a cover their internal state after a sufficient number of handful of requests). past outputs (leaks) have been observed by the attacker. Any such attack has to overcome two challenges. First, web applications usually need only a small range of 4.3 Recovering the Seed from Applica- tion leaks 3 The suhosin patch installed in some Unix operating systems by default does not include the randomness patches, rather than it In contrast to the technique presented in the previous mainly offers protection from memory corruption attacks. The full section, the attack presented here recovers the seed of extension is usually installed separately from the PHP packages. 6
lowing range for possible values for n: (l − a) · M (l − a + 1) · M b c≤n≤b c−1 b−a+1 b−a+1 Therefore, given a bucket number l we are able to find an upper and lower bound for the original number Figure 4: Mapping a random number n ∈ [M] to 7 denoted respectively by Ll and Ul . In order to recover buckets and the respective bits of n that are revealed a part of the original number n one can simply find the given each bucket. number of most significant bits of Ll and Ul that are equal and observe that these bits would be the same also in the number n. Therefore, given a bucket l we can compare the MSBs of both numbers and set the random numbers, for example to sample a random en- MSBs of n to the largest sequence of common most try from an array. To achieve that, the PHP system significant bits of Ll ,Ul . maps the output of the PRNG to the given range, an Notice that in some cases even the most signifi- action that may break the linearity of the generators. cant bit of the two numbers are different, thus we Second, in order to collect the necessary leaks the at- are be unable to infer any bit of the original number tacker may need to reconnect to the same process many n with absolute certainty. For example, in Figure 4 times to collect the leaks from the same generator in- given that a number falls in bucket 3 we have that stance. Since, there could be many PHP processes run- 920350134 ≤ n ≤ 1227133512. Because 920350134 < ning in the system, this poses another challenge for the 230 and 1227133512 > 230 we are unable to infer any attacker. bit of the original number n. In this section we present state recovery algo- Another important observation is that this specific rithms for the truncated PRNG functions rand() and truncation algorithm allows the recovery of a fragment mt rand(). The algorithm for the latter function of the MSBs of the original number. Therefore, in the is novel, while regarding the former we implement following sections we will assume that the truncation and evaluate the Håstad-Shamir cryptanalytic frame- occurs in the MSBs and we will describe our algo- work [8] for truncated linearly related variables. We rithms based on MSB truncated numbers. However, all begin by discussing the way truncation takes place in algorithms described work for any kind of truncation. the PHP system. Afterwards, we tackle the problem of reconnecting into the same server process. Finally we present the two algorithms against the generators. 5.2 Process distinguisher As we mentioned in section 2.1, if one wants to receive a number of leaks from the same PHP process one can 5.1 Truncating PRNG sequences in the use keep-alive requests. However, there is an upper PHP system bound that limits the number of such requests (by de- As mentioned in section 2.2 the rand() and fault 100). Therefore, if the attacker needs to observe mt rand() functions can map their output to a user more outputs beyond the keep-alive limit the connec- defined range. This has the effect of truncating the tion will drop and when the attacker connects back to functions’ output. Here we discuss the process of trun- the server he may be served from a different process cating the output and its implications for the attacker. with a different internal state. Therefore, in order to Let n ∈ [M] = {0, . . . , M − 1} be a random number apply state recovery attacks (which typically require generated by rand() or mt rand(), where M = 231 more than 100 requests), we must be able to submit in the PHP system. In order to map that number in the all the necessary requests to the same process. In this range [a, b] where a < b the PHP system maps n to a section, we will describe a generic technique that finds number l ∈ [a, b] in the following way: the same process over and over using the PHP session leaks described in section 4.1. n · (b − a + 1) While we cannot avoid disconnecting from a pro- l = a+ cess after we have submitted the maximum number of M keep-alive requests, we can start reconnecting back to We can view the process above as a mapping from the server until we hit the process we were connected the set of numbers in the range [M] to b − a + 1 “buck- before and continue to submit requests. The problem ets.” Our goal is to recover as many bits as possible in applying this approach is that it is not apparent to of the original number n. Observe that given l it is distinguish whether the process we are currently con- possible to recover immediately up to blog(b − a + 1)c nected to is the one that was serving us in the previ- most significant bits (MSB) of the original number n ous connection. To distinguish between different pro- as follows: cesses, we can use the preimage from a session iden- Given that n belongs to bucket l we obtain the fol- tifier. Recall that the session id contains a value from 7
the php combined lcg() function, which in turn uses the first place. Nevertheless, techniques exist [16] to process specific state variables. Thus, if the session remotely determine the traffic of a server and thus al- is produced from the same process as before then the low the attacker to find an appropriate time window php combined lcg() will contain the next state from within which he will attempt this attack. the one it was before. This gives us a way to find the Based on the above, in the following sections we correct process among all the server processes running will assume that the attacker is able to collect the nec- in the server. In summary the algorithm will proceed essary number of leaks from the targeted function. as follows: 1. The attacker obtains a session identifier and a 5.3 State recovery for mt rand() preimage for that id using the techniques dis- The mt rand() function uses the Mersenne Twister cussed in section 4.1. generator in order to produce its output. In this section 2. The attacker submits the necessary requests to ob- we give a description of the Mersenne Twister genera- tain leaks from the PRNG, using the keep-alive tor and present an algorithm that allows the recovery of HTTP header until the maximum number of re- the internal state of the generator even when the output quests is reached. is truncated. Our algorithm also works in the presence of non consecutive outputs as in the case resulting from 3. The attacker initiates connections to the server re- the buckets truncation algorithm of the PHP system (cf. questing session identifiers. He attempts to ob- section 5.1). tain a preimage for every session identifier using the next value of the php combined lcg() from Mersenne Twister. Mersenne Twister, and specifi- the one used before or, if the server has high traf- cally the widely used MT19937 variant, is a linear fic, the next few iterations. If a preimage is ob- PRNG with a 624 32-bit word state. The MT algo- tained the attacker repeats step 2, until all neces- rithm is based on the following recursion: for all k, sary leaks are obtained. Notice that obtaining a preimage after disconnect- xk+n = xk+m ⊕((xk ∧0x80000000)|(xk+1 ∧0x7fffffff))A ing requires to bruteforce a maximum number of 20 where n = 624 and m = 397. The logical AND oper- bits (the microseconds), and thus testing for the cor- ation with 0x80000000 discards all but the most sig- rect session id is an efficient procedure. Even if the nificant bit of xk while the logical AND with 0x7fffffff application is not using PHP sessions, or if a preimage discards only the MSB of xk+1 . A is a 32 × 32 ma- cannot be obtained, there are other, application spe- trix for which multiplication by a vector x is defined as cific, techniques in order to find the correct process. follows: if x31 = 0 A generic technique for Windows. In the case of (x 1) xA = Windows systems the attacker can employ another (x 1) ⊕ a if x31 = 1 technique to collect the necessary leaks from the same process in case the server has low traffic. In unix Here a = (a0 , a1 , ..., a31 ) = 0x9908B0DF is a constant servers with apache preforked server + mod php all 32-bit vector (note that we use x31 to denote the LSB idle processes are in a queue waiting to handle an in- of a vector x). The output of this recurrence is finally coming client. The first process in the queue handles multiplied by a 32 × 32 non singular matrix T , called a client and then the process goes to the back of the the tempering matrix, in order to produce the final out- queue. Thus, if an attacker wants to reconnect to the put z = xT . same process without using some process distinguisher he will need to know exactly the number of processes State recovery. Since the tempering matrix T is non in the system and if there are any intermediate requests singular, given 624 outputs of the MT generator one by other clients while the attacker tries to reconnect to can easily compute the original state by multiplying the same process. However, in Windows prethreaded the output z with the inverse matrix T −1 thus obtain- server with mod php things are slightly better for an ing the state variable used as xi = zi T −1 . After recov- attacker. Threads are in a priority queue and when a ering 624 state variables one can predict all future it- thread in the first place of the queue handles a request erations. However, when the output of the generator is from a client it returns again in that first place and han- truncated, predicting future iterations is not as straight- dles the first subsequent incoming request. Thus, an forward as before because it is not possible to locally attacker which manages to connect to that first thread recover all needed bits of the state variables given the of the server, can rapidly close and reopen the connec- truncated output. tions thus leaving a very small window in which that The key observation in recovering the internal state thread could be occupied by another client. Of course, is that due to the fact that the generator is in GF(2) the in high traffic servers the attacker would have a diffi- truncation does not introduce non linearity even though culty connecting in a time when the server is idle in there are missing bits from the respective equations. 8
Thus, we can express the output of the generator as a or application behavior, linear dependencies may arise set of linear equations in GF(2) which, when solved, between the resulting equations and therefore we may yield the initial state that produced the observed se- need a larger number of observed outputs. quence. From the basic recurrence of MT we can de- Because we cannot know in advance when the sys- rive the following equations for each individual bit: tem will become solvable or the equations that will be included, we employ an online version of Gaussian Lemma 5.1. Let x0 , x1 , . . . be an MT sequence and j > elimination in order to form and solve the resulting 0. Then the following equations hold for any k ≥ 0: system. In this way, the attacker can begin collecting leaks and gradually feed them to our Gaussian solver 1. x0jn+k = x(0j−1)n+k+m ⊕ (x(31j−1)n+k+1 ∧ a0 ) until he is notified that a sufficient number of indepen- 2. x1jn+k = x(1j−1)n+k+m ⊕ x(0j−1)n+k ⊕ (x(31j−1)n+k+1 ∧ dent equations have been collected. Note that regular Gaussian elimination uses both elementary row and el- a1 ) ementary column operations. However, because we do 3. ∀i, 2 ≤ i ≤ 31 : xijn+k = x(i j−1)n+k+m ⊕ not have in advance the entire linear system we cannot use elementary column operations. Instead we make x(i−1 ⊕ (x(31j−1)n+k+1 ∧ ai ) j−1)n+k+1 Gaussian elimination using only elementary row oper- ations and utilize a bookkeeping system to enter equa- Proof. The equations follow directly from the basic re- tions in their place as they are produced by the leaks currence. supplied to the solver. Our solver employs a sparse In addition since the tempering matrix is only a lin- vector representation and is capable of solving overde- ear transformation of the bits of the state variable xi , termined sparse systems of tens of thousands of equa- we can similarly express each bit of the final output of tions in a few minutes. MT as a linear equation of the bits of the respective We ran a sequence of experiments to determine the state variable. solvability of the system when a different number of To recover the initial state of MT, we generate all bits is truncated from the output. In addition we ran equations over the state bit variables x0 , x1 , . . . , x19936 . experiments when the outputs of the MT generator is To map any position in the MT sequence in an equation passed through the PHP truncation algorithm, with dif- over this set of variables, we apply the equations of the ferent user defined ranges. All experiments were con- lemma above recursively until all variables in the right ducted in a 4 × 2.3 GHz machine with 4 GB of RAM. hand side have index below 19937. In Figure 5 we present the number of equations Depending on the positions observed in the MT se- needed when the PHP truncation algorithm is used. quence the resulting linear system will be different. In the x-axis we have the logarithm of the number The question that remains is whether that system is of buckets. We also show the standard deviation ap- solvable. Regarding the case of the 31-bit truncation, pearing as vertical bars. It can be seen that the num- i.e. only the MSB of the output word is revealed, we ber of equations needed is much higher than the the- can use known properties of the generator in order to oretical lower bound of 19937 and fluctuates between easily prove the following: 27000 and 33000. Neverthless, the number of leaks re- quired is decreasing linearly to the number of buckets Lemma 5.2. Suppose we obtain the MSB of 19937 we have. The reason is that although we have more lin- consecutive words from the MT generator. Then the early dependend equations, the total number of equa- resulting linear system is uniquely solvable. tions we obtain due to the larger number of buckets is Proof. It is known that the MT sequence is 19937- bigger. distributed to 1-bit accuracy4 . The linear system is uniquely solvable iff the rows are linearly independent. Implementation error in the PHP system. The Suppose that a set k ≤ 19937 of rows are lineary de- PHP system up to current version, 5.3.10, has an error pendent. Then the last row of the set k obtained is in the implementation of the Mersenne Twister gen- computable from the other members of the k-set some- erator (we discovered this during the testing of our thing that contradicts the order of equidistribution of solver). Specifically the following basic recurrence is MT. effectively used in the PHP system due to a program- The above result is optimal in the sense that this is ming error: the minimum number of observed outputs needed for the system to become fully determined. In the case xk+n = xk+m ⊕ ((xk ∧ 0x80000000)|(xk+1 ∧ 0x7ffffffe)|(xk ∧ 0x1))A we obtain non consecutive outputs due to truncation As a result the PHP system uses a different generator 4 Suppose that a sequence is k-distributed to u-bit accuracy. Then which, as it turns out, has slightly more linear depen- knowledge of the u most significant bits of l words does not allow one to make any prediction for the u bits of the next word when l < k. dencies than the MT generator. This means that prob- This is the cryptographic interpretation of the “order of equidistribu- ably the randomness properties of the PHP generator tion” whose exact definition can be found in [15]. are poorer compared to the original MT generator. 9
Figure 5: Solving MT; y-axis:number of equations; x-axis: number of buckets (logarithm). Standard deviation shown as vertical bars. 5.4 State recovery for rand() is a non-linear generator. Nevertheless, the non linear- ity introduced by the truncation of the LSB is negligi- We turn now to the problem of recovering the state ble and one can easily recover the initial values given of rand() given a sequence of leaks from this gen- enough outputs of the generator. erator. While mt rand() is implemented within the PHP source code and thus is unchanged across differ- ent enviroments, the rand() function uses the respec- State recovery. Notice that if the generators used tive function defined from the standard library of the have a small state such as the Windows LCGs then operating system. This results in different implemen- state recovery is easy, by applying the attack from sec- tations across different operating systems. There are tion 4 to bruteforce the entire state of the generator. mainly two different implementations of rand() one However, on the Glibc generator, which has a state of from the glibc and one from the Windows library. 992 bits, these attacks are infeasible assuming that the state is random. Although LCGs and the Glibc gener- ators are different, they both fall into the same crypt- Windows rand(). The rand() function defined in analytic framework introduced by Håstad and Shamir Windows is a Linear Congruential Generator (LCG). in 1985 for recovering values of truncated linear vari- An LCG is defined by a recurrence of the form ables. This framework allows one to uniquely solve Xn+1 = (aXn + c) mod m an underdefined system of linear equations when the values of the variables are partially known. In this sec- Although LCGs are fast and require a small memory tion we will discuss our experiences with applying this footprint there are many problems which make them technique in the two aforementioned generators: The insufficient for many uses, including of course cryp- LCG and the additive-feedback generator of glibc. We tographic purposes. The parameters used by the Win- will briefly describe the algorithm for recovering the dows LCG are a = 214013, c = 2531011, m = 232 . In truncated variables in order to discuss our experiments addition, the output is truncated by default and only and results. The interested reader can find more infor- the top 15 bits are returned. If PHP is running in a mation about the algorithm in the original paper [8]. threaded server in Windows then the parameters of the Suppose we are given a system with l linear equa- LCG used are a = 1103515245, c = 12345, m = 215 . tions on k variables modulo m denoted by x1 , x2 , . . . , xk , a11 x1 + a12 x2 + · · · + a1k xk = 0 mod m Glibc rand(). In the past, glibc also used an LCG for the rand() function. Subsequently an LFSR-like “ad- a21 x1 + a22 x2 + · · · + a2k xk = 0 mod m ditive feedback” design was adopted. The generator has a state of 31 words (of 32 bits each), over which it ... is defined by the following recurrence: al1 x1 + al2 x2 + · · · + alk xk = 0 mod m ri = (ri−3 + ri−31 ) mod 232 where l < k and each variable xi is partially known. We want to solve the system uniquely by utilizing the In addition the LSB of each word is discarded and the partial information of the k variables xi . output returned to the user is oi = ri 1. An interest- We use the coefficients of the l equations to create a ing note is that the man page of rand() states that rand set of l vectors, where each vector is of the form vi = 10
(a1 , . . . , ak ). In addition we add to this set the k vectors against truncated LCG’s with secret parameters. m · ei , 0 < i ≤ k. The cryptanalytic framework exploits Applying the attack in the glibc additive feedback properties of the lattice L that is defined as the linear generator we found that the LLL algorithm became a span of these vectors. Observe that the dimension of L bottleneck in the algorithm running time; due to its is k and in addition for every vector v ∈ L we have that large state the algorithm required a large number of ∑ki=1 vi xi = 0 mod m. leaks to recover even small truncation levels there- Given the above the attack works as follows: first a fore increasing the lattice dimension that was given lattice is defined using the recurrence that defines the to the LLL algorithm. Our testing system (a 3.2GHz linear generator; then, a lattice basis reduction algo- cpu with 2GB memory) ran out of memory when 7 rithm is employed to create a set of linearly indepen- bits were truncated. The version of LLL we em- dent equations modulo m with small coefficients; fi- ployed (SageMath 4.8) has time complexity O(k5 ) nally, using the partially known values for each vari- where k is the dimension of the lattice (which repre- able, we convert this set of equations to equations over sents roughly the number of leaks). The best time- the integers which can be solved uniquely. Specifi- complexity known is O(k3 log k) derived from [12]; cally, we use the LLL [13] algorithm in order to obtain this may enable much higher truncation levels to be re- a reduced basis B for the lattice L. Now because B = covered for the glibc generator, however we were not {w j } is a basis, the vectors of B are linearly indepen- able to test this experimentally as no implementation dent. The key observation is that the lattice definition of this algorithm is publicly available. implies that w j ·x = w j ·(xunknown +xknown ) = d j ·m for We conclude that truncated LCG type of generators some unknown d j . Now as long as xunknown · w j < m/2 can be broken (in the sense of entirely recovering their (this is the critical condition for solvability) we can internal state) for all but extremely high levels of trun- solve for d j and hence recover k equations for xunknown cation (e.g. in the case of 32-bit state LCG’s mod- which will uniquely determine it. ulo 232 when they are truncated to 16 buckets or less). The original paper provided a relation between the For additive feedback type of generators, such as the size of xknown and the number of leaks required from one in glibc, the situation is similar, however higher the generator so that the upper bound of m/2 is ensured recursion depths require more leaks (with a linear re- given the level of basis reduction achieved by LLL. In lationship) that in turn affect the lattice dimension re- the case of LCGs the paper demanded the modulo m to sulting in longer running times. Comparing the results be squarefree. However, as shown above, in the gen- between the LCGs and the additive feedback genera- erators used it holds that m = 232 and thus their argu- tors one may find some justification for the adoption ments do not apply. In addition, the lattice of the addi- of the latter in recent versions of glibc : it appears that tive generator of glibc is different than the one gener- - at least as far as lattice-based attacks are concerned - ated by an LCG and thus needs a different analysis. it is harder to predict truncated glibc sequences (com- We conducted a thorough experimental analysis of pared to say, Windows LCG’s) due to the higher run- the framework focusing on the two types of generators ning times of LLL reduction (note though that this does above. In each case we tested the maximum possible not mean that these are cryptographically secure). value of xknown to see if the m/2 bound holds for the reduced LLL basis. In the following paragraphs we will briefly discuss the results of these experiments for these types of generators. 6 Experimental results and Case studies In Figure 6 we show the relationship between the number of leaks required for recovering the state with the lattice-attack and the number of leaks that are trun- In order to evaluate the impact of our attacks on real cated for four LCGs: the Windows LCG, the glibc applications we conducted an audit to the password LCG (which are both 32 bits), the Visual Basic LCG reset function implementations of popular PHP appli- (which is 24 bits) and an LCG used in the MMIX of cations. Figure 7 shows the results from our audit. Knuth (which is 64 bits). It is seen that the number In each case succesfully exploiting the application re- of leaks required is very small but increases sharply as sulted in takeover of arbitrary user accounts5 and in more bits are truncated. In all cases the attack stops some cases, when the administrator interface was af- being useful once the number of truncated bits leaves fected, of the entire application. In addition to iden- none but the log w − 1 most significant bits where w is tifiying these vulnerabilities we wrote sample exploits the size of the LCG state. The logarithm barrier seems for some types of attack we presented, each on one af- to be uniformly present and hints that the MSB’s of fected application. a truncated LCG sequence may be hard to predict (at least using the techniques considered here). A similar 5 The only exception to that is the HotCRP application where logarithmic barrier was also found in the experimental passwords were stored in cleartext thus there was no password reset analysis that was conducted by Contini and Shparlin- functionality. However, in this case we were able to spoof registra- ski [3] when they were investigating Stern’s attack [17] tions for arbitrary email accounts. 11
Figure 6: Solving LCGs with LLL; y-axis:number of leaks; x-axis: number of bits truncated. Application Attack Application Attack mediawiki 4.2 4.3 5.3 • Joomla 4.3 • Open eClass 4.2 4.3 5.4 • MyBB ATSc 4.1c 5.3c ◦ taskfreak 4.2 4.3 5.3 • IpBoard ATSc 4.1c 4.2c • zen-cart ATS RT • phorum 4.2 4.3 5.3 • osCommerce 2.x ATS RT • HotCRP 4.2 4.3 5.3 • osCommerce 3.x 4.2 4.3 5.4 • gazelle 4.3 5.3 • elgg ATSc 4.2 4.3 • tikiWiki 4.2 4.3 5.4 • Gallery RTc 4.1c 4.2c • SMF ATSc 4.3c ◦ Figure 7: Summary of audit results. The c superscript denotes that the attack need to be used in combination with other attacks with the same superscript. The • denotes a full attack while ◦ denotes a weakness for which the practical exploitation is either unverified or requires very specific configurations. The number denotes the section in which the applied attack is described in the paper. 6.1 Selected Audit Results namely a time measurement from uniqid(), an out- put from the MT generator and an output from the Many applications we audited where trivially vulnera- php combined lcg() through the extra argument in ble to our attacks since they used the affected PRNG the uniqid() function. In addition the output is functions in a straightforward manner, thus making it passed through the MD5 hash function so its infeasi- pretty easy for an attacker to apply our techniques and ble to recover the initial values given the output of this exploit them. However some applications attempted function. Since we do not have access to the output to defend against randomness attacks by creating cus- of the function, the state reconstruction attack seems tom token generators. We will describe some attacks an appropriate choice for attacking this token gener- that resulted from using our framework against custom ation algorithm. Indeed, the Gallery application uses generators. PHP sessions thus an attacker can use them to predict the php combined lcg() and mt rand() outputs. In Gallery. PHP Gallery is a very popular web based addition by utilizing the request twins technique from photo album organizer. In order for a user to reset his section 3 the attacker can further reduce the search password he has to click to a link, which contains the space he has to cover to a few thousand requests. security token. The function that generates the token is the following: function hash($entropy="") { Joomla. Joomla is one of the most popular CMS ap- return md5($entropy . uniqid(mt_rand(), true)); plications, and it also have a long history of weak- } nesses in its generation of password reset tokens [4, 11]. Until recently, the code for the random token gen- The token is generated using three entropy sources, eration was the following: 12
function genRandomPassword( $length=8 ) { with a newly initialized MT generator then there at $salt = abc...xyzABC...XYZ0123456789 ; most 232 possible tokens, independently of the entropy $len = strlen ( $salt ); of config.secret. This gives an interesting attack $makepass = ‘‘’’; vector: We generate two tokens from a fresh process - $stat = @stat ( FILE ) ; sequentially for a user account that we control. Then - if (empty($stat) || !isarray($stat)) we start to connect to a fresh process and request a to- - $stat=array(phpuname()); - mt_srand(crc32(microtime().implode(|,$stat))); ken for our account. If the token matches the token for($i=0;$i
is infeasible. The normal action would be to follow zen-cart first seeds the mt rand() generator with the the same path as we did in the Gallery application microtime() function and then uses the mt rand() where we had a similar problem and utilize the seed function to produce a new password for the user. Thus, reconstruction attack which does not require an output there at most 106 possible passwords which could be of the PRNGs. However, the Gazelle application uses produced. Our exploit used the request twins tech- custom sessions (which are generated using the same nique to reset both our password and the target user’s function), and thus we cannot apply that attack either. password. Afterwards, we bruteforced the generated The solution lies into slightly mofiying the seed recov- password for our account to recover the microtime() ery attack. Instead of asking the question “which seed value that produced it. This takes at most a few sec- produces this mt rand() sequence”, which is shuffled onds on any modern laptop. Then, our exploit brute- and thus affected by the second PRNG, we instead ask forces the passwords generated by microtime() val- which seed produces the unsorted set which contains ues close to the one that generated our own new pass- the characters of our string. This set is not affected by word. We ran our exploit in a network with RTT the shuffling and thus we can effectively bruteforce around 9 ms, and Zen-Cart was installed in a 4 × 2.3 the mt rand() seed independently. After recovering GHz server. The average difference of the two pass- the mt rand() seed we know the initial sequence that words was about 3600 microseconds, and the exploit was produced and we can subsequently recover the needed at most two times that requests since we don’t seed of rand() using the same attack. know which password was produced first. With the rate of 2500 requests per minute that our implementa- tion achieves, the attack is completed in a few minutes. 6.2 Attacks Implementation Phorum. Phorum is a classic bulletin board applica- In addition to auditing the applications, we imple- tion. It was used, among others, by the eStream com- mented a number of our attacks targeting selected ap- petition as an online discussion platform. In order for plications. In particular, we implemented a seed re- a user to reset his password the following function is covery attack against Mediawiki, a state reconstruction used: attack against the Phorum application and the request function phorum_gen_password($charpart=4, $numpart=3) twins technique against Zen-cart. In the following sec- { tions we will briefly describe each vulnerability and $vowels = ... //[char array]; the results of our attacks implementation. $cons = ... //[char array]; Mediawiki. Mediawiki is a very popular wiki appli- $num_vowels = count($vowels); cation used, among others, by Wikipedia. Mediawiki $num_cons = count($cons); uses mt rand() in order to generate a new password $password=""; when the user requests a password reset. In order to predict the generated password we use the seed recov- for($i = 0; $i < $charpart; $i++){ ery attack of section 4.3. The function f that we sam- $password .= $cons[mt_rand(0, $num_cons - 1)] ple is the one used to generate a CSRF token which is . $vowels[mt_rand(0, $num_vowels - 1)]; the following: } $password = substr($password, 0, $charpart); function generateToken( $salt = ’’ ) { if($numpart){ $token = dechex(mt_rand()).dechex(mt_rand()); $max=(int)str_pad("", $numpart, "9"); return md5( $token . $salt ); $min=(int)str_pad("1", $numpart, "0"); } $num=(string)mt_rand($min, $max); } Our function f given a seed s first seeds the return strtolower($password.$num); mt rand() generator and then uses that generator to } produce a token as the function above. To fully eval- uate the practicality of the attack we implemented the What makes this function interesting in the context attack online, without any time-space tradeoff. Our im- of state recovery is that at if called with no arguments plementation was able to cover around 1300000 seed (as it is in the application), at least four mt rand() evaluations of f per second in a dual-core laptop with leaks are discarded in each call. We implemented two 2.3 GHz processors. This allowed us to cover the the attack having the application installed in a Win- full 232 range in about 70 minutes. Of course, using dows server with the Apache web server and we used a time-space tradeoff the search time could be further our generic technique for Windows in order to recon- reduced to a few minutes. nect to the same process. On average, the attack re- quired around 1100 requests and 11 reconnections of Zen cart. Zen-Cart is a popular eCommerce applica- our client. The running time was about 30 minutes, and tion. At the time of this writing, a sample database the main source of overhead was the system solving. which shops enter volunterily numbers about 2500 ac- This fact is mainly explained from the small number tive e-shops 6 . In order to reset a user’s password of buckets and the lost leaks of each iteration. Nev- 6 www.zen-cart.com/index.php?main_page=showcase erthless, the attack remained highly practical, as we 14
You can also read