RavenBlog
 Black: ravenblack.net | me | games | books | email | wishlist | rss Blogs: Angryblog | As Above | MonkyBlog | Nightshade | Journals Blimey: SomethingAwful | Advice

 Archive February 2010
Thursday 11 February 2010
 As of a couple of weeks ago, Jessica and I are engaged.In other news that someone else might care about and that I might want to refer back to at some point, I found a reasonably quick way of putting an approximate percentile into a big SQL table (in mysql at least). The solution is:SET @count=(SELECT COUNT(*) FROM table);SET @row=0;UPDATE table SET percentile=(@row:=@row+1)/@count ORDER BY whatever;That should be *100 for an actual percentile, but I prefer to work with the real number. If you want the top one of 5 values to be 80th percentile rather than 100th then row should start at -1 rather than at 0, and similarly if you want the percentiles to never be 0 or 100 then @row starts at 0 and @count should be count(*)+1.Main flaw in this model is that tied values don't get the same percentile, but rather a distribution of percentiles through the number of ties, ordered by some indeterminate secondary criterion (or a determinate one if you supply a second value for ORDER BY). For my purposes that's fine. It could be resolved by adding one or two more procedural variables and using an IF function. Certainly beats the solutions I've seen most online.Actually, since this is apparently a pretty common google search and if someone finds my solution there'll surely be someone who'd ask "how would you do it for same-level with IFs?" I made a quick test for proof of concept. Here we go.SET @count=(SELECT COUNT(*) FROM test);SET @row=-1;SET @last=0;SET @lastperc=0;UPDATE test SET percentile=IF(number=@last, -- if the number is the same as last time  IF((@row:=@row+1)!='x',@lastperc,0), -- increase row, set percentile to @lastperc  IF((@last:=number)!='x',(@lastperc:=((@row:=@row+1)/@count)),0) -- else set @last to this number, set @lastperc to new percentile, increase row) ORDER BY number;This operates on a table named 'test' with columns 'number' and 'percentile', and gives results where the percentile is "percentage of values that are lower", eg.```+--------+------------+ | number | percentile | +--------+------------+ | 1 | 0 | | 4 | 0.2 | | 4 | 0.2 | | 13 | 0.6 | | 17 | 0.8 | +--------+------------+ ```Also of note, if you're doing this in PHP or Perl then the 'SET' queries must each be done as a separate query before the main one. Probably best to remove the comments too for putting it in through a program.I assume there's a better way of doing "update this procedural variable and don't incorporate it into the output in any way just now" than my horrible use of setting the variable and then comparing it against 'x' in an IF that thus goes on to be ignored, but I don't know what that way would be. Regardless of that horribleness, it remains that this is a bucketload faster than what seems to be the common standard method, "UPDATE test SET percentile=(SELECT COUNT(*) FROM test AS b WHERE b.number