Computing desk
< March 28	<< Feb \| March \| Apr >>	March 30 >

Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

March 29

SQL query question

I'm trying to optimize some queries because I'm working on tables with millions of records. I have a table with about 20 million records. Each row has a date (required), a user (required), and a value (not required). I am interested in consecutive dates in which the value remains above 100 for a user. So, if a user has 104, 78, 110, I'm not interested in any of the dates attached to those values. I only want the second date in which two dates have a value above 100 without a value below 100 (or null) between them. If the user has four dates in a row above 100, I want the last three because each pair is a pair of consecutive dates in which the value is above 100. With a couple million users and 20 million records, every query I've tried takes forever to run. Does anyone know of a trick to make this run in a matter of minutes as opposed to days? The query should return user and date for each date that a user has a consecutive value above 100. -- kainaw ™ 02:49, 29 March 2008 (UTC)[reply]

I'd just select sort by date where value > 100 and use php to work out the consecutiveness. It may be possible to craft some INSANE recursive query that takes a year to execute (I'd be very interested to see the queries you've tried) to do it, but the idea of databases is to store data, not twiddle it.. that's your program's job. Just put as much bandwidth as possible between your php and mysql servers and dump it. The php loop sounds very optimizable if you really try to bum as much processing as possible out.. looking for consecutive-ness is easy. If you have to do this sort of thing as part of a real time application, add a "over-100-consecutiveness" field to your table so you can select that set directly, and change whatever adds rows to check for that :D\=< (talk) 03:29, 29 March 2008 (UTC)[reply]

Note that there's no reason to assume he's talking about PHP or MySQL. — Kieff | Talk 03:45, 29 March 2008 (UTC)[reply]

Doesn't matter, don't be picky and irrelevant :D\=< (talk) 04:18, 29 March 2008 (UTC)[reply]

I assume he wants to do it in the database for some reason. What database are you using? Some stored procedure may give you the performance you want. Friday (talk) 03:35, 29 March 2008 (UTC)[reply]

If this is something you're going to do regularly, and if the records aren't changing constantly, you should pre-process the current data, even if it's slow, and add something to process incoming, new data so you don't have to run insanely long comparisons all the time. If all the results you want are already on another table, ready for your use, and if you just keep this table updated as data comes in, you should be able to optimize the whole process. But please, do provide us with more information. — Kieff | Talk 03:45, 29 March 2008 (UTC)[reply]

The data does change weekly, but I tested something this morning. I added a "previous value" field to each row and wrote a script to fill it by getting each row and putting its value in the next row for that user. So, each row has the current value and the previous value for the user. Now, I can get the date for every row where both the value and previous value are above 100. My fear was that filling the row would take ages. It took about an hour. Then, the simple query took minutes. Combined, a little over an hour for processing is much better than the days it was taking. -- kainaw ™ 14:17, 29 March 2008 (UTC)[reply]

If the data only changes weekly then you could take it a step further.. as I suggested above create a field specifically defining whether that row has a current value and previous value over 100. It would take slightly longer to fill with your script but the database query to retrieve that set would be a bit faster too. :D\=< (talk) 18:19, 29 March 2008 (UTC)[reply]

Now that you have the "previous value" field set up and correctly populated, you might want to use a database trigger to keep it updated, whenever someone INSERTs or UPDATEs those rows. That way, you won't have to keep re-running your "setup" script. -- Coneslayer (talk) 19:26, 31 March 2008 (UTC)[reply]

create table Data (

"user" int,

date date,

value int,

constraint Data_PK primary key ("user", date));

create index Data_Date_IX on Data (date);

select count(*), count(distinct "user") as "#users", count(distinct date) as "#dates", count(distinct value) as "#values" from Data;

count	#users	#dates	#values
20000000	2000000	10	200

(1 row)

explain analyze

-- pairs of dates with values over the threshold

select D1.date, D2.date, D1."user"

from Data D1

inner join Data D2

on D1."user" = D2."user" and D1.date + 1 = D2.date

where D1.value > 100 and D2.value > 100;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Merge Join (cost=4581169.22..4793507.74 rows=1043556 width=12) (actual time=710444.466..888495.655 rows=4499697 loops=1)

Merge Cond: (("outer"."user" = "inner"."user") AND ("outer"."?column3?" = "inner".date))

-> Sort (cost=2290584.61..2315823.74 rows=10095652 width=8) (actual time=401610.015..495914.795 rows=10000156 loops=1)

Sort Key: d1."user", (d1.date + 1)

-> Seq Scan on data d1 (cost=0.00..381155.29 rows=10095652 width=8) (actual time=0.192..202170.291 rows=10000156 loops=1)

Filter: (value > 100)

-> Sort (cost=2290584.61..2315823.74 rows=10095652 width=8) (actual time=308834.415..334095.432 rows=10000156 loops=1)

Sort Key: d2."user", d2.date

-> Seq Scan on data d2 (cost=0.00..381155.29 rows=10095652 width=8) (actual time=532.268..151611.459 rows=10000156 loops=1)

Filter: (value > 100)

Total runtime: 896994.971 ms

(11 rows)

explain analyze

-- runs of at least seven days above a threshold

select "user", startDate, max(endDate) as endDate

from (

select "user", min(startDate) as startDate, endDate

from (

select DStart."user", DStart."date" as startDate, DEnd."date" as endDate

from Data DStart

inner join Data DEnd

on DStart."user" = DEnd."user"

and DStart.date < DEnd.date

-- constraint prop from run length > 7

and DStart.date + 7 <= DEnd.date

inner join Data D2

on DStart."user" = D2."user"

and D2."date" between DStart."date" and DEnd."date"

where DStart.value > 150 and D2.value > 150 and DEnd.value > 150

group by DStart."user", DStart."date", DEnd."date"

having count(D2.date) = 1 + DEnd."date" - DStart."date"

and count(D2.date) > 7) Runs

group by "user", endDate) EarlyRuns

group by "user", startDate;

QUERY PLAN

:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

HashAggregate (cost=8264809.43..8264859.43 rows=4000 width=12) (actual time=702170.248..702170.377 rows=71 loops=1)

-> GroupAggregate (cost=8250569.04..8264109.43 rows=40000 width=12) (actual time=702169.151..702170.034 rows=82 loops=1)

-> Sort (cost=8250569.04..8253829.14 rows=1304039 width=12) (actual time=702169.141..702169.247 rows=94 loops=1)

Sort Key: runs."user", runs.enddate

-> GroupAggregate (cost=7894083.56..8031663.79 rows=1304039 width=16) (actual time=688641.621..702168.684 rows=94 loops=1)

Filter: ((count(date) = ((date + 1) - date)) AND (count(date) > 7))

-> Sort (cost=7894083.56..7911580.10 rows=6998617 width=16) (actual time=688004.962..694747.946 rows=2748925 loops=1)

Sort Key: dstart."user", dstart.date, dend.date

-> Merge Join (cost=3973795.37..6487768.45 rows=6998617 width=16) (actual time=412232.694..662995.049 rows=2748925 loops=1)

Merge Cond: ("outer"."user" = "inner"."user")

Join Filter: (("inner".date >= "outer".date) AND ("inner".date <= "outer".date))

-> Merge Join (cost=2649196.91..3888467.22 rows=5993262 width=16) (actual time=257090.518..465949.359 rows=749631 loops=1)

Merge Cond: ("outer"."user" = "inner"."user")

Join Filter: (("outer".date < "inner".date) AND (("outer".date + 7) <= "inner".date))

-> Sort (cost=1324598.46..1337429.27 rows=5132327 width=8) (actual time=133713.631..188871.000 rows=4999101 loops=1)

Sort Key: dstart."user"

-> Seq Scan on data dstart (cost=0.00..381155.29 rows=5132327 width=8) (actual time=33.970..70770.419 rows=4999101 loops=1)

Filter: (value > 150)

-> Sort (cost=1324598.46..1337429.27 rows=5132327 width=8) (actual time=123376.793..214303.669 rows=16243745 loops=1)

Sort Key: dend."user"

-> Seq Scan on data dend (cost=0.00..381155.29 rows=5132327 width=8) (actual time=0.088..49517.401 rows=4999101 loops=1)

Filter: (value > 150)

-> Sort (cost=1324598.46..1337429.27 rows=5132327 width=8) (actual time=155142.134..172357.925 rows=6055904 loops=1)

Sort Key: d2."user"

-> Seq Scan on data d2 (cost=0.00..381155.29 rows=5132327 width=8) (actual time=0.104..64200.543 rows=4999101 loops=1)

Filter: (value > 150)

Total runtime: 702287.315 ms

(27 rows)

Error deleting

Why does "cannot remove folder:directory is not empty."appear when i try to delete a specific file from my thumbdrive? This problem does not seem to exist when I delete other files from the thumdrive. Also, the file is read-only. I tried to change the attribute to unset read-only but every time I open 'properties' again, it still remains read-only. The thumdrive is fine other then these problems. is there any software or do i have to run anything to solve this problem. By the way, the format of the thumbdrive is FAT32, 3.98GB. Please help.Invisiblebug590 (talk) 05:41, 29 March 2008 (UTC)[reply]

Are you sure that flash drive isn't infected with a virus? Setting "read-only" by itself sounds very suspicious. --antilived^{T | C | G} 12:17, 29 March 2008 (UTC)[reply]

I had ~~exactly the same~~ a similar problem just a few days ago. I went home, salvaged anything I could and then formatted the drive. Maybe you would want to look at Clamwin Portable from Portableapps.com for ~~the~~ virus detection in the future? Kushal 14:58, 29 March 2008 (UTC)[reply]

Windows Vista installation help

I am trying to install windows vista (ultimate 64-bit OEM) on my new computer, and it is really not playing ball. It gets to varying stages of installation each time I try, the farthest has been the product key entering screen, it is always very slow during the process. It then either freezes, or blue-screens with a message 'PFN_LIST_CORRUPT'. Could anyone offer a solution? Englishnerd 12:23, 29 March 2008 (UTC)[reply]

Probably a bad piece of memory. --jpgordon^∇∆∇∆ 15:58, 29 March 2008 (UTC)[reply]
is it a bird? is it a plane? no, it is Memtest86+

Seriously.

Please try using memtest86+ to see if your RAM modules are working properly. Hope that helps. Kushal 14:13, 30 March 2008 (UTC)[reply]

Free 64-bit C compiler with inline assembly for Windows XP

... for 32-bit XP on AMD64 machine? --212.149.216.233 (talk) 13:53, 29 March 2008 (UTC)[reply]

Download the mingw32 sources on your linux box and compile it with gcc, then use it to compile itself again for 64-bit Windows machines. You shouldn't have to do anything to get it to compile for 32 bit targets. :D\=< (talk) 18:15, 29 March 2008 (UTC)[reply]

Wikipedia and it's normal condition

Why isn't Wikipedia in it's normal condition? Heegoop, 29 March 2008 (UTC)

SNAFU :D\=< (talk) 18:12, 29 March 2008 (UTC)[reply]

I don't see anything wrong with it. Why do you say it is not in its normal condition? AR TYOM 19:40, 29 March 2008 (UTC)[reply]

The databases were locked intermittently a few days ago (I guess around the time he asked).. maybe that's what he was talking about :D\=< (talk) 04:18, 1 April 2008 (UTC)[reply]

Computer Science Teaching Materials

Hello, I'm trying to track down teaching material for a CS undergraduate degree available online. I wondered if anyone knew of an institution (perhaps someone's alma mater) which has good stuff freely available. Thanks. 90.203.189.60 (talk) 19:09, 29 March 2008 (UTC)[reply]

Me again - even reading lists would be good. Thanks. 90.203.189.60 (talk) 19:10, 29 March 2008 (UTC)[reply]

All the CS texts I've ever had to buy were terrible, terrible!! Even famous books like CLRS are worthless; they take more time to decipher than it would take to teach yourself from examples on the internet :D\=< (talk) 19:16, 29 March 2008 (UTC)[reply]

Connexions at Rice university hosts a large collection of free educational material. I don't know if they have what you are looking for but it's worth a look. The website is at http://cnx.org/. Theresa Knott | The otter sank 10:30, 30 March 2008 (UTC)[reply]

MIT Open course ware. http://mitpress.mit.edu/sicp/full-text/book/book.html
and http://ocw.mit.edu/OcwWeb/web/home/home/index.htmM
Also Philip Greenspun... http://philip.greenspun.com/seia/

My computer has just decided that when I right click it will put up the menu transparent untill i run my mouse over the options. How can I get this back to normal as I find pretty darn annoying? —Preceding unsigned comment added by 79.64.12.133 (talk) 23:25, 29 March 2008 (UTC)[reply]

Reboot? :D\=< (talk) 03:27, 30 March 2008 (UTC)[reply]

You've probably changed a setting somewhere, I don't know which but if rebooting doesn't work, try looking through the control panel. George D. Watson (Dendodge).Talk^Help 14:11, 30 March 2008 (UTC)[reply]

I get this every once in a while and don't know how to sort it out. I can't even figure out what causes it but a reboot fixes it. Logging out might fix it too TheGreatZorko (talk) 12:08, 2 April 2008 (UTC)[reply]

Wikipedia:Reference desk/Archives/Computing/2008 March 29

Contents

March 29

SQL query question

Error deleting

Windows Vista installation help

Free 64-bit C compiler with inline assembly for Windows XP

Wikipedia and it's normal condition

Computer Science Teaching Materials

menu