Wikipedia:Reference desk/Archives/Computing/2016 February 4

Computing desk
< February 3 << Jan | February | Mar >> February 5 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


February 4

edit

Question (How are CPS made ?)

edit

how are cps made — Preceding unsigned comment added by Jake200503 (talkcontribs) 02:34, 4 February 2016 (UTC)[reply]

Can you please clarify, it's entirely not clear what you mean by CPS, even if we narrow the field to computing there's a half dozen possibilities of what you could mean. Vespine (talk) 04:12, 4 February 2016 (UTC)[reply]
I added to the title to make it unique (although still not clear). StuRat (talk) 04:39, 4 February 2016 (UTC) [reply]
Possibly the OP meant CPUs although even in that case some clarification would help. In the case of a computer the lay person sometimes uses CPU to mean the whole computer (minus the monitor, keyboard, mouse and other user facing parts) particularly with a desktop like device. Nil Einne (talk) 13:31, 5 February 2016 (UTC)[reply]

SQL query condition question

edit

I recently ran into the following problem at work.

We have a database table that includes rows that should be processed every x months. How much x is depends on the row and is written to a field in the row. There is another field saying when the row was last processed.

I wanted to make a database query selecting every row that should now be processed, as enough time has been passed since it was last processed. Turns out this was not so easy, as the condition to write to the where section would have to change between individual rows.

Let's say the table name is thing, and lastprocessed means when it was last processed, and processinterval means how many months it should be processed between. So the format is something like: select * from thing where lastprocessed < :date and processinterval = :interval.

I ended up making separate queries for every value of processinterval (there are a finite number of them, and not quite many), computing :date separately for each of them, and then combining the results.

Is there an easier way to do this? JIP | Talk 21:02, 4 February 2016 (UTC)[reply]

If you're using SQL Server you can do something like this. If not, you could write a function that returns the specific date based on the processinterval and then call the function in the lastprocessed < :date portion, setting the :date to be getdatebyinterval(processinterval)'s return value. I'm not sure if you need specific permissions to create a function on a database; I assume so, but YMMV. I'd also assume that if you're doing this at work then you probably have the permissions. I'm not sure how good a solution this is for your particular workplace. The third option is to hit the database once and get all relevant rows in thing , and then have whatever application is using the data figure out which to process instead of having the database try to. FrameDrag (talk) 21:41, 4 February 2016 (UTC)[reply]
Without data types, this is hard to answer. I have to ask why you can't add lastprocessed to interval and compare it to the current date. — Preceding unsigned comment added by 47.49.128.58 (talk) 01:37, 5 February 2016 (UTC)[reply]
See Your table "thing": Does "thing" store a creation date+time for a record? Does it have an increasing record indetifier? When true, create a table which stores a highwater mark when Your processing runs. Use "> (select max(highwater mark) from processruns)" unjoined in the query to select the unprocessed records only. When the query finished, insert the highwatermark from "thing" or add the current date. --Hans Haase (有问题吗) 02:06, 5 February 2016 (UTC)[reply]
BTW, this isn't a very efficient way to do things. You might get away with it, with only a small number of rows, but checking every row every time, when only a small number need to be processed, wastes resources. I would suggest a "Scheduled Events" table that lists dates and rows (in the main table) that need to be processed on those dates. You would only have the next event for each row listed in this table, and replace the date with the next date, as part of the processing step. So, if you had a million rows in the main table, and only a thousand need to be processed each time, this should be on the order of a thousand times faster (not including the actual processing time). StuRat (talk) 03:03, 5 February 2016 (UTC)[reply]
If the "process" is each month and a date is in "thing", extract and filter on year and month. --Hans Haase (有问题吗) 21:24, 5 February 2016 (UTC)[reply]

SSD, virtual memory, thrashing

edit

In short: will adding an SSD drive generally give notable improvements to thrashing issues?

In long: I have a series of computations I need to run for my research. This partly involves smacking together lots of relatively large arrays. Things go great for size parameters of about 45x45x10k, and the things I want to run complete in a few hundred seconds. Apparently things mostly fit into my 16GB RAM, and only a few larger swaps to virtual need to be done. But if I increase those sizes to say 50x50x15k, I hit a memory wall, get into heavy swapping/thrashing, and the thing can take about 20 times longer, and that's not ideal. It's not so much the time increase that bothers me, but that I've shifted into a whole new and worse domain of effective time complexity. So: would buying a few gigs (16?) of SSD to use as virtual memory generally help speed things up for the latter case? I know more memory would help, but all my DIMM slots are full, and I think I can get a solid drive for a lot less than 4 new RAM chips above 4Gb each. Actually I may be very confused and wrong about this, as I haven't payed much attention to hardware for years, maybe more RAM could be comparable cost and more effective. Any suggestions? (And if you happen to be decently skilled at scientific computing and are looking for something to work on, let me know ;) SemanticMantis (talk) 21:14, 4 February 2016 (UTC)[reply]

In short, yes, sticking the windows swapfile on an SSD is highly recommended, and considering you can get a 64GB SSD for about $70 (I don't think you'll find a 16GB one these days), it's a bit of a no brainer these days. It used to be a but controversial in the early days of SSD because it heavily utilizes the disk and early SSDs didn't have very high read write cycles, but that's not so much of a concern anymore. How much performance increase you see is hard to predict, but it could be anything from "a bit" to "loads". Vespine (talk) 22:00, 4 February 2016 (UTC)[reply]
Thanks, it seems I am indeed out of touch with prices and sizes. This is for use with OSX by the way, but I'm sure I can figure out how to use the SSD for virtual memory if/when I get one. If anyone cares to suggest a make/model with good latency and value for the price, I'd appreciate that too. SemanticMantis (talk) 22:17, 4 February 2016 (UTC)[reply]
Adding more memory will generally be MUCH MUCH more effective than using a faster (SSD) disk. Even if the memory in the SSD is as fast as RAM (and it almost certainly isn't), the overhead in passing a request through the virtual memory system and I/O system, sending it over the (slow) interface to the disk, getting the response back over the same slow interface, getting it back up the software stack to the requesting thread, waking up and context-switching into the thread, is going to be many times slower than a single memory access. Now consider that nearly every CPU instruction could be doing this (which is pretty much the definition of thrashing). Moving to a SSD disk will certainly improve performance over a mechanical disk, but it will be nowhere near the performance increase you would get by increasing the size of physical memory to be larger than the working set of your processes. Mnudelman (talk) 22:24, 4 February 2016 (UTC)[reply]
I'm not as familiar with OSX or Linux but I get the impression moving the swap file is not quite as straight forward as windows. If I were you, i would consider getting a bigger SSD instead, doing a full backup to an external drive and then restore your entire system to the SSD disk. That way you will see improvement benefits above and beyond what you'd get from just sticking the swap file there. I personally use the Samsung 840 (now the 850), they're stalwarts in the reviews for "best bang for the buck" category. Vespine (talk) 22:30, 4 February 2016 (UTC)[reply]
A factor of 20 is not really much for thrashing. The really horrible cases are in the thousands. The most important difference between physical HDD and SSD is the access time, so if there are lots of accesses, the SSD could lower a factor of 20 well into the single digits. If there are many threads working, e.g. 45 threads, the HDD could be busy delivering data from 45 places "at the same time". SATA NCQ helps here but cannot eliminate the physical seeks, just order them in a time-saving way. On SSDs, NCQ could well eliminate part of the communication loop, because all threads could run until they need paged data, and then the OS could ask for the whole wad in one big query. I'm not sure how that would work out in practice, though. The savings will probably be bigger with the HDD, but still not enough to catch up with a quality SSD without queueing. - ¡Ouch! (hurt me / more pain) 19:03, 6 February 2016 (UTC)[reply]
Kingston 8GB 1600MHz DDR3 (for MacBook Pro, but I suspect thats not too atypical) is a bit over US$40, so for 32 GB you'd pay US$ 170 or so. As others have said, an SSD is better than a mechanical drive, but RAM is so much better than SSD that it's not even funny. --Stephan Schulz (talk) 22:41, 4 February 2016 (UTC)[reply]
Well, I agree RAM is "better than SSD", but ask anyone what the best improvement to their computers has been in the last 10 years and almost universally it's getting an SSD. The other thing you COULD consider is get 16GB in 2 sticks and replace 2 of your sticks for a total of 24GB. It is "more" usual to have 16 or 24, but having 2 matching but different pairs should still work fine. if that's enough to push you "over the line" you can leave it at that, if it's not you can always get another 16GB pair later. Vespine (talk) 22:46, 4 February 2016 (UTC)[reply]
Yes, that does change things in the cost/value analysis; I forgot not all RAM chips have to match (anymore)? SemanticMantis (talk) 00:16, 5 February 2016 (UTC)[reply]
As a general concept it's true that SSD is a big improvement to performance but only if the bottleneck is the disk in the first place. That's not what the OP is describing; he is in a THRASHING scenario. Improving disk speed is a terrible way to address thrashing.
Also consider that when you need to swap to read a single word, you have to free up some RAM first, which probably means WRITING a PAGE of memory to the SSD, then reading another PAGE of from the SSD back to RAM. Depending on the page size, this will be hundreds or thousands of times slower than it would be to read the word from RAM if swapping is not needed, even ignoring other overhead like disk/interface speed and context switching. Mnudelman (talk) 22:49, 4 February 2016 (UTC)[reply]
Improving disk speed is a terrible way to address thrashing. Except it's not a "terrible" way at all, microsoft recommend sticking your swap file on an SSD if you can. We're getting superlatives mixed up. YES more ram is better than a faster disk but a faster disk is NOT a "terrible" upgrade. Vespine (talk) 23:27, 4 February 2016 (UTC)[reply]
My 1st reply was actually going to be "it might be hard to predict how much improvement you will see upgrading to an SSD" but they I read the 1st line again, WILL AN SSD GIVE NOTABLE IMPROVEMENT IN A THRASHING SCENARIO" and my answer, in short, which i still stand by, is yes, yes it will. More RAM will probably be better, but getting an SSD IS also just a GOOD upgrade overall. Vespine (talk) 23:29, 4 February 2016 (UTC)[reply]
Yes, an SSD for virtual memory will be faster than a HD, but not a whole lot faster. Definitely maximize your motherboard's RAM first. Bubba73 You talkin' to me? 23:56, 4 February 2016 (UTC)[reply]
Right, so I know more RAM will be the best way to solve the problem. But given a fixed budget, I'm not clear on how to optimally spend it. For example, I can get 8 more GB of ram for around $110 [1]. That will give me more room, but may well put me straight back to thrashing at 52x52x15k, to continue the example numbers from above. I can get 128GB SSD for $60 [2]. For a certain fixed size computation, that will not give me as much speed increase as more RAM would. But, if I'm hitting a RAM wall regardless, then the increased read/write speed should help me out with a lot of thrashing issues, no? It's not like it's thrashing so bad it never stops, or crashes the computer. Just puts me into a much higher exponent on time complexity. To make things up: say I was at O(n^1.1) up to a certain size N. For M>N, the thrashing puts me at O(M^3). With 8 (or 16)GB more RAM, thrashing may set in at M2=M+K, but I'm still at O((M+K)^3) after that. With a new SSD, I thought maybe I could get to O(M^2) for M>N, up to some larger cap on the size of the SSD. (yes I know this is not exactly how time complexity works, I'm just speaking in effective, functional terms of real-world performance on a certain machine, not analysis of algorithms. For that matter, nobody has yet suggested I just get better at managing my computing resources and being more clever at organizing things efficiently, but rest assured I'm working on that too :) SemanticMantis (talk) 00:10, 5 February 2016 (UTC)[reply]
In terms of bang for your buck, have you looked into just buying computing resources from a "cloud" provider instead of running things on your personal computer? --71.119.131.184 (talk) 00:33, 5 February 2016 (UTC)[reply]
(EC)And THERE is the rub. This is a complex enough problem that it might be hard or impossible to give you a good answer. Will an SSD give you an improvement? yes. Will getting 8GB more RAM give you an improvement? yes. Which will be "better" or more "worth while"? This is going to be very hard to predict without actually just TRYING it. If I were you, I think upgrading your system disk to an SSD is just a "good upgrade" to do regardless AND it has the added benefit that your thrashing will probably improve somewhat. How big is your system disk? If you get 8GB more ram and your problem doesn't improve (because you need 16 or 32 more) , then it WILL be a waste, if your problem doesn't improve much with an SSD then at least you will have speedier boot times and an overall performance increase. Vespine (talk) 00:36, 5 February 2016 (UTC)[reply]
Right, I think I'm leaning toward SSD because it is cheaper and will certainly help at least a little bit in almost all cases, even just normal tons-of-applications-open scenarios. I thought about buying/renting cloud resources but that stuff is fiddly and annoying to me, plus this shouldn't really be out in the world until it is published. SemanticMantis (talk) 15:45, 5 February 2016 (UTC)[reply]
I always maximize my RAM, even if it means taking out some sticks. I usually get it from Kingston Crucial. Besides my main computer, I have three computers that I use for numerical work. Two are core i5s which I bought cheaply and bumped up to the maximum 16GB RAM. One of them is an i7 with 16GB of RAM and an SSD. My main computer is an i7 with and SSD and 32GB.
I did a speed test of sequential access my SSD vs. HD. the SSD does about 395MB/sec whereas the HD does 175MB/sec, so the SSD is 2.25x as fast. (Of course, random access will show a much larger benefit to the SSD.) So I think swapping to an SSD instead of a HD will be about twice as fast. I think you will probably be better off making the RAM as large as possible first. Bubba73 You talkin' to me? 00:41, 5 February 2016 (UTC)[reply]
Page file access tends to be highly non-sequential, so you should see a far higher gain than that from the SSD in theory. Surprisingly, I can't find any SSD vs HDD paging benchmarks online.
You would probably get large gains from using explicitly disk-based algorithms that are tuned to the amount of physical RAM in the system, instead of relying on virtual memory. But that is a lot of work and programmer time is expensive. -- BenRG (talk) 01:06, 5 February 2016 (UTC)[reply]
Yep, but I'm the only "programmer", and too lazy/unskilled for that kind of optimization, and I have bigger fish to fry :) SemanticMantis (talk) 15:45, 5 February 2016 (UTC)[reply]
According to Solid-state drive, random access times are about 100 times faster for SSD than for HD, and data transfer speeds for both are within one order of magnitude (in both cases, there are huge differences for different models). But random access time for the SSD is still around 0.1ms. Random access time for RAM (assuming it is not cached, and assuming it's not pre-fetched) for current DDR3 SDRAM is about .004 μs, or about 25000 times faster than the SSD just to access a word - and that ignores all the additional overhead of writing back dirt pages, updating the MMU tables, and so on. So yes, an SSD is a good upgrade. I like SSDs, and I have SSDs exclusively in all my machines (even the ones I paid for out of my own pocket, and even at a time when a 1 GB SSD set me back a grand). It will certainly improve paging behaviour. But it is a very poor second best if the system is really thrashing. --Stephan Schulz (talk) 12:02, 5 February 2016 (UTC)[reply]
I see your point but beg to differ.
If the SSD can save 9 hours of runtime on the problem I have and the RAM upgrade can save 10 but is more expensive, the SSD can be good enough and even overall better. For example if it saves 30 seconds of boot time (optimistic but not unheard of), the SSD would overtake the RAM after 120 boot cycles. - ¡Ouch! (hurt me / more pain) 19:08, 6 February 2016 (UTC)[reply]
Ok, thanks all. I know this is too complicated to give one simple answer of which option is best; I mostly wanted your help in framing some of the pros/cons, and some more current estimates of price and performance. SemanticMantis (talk) 15:45, 5 February 2016 (UTC)[reply]
  Resolved
I'm actually really curious which direction you will go and how it works out for you. Vespine (talk) 22:02, 8 February 2016 (UTC)[reply]