Wikipedia:Reference desk/Archives/Computing/2011 April 9

Computing desk
< April 8 << Mar | April | May >> April 10 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


April 9 edit

personal cloud (continued from here.) edit

(This question continues from here.)

Out of curiosity I tried setting up a Beowulf network using four old personal computers running under Windows XP using Visual Basic as described here.

To my great surprise it works and is fast enough and distributes memory well enough that I am only restricted by subscript out of range, overflow and out of memory errors in only extreme cases.

Is there a way I can improve upon this setup by programming in DOS or in some other way that is relatively user friendly as under Windows XP and Visual Basic 6. --DeeperQA (talk) 09:27, 9 April 2011 (UTC)[reply]

I've seen DOS used in some strange (and surprisingly modern) software setups. But you need to know: DOS does not provide the following two features that you probably care about:
If you want to set up a Beowulf cluster, you probably need both of those. Otherwise, DOS is a fine operating system for certain computer architectures. I've seen modern, mission-critical, real-time computer systems that run true DOS, without any emulation. But none of those computers needed a network or a lot of memory.
Anyway, it sounds like your objective is to "get more performance," not to "run DOS." So, let's drop that idea, and here's what I'd suggest instead:
  • Stop programming in Visual Basic. You will soon find that your performance limitations are dominated by the overhead of application virtualization, hardware abstraction, and the limited support that even a modern VB compiler provides for optimization.
  • Run a server operating system, such as Windows Server (or really, since you're seeking high-performance computing, learn a Unix or Linux - though there are HPC applications and tools for Windows, you need to expand your horizons if you seriously do want to spend a lot of effort in the area of performance-driven computing). Run an operating system that lets you disable unneeded features, such as the graphic interface(s), which hog memory and CPU cycles.
These simple steps will buy you loads more performance than switching to DOS. Especially as you seek larger and larger memory allocations, you are going to absolutely require a modern operating system with support for physical address extensions or 64-bit memory addressing. Nimur (talk) 17:53, 9 April 2011 (UTC)[reply]
The immediate problem is to expand the data limits using a VB6 algorithm written to implement this method of classification. Large numbers of Characteristics or characteristics with large numbers of states that result in large multiset values will exceed available memory or result in subscript out of bounds errors if limited to available memory. Windows Xp does not seem to handle arrays with a swap file that have indexes which exceed the range of an integer variable. The alternative method of counting multisets with a single variable is far too slow.
Distributing set segments allows parallel counting and combining of counts to find the separatory value. In fact, a multitude of counting "locations" (remote folders with their own counting algorithm) allows reducing segment size to the point of avoiding out of memory and subscript out of bounds errors while remaining in a high level environment. In the long run though I here what everyone is saying. Eventually the realm of Unix and dedicated hardware will have to be entered to achieve an ultimate goal. --DeeperQA (talk) 04:06, 10 April 2011 (UTC)[reply]
Just as an aside - if you're working on problems so large that you actually need multiple computers' worth of address space, I'd strongly suggest switching architectures altogether. I recommend installing Ubuntu on each system; but any modern *nix will really get the job done. OpenSolaris' ability to fine-tune individual process allocations to specific CPU cores used to be very helpful, but I would really just leave such optimization to a modern kernel; worry about your high-level algorithms, minimizing data-dependency, and intelligently scheduling high-level tasks. Design your algorithm; then, program in straight C or FORTRAN, using GCC to compile. In my personal experience, I have not found any other compiler, including the Intel FORTRAN and Intel C compilers, that benchmark as aggressively as GCC. GCC also supports MPICH2 and other MPI platforms, or you can run a grid engine like Sun Grid Engine. If massive data-communication is required for your problems, consider wrapping your node-programs inside a higher-level inter-node communication platform, such as Hadoop or Java EE (both have available, free, open-source implementations available). It is unlikely you can design an inter-node router that is more efficient than either of those platforms (or MPI, if your data transfers are very regularly-scheduled).
Some of the applications began on a Microsoft TRS-80 Mod I. At one time they were rewritten in C+ and ported to an Atari ST which ran TOS which was a clone of Unix. The problem then was they lost most of their programmer friendliness which stalled small changes to the algorithm. For at least one program this is the problem today. Its complexity demands programmer friendliness to facilitate algorithm improvements. The idea then is to make algorithm improvements first in a distributed environment prior to changing application programming language, much less architectures. Even the transition from VB6 to VB.NET, VBS and ASP (here)presented time consuming difficulties. The idiosyncrasies of the original algorithm (use of string variables) need to be eliminated in a distributed environment first before changing architectures or the resulting complexity will kill the project. This is a major constraint. --DeeperQA (talk) 11:14, 10 April 2011 (UTC)[reply]
This environment is your end-goal. Based on your current descriptions of your cluster, you will probably have a lot to learn between where you are, and where you want to go - but if you take HPC seriously, you should start using the tools of the trade. Nimur (talk) 18:04, 9 April 2011 (UTC)[reply]
I'm not sure how the errors you describe could be prevented by having a cluster. Subscript out-of-bounds and overflow are programming errors -- the resources available to the program don't affect them. Out-of-memory errors are usually programming errors, too. A typical set up will use swap space to allocate virtual memory until the system bogs down to be painfully slowly. If a program manages to allocate memory fast enough to completely run out before the user kills it in frustration, it's probably allocating in an infinite loop. To further confuse me, the Beowulf article doesn't give any indication that Windows is a supported platform for Beowulf clusters. Paul (Stansifer) 19:58, 9 April 2011 (UTC)[reply]
As stated above one applications program lends itself to bein divided into segments which reduces segment size for each division. Smaller segments mean faster processing and a smaller accumulator array. Out of memory errors come from arrays whose size exceeds available memory which subscript out of bounds errors come from index values that exceed constricted array sizes. Windows XP seems to limit swap file size for arrays well below the assigned disk space limit. while not an industrial level Beowulf the basic Beowulf function is fulfilled. My goal is to make baby steps from a high level environment rather than to recreate a Google, Facebook, Wikipedia or giant cloud environment up front but I do want to explore all of the basic concepts while headed in that direction. --DeeperQA (talk) 11:31, 10 April 2011 (UTC)[reply]
I'm not sure what you mean -- if you can't allocate as much memory as you want, you just allocate smaller arrays, but pretend that they're the large arrays that you wanted? The reason that Windows (and other operating systems) by default has a large amount of virtual memory is that it's better for a system to grow slower and slower (giving the user time to kill the offending program), rather than suddenly become unstable. Running completely out of memory will always cause instability, because programs expect to be able to allocate new memory as needed. I don't see how the behavior you describe could happen. Paul (Stansifer) 12:59, 11 April 2011 (UTC)[reply]
A Beowulf cluster is defined as a homogenous cluster of nodes running a Unix-like FOSS operating-system, with programs which permit processing to be shared among all nodes, and networked via a TCT/IP LAN. As Windows is a proprietary closed-source operating-system, it is by definition not a supported platform for a Beowulf cluster. That having been stated, a Beowulf-like cluster (i.e. identical clustered nodes on a TCP/IP LAN) most certainly can run on a Windows operating-system (e.g. Microsoft Windows Server 2008 HPC Edition), although it could not be described as a Beowulf cluster. Rocketshiporion 00:17, 10 April 2011 (UTC)[reply]
Beowulf network is also a generic term. As with most terms it can have many senses. I would not draw the line even where data was transferred by floppy disk or keypunch operators, so long as the primary concept of processing data at the same time on separate computers holds. Beowulf network is no longer the name of a species but the name of a class which is described in a simple high level LAN based form here. --DeeperQA (talk) 11:48, 10 April 2011 (UTC)[reply]
I stand corrected. I was mislead by our Beowulf article, which states that "a Beowulf cluster is a group of what are normally identical, commercially available computers, which are running a Free and Open Source Software (FOSS), Unix-like operating system, such as BSD, GNU/Linux, or Solaris. They are networked into a small TCP/IP LAN, and have libraries and programs installed which allow processing to be shared among them." Rocketshiporion 09:32, 11 April 2011 (UTC)[reply]
If you want to pool all of the memory and processors on multiple computers, you could try Parallel Virtual Machine, which basically turns multiple computers into an SSI cluster. But be warned that unless you have a fast interconnect (such as Gigabit Ethernet), you will probably face latency problems. In fact, I'm not even sure whether 1Gb Ethernet is fast enough for an SSI cluster to run without an apparent slowdown. Perhaps Nimur can expound and clarify. Rocketshiporion 00:29, 10 April 2011 (UTC)[reply]
"Without an apparent slowdown..."? Well, it's going to be slower than a system that doesn't rely on network latency! Is it acceptable slowdown? You need to benchmark your algorithm and compare to your requirements. If you work for a web-company, and must deliver processed results "in less than 10 milliseconds," your acceptable slowdown is very different than if you work at an offline data-processing company and must deliver numerical results within 6 months. My experience with SSI (for the uninitiated: "single-system image" - a "virtual" single computer made out of n individual nodes) has been this: if you make a cluster appear as a single computer, programmers don't realize the performance tradeoff, and design clumsy algorithms. This mindset is acceptable only in the most extreme cases. For most HPC applications, you need to inform the programmers that their application performance will seriously degrade if they try to malloc() a 600 gigabyte array. This will force the programmer to stop, sit back, and re-design the algorithm for scalability. That is also why I am strongly pitching the various technologies like MPI or Enterprise Java for inter-node coordination, communication, and control. It boils down to this: your problem is big, but the solution is not to rely on a faster computer with more gigabytes. It is a fact: we can never design an infinitely-embiggened computer to handle your infinitely-embiggened data-set. You, the HPC system designer, must accept that domain decomposition and node-level parallelism is the only way to deal with arbitrarily-large problems. Start getting in to that mindset, and start learning the tools to assist you in designing an appropriate software/hardware architecture. Nimur (talk) 15:38, 11 April 2011 (UTC)[reply]

Java GUI: manipulating (add/remove/setVisible true or false etc) JPanels during runtime? edit

I'm trying to create a board game (nine men's morris, if that helps at all). The board is a JPanel with paintComponent overridden to put custom .jpg image (the board image). The playing pieces are a JPanel also with paintComponent overridden to put a custom .png image. What I want to do is when the user clicks on a specific location of the board to put in a piece, the JPanel would be added - or if this is easier, setVisible(true).

Problem is, none worked when I tried one or a combination of these: frame.validate(), board.revalidate() and/or board.repaint() after board.add(piece), or after piece.setVisible(true). I've tried putting in System.out.println("something") after the codes to ensure that the listener is worker fine, and yes it was.

A friend told me to use SwingWorker as a solution, but as an amateur I have no idea how to use it at all even after referring to some links. Any feedback would be appreciated, thanks in advance. — Yurei-eggtart 12:59, 9 April 2011 (UTC)[reply]

I don't have time just now to code something (I can't really think without coding; I may have some time in a few hours). In the meantime I have some thoughts:
  • the pieces don't have to be JPanels: JPanels are containers, and if they're not containing something, you're adding overhead and a bit of complexity to no purpose. They'd be cleaner if they were just overridden JComponents instead
  • what is the layout manager on the board JPanel? If you haven't specified it explicitly, I think it's a java.awt.FlowLayout (which surely you don't want). You should probably specify a NULL layout manager for it, and position the pieces manually. I rather think that your problem is here: they're not being positioned, so they're offscreen, and so don't get paint calls when you ask to repaint them. And make sure they are not of (0,0) size (as again these won't be painted). Call getBounds on each piece, and make sure you get sensible on-screen rectangles for each.
  • for a job like this, you really don't need to have the pieces be components at all; the drawing and event handling you require is so simple that you can just have the board be a single JComponent and have its paint method (and its event handler) do all the work drawing the piece graphics as necessary.
-- Finlay McWalterTalk 13:40, 9 April 2011 (UTC)[reply]
Here's my code for the piece:
       JPanel piece = new JPanel() {
       @Override
           public void paintComponent(Graphics g) {
               Image img = new ImageIcon("Button"+colour1+".png").getImage();
               g.drawImage(img, 0, 0, null);
               setBounds(20, 100, img.getWidth(null), img.getHeight(null));
           }
       };
       board.add(piece);
       piece.addMouseListener(this);
       MAINWINDOW.addMouseListener(this); //MAINWINDOW is the JFrame
board JPanel layout is already specified null, thats why I could draw the background image at 0,0 in the first place. I've also tried calling this at the beginning (without waiting for any event), and the piece was displayed correctly. I just don't know what went wrong x_x — Yurei-eggtart 16:08, 9 April 2011 (UTC)[reply]
On the face of it, that's a fairly odd paintComponent call. Firstly you should generate img in the piece's constructor (I'd really have a named class rather than an anon like this). And secondly you shouldn't be positioning the object in its own paint call. It should be constructed, then you wait for the img to load (so you can get its dimensions) and then located, and the paint call just does drawImage. -- Finlay McWalterTalk 16:36, 9 April 2011 (UTC)[reply]
Although if there are more than one piece with the same image then you'd ideally load the images in the main object and pass the loaded Image object to the constructor of the piece objects - that way you've only loaded each image once. -- Finlay McWalterTalk 16:49, 9 April 2011 (UTC)[reply]
Here's my basic example
basic java example
import java.awt.*;
import javax.swing.*;

class Piece extends JComponent {
    static final long serialVersionUID = 0x1234L;
    private Image myImg;

    public Piece(Image i){
	myImg = i;
    }

    public void paintComponent(Graphics g){
	g.drawImage(myImg, 0, 0, null);
    }
}

class Board extends JPanel {
    static final long serialVersionUID = 0x1234L;

    public void paintComponent(Graphics g){
	g.drawLine(125,0,125,500);
    }
}

public class Boardgame {
    public static void main(String [] args){
	// for this example, white.png and black.png are 50x50
	Image whiteimg = new ImageIcon("white.png").getImage();
	Image blackimg = new ImageIcon("black.png").getImage();
	JFrame frame = new JFrame("hello");
	Board board = new Board();
	frame.getContentPane().add(board);
	board.setLayout(null);

	Piece p1 = new Piece(whiteimg);
	p1.setBounds(100,100,150,150);
	board.add(p1);

	Piece p2 = new Piece(blackimg);
	p2.setBounds(100,150,150,200);
	board.add(p2);

	frame.setSize(500,500);
	frame.setVisible(true);
    }
}
-- Finlay McWalterTalk 17:36, 9 April 2011 (UTC)[reply]
As to SwingWorker (etc.), you almost certainly don't need to use that. Swing is single-threaded, which means all the calls (at least, all those following that frame.setVisible call) which do anything to swing components have to happen in the main event thread. If your game just handles mouseclicks, you can just have the mouseeventhandler code move Pieces around as it wants. The only thing I can think that you'd do asynchronously (of the event handler thread) is if you had a computer player that took a significant amount of time to calculate its next move. In that case you'd most likely start a new Thread and run the calculate-move code there; once it was done it would call a little method (via invokeLater()) that would update the board. That invoked-later code would be executed in the event handler thread, which means it's safe for it to move and otherwise mess around with Swing components like Piece. -- Finlay McWalterTalk 18:31, 9 April 2011 (UTC)[reply]
Okay I did a few more runs and found some more odd stuffs, well it's really hard to explain and my codes are really messed up. Basically this is the flow of the program:
  • menu -> game mode -> MAINWINDOW generates (for testing purpose, i have called one p1 piece together with this frame so there'll be a white piece on the board (white is the default colour for p1)
  • p1 and p2 insert name -> p1 and p2 choose colour -> randomly choose the starter
  • all done, now only the board is up. As if putting down a piece, I click on a specific slot to trigger mouse event.
       if (e.getSource() == board) {
           if ((e.getX()<=65&&e.getX()>=20) && (e.getY()<=145&&e.getY()>=100)) {
               Piece piece = new Piece(colour1, 20, 100); //a class that extends JPanel that would form the piece
               BGPanel.add(piece);
               BGPanel.validate();
               BG.repaint();
               MAINWINDOW.validate();
               System.out.println("Hello world");
           }
       }
  • What exactly happened: Hello World is printed, the piece already on board changes colour from White to p1's selected colour, however nothing popped up at the slot.
What I don't understand is that, calling piece at the main GUI constructor (i.e. without checking for mouse event) easily creates the piece (just like how the testing piece shows up), however calling it at the event handler doesn't show up. Hopefully it's understandable, and thanks a lot for your attention, Finlay. — Yurei-eggtart 21:03, 9 April 2011 (UTC)[reply]

Book scanning services in Toronto edit

Hi:

I have some old books which I want to throw away. But I would also like to keep a digital archive of them. Is there any place in Toronto that does book scanning? Since I plan to throw them away, I don't mind if the scanning shop takes my books and cut away the ridge.

Thanks,

70.29.27.28 (talk) 17:19, 9 April 2011 (UTC)[reply]