Talk:Enterprise search/Archives/2011

Latest comment: 13 years ago by Therealpowerflower in topic External Links

difference between this "enterprise search" and "plain old search"

So what is the difference between this "enterprise search" and "plain old search"? --83.249.118.113 16:52, 16 September 2007 (UTC)

Without knowing what you mean by "plain old search" it's difficult to give an answer, but I'll try as best I can to explain what most analysts say it covers. Enterprise search refers to search technology intended for use within an enterprise (e.g a multiple person business or also a large organisation such as a government department or a library or research facility). Technologywise it covers systems intended not just for 'Windows' or 'MAC' Desktops but more commonly for use on a LAN, Intranet, Extranet or corporate website. It covers systems that will run on Unix systems (Solaris, Linux etc) and other high end server systems as well as Microsoft Windows NT, 2000, 2003 servers, down to desktop clients running any of these also. Historicaly, such systems predate the more recent systems based on Web technology or tools designed for the consumer Desktop market; for example Verity and Excalibur (later renamed Convera) started out on Unix platforms, whereas with the advent of the IBM PC and compatibles, the pioneers for the Microsoft DOS platform were Zylab about 1988, Isys about 1989 and D T Software (later renamed dtSearch Corp) about 1991. Search systems intended for 'enterprise' use as opposed to consumer use generally have better security and 'install over network' capabilites, more control over what is indexed and when (typically scheduled for overnight rather than 'real-time' as in some consumer products, to avoid slowing systems down), better scalability to index millions of documents, often support for searching legacy document formats rather than just common consumer file formats, often with extensive reporting or exporting of search results, better multiple language support, and often with specialised features for categorisation, sorting of results, and other features more suited to professional researchers, investigators etc, and of course a price tag that is usually beyond the range of products intended for the consumer market. Gartner decided to change their annual 'Enterprise Search' report in 2005 to cover vendors that in fact must have the capability to offer more than just plain search capabilities (frequently called "keyword search") to be eligible to be included in their 'Leader' or 'Visionary' classifications.Ray3055 13:25, 3 November 2007 (UTC)

Ray - I think I can relate to this person's question...When the average person thinks of search, he thinks of Google, but Google is not listed here. What makes enterprise search different from what Google or Yahoo does?Cwilli1024 (talk) 20:54, 31 May 2008 (UTC)

I have added a Summary section to this article in an attempt to address these issues. Johnchallis (talk) 19:25, 30 August 2008 (UTC)

I added a section on the challenges of enterprise search relevancy, which I think deserves some focus as it is a known challenge and something that enterprise search vendors will be looking at in the future. tkjelsrud 4 March 2009

I added a section called "Elements of an enterprise search system" (though I'm not entirely happy with that section title) to describe what an enterprise search system does, exactly, since I found it strange this wouldn't be here. Therealpowerflower (talk) 16:51, 28 June 2009 (UTC)

Vendor section headings

Can someone please explain the headings Microsoft Based and SuperPlatforms. For example 1) why is Microsoft Sharepoint not under Microsoft based - it runs on any Windows 2000, 2003 server? 2) dtSearch Corp produces a Linux Engine as well as Windows 32 and 64 bit Engines. Although I think they should be in the specialists section with Isys the same way that Gartner rank them. Ray3055 12:41, 1 November 2007 (UTC)

OK - I have since found that the headings I mentioned above have been copied from the CMSWatch website. Perhaps they should be changed since CMSWatch is a commercial website and there maybe copyright violation. But a reference to CMSWatch would be useful, I'll add it. Ray3055 13:55, 1 November 2007 (UTC)

The Gartner annual "Magic Quadrant for Information Access Technology" lists enterprise search vendors in one of 4 categories, Leaders, Challengers, Niche Players, and Visionaries. For 2007, ZyLAB is in the “Leaders” section, so I will add them to the same section as Autonomy and Fast, the same way that Gartner ranks them. Ray3055 12:41, 1 November 2007 (UTC)

I am concerned about the vendor headings being the same as from CMS Watch, apart from copyright issues, CMS Watch is just a small consultancy and the descriptions they give under their headings aren't consistant with major consultantancies such as Gartner, IDC etc; for example they only list publicly quoted companies under the 'major vendors' heading, whereas some of the privately owned firms often have similar revenues or similar or often much larger user bases. The major consultancies tend to classify by turnover, but this is sometimes misleading because some of the vendors sell systems and consultancy starting at 100,000 dollars and a user base of just a few hundred customers can acheive multimillion dollar revenues; whereas some of the 'smaller' firms sell systems or often just software and with a turnover of just a few million dollars can have a user base in the hundreds of thousands; for example Gartner rank Zylab as a 'Leader' but according to their website their turnover for 2006 was just around 15m dollars. Some suggestions for now a) rank them in groups by turnover for the largest firms as now, and for the others list alphabetically in the same list or perhaps in groups by platform (Linux, Windows etc) - if we base the rankings on say Gartners reports we risk the same problems so we must have a transparent way of classification/grouping. I'll leave this for a few days to get feedback but think we must change the current vendor headings to avoid a delete request because of copyright violation. I'd also suggest we limit the listings to companies referred to in just the major analyst firms to avoid spam links from the dozens of small startups unless they can provide a reference link to prove substantial income from enterprise usage - and not just references to wide 'free' or adware distribution or non profitable usage on websites etc for publicity purposes Ray3055 13:25, 3 November 2007 (UTC)

I'm also curious what the vendor headings are supposed to be. Since I work for a company that is broadly accepted as a major vendor, I don't feel comfortable editing the list. But I agree with Ray3055 that we should either have a principled, transparent way of classifying vendors, or we should just present them as a flat list. In my experience, the leading industry analyst firms in this space are Gartner, Forrester, and IDC. Perhaps we can aggregate judgments from these three firms to arrive at a classification that is transparent and mitigates concerns about copyright. Dtunkelang (talk) 03:38, 13 August 2008 (UTC)

I Vote for removing the vendor list completely, or at least moving it to a separate page. The Enterprise search article should be more about what enterprise search is, now who it is. andemann —Preceding undated comment was added at 18:45, 20 September 2008 (UTC).

In the absence of contrary opinions and in keeping with Wikipedia editorial guidelines, I've moved the list to a new page, which itself is cleaned up to only include vendors and open source software with Wikipedia entries, and which eliminates the dubious classification of vendors that was on this entry. This entry could still use some cleanup, but at least it is no longer dominated by an ugly and confusing vendor list. Dtunkelang (talk) 00:35, 28 September 2008 (UTC)

Expand?

This page needs some additional sections - say History/Timeline? - Technology/Glossary as a minimum.. at the moment it is just a Vendor List and not very informative or useful to anyone wanting to research about the subject. Problem is that things change quickly and there are few if any reliable or accurate books or reports from 'consultancies' - I just came across the Forrester "Enterprise Search Platforms, Q2 2006" report in which it states "Entopia is a strong Performer in Enterprise Search Platforms.." so strong that it went out of business in late 2006. I just removed Entopia from the vendor list. Ray3055 19:55, 7 November 2007 (UTC)

Searching Software Source Code?

Is there any room to include software source code (e.g. programming languages such as Java, C, COBOL, Fortran, Easytrieve, JCL, JavaScript, REXX, Ruby, Python, Focus, RPG, etc.) in the definition of "enterprise search?" Reasoning being: when someone writes a MSWord document (hopefully discoverable via enterprise search) discussing sales performance ("Sales were down 10% in the Western Region"), the numbers supporting the textual description likely came from an Excel spreadsheet populated with numbers taken from a variety of company internal reporting systems—from PCs to mainframes.

No question that searching—in a meaningful way—source code is a non-trivial exercise since words/phrases/concepts like "social security number" will be found as EIN, SSN, TIN, EID, SIN, emp_no, empl_id, etc. inside software systems.

See: http://en.wikipedia.org/wiki/Alphabetical_list_of_programming_languages for a mind boggling list of programming languages. DEddy (talk) 16:02, 7 February 2008 (UTC)

Enterprise search today is mostly used in the sense that it is a company wide Intranet based system which enables searching on documents/web pages that have been specifically 'published' (e.g reside in shared web folders) on the corporate Intranet for all (or certain groups)to see - such as policies, procedures, and 'non sensitive' documents. However, many also link into databases that might hold customer/supplier information and so on, and of course might spider external websites. For more specialised searching such as for sensitive data (customer e-mails, confidential reports, invoices, legal agreements) and software source code etc usually it is impractical or not secure enough to put those types of things in web shares or common repositories, so isolated 'departmental' specialised search systems are more common. Source code in particular is usually classed as a Text Mining application - many of the same search engines used in Enterprise Search applications are suitable for this type of usage. I have added a link under See also - to text mining. Business Intelligence systems are a further example of where search engines are employed. Ray3055 (talk) 22:30, 8 February 2008 (UTC)

Cleanup of red-linked items

I noticed User:Dtunkelang adding a Cleanup tag to this article. In response, in this edit, I removed some products from the 'Specialized Search' section that do not have Wikipedia articles. The practice over at List of search engines is to include entries only when they are notable enough to have articles. If a product that you have knowledge of has been removed, please consider creating an article on it. If there are reliable sources to establish the product's importance, that is enough to justify an article. In turn that would justify re-adding it to this list. EdJohnston (talk) 19:02, 17 August 2008 (UTC)

Ed, thanks for stepping up. There's been discussion about applying a more principled approach to vendor categorization. For example, what makes a vendor "major"? Inclusion as a leader in the big-name analyst reports (i.e., Gartner, Forrester)? Revenue or customer count above some threshold? In general, this page feels due for an overhaul. As it is, the current entry is, at best, a list of vendors, and I thought Wikipedia discouraged such entries. Wouldn't it be more useful to write about enterprise search, taking a cue from publications like the Enterprise Search Sourcebook and the series of posts by Mark Bennett of New Idea Engineering on 20+ Differences Between Internet vs. Enterprise Search - And Why You Should Care? Dtunkelang (talk) 03:26, 18 August 2008 (UTC)

I have added a Summary section to this article in an attempt to address these issues. Johnchallis (talk) 19:25, 30 August 2008 (UTC)

The Summary section is definitely an improvement. I've put out a call for help at The Noisy Channel to see if we can get a group of people without strong ties to a single vendor to improve this entry. Dtunkelang (talk) 17:22, 20 September 2008 (UTC)

Faceted Search

While I appreciate RPR123's enthusiasm about faceted search, I don't think this entry should be quite so ideological about it. And I'm saying this as the Chief Scientist of Endeca! I'll try to edit the entry to preserve what I can of this material while bringing it back to an NPOV.

Dtunkelang (talk) 01:15, 1 November 2008 (UTC)

External Links

Could we collectively decide whether to have external links, and if so which ones meet Wikipedia standards? I'm not thrilled with the current ones, especially given that the previous ones were deleted as spam and these seem no better. But I'd like us to take a principled approach. Dtunkelang (talk) 03:13, 31 December 2008 (UTC)

There seems little reason to include external links in this article and once some are included it will then be difficult to achieve consensus about which external links are valid and which are not. MarioBoglietti (talk) 12:33, 13 January 2009 (UTC)

I think there are some excellent primers and introductions to enterprise search out there and we should probably link to a few that give an unbiased, product-neutral description. This page is too brief to go into all the details itself. I've added one, trying to hunt down some more. Therealpowerflower (talk) 15:47, 31 December 2010 (UTC)