Whether your law firm employs five or 500, the decision to outsource your discovery needs or bring it in house requires serious consideration. Ask yourself these five key questions: Do we have the right staff to manage all our discovery … Continue reading
BLOG on Litigation Support and eDiscovery Industry
Understanding Near-Duplicate Identification [Part Two]
We finished Part One of this blog post with a simplified example to demonstrate how one might be able to measure document similarity across a given corpus using a document-term matrix as a starting point. Let’s now turn back to R and our larger sample … Continue reading
Understanding Near-Duplicate Identification [Part One]
Near-duplicate identification is one of the more common textual analytics tools used in eDiscovery. Not to be confused with document deduplication, which relies on hash values, near-duplicate identification calculates document similarity based off textual content. For example, if you had … Continue reading