Hadoop is a unified data platform where millions of files and tables are stored. Data scientists and business analysts can use the Hadoop cluster when they know the exact files or tables they want to analyze. But how do business analysts and data scientists find the right data? This requires a great deal of time, as they have to arrange meetings with SMEs and look for these files one by one, understand them and then analyze them in order to make data-driven decisions.
OvalEdge makes all the data of the data lake accessible to you through its in-built search technology. It also provides a user interface where you can easily search for any data or metadata. When you find the data you are looking for, you can understand it quickly by using the Smart-Catalog summary view.
When you consider searching for information, you cannot help but think about Google. What sets Google apart is that it uses the interaction of users with its search results to improve its search engine. Enterprise search lacks that vital aspect. You need a search engine that is just as smart, where you don’t have the help of billions of users to update your algorithms.
OvalEdge understands this, and we have creatively woven data relationships into our Smart-Catalog search algorithms. This works great for enterprises where all the data is related. Read more about the unique features of OvalEdge on our blog, What Google Cannot Answer, OvalEdge Can.
possible by references
When business analysts and data scientists search for data, they may not know the correct technical terms for what they are looking for. OvalEdge, however, allows users to conduct searches in plain English. Our algorithms make this possible by marking the relationships between all the data across the cluster for both raw data and processed data.
Search Petabytes of Data
There are some open source and proprietary tools that can put the Hadoop data lake into search indices, but then you would need a separate cluster. OvalEdge, however, creates Lucene indices and stores them directly on Hadoop. The data does not leave the cluster, and it stays secure at all times.
OvalEdge algorithms also reduce the data functionality by way of many intricate tricks. We have seen that because of these algorithms, we are able to get a 90% reduction on the space required for indices.
with built-in Security
Since OvalEdge features core-security, where roles and the security policy are defined at the raw data level, there is no need to worry about security breaches. This is largely due to the fact that search results are only available to those who have permission to view the datasets.