Con esta herramienta te facilitamos un acceso a todas las ofertas y demandas de tecnología europeas y a búsquedas de socios para participar en propuestas europeas de I+D publicadas en la red Enterprise Europe Network, pudiendo filtrar los resultados para facilitar las búsquedas más acordes con tus necesidades.

¿Quieres recibir estos listados de oportunidades de colaboración en tu correo de forma periódica y personalizada? Date de alta en nuestro Boletín

Los términos de búsqueda han de ser en inglés.

Búsqueda de socios para participar en Eurostars/Eureka: centro de I+D o pyme especializados en motores de reconocimiento óptico de caracteres para ofrecer requisitos de alto nivel y detalles técnicos de grano fino

Resumen

Tipo:
Búsqueda de socios
Referencia:
RDES20170821001
Publicado:
24/08/2017
Caducidad:
24/08/2018
Resumen:
Una pyme española especializada en gestión de contenidos empresariales (ECM) y captura de documentos busca pymes o centros de I+D interesados en presentar una propuesta a la convocatoria de Eurostars, con fecha límite el 14 de septiembre de 2017. Específicamente busca un socio con experiencia en motores de reconocimiento óptico de caracteres (ROC) para ofrecer requisitos de alto nivel. El socio buscado debe tener experiencia en procesamiento de lenguaje natural, procesamiento de imágenes, tecnologías ROC, inteligencia artificial y Big Data. Como resultado se pretende superar a los motores ROC que se encuentran actualmente en el mercado.

Details

Tittle:
Eurostars / Eureka Partner search: R-Y-D institution or SME used to deal with Optical Character Recognition engines for providing both high level requirements and fine grained technical details
Summary:
Spanish SME specialised in enterprise content management (ECM) and document capture is looking to identify suitable SMEs or R-Y-D institutions to join an Eurostars proposal to the next cut-off deadline (14th September 2017). The partner has to be used to deal with Optical Character Recognition (OCR) engines for providing both high level requirements. The expected outcome of the project is to go beyond what current OCR engines are offering in the market.
Description:
The project proposes the development of new fault tolerant open source OCR/ICR solution comparable with the features provided by some of the commercial ones. Therefore providing trustable extraction from documents with any layout, typography and/or writing style (both type and handwritten).

The enterprise content management (ECM) and the document capture markets are definitely in need of innovative solutions able to relieve human work. The ECM market is expected to move 12 billions dollars for 2019. It is a huge market because organizations at any size need require of some kind of content management solution. On the other hand, the document capture market, closely related to ECM, growth last year more than a 6% reaching a volume of 2 billions dollars.

Having at hand powerful, flexible, easy-adaptable, extensible OCR-ICR tools has become a fundamental problem for many organizations in their document management processes. Current tools for OCR are mainly based in techniques that were developed in the past twenty years in the pattern recognition (PR) field, like support vector machines, or simply nearest neighbour techniques. These tools do not take profit in many cases of contextual information for improving the OCR results. This contextual information consists mainly in linguistic resources like vocabularies and lexicons that reside usually in the companies that make use of these tools. With this project, the company intends to develop a series of OCR tools that will be accessible as open source that will take profit of these linguistic resources for building language models and to deal with difficult documents.

In recent years a new technology that is based in deep learning techniques, has strongly emerged in many Pattern Recognition problems including handwritten text recognition (HTR). These powerful techniques can be extended easily to implement OCR systems in order to deal with difficult documents and some research teams intend to develop these tools in the near future. One advantage of these deep learning techniques is that they are able to classify very quickly the sample to be recognized. Another advantage is that there exist many basic open source tools for performing the core processing. With this project, the company intends to develop a series of tools based on free software both for generating automatically training data and for training an OCR system based on deep learning techniques.

The call that the company is targeting is a Eurostars cut off 8, deadline is 14/09/2017. As the deadline is very close, the company is also considering to prepare a proposal for next Eurostars cut off deadline or an Eureka network project (always open).
EOI deadline: 07 September 2017
Project duration: 1,5 years (aprox).

Ideal partner is an SME (or RD institution) in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform. It should have expertise in:
· Natural Language Processing
· Image Processing
· OCR Technologies
· Artificial Intelligence
· Big Data
Advantages and Innovations:
The project intends to go beyond the state-of-the-art by developing tools for recognizing printed documents with handwritten text recognition (HTR) techniques that we will name from now on as OCR-HTR techniques. These techniques should be effective for printed documents for which current OCR techniques are not able to obtain good results. The main foundations of these tools will be: i) the technology will be based on a combination of Deep Neural Network Hidden Markov Models (DNN-HMM) for optical modeling; ii) n-gram models will be used in a recognition/decoding system; iii) Words generation and indices preparation will be implemented for making the collections of printed documents searchable.

The main technological outcome of the project include contributions to the Open Source Community:

- Industry-ready OCR and ICR with permissive open source (BSD, MIT or ASL 2.0 licensed) components that would perform as good as far more expensive commercial products.

- An API for OCR-HTR system tailored to be used by Content Management Systems and full text retrieval systems especially Apache Lucene / Solr a and Elasticsearch.
Stage of Development:
Under development/lab tested
Technical Specification or Expertise Sought:
Ideal partner is an SME (or RD institution) in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform. The parnet should have expertise in:
· Natural Language Processing
· Image Processing
· OCR Technologies
· Artificial Intelligence
· Big Data
IPR Status:
Other

Partner sought

Partner Sought:
Part of the technical development will be contracted with an University R-Y-D Group with whom the company already has built an initial prototype. They are now seeking a technical and/or commercial partner with similar uses cases or interested in the expected outcomes of the project beyond what current OCR engines are offering in the market. Ideal partner is an SME in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform. Therefore, the company is seeking for an integrator partner used to deal with OCR engines for providing both high level requirements and fine grained technical details about the new engine capabilities. A concrete use case around OCR/ICR/HTR technologies is preferred, ideally a use case where they find a barrier in the current state of the art in this technologies.
Type of Partnership Considered:
RDR

Client

Type and Size of Client:
Industry SME 11-49
Already Engaged in Trans-National Cooperation:
Si
Languages Spoken:
English
Spanish

Dissemination

Programme-call

Evaluation Scheme:
Two-stage submission (online submission, national submission) www.eurostars-eureka.eu/eurostars-process
Anticipated Project Budget:
To be determined
Coordinator Required:
No
Deadline for Call:
14/09/2017
Project Duration:
75
Weblink to the Call:
Project title and Acronym:
Deepdocs