Carbon Footprint Management Software Market Size is projected to … – GlobeNewswire
November 28, 2022 12:45 ET | Source: Straits Research Straits Research
Pune, INDIA
New York, United States, Nov. 28, 2022 (GLOBE NEWSWIRE) — A carbon footprint is the amount of greenhouse gases (GHG) released into the atmosphere due to a specific activity, person, or manufactured good. By far, the most common source of greenhouse gases (GHGs) is the combustion of fossil fuels for purposes such as generating electricity, heating buildings, clearing forests, manufacturing goods, and transporting people and goods. The global carbon footprint management market is expanding due to increasing environmental consciousness and the topic’s increased focus by regulatory bodies worldwide. Businesses across sectors are adopting carbon management software to comply with government mandates and industry standards like the US Green Deal and the EU’s new taxonomy for sustainable activities.
Get a Free Sample Copy of This Report @ https://straitsresearch.com/report/carbon-footprint-management-market/request-sample
Increase in Government Initiatives for Low Carbon Policies Drives the Global Market
Governments worldwide are thinking about rapid industrialization and the unbalanced production of considerable carbon because of the severe health and environmental harm that unchecked carbon emissions cause. Carbon taxes and related policies, such as energy taxes, have been implemented by several national, regional, and local governments to decrease emissions of greenhouse gases. According to the Center for Climate and Energy Solutions, 35 carbon tax programs are currently in place worldwide. Moreover, many countries impose stringent tax laws and regulations on their commercial and industrial sectors to lessen their carbon footprint.
The shift Towards Cloud Computing and Paperless Economy Creates Tremendous Opportunities
The need for documentation of various kinds results in substantial paper consumption; consequently, the government encourages businesses to embrace digital alternatives to paper. Since the widespread adoption of online banking and the explosion of mobile payment apps, our society has become increasingly paperless. The carbon footprint management industry can benefit from both the proliferation of cloud computing and its rising demand. With the help of contemporary IT, workplaces can streamline their document management processes by scanning documents and storing them in the cloud. Thus, going paperless via cloud computing will soon be integrated and adopted, strengthening the market.
Report Scope
Buy Now Full Report @ https://straitsresearch.com/buy-now/carbon-footprint-management-market
Regional Analysis
North America is the most significant shareholder in the global carbon footprint management software market and is expected to grow during the forecast period. The carbon footprint management market in North America is predicted to grow due to numerous regulations and substantial efforts by the United States government to reduce GHG emissions. For instance, the Environmental Protection Agency (EPA) has issued the Affordable Clean Energy (ACE) Rule to control emissions of greenhouse gases (GHGs) from currently operating power plants that burn fossil fuels. After factoring in the rule’s costs, domestic climate benefits, and health benefits, the EPA gains anywhere from USD 120 million to USD 730 million per year. Future carbon footprint regulations are likely to increase demand for carbon footprint management.
Asia-Pacific is expected to grow during the forecast period. Stakeholders view the Asia-Pacific region as a potentially lucrative opportunity on the back of rapid industrialization and urbanization. Financial support for air quality regulatory frameworks is anticipated from emerging economies. In July, China announced the launch of its emissions trading program, the world’s largest of its kind. It’s worth noting that China is the world leader in carbon dioxide production. Rising concerns about increasing CO2 emissions in the region are expected to fuel growth in the carbon footprint management market. According to the Paris Accord, India plans to reduce its carbon emissions by more than 30 percent by 2030. As a result of calamities like these, it will become increasingly important to control one’s carbon footprint.
It is expected that Europe’s carbon footprint management market will grow remarkably rapidly during the forecast period. It is due to the region’s rapid adoption of technologically advanced solutions to reduce carbon emissions. Coop, one of the largest retail and wholesale companies in Switzerland, adopted ABB solar inverter technology to reduce energy consumption, improve energy efficiency by 20%, and become carbon neutral by 2023. So, over the forecast period, the growth of the European carbon footprint market is expected to be influenced by the adoption of technologically advanced solutions to reduce carbon emissions.
The market for carbon footprint management is expected to grow substantially in LAMEA over the forecast period. It is because many countries in the region are moving toward greener policies and alternative energy sources. Governments and regulatory bodies have issued laws and mandates to lessen the world’s environmental impact. It is assumed that market leaders will expand their product offerings. Because of the increased adoption of policies aimed at reducing greenhouse gases, it is becoming easier for associations to report on the results and actions of their members.
Key Highlights
The global carbon footprint management market’s major key players are
Get a Free Sample Copy of This Report @ https://straitsresearch.com/report/carbon-footprint-management-market/request-sample
Global Carbon Footprint Management Market: Segmentation
By Component
By Deployment
By Industry
By Regions
TABLE OF CONTENT
Table of Content and Figure @ https://straitsresearch.com/report/carbon-footprint-management-market/toc
Market News
News Media
Carbon Footprint Management Solution – the Optimal Countermeasure for Rampant Carbon Emission
Have a Look at the Related Research Report?
Green Technology and Sustainability Market: Information by Technology (IoT, Cloud Computing), Application (Green Building, Carbon Footprint Management), and Region — Forecast till 2030
Carbon Management System Market: Information by Offering (Software, Services), Application (Energy, Greenhouse Gas Management), and Region- Forecast till 2030
Activated Carbon Market: Information by Type (Powdered activated carbon, Granular activated carbon), Application (Liquid Phase, Gas Phase), End User, and Region — Forecast till 2030
Carbon Steel Market: Information by Type (High Carbon Steel, Medium Carbon Steel), Application (Automotive, Construction), and Region — Forecast till 2030
Industrial Filtration Market: Information by Type (Air & Gas Filtration, Liquid Filtration), Filter Media (Activated Carbon, Nonwovens), Application, and Regions—Forecast till 2031
About Straits Research Pvt. Ltd.
Straits Research is a market intelligence company providing global business information reports and services. Our exclusive blend of quantitative forecasting and trends analysis provides forward-looking insight for thousands of decision-makers. Straits Research Pvt. Ltd. provides actionable market research data, especially designed and presented for decision making and ROI.
Whether you are looking at business sectors in the next town or crosswise over continents, we understand the significance of being acquainted with the client’s purchase. We overcome our clients’ issues by recognizing and deciphering the target group and generating leads with utmost precision. We seek to collaborate with our clients to deliver a broad spectrum of results through a blend of market and business research approaches.
For more information on your target market, please contact us below:
Phone: +1 646 480 7505 (the U.S.)
+91 8087085354 (APAC)
+44 208 068 9665 (the U.K.)
Email: sales@straitsresearch.com
Follow Us: LinkedIn | Facebook | Instagram | Twitter
- Published in Uncategorized
Kashoo Review 2023: Pricing, Pros & Cons – Forbes Advisor – Forbes
- Published in Uncategorized
Document Management Best Practices in 2023 [Business Guide] – Cloudwards
Keeping on top of critical documents and managing business data are some of the most important things your company can do. If you have installed a document management system, we’ve got some tips to help you run it effectively.
Implementing a document management system (DMS) is a great first step to keeping your business documents organized. However, you’ll need to adhere to some document management best practices to maintain effective document management, and we’ll show you how.
The good news is that many of the leading document management software solutions make it easy to keep track of your documents and ensure your business is compliant with local legislation. That doesn’t mean there’s no effort on your part, and below are the top document management best practices to help you get started.
Document management system best practices include accurate indexing and filing of electronic documents, creating a document access hierarchy and carrying our regular audits.
The best business document management system is the one that’s easy to use and full of features to make your life easier. Such features include automations, file versioning, collaboration options and robust security for your files.
Some of the document management best practices featured below may seem obvious. Thankfully, that means they’re easy to understand and implement; you need only ensure you remain consistent with them to have a continuously robust document management system in place. Let’s go through them.
The first step your business needs to take is identifying which document management system is right for you. While many of the options are similar, they’re not the same, and choosing one that doesn’t meet your needs can be a waste of time and money.
If your business creates a lot of new documents that you need to collaborate on, then something like Microsoft SharePoint or Egnyte are good options, as both offer Office 365 integration. Alternatively, if you simply need a space to store and manage contracts, DocuWare is a terrific option that also allows you to digitally sign your documents.
To ensure your documents are easily accessible, it’s vital that you index each new document and folder accurately, thus creating a consistent file structure. Prior to creating your DMS, we recommend that you make clear folder structures and create a list of categories for each document type.
Clear indexing makes it much easier to find documents when searching, and far easier for different departments to manage documents within their section of the business. When using a document management solution like M-Files, you can also label documents with tags and other forms of metadata, ensuring each document is in its proper place within your DMS.
If you’re migrating from physical documents to electronic documents, it’s good practice to make digital copies of your paper documents. It may be a daunting task, but scanning documents and transferring them to an electronic digital management system ensures that sensitive documents remain safe and accessible.
Paper documents are much easier to lose, and a physical filing cabinet doesn’t have the same level of security that the best document management software provides. So moving everything over not only makes filing more efficient, it also gives far more robust protection for each document.
Not every person in your business needs to have access to every document that lives in your document management system. There will be sensitive documents — like employee contracts and performance records — that must remain confidential. Thankfully, with today’s document management software you can grant access permissions to users of your choice.
When setting up your document management system, take time to identify which senior members of staff and departments need to access specific documents. They’ll also be able to control access for their own team members, and choose which of them can access, edit and share documents.
Automations are something every business should add to their document management workflow. They make business life easier and also help you remain consistent in managing your documents and business processes.
Depending on the document management software you’re using, you can send automated notifications when a document is created or edited. You can also set automated document approval workflows for certain documents, as well as set up automated invoicing if your business works with external clients.
If you’re using an efficient document management system, you can expect to have more automation options available, and it’s a good idea to make use of them.
Assuming you want to move away from physical documents and filing cabinets, you have two options when it comes to using a document management system: on-premise and cloud storage.
On-premise document management software allows you to control your documents from a server within your business. Cloud storage means your documents exist on third-party servers — which some users don’t like.
However, cloud storage gives you an added layer of protection should your hardware get damaged or stolen, as you’re still able to access your documents in the cloud. Cloud-based software also makes it much easier to collaborate with different users, especially within remote teams, improving overall business efficiency.
If you’re looking for a space to manage documents as well as other file types, check out our top online cloud storage providers for documents.
If your business constantly updates documents, having effective version control in place is a must. File versioning means you can access previous versions of documents, and restore them if needed.
It also lets you give some users only the ability to view documents and other users the ability to view, comment and change the edited versions. Doing this means you can effectively collaborate on a document and ensure a document isn’t changed without the proper authorization.
Although it’s much easier to manage business documents with an electronic document management system, it’s still good practice to do regular audits of your documents to mitigate against potential process failures or breaches.
You can do audits internally and externally. For the former, it’s good to do audits every three to six months, while an external audit can be conducted annually.
Not only does auditing allow you to learn what documents you have in place, it also means you can understand if your business is being compliant with sector regulations. Many professional sectors will enforce an external audit, so it’s good to keep on top of everything, rather than panic when the time comes for your business to be put under the microscope.
It’s a good idea to learn what types of documents you need to keep in the document management system for a certain period of time. For example, in the financial industry, regulatory requirements state that certain documents must remain on record for a period of seven years. Before deleting a document, double-check to see if it’s against regulations to do so.
Sticking to these document management best practices will help you keep your document management system in the best shape possible. If you have signed up for a new document management system, it’s best to put the above tips in place as soon as possible, to ensure you can keep on top of your documents and business processes.
While all the above tips are important, we suggest you prioritize folder and file structure, version control and selective user permissions, as these are the core of efficient document management.
It’s also worth trying out different document management solutions before committing to one long term. This way your business can identify which solution is best for its needs and provides the best user experience, which is important considering you’ll likely be using it every (working) day.
While still at it, read our document management vs content management guide to understand which product better suits your needs.
What do you think the best practices are for efficient document management? Is there a business document management system you recommend? Which software do you currently use for managing documents? Let us know in the comments. Thanks for reading.
document.getElementById( “ak_js_1” ).setAttribute( “value”, ( new Date() ).getTime() );
Also interesting
Most popular on Cloudwards
© 2007-2023 Cloudwards.net – We are a professional review site that receives compensation from the companies whose products we review. We test each product thoroughly and give high marks to only the very best. We are independently owned and the opinions expressed here are our own.
Because the Cloudwards.net team is committed to delivering accurate content, we implemented an additional fact-checking step to our editorial process. Each article that we fact check is analyzed for inaccuracies so that the published content is as accurate as possible.
You can tell that an article is fact checked with the Facts checked by symbol, and you can also see which Cloudwards.net team member personally verified the facts within the article. However, providers frequently change aspects of their services, so if you see an inaccuracy in a fact-checked article, please email us at feedback[at]cloudwards[dot]net. We strive to eventually have every article on the site fact checked. Thank you.
- Published in Uncategorized
Evaluation of the extraction of methodological study characteristics … – Nature.com
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Carousel with three slides shown at a time. Use the Previous and Next buttons to navigate three slides at a time, or the slide dot buttons at the end to jump three slides at a time.
30 September 2021
Ingmar Böschen
30 May 2018
Neal Robert Haddaway & Biljana Macura
15 November 2021
Christopher D. Chambers & Loukia Tzavella
16 August 2021
John K. Kruschke
22 May 2018
Timothy H. Parker, Simon C. Griffith, … Shinichi Nakagawa
02 September 2022
Uri Simonsohn, Joseph Simmons & Leif D. Nelson
08 March 2018
Jessica Gurevitch, Julia Koricheva, … Gavin Stewart
29 November 2022
Rick Dale, Anne S. Warlaumont & Kerri L. Johnson
11 January 2021
Eliana Lima, Robert Hyde & Martin Green
Scientific Reports volume 13, Article number: 139 (2023)
219
1
Metrics details
This paper introduces and evaluates the study.character module from the JATSdecoder package which extracts several key methodological study characteristics from NISO-JATS coded scientific articles. study.character splits the text into sections and applies its heuristic-driven extraction procedures to the text of the method and result section/s. When used individually, study.character’s functions can also be applied to any textual input. An externally coded data set of 288 PDF articles serves as an indicator of study.character’s capabilities in extracting the number of sub-studies reported per article, the statistical methods applied and software solutions used. Its precision of extraction of the reported (alpha )-level, power, correction procedures for multiple testing, use of interactions, definition of outlier, and mentions of statistical assumptions are evaluated by a comparison to a manually curated data set of the same collection of articles. Sensitivity, specificity, and accuracy measures are reported for each of the evaluated functions. study.character reliably extracts the methodological study characteristics targeted here from psychological research articles. Most extractions have very low false positive rates and high accuracy ((ge 0.9)). Most non-detections are due to PDF-specific conversion errors and complex text structures, that are not yet manageable. study.character can be applied to large text resources in order to examine methodological trends over time, by journal and/or by topic. It also enables a new way of identifying study sets for meta-analyzes and systematic reviews.
In scientific research practice, many individual decisions can be made that affect the scientific quality of a study. There are also changing standards set by journal editors and the community. This applies not only to the study design, but also to the choice of statistical methods and their settings. With new methods and standards, the way research is planned, conducted and presented changes over time and represents an interesting field of research. One aspect to consider is the ever-increasing number of scientific publications coming out each year. Numerous studies have investigated the use and development of statistical techniques in scientific research practice1,2,3,4,5,6,7. Most of these studies used manually coded data of a limited number of articles, journals, topics or time interval. The selectivity of these samples therefore severely limits the generalizability of the findings to a wider scope. For example, Blanca et al.7 analyzed the use of statistical methods and analysis software solutions in 288 articles (36 articles each from 8 journals), all from a publication period of about one year.
A technology that is suitable for analyzing large amounts of text and helps to overcome the problem of small samples in the analysis of scientific research practice is text mining. Text mining is the process of discovering and capturing knowledge or useful patterns from a large amount of unstructured textual data8. It is an interdisciplinary field that draws on data mining, machine learning, natural language processing, statistics, and more8. It facilitates extraction and unification tasks that cannot be done by hand when the analyzed text corpus becomes large. In addition to rudimentary computer commands on textual input (regular expressions), there are also many software programs and toolkits that provide model-based methods of natural language processing (NLP).
Well-known NLP libraries such as NLTK9 or spaCy10 provide users with a variety of programs for linguistic evaluation of natural language. This often involves the use of statistical models and machine learning. In contrast, the JATSdecoder package11 focuses on metadata and study feature extraction (in the context of the NISO-JATS format). This extraction is implemented using expert-driven heuristics. Thus, unlike in the aforementioned large multipurpose NLP libraries, no further programming effort is required to perform specific extraction.
Research on scientific practice can benefit greatly from NLP techniques. Compared to manual coding, an automated identification of study characteristics is very time and cost-efficient. It enables large-scale and trend analyzes, mirroring of scientific research practices and identification of studies that meet certain methodological requirements for meta-analyses and systematic reviews. In addition, automated plausibility checks and global summaries can support quality management.
In general, most methodological study characteristics (e.g., statistical results, (alpha )-level, power, etc.) are reported in a fairly standard way. Here, the module study.character from the R package JATSdecoder11 is presented and evaluated as a tool for extracting key methodological features from scientific reports. The evaluation of the built-in extraction functions is performed on a medium-sized collection of articles (N = 287) but highlights the possibilities in mirroring and identifying methodological trends in rather big article collections. Although the use of method-based NLP methods might be appropriate for the study features focused here, all functions run fine-tuned expert-driven extraction heuristics to achieve a robust extraction and traceability of errors. While many NLP libraries can be thought of as a toolbox for a variety of problems, JATSdecoder represents a precision tool for a specific problem.
Scientific research is mostly published in two ways. In addition to a printable version which is distributed as PDF file, machine-readable versions are accessible in various formats (HTML, XML, JSON). The PubMed Central database12 currently stores almost five million open access documents from the biology and health sciences, distributed as XML files and structured using the Journal Archiving Tag System NISO-JATS13. The NISO-JATS is an HTML tag standard to store scientific article content without any graphical parameters (website layout, text arrangement, etc.), graphical content is hyper referenced.
JATSdecoder11 is a software package for the statistical programming language R15. Its function JATSdecoder converts NISO-JATS encoded XML documents into a list with metadata, user-adjustable sectioned text and reference list16. The structured list is very useful for costum search and extraction procedures, as it facilitates these tasks on selectively defined text parts (e.g., section headings, method or results section, reference list).
The algorithms of JATSdecoder were iteratively developed based on the PubMed Central article collection (at that time (approx 3) million native NISO-JATS XML) and more than 10,000 PDF files from different journals that were converted to XML files with the Content ExtRactor and MINEr14(CERMINE).
CERMINE is a sophisticated PDF conversion tool which extracts metadata, full text and parsed references from scientific literature in PDF format. The output can be returned as plain text or NISO-JATS encoded content. Compared to a pure text extraction, the transfer into the NISO-JATS format with CERMINE is a great advantage for post-processing. Article metadata can be accessed directly and the text of multi-column blocks is extracted correctly, which is often not the case with the output of other conversion software. Supervised and unsupervised machine learning algorithms enable CERMINE to adapt to the different document layouts and styles of the scientific literature. Large file collections can be converted using batch processing. Thus, with the help of CERMINE, the publications of another large group of publishers can be processed with JATSdecoder.
In addition to the extraction of metadata and study features, JATSdecoder provides some convenient, purely heuristic-driven functions that can be useful for any text-analytic approach. An overview of these functions and their functionality is given in Table 1. All functions are based on the basic R environment and make intense use of regular expressions. letter.convert() unifies hexadecimal and many HTML letters into a Unicode representation and corrects most PDF and CERMINE specific conversion errors. For example, more than 20 different hexadecimal characters that encode a space are converted to a standard space, invisible spaces (e.g.: ‘u200b’) are removed. When extracting text from PDF documents, special characters can often not be read correctly, as they can be stored in a wide variety of formats. Badly compiled Greek letters (e.g., ‘v2’ not ‘(chi ^2)’) and operators (e.g., ‘5‘ not ‘(=)’) are corrected, a ‘({<}{=}{>})’ is inserted for missing operators (e.g., ‘t({<}{=}{>})1.2, p({<}{=}{>})0.05’ for ‘t 1.2, p 0.05’). These unifications are important for further processing and facilitate text search tasks and extractions. text2sentences() converts floating text into a vector of sentences. Many not purely digit-based representations of numbers (words, fractions, percentages, very small/high numbers denoted by (10^x) or (e+x)) can be converted to decimals with text2num() (e.g., ‘five percent’ ({-}{>}) ‘0.05’, ‘0.05/5’ ({-}{>}) ‘0.01’). ngram() extracts a definable number of words occurring before and/or after a word within a list of sentences (({pm })n-gram bag of words). The presence of multiple search patterns can be checked with which.term(). The output is either a binary hit vector for each search pattern or a vector of detected search patterns. The functions grep2() and strsplit2() are useful extensions of the basic R functions grep() and strsplit(). grep2() enables the identification and extraction of text using multiple search patterns linked with a logical AND. Compared to strsplit(), which deletes the search pattern when splitting text into pieces, strsplit2() allows to preserve the search pattern in the output by supporting splits before or after the recognized pattern.
The study.character module bundles multiple text selection and manipulation tasks for specific contents of the list created by JATSdecoder. It extracts important study features such as the number of studies reported, the statistical methods applied, reported (alpha )-level and power, correction procedures for multiple testing, assumptions mentioned, the statistical results reported, analytical software solution used, and whether the results include an analysis of interacting covariates, mediation and/or moderation effects. All functions use sophisticated, expert-guided heuristics for text extraction and manipulation, developed with great effort and domain expertise. One advantage of the time-intensive development of efficient rules is the robust recognition of a wide range of linguistic and technical representations of the targeted features, as well as a clear assignment of the causes of incorrect extractions. A functional limitation of most study.character functions is that they can only handle English content.
In general, study.character attempts to split a document into four sections (Introduction, Methods, Results, Discussion). The text of the introduction, which explains the theory and describes other work and results, and the discussion section, which contains implications, limitations, and suggestions for future procedures, can easily lead to false-positive extractions of actually realized study features. This also applies to the information in the bibliography. Therefore, mostly only the methods and results sections and captions are processed to extract the study characteristics from an article.
It has been demonstrated that study.character’s function get.stats() outperforms the program statcheck17 in extracting and recalculating p-values of statistical results reported within an article in both PDF and XML format18. Here, study.character’s functions to extract the statistical methods applied, statistical software used, number of studies per article, reported (alpha )-level and power, test direction, correction method for multiple testing, and mentioned assumptions are evaluated using manually coded data of the study characteristics.
A brief description of the targeted study feature and the implemented extraction heuristic of each function is given in the following section. Minor uniformization tasks are not listed, but can be traced using the source code of each function. The text processing and feature extraction are implemented with basic R functions (e.g., grep(), gsub(), strsplit()) and JATSdecoder’s text processing solutions, which are also based on these basic functions. A main feature of these functions is that they can be used with regular expressions, which makes them very powerful if used wisely. The grep() function performs search queries, gsub() finds and replaces text. Using strsplit(), text input can be split into a vector at text locations that match a search pattern. The search pattern itself is removed.
To draw contextual conclusions, researchers use various statistical methods and procedures to process and summarize their data. Although any descriptive as well as inferential method can be considered a statistical method, the focus here is on inferential methods. Inferential methods are based on either simple or more complex models, which also allow differing depths of data analysis and inference. Some of these methods are widespread in the literature (e.g., t-test, correlation, ANOVA, multiple regression), while other techniques are rarely used.
The function get.method() extracts the statistical methods mentioned in the input text. It detects sentences containing a statistical method with a list of search terms that most commonly used procedures share as an identifier (e.g., test, correlation, regression, ANOVA, method, theorem, interval, algorithm, etc.). After lowerization, up to seven preceding words with the identified search term at the end are extracted with ngram() and further cleaned up with an iteratively generated list of redundant words (e.g., prepositions, verbs). Users can expand the possible result space by passing additive search words to the ‘add’ argument of get.method(). The current heuristic enables the extraction of new, still unknown procedures (e.g., ‘JATSdecoder algorithm’), if their name ends with one of the prespecified or user-adjusted search terms. Simple descriptive measures (e.g., mean, standard deviation, proportion) are not extracted, because they are overly common and therefore do not differentiate well. Methods with a specifying term after the search term (e.g., ‘test for homogeneity of variances’) cannot be identified by get.method() yet.
Theoretically, any frequentist decision process requires an a-priori set significance criterion, the (alpha )-level or type-1 error probability. The type-1 or (alpha )-error is the probability of rejecting a correct null hypothesis. Because it has become a widespread standard to work with an (alpha )-level of 0.05, it is often not explicitly stated in practice. Among many synonyms (e.g., ‘alpha level’, ‘level of significance’, ‘significance threshold’, ‘significance criterion’) and made up terms (e.g., ‘level of confidence’, ‘level of probability’), it may be reported as critical p-value (e.g., ‘p-values (<0.05) are considered significant’) and/or with a verbal operator (e.g., ‘the (alpha )-error was set to 0.05’), making it difficult to detect and extract reliably. In addition, the (alpha )-level may be reported with a value, that has been corrected for multiple testing, which does not lower the nominal (alpha )-level. Another indirect but clearly identifiable report of an (alpha )-error probability is the use of 1-(alpha ) confidence intervals.
The text of the method and result sections/s, as well as the figure and table captions, are passed to get.alpha.error(). Prior to the numerical extraction of the reported (alpha )-level/s, several unification tasks are performed on synonymously used terms for (alpha )-errors and reporting styles. Levels of different p-values that are coded with asterisks are not considered (alpha )-levels. When a corrected (alpha ) is reported by a fraction that also contains the nominal value (e.g., ‘(alpha =0.05/4)’) both values are returned (0.05 and 0.0125). The argument ‘p2alpha’ is activated by default to increase the detection rate. This option allows extraction of p-values expressing (alpha )-levels (e.g., ‘Results with p-values < 0.05 are considered significant.’). The final output is a list distinguishing between detected nominal, corrected (alpha )-level/s and extractions from 1-(alpha ) confidence intervals. Since some articles report multiple (alpha )-levels, all detected values are max- and minimized to facilitate further processing.
The nominal (alpha )-level refers to a single test situation. When multiple tests are performed with the same (alpha )-level, the probability of obtaining at least one significant result increases with each test and always exceeds (alpha ). There are several correction procedures to control the inflation of the (alpha )-error or false discovery rate, when running multiple tests on the same data.
A two-step search task is performed for the text of the methods and results section/s, as well as figure and table captions by get.multiple.comparison(). Sentences containing any of the search terms ‘adjust’, ‘correct’, ‘post-hoc’ or ‘multiple’ are further inspected for twelve author names (e.g., ‘Benjamini’, ‘Bonferroni’) that refer to correction procedures, as well as four specific procedures (e.g., ‘family-wise error rate’, ‘false discovery rate’) that correct for multiple testing (see Online Appendix A for the full list of specified search terms). The output is a vector with all identified authors of correction methods. Common spelling errors (e.g., ‘Bonfferoni’ instead of ‘Bonferroni’) are also detected, but returned with the correct name.
The concept of power describes the probability of correctly rejecting a false null hypothesis given a theoretical (a-priori) or empirical (post-hoc) effect. It can be used to estimate an optimal sample size (a-priori) or as a descriptive post-hoc measure.
get.power() extracts the reported aimed and achieved power value/s that are reported in the full text of the document. Since the term power is used in different contexts, sentences containing certain terms are omitted (e.g., volts, amps, hz). To reduce the likelihood of false positives, detected values that fall outside the valid power range ([0; 1]) are omitted. get.power() unifies some synonyms of power (e.g., (1-beta )) and extracts the corresponding value/s if they fall within the range of 0–1. When (beta )-errors are reported instead of power values, they are converted to power values by replacing (beta ) with (1-beta ).
Analyses with more than one independent variable can be conducted with or without an interaction effect of the covariates. The term interaction effect refers to any type of interplay of two or more covariates that have dynamic effects on an outcome. In most research settings, the analysis of interactions is of great interest, as it may represent the central research hypothesis or lead to restrictions and/or reinforcement for the hypothesis/theory being tested. In addition to statistical models that explicitly include an interaction effect, mediation- and moderation analyses focus on dynamic effects of covariates on an outcome.
has.interaction() searches the lowerized text of the methods and results section/s for specific search patterns that relate to an interaction effect. To avoid false positive hits when analyzing articles dealing with interactions of organisms instead of variables, sentences containing specific search terms (e.g., social, child, mother, baby, cell) are removed. The output distinguishes between an identified interaction, mediator and/or moderator effect.
Most research is based on theories that allow a prediction about the direction of the effect under study. Besides several procedures, that do not allow a direct conclusion about the direction of an observed effect (e.g., (chi ^2)-Test, ANOVA), others can be applied to test directed hypotheses (e.g., t-test). Adjusting an undirected test to a directed test increases its power, if the sample and effect size are held constant, and the effect is present in the predicted direction.
Sentences containing a statistical result or one of several search terms (e.g., ‘test’, ‘hypothesis’) are searched by get.test.direction() for synonyms of one- and two-sided testing and hypothesis (e.g., directed test, undirected hypothesis). To avoid false positives for one-sidedness, sentences containing certain reference words (e.g., paper, page, pathway) are excluded and detected values less than one are omitted.
Since many popular statistical measures are sensitive to extreme values (e.g., mean, variance, regression coefficients), their empirical values may not be appropriate to describe a sample. In practice, there are two popular techniques to deal with extreme values and still compute the desired statistic. Simple exclusion of outliers reduces the sample size and test power, while adjustments towards the mean preserve the original sample size. Both procedures can, of course, distort the conclusions drawn from the data because the uncertainty (variance) is artificially reduced. It is difficult to justify why valid extreme values are manipulated or removed to calculate a particular measure rather than choosing an appropriate measure (e.g., median, interquartile range). On the other hand, outliers may indicate measurement errors, that warrant special treatment. A popular measure for detecting outliers is the distance from the empirical mean, expressed in standard deviations.
get.outlier.def() identifies sentences containing a reference word of a removal process or an outlier value (e.g., outlier, extreme, remove, delete), and a number (numeric or word) followed by the term ‘standard deviation’ or ‘sd’. Verbal representations of numbers are converted to numeric values. Since very large deviations from the mean are more likely to indicate a measurement error than an outlier definition, and to minimize erroneous extractions of overly small values, the default result space of the output is limited to values between 1 and 10.
Any statistical procedure/model is based on mathematical assumptions about the sampling mechanism, scaling, the one and/or multidimensional distribution of covariates and the residual noise (errors). The underlying assumptions justify the statistical properties of an estimator and a test statistic (e.g., best linear unbiased estimator, distributional properties, (alpha )-error/p-value). There may be serious consequences for the validity of the conclusions drawn from these statistics, if the underlying assumptions are violated.
To extract the mentioned assumptions within an article, get.assumption() performs a dictionary search in the text of the methods and results sections. A total of 20 common assumptions related to the model adequacy, covariate structure, missing and sampling mechanisms can be identified (see Online Appendix C for the full list of specified search terms).
Statistical software solutions are a key element in modern data analysis. Some programs are specifically designed to perform certain procedures, while others focus on universality, performance, or usability.
To identify the analytic software solution mentioned in the methods and results sections, get.software() is used to perform a manually curated, fine-grained dictionary search of software names and their empirical representation in text. Tools for data acquisition or other data management purposes are not part of the list. However, they can be tracked down with a vector of user-defined search terms, passed to the ‘add’ argument. A total of 55 different software solutions can be detected in standard mode (see Online Appendix B for the complete list of specified search terms).
Research reports may contain single or multiple study reports. To determine the total number of studies reported in an article, the section titles and abstract text are passed to get.n.studies(). Enumerated studies or experiments are identified, and the highest value is returned. The function returns ‘1’ if no numbering of the studies is identified.
To evaluate the extraction capabilities of study.character, a manually coded dataset serves as reference data. The statistical methods used, the number of studies reported, and the software solutions used were coded by Blanca et al.7 and provided to the author. All articles were manually rescanned for those study characteristics that are extracted by study.character but were not part of the original dataset.
The collection of articles by Blanca et al.7 consists of 288 empirical studies published in eight psychological journals (British Journal of Clinical Psychology, British Journal of Educational Psychology, Developmental Psychology, European Journal of Social Psychology, Health Psychology, Journal of Experimental Psychology-Applied, Psicothema, Psychological Research) between 2016 and 2017.
The absolute frequencies of the identified statistical procedures used in the main analysis by Blanca et al.7 are contrasted with those of study.character. The manually created categories of the statistical methods from Blanca et al.7 are compared to the uncategorized statistical methods extracted using study.character. The search tasks for counting the frequency of articles using a specific category of procedures are implemented with regular expressions. An exploratory view of the entire result space of get.method() is displayed in a word cloud.
To explore the correct/false positive/negative detections by study.character all other extracted features are compared to the manually recoded data. A correct positive (CP) detection refers to an exact match to a manually coded feature within an article. A false positive (FP) refers to an extraction that is not part of the manually coded data. Articles that do not contain a feature and for which no feature has been detected are referred to as a correct negative (CN). Finally, a false negative (FN) refers to a feature that was not detected but was manually identified.
If a target feature is identified multiple times in an article, study.character will output this feature once. Therefore, the evaluation of the detection rates is carried out at the article level. Since most of the features focused on here can potentially have multiple values per article, the extractions may be fully or partially correct. This can be illustrated by the example of the extraction of the (alpha )-level. If the manual coding revealed the use of a 5% and a 10% (alpha )-level and study.character identifies the 5% and an unreported 1%, this is counted to be 1 correct positive, 1 false negative and 1 false positive for this article. It follows, that the number of correct (CP+CN) and total decisions (CP+FN+CN+FP) may be larger than the total number of articles analyzed.
Global descriptive quality measures (sensitivity, specificity, accuracy) are reported for every extracted feature.
Sensitivity refers to the proportion of correctly detected features within all features present (CP+FN).
Specificity refers to the proportion of correct non-detections within all articles that do not contain the searched pattern (CN+FP).
Finally, accuracy is the proportion of correct detections (CP+CN) within all existing features and non-existing features (CP+FN+CN+FP).
Absolute frequency tables of manual and automatic detections are presented for each characteristic, and a causal association of the deviations that occurred is provided.
The 288 articles in the raw data provided by Blanca et al.7 were manually downloaded as PDF files. The PDF files were converted to NISO-JATS encoded XML using the open-source software CERMINE14, before being processed with study.character. Since the compilation with CERMINE can lead to various errors (text sectioning/structuring, non-conversion of special characters), this can be considered as a rough test condition for the evaluated functions. All processes are performed with a Dell 4-core processor running with Linux Ubuntu 20.04.1 LTS and the open-source software R 4.015. To enable multicore processing, the R package future.apply19 is used. The word cloud of the identified methods is drawn using the wordcloud20 package.
The extraction properties and causes of deviations from the manually coded study features are given in the following section for each function. A total of 287 articles are included in the analyses, as the Blanca et al.7 data contain one article twice.
It should be noted that the extractions of statistical methods and software solutions from Blanca et al.7 are not directly comparable to the output of study.character as they coded the statistical methods used in the main analyses (rather than each mention) and that are explicitly reported to be used for these main analyses.
An insight into the overall result space of the statistical methods extracted by study.character is given in Fig. 1, where the frequency table of the extractions is shown as a word cloud. Bigger words indicate higher frequencies. It is obvious, that correlation analysis and ANOVA are the most frequently mentioned methods in this article selection.
Word cloud of the extracted statistical methods by study.character.
In order to compare the extractions of get.method() with the extractions of the main analysis procedure of Blanca et al.7 the absolute frequencies of the detected studies using a specific class of methods are listed in Table 2. The regular expressions listed are used as search terms to count the hits of get.method() per categorized method.
Because Blanca et al.7 coded the statistical method used in the main analysis (all methods reported in preliminary analyses or manipulation checks, footnotes, or in the participants or measures section, were not coded), most methods are more commonly identified by get.method(). Two rare categories cannot be identified at all with the search terms used (‘correlation comparison test’, ‘multilevel logistic regression’ (^{wedge })).
The large differences in most of the identified methods (e.g., descriptive statistics, correlation, (chi ^2)-statistics) are due to the different inclusion criteria (each mentioned method vs. method of main analysis). In addition, using of ‘regression’ as a search term in the output of get.method() also results in hits when more complex regression models were found (e.g., multilevel or multivariate regression), whereas Blanca et al.7 consider simple regression models and more specific regression models to be disjoint.
Table 3 shows the sensitivity, specificity, and accuracy measures for study.character’s extractions based on the manually coded data. Most of the extractions work very accurately and can replace a manual coding.
Except for the (alpha )-level detection with ‘p2alpha’ activated, all extractions have low false positive rates. In default mode, the empirical sensitivity of all extractions is above 0.8, the specificity above 0.9. Since there are usually very few false positive extractions, five specificity measures reach 1.
Accuracy is lowest for (alpha )-level detection (0.86 with ‘p2alpha’ deactivated, 0.9 in default mode) and statistical assumption extraction (0.9). The accuracy of all other extractions is above 0.9. The binarized outputs for the extracted interaction and the stated assumptions have higher accuracy than the raw extractions.
Although most of the studies examined make use of inferential statistics, only 78 (27%) explicitly report an (alpha )-level. In all cases, where no (alpha )-level is reported, the standard of (alpha =5%) is applied, but not considered an extractable feature. Since some studies report the use of multiple (alpha )-levels, the total number of detected and undetected (alpha )-levels exceeds the number of articles. Eight articles report the use of a 90% confidence interval and a 95% confidence interval.
The absolute frequency of (alpha )-levels extracted from 1-(alpha ) confidence intervals by study.character and the manual analysis are shown in Table 4. study.character correctly extracts the (alpha )-value in 105 out of 126 (83%) confidence interval reports in 97 out of 118 (82%) articles. No false positive extraction is observed. Seven non-detections by study.character are due to CERMINE specific conversion errors of figure captions, 11 to the non-processing of column names and content of tables. Two reports of confidence intervals cannot be recognized due to unusual wording (‘95% confidence area’, ‘confidence intervals set to 0.95’), one due to a report in the unprocessed discussion section.
The corrected (alpha )-level cannot be well distinguished from an uncorrected (alpha )-value. Only one out of eight corrected (alpha )-levels is correctly labeled and extracted by study.character, one is a false positive detection of a nominal (alpha ). The extracted nominal (alpha )-level contains three of the manually extracted corrected (alpha )-values.
For simplicity, the extracted nominal and corrected (alpha )-levels are merged with the extraction from the confidence intervals and reduced to their maximum value, which corresponds to the nominal (alpha )-level. Table 5 shows the frequency distribution of the extracted maximum (alpha )-level with the deactivated conversion of p- to (alpha )-values and the default setting.
The conversion procedure of p-values increases the accuracy of (alpha )-level extraction, but brings one additional false positive extraction, which is caused by a statistical test result reported with a standard threshold of p-values ((p<0.01)). Thus, enabling the conversion of p- to (alpha )-values slightly increases the false positive rate of explicitly reported (alpha )-levels, especially for rather rarely applied levels (0.1, 0.01 and 0.001).
Since test power can be reported as both a-priori and a-posteriori results, some articles contain multiple power values. The absolute distribution of categorized power values found by study.character and manual coding is shown in Table 6. The evaluation of the categorized power values differs from the results in Table 3 because here, four unrecognized values in articles with several power values of the same category are evaluated as fully correct. There are two false-positive extractions caused by a poorly compiled table and a citation of Cohen’s recommendation to plan studies with at least 80% power21. Both errors occur in documents that contain other correctly extracted power values. Overall, 61 of 73 (84%) manually coded and categorized power values are correctly extracted in 42 of 45 (93%) articles. Nine of the 12 unrecognized reports of power follow a text structure that is still unmanageable (e.g., ‘The final sample size ensured sufficient power (i.e., 0.99)’, ‘The statistical power was very high (0.99)’). This also applies to the specification of a power interval (‘with a power ranging between 0.80 and 0.90’). Here, only the first value (0.8) was extracted and considered a correct positive, while the second limit of the interval is missing and considered a false negative. One non-detection is caused by an uncompiled and unimputated Greek letter (beta ). In addition, one erroneous report of a power value of 80 is not extracted by study.character, because it falls outside the defined result space [0; 1]. Further, one power value reported in an unprocessed figure caption is not detected.
Table 7 shows the absolute frequency of detected correction method for multiple testing by study.character and the manual coding. Within the collection of articles analyzed, ten of 15 detectable authors/correction methods for multiple testing are identified by study.character without a false positive. There are two non-identifications. One article reports a p-value correction, but not the specific method. In another article, the reported use of a ‘Bonferroni Test’ is not detected as a correction procedure, because it is not mentioned that something is corrected/adjusted with it.
The distinction between moderation, mediation, and interaction effects works in 242 out of 264 mentions (92%) (see Table 3). Table 8 shows the frequency of extracted type of interaction effect by study.character and the manual coding.
Overall, 22 specific mentions are not recognized in 20 articles, and 10 false positive hits occurred in 10 articles. Unrecognized mentions are mostly due to reports within the non-scanned abstract, introduction, discussion, or section headings as well as simple but badly handled sentences (e.g., ‘The model also included the interactions between A and each predictor variable.’ or ‘We tested the effect of A and B, along with their interaction.’). The false positive extractions are mainly observed in studies that infer moderating/mediating effects of covariates but do not perform an explicit moderator/mediator analysis. In two articles examining mother-infant interaction and quality of interactions among peers, exclusion of target sentences fails for the term ‘interaction’ and causes false positive extractions.
The presence of an interaction effect can be localized very well with the binarized output (no detection vs. any detection of interaction). The presence of at least one type of interaction analysis is correctly detected in 192 of 198 (97%) articles. In total, six articles analyzing an interaction effect of variables are not identified, two detections are false positives.
The distribution of detected outlier definitions is shown in Table 9. Twenty-four out of 25 (96%) outlier definitions, expressed by standard deviations, are correctly extracted. One report of an outlier removal is not detected due to an non-compiled special character (‘±’). This error does not occur when parsing the original sentence (‘Twenty-nine additional infants were excluded for following reasons: …, extreme looking times (±2 SD) …’) Because only removal processes reported with standard deviations are targeted, one outlier removal based on an interquartile range of 1.5 is not part of this analysis.
The absolute distribution of detected test sidedness by study.character and the manual coding is shown in Table 10. Thirty-three of 34 (97%) reports of a reported test direction are correctly extracted. One false positive hit is observed in an article dealing with ‘one-sided aggression’ One report of a two tailed test setting is not detected in a sentence about power considerations (‘…a difference of (d=0.4) ((eta ^2=0.04); two tails, (alpha =0.05))…’), because ‘two tails’ is not defined as inclusion pattern.
Seventy-nine of 91 manually coded assumptions (87%) are extracted correctly (see Table 3). Extraction of specific assumptions results in more false positive (19) than false negative (12) detections. Both, the false positive and negative detections mainly concern the very general assumptions of normally distributed variables, independence of measurements and linearity of relationships. More specific assumptions are extracted very accurately. Four manually extracted assumptions are missed because they are not part of the result space of get.assumption() (homogeneity of covariance matrices, sampling adequacy, non proportionality) or are too unspecific (homogeneity). For one article containing the manually coded assumption of non-proportionality, study.character outputs the proportional hazards and proportional odds assumptions, which is appropriate. The absolute frequency distribution of the extracted assumptions by study.character and the manual coding is shown in Table 11.
Compared to the manually recoded software mentions, the dictionary search tasks work very accurately. Eight software mentions are missed, no false positive extraction is observed. In total, get.software() identifies 245 usages of 23 different software solutions in 181 articles. This is significantly more than reported by Blanca et al.7 (180 uses of 13 software programs in 155 articles). The absolute frequencies of studies explicitly reporting the use of a statistical software to perform the main analysis coded by Blanca et al.7 and all extracted software mentions by the get.software() function of study.character are listed in Table 12.
A total of six discrepancies to the recoded data of software mentions are due to extraction errors of study.character involving non-captures of G*Power (2), the tool PROCESS MACRO (2), MPlus (1) and Amos (1). Compared to the analysis of Blanca et al.7, with the exception of AMOS and SPSS PROCESS MACRO, all other software solutions are identified more frequently or as frequently by study.character. A comparison with the manual coding shows that most of the discrepancies (47) are due to different inclusion criteria (mentioned vs. explicitly stated software for the main analysis). Since G*Power is never used to perform the main calculations of an analysis, it is not extracted by Blanca et al.7. All mentions of Matlab (13) extracted with get.software() are not included in the data of Blanca et al.7, since it was only indicated that it was used to design the computerized experiment, with no explicit indication of its use as analysis software. In eight cases, neither Blanca et al.7 nor study.character extracted the specifications of comparatively rarely used software solutions (e.g., ConQuest 2.0, ASY-DIF), because they were outside their area of interest, or results space. Nevertheless, these software solutions could be easily detected by get.software() by adding these names to the ‘add’ argument. Some tools such as Omega, ChiSquareDif or Excel extracted by get.software() were outside the domain of interest of Blanca et al.7.
In 283 articles (98.6%), the number of contained studies is correctly extracted by study.character (see Table 3). Since a numeric value is returned in each case, the function has no specificity. The output can only be right or wrong, and never false negative. Compared to the manually recoded data, there are four false detections by study.character. All are due to missing section names in the compiled XML file. In one case, only some relevant section names were not compiled by CERMINE, resulting in an output of ‘2’ instead of ‘3’ studies. In three cases, the missing section names result in the default output of ‘1’ study. The full distribution of the number of reported studies per paper extracted by study.character, the manual recoding and Blanca’s analysis are shown in Table 13.
In contrast to the manually coded data of Blanca et al.7 (Table 2 in the original article), there are seven deviations from the recoded number of studies per article. Some differences can be explained by different coding approaches. Two control tasks in one study were treated as two individual studies by Blanca et al.7, while study.character does not consider them as a study. One study with two sub-studies in a single study was coded as two studies by Blanca et al.7. No error assignment can be made for the remaining five discrepancies. These include an article with six studies coded as an article with three studies by Blanca et al.7.
The accuracy analysis of the study feature extraction presented here contributes to the establishment of JATSdecoder as a useful tool for science research. In addition to the extraction algorithms for statistical results that have already been evaluated18, key methodological features of studies can now be extracted in a rubust way, opening up a wide range of new possibilities for research on research projects.
JATSdecoder’s module study.character can facilitate meta-research and identification tasks on methodological study features of scientific articles, that are written in English. The primary focus on NISO-JATS encoded content and the purely heuristics-based approach distinguish JATSdecoder from most other text processing packages. Nevertheless, the implemented extraction heuristics exhibit high accuracy and can replace overly time-consuming human coding. Moreover, study.character facilitates the monitoring of methodological research practices in large text databases. Such analysis can be performed for one or many journals, authors, subjects, and/or disciplines when the output of study.character is combined with the metadata extracted by JATSdecoder.
Although study.character cannot distinguish between a statistical method and other text phrases that are not statistical methods (e.g., ‘IQ test’), its output facilitates research on research that is focused on the application and distinction of statistical methods in practice. A reduction of dimensionality by search terms enables rapid method-specific identification of studies and mapping of developments in scientific research practice.
However, some aspects should be considered in an individual or overall analysis of studies with study.character. Compared to Blanca et al.7, who extracted the main statistical method used and the explicitly stated software used to perform the main calculations, study.character outputs all mentioned methods and software solutions within the methods and results sections.
A key feature of study.character is the selection of section-specific text parts, which only works for NISO-JATS encoded XML files. The very low false positive rates indicate that the text selection and exception handling work properly. Nevertheless, applying the study.character functions to any unstructured plain text is possible, but may lead to more false positive extractions, if discussions and reference lists are also processed.
The evaluation was done on psychological studies only. Whether the high level of precision can be achieved in other scientific disciplines, thus allowing a general analysis of scientific procedures in other disciplines, remains to be answered by future research.
There are other interesting study features that are not yet extracted by study.character. For example, an identification by study design (e.g., experimental, observational), study type (e.g., randomized treatment control, placebo waitlist control), or measurement tools (e.g., questionnaire, EEG, DNA-sequencing) might be of interest in a meta-analysis. Since all of these features are high-dimensional when considered across a broad range of scientific practice, consideration should be given to developing more sophisticated natural language processing tools that can address this issue. In any case, it should be carefully investigated whether model-based text extraction tools (e.g., Named Entity Recognition) can outperform both methods. One aspect that will be complicated when using these methods is assigning the cause of incorrect extractions.
A web application enabling the analysis and selection of the extracted metadata and study characteristics of content within the PMC database is provided at: https://www.scianalyzer.com. The handling is simple and allows even inexperienced users to make use of the JATSdecoder extractions and perform individual analysis and search tasks. The raw data of searches with less than 20,000 results can be downloaded for further processing.
Since JATSdecoder is a modular software, externally developed extraction functions can be easily implemented. Collaboration on JATSdecoder is very welcome to further improve and extend the functionality. It can be initiated via the GitHub account: https://github.com/ingmarboeschen/JATSdecoder.
An interactive web application for analyzing study characteristics and identifying articles linked in the PubMed Central database is accessible at: https://www.scianalyzer.com.
JATSdecoder software is freely available at: https://cran.r-project.org/package=JATSdecoder; https://github.com/ingmarboeschen/JATSdecoder. Scripts to reproduce this and other analyses performed with JATSdecoder are stored at: https://github.com/ingmarboeschen/JATSdecoderEvaluation.
Cohen, J. The statistical power of abnormal-social psychological research: A review. Psychol. Sci. Public Interest 65, 145–153. https://doi.org/10.1037/h0045186 (1962).
Article Google Scholar
Reis, H. T. & Stiller, J. Publication Trends in JPSP: A three-decade review. Pers. Soc. Psychol. Bull. 18, 465–472. https://doi.org/10.1177/0146167292184011 (1992).
Article Google Scholar
Schinka, J. A., LaLone, L. & Broeckel, J. A. Statistical methods in personality assessment research. J. Pers. Assess. 68, 487–496. https://doi.org/10.1207/s15327752jpa6803_2 (1997).
Article Google Scholar
Bangert, A. W. & Baumberger, J. P. Research and statistical techniques used in the Journal of Counseling & Development: 1990–2001. J. Counsel. Dev. 83, 480–487. https://doi.org/10.1002/j.1556-6678.2005.tb00369.x (2005).
Article Google Scholar
Van de Schoot, R., Winter, S. D., Ryan, O., Zondervan-Zwijnenburg, M. & Depaoli, S. A systematic review of Bayesian articles in psychology: The last 25 years. Psychol. Methods 22, 217–239. https://doi.org/10.1037/met0000100 (2017).
Article Google Scholar
Anderlucci, L., Montanari, A. & Viroli, C. The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015. arXiv preprint arXiv:1709.03563 (2017).
Blanca, M. J., Alarcón, R. & Bono, R. Current practices in data analysis procedures in psychology: What has changed? Front. Psychol. 9. https://doi.org/10.3389/fpsyg.2018.02558 (2018).
Zheng, S., Dharssi, S., Wu, M., Li, J. & Lu, Z. Text mining for drug discovery. Methods Mol. Biol. (Clifton, NJ) 1939, 231–252. https://doi.org/10.1007/978-1-4939-9089-4_13 (2019).
Article Google Scholar
Bird, S., Klein, E. & Loper, E. Natural language processing with Python: analyzing text with the natural language toolkit (O’Reilly Media, Inc., 2009).
MATH Google Scholar
Honnibal, M. & Montani, I. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017) (to appear).
Böschen, I. JATSdecoder: A Metadata and Text Extraction and Manipulation Tool Set (2022). https://CRAN.R-project.org/package=JATSdecoder. R package version 1.1.
PubMed-Central. PMC Overview. Accessed: 2021-12-20. https://www.ncbi.nlm.nih.gov/pmc/about/intro (2020).
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Publishing Tag Library – NISO JATS Draft Version 1.1d2. https://jats.nlm.nih.gov/publishing/tag-library/1.1d2/index.html (2014).
Tkaczyk, D., Szostek, P., Fedoryszak, M., Dendek, P. J. & Bolikowski, Ł. CERMINE: automatic extraction of structured metadata from scientific literature. Int. J. Doc. Anal. Recognit. (IJDAR) 18, 317–335. https://doi.org/10.1007/s10032-015-0249-8 (2015).
Article Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. (2020).
Böschen, I. Software review: The JATSdecoder package – extract metadata, abstract and sectioned text from NISO-JATS coded XML documents; Insights to PubMed Central’s open access database. Scientometrics 126, 9585–9601. https://doi.org/10.1007/s11192-021-04162-z (2021).
Article Google Scholar
Epskamp, S. & Nuijten, M. B. statcheck: Extract statistics from articles and recompute p values. R package version 1.3.0. https://CRAN.R-project.org/package=statcheck (2018).
Böschen, I. Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports. Sci. Rep. 11. https://doi.org/10.1038/s41598-021-98782-3 (2021).
Bengtsson, H. future.apply: Apply function to elements in parallel using futures. R package version 1.4.0. https://CRAN.R-project.org/package=future.apply (2020).
Fellows, I. wordcloud: Word Clouds. R package version 2.6. https://CRAN.R-project.org/package=wordcloud (2018).
Cohen, J. Statistical power analysis for the behavioral sciences (Erlbaum, Hillsdale, NJ, 1988). https://doi.org/10.4324/9780203771587.
Download references
I would like to thank María J. Blanca for her cooperation in providing the data of her original research. I also thank Marcella Dudley very much for the linguistic revision of the manuscript.
Open Access funding enabled and organized by Projekt DEAL. This research was financed by a doctoral grant awarded by the Department of Psychological Methods and Statistics, Institute of Psychology, University Hamburg, Germany. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Institute of Psychology, Research Methods and Statistics, University Hamburg, Von-Melle-Park 5, 20146, Hamburg, Germany
Ingmar Böschen
You can also search for this author in PubMed Google Scholar
Correspondence to Ingmar Böschen.
The author declares no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and Permissions
Böschen, I. Evaluation of the extraction of methodological study characteristics with JATSdecoder. Sci Rep 13, 139 (2023). https://doi.org/10.1038/s41598-022-27085-y
Download citation
Received: 20 December 2021
Accepted: 26 December 2022
Published: 04 January 2023
DOI: https://doi.org/10.1038/s41598-022-27085-y
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Advertisement
© 2023 Springer Nature Limited
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
- Published in Uncategorized
Stock Market | FinancialContent Business Page – Financial Content
New Jersey, NJ — (SBWIRE) — 01/05/2023 — The latest study released on the Global Network Diagram Software Market by AMA Research evaluates market size, trend, and forecast to 2027. The Network Diagram Software market study covers significant research data and proofs to be a handy resource document for managers, analysts, industry experts and other key people to have ready-to-access and self-analyzed study to help understand market trends, growth drivers, opportunities and upcoming challenges and about the competitors.
Key Players in This Report Include:
Perforce Software, Inc. (United States), Lucid Software Inc. (United States), SmartDraw, LLC (United States), The Omni Group (United States), Google (United States), Microsoft Corporation (United States), Computer Systems Odessa (Ukraine), Wondershare Technology Co., Ltd. (China), SolarWinds Inc. (United States), NetBrain Technologies, Inc. (United States)
Download Sample Report PDF (Including Full TOC, Table & Figures) @ https://www.advancemarketanalytics.com/sample-report/64501-global-network-diagram-software-market-1#utm_source=SBWireShubhangi
Definition:
Network diagram software is a software that is the fastest and easiest way to create a network drawing with standard network topology symbols. It provides a visual representation of a computer network, displaying how the individual components of a network interact. This software automatically detects users network and map it out according to custom settings. This software can create network diagrams according to industry standards.
Market Trend:
– Increased Adoption of Cloud-based Applications
Market Drivers:
– High Benefits of the Network Diagram Software such as Easy to USe, Decreased Time and Increased Productivity
– Increased Applications of Network Diagram Software
Market Opportunities:
– Increasing Demand from End-users
– Increased Penetration of the Internet
The Global Network Diagram Software Market segments and Market Data Break Down are illuminated below:
by Type (Desktop Network Diagram Software, SaaS Network Diagram Software), Application (Small and Medium Businesses (SMBs), Large Enterprises), Platform (Desktop, Laptop, Mobile), Deployment (Cloud-based, Web-based), Pricing Model (Subscription (Annual, Monthly, Quarterly), One Time License, Free Trial)
Global Network Diagram Software market report highlights information regarding the current and future industry trends, growth patterns, as well as it offers business strategies to help the stakeholders in making sound decisions that may help to ensure the profit trajectory over the forecast years.
Have a query? Market an enquiry before purchase @ https://www.advancemarketanalytics.com/enquiry-before-buy/64501-global-network-diagram-software-market-1#utm_source=SBWireShubhangi
Geographically, the detailed analysis of consumption, revenue, market share, and growth rate of the following regions:
– The Middle East and Africa (South Africa, Saudi Arabia, UAE, Israel, Egypt, etc.)
– North America (United States, Mexico & Canada)
– South America (Brazil, Venezuela, Argentina, Ecuador, Peru, Colombia, etc.)
– Europe (Turkey, Spain, Turkey, Netherlands Denmark, Belgium, Switzerland, Germany, Russia UK, Italy, France, etc.)
– Asia-Pacific (Taiwan, Hong Kong, Singapore, Vietnam, China, Malaysia, Japan, Philippines, Korea, Thailand, India, Indonesia, and Australia).
Objectives of the Report
– -To carefully analyze and forecast the size of the Network Diagram Software market by value and volume.
– -To estimate the market shares of major segments of the Network Diagram Software
– -To showcase the development of the Network Diagram Software market in different parts of the world.
– -To analyze and study micro-markets in terms of their contributions to the Network Diagram Software market, their prospects, and individual growth trends.
– -To offer precise and useful details about factors affecting the growth of the Network Diagram Software
– -To provide a meticulous assessment of crucial business strategies used by leading companies operating in the Network Diagram Software market, which include research and development, collaborations, agreements, partnerships, acquisitions, mergers, new developments, and product launches.
Buy Complete Assessment of Network Diagram Software market Now @ https://www.advancemarketanalytics.com/buy-now?format=1&report=64501#utm_source=SBWireShubhangi
Major highlights from Table of Contents:
Network Diagram Software Market Study Coverage:
– It includes major manufacturers, emerging player’s growth story, and major business segments of Network Diagram Software market, years considered, and research objectives. Additionally, segmentation on the basis of the type of product, application, and technology.
– Network Diagram Software Market Executive Summary: It gives a summary of overall studies, growth rate, available market, competitive landscape, market drivers, trends, and issues, and macroscopic indicators.
– Network Diagram Software Market Production by Region Network Diagram Software Market Profile of Manufacturers-players are studied on the basis of SWOT, their products, production, value, financials, and other vital factors.
– Key Points Covered in Network Diagram Software Market Report:
– Network Diagram Software Overview, Definition and Classification Market drivers and barriers
– Network Diagram Software Market Competition by Manufacturers
– Impact Analysis of COVID-19 on Network Diagram Software Market
– Network Diagram Software Capacity, Production, Revenue (Value) by Region (2021-2027)
– Network Diagram Software Supply (Production), Consumption, Export, Import by Region (2021-2027)
– Network Diagram Software Production, Revenue (Value), Price Trend by Type {Payment Gateway, Merchant Account, Subscription Management,}
– Network Diagram Software Manufacturers Profiles/Analysis Network Diagram Software Manufacturing Cost Analysis, Industrial/Supply Chain Analysis, Sourcing Strategy and Downstream Buyers, Marketing
– Strategy by Key Manufacturers/Players, Connected Distributors/Traders Standardization, Regulatory and collaborative initiatives, Industry road map and value chain Market Effect Factors Analysis.
Browse Complete Summary and Table of Content @ https://www.advancemarketanalytics.com/reports/64501-global-network-diagram-software-market-1#utm_source=SBWireShubhangi
Key questions answered
– How feasible is Network Diagram Software market for long-term investment?
– What are influencing factors driving the demand for Network Diagram Software near future?
– What is the impact analysis of various factors in the Global Network Diagram Software market growth?
– What are the recent trends in the regional market and how successful they are?
Thanks for reading this article; you can also get individual chapter wise section or region wise report version like North America, Middle East, Africa, Europe or LATAM, Southeast Asia.
For more information on this press release visit: http://www.sbwire.com/press-releases/network-diagram-software-market-to-see-huge-growth-by-2027-perforce-software-lucid-software-google-1368755.htm
Praveen Kumar
PR Marketing Manager
AMA Research & Media LLP
Telephone: +1 (551) 333 1547
Email: Click to Email Praveen Kumar
Web: https://www.advancemarketanalytics.com/
- Published in Uncategorized
Construction Software Market by Platform, Type, Technology and End User Industry Statistics, Scope, Demand wit – openPR
- Published in Uncategorized
Google's GUAC Open Source Tool Centralizes Software Security … – SecurityWeek
Google today introduced Graph for Understanding Artifact Composition (GUAC), an open source tool for centralizing build, security, and dependency metadata.
Developed in collaboration with Kusari, Purdue University, and Citi, the new project is meant to help organizations better understand software supply chains.
GUAC aggregates metadata from different sources, including supply chain levels for software artifacts (SLSA) provenance, software bills of materials (SBOM), and vulnerabilities, to provide a more comprehensive view over them.
“Graph for Understanding Artifact Composition (GUAC) aggregates software security metadata into a high-fidelity graph database—normalizing entity identities and mapping standard relationships between them,” Google says.
By querying this graph, organizations can improve their audit processes and risk management, can better meet policy requirements, and even provide developer assistance.
GUAC, the internet giant explains, has four areas of functionality, including metadata collection (from public, first-person, and third-party sources), ingestion of data (on artifacts, resources, vulnerabilities, and more), data assembly into a coherent graph, and user query for metadata attached to entities within the graph.
By aggregating software security metadata and making it meaningful and actionable, GUAC can help identify risks, discover critical libraries within open source software, and gather information on software dependencies, to improve supply chain security.
The open source project is in its early stages, with a proof of concept (PoC) now available on GitHub, offering support for the ingestion of SLSA, SBOM, and Scorecard documents and for simple queries for software metadata.
“The next efforts will focus on scaling the current capabilities and adding new document types for ingestion. We welcome help and contributions of code or documentation,” Google says.
The internet giant has created a group of ‘Technical Advisory Members’ that includes SPDX, CycloneDX Anchore, Aquasec, IBM, Intel, and others, to help expand the project towards consuming data from many different sources and formats.
Related: Google Launches Bug Bounty Program for Open Source Projects
Related: Academics Devise Open Source Tool For Hunting Node.js Security Flaws
Related: Google Open Sources ‘Paranoid’ Crypto Testing Library
Advertise with SecurityWeek
2023 ICS Cyber Security Conference | USA Oct. 23-26]
Virtual Event Series – Security Summit Online Events by SecurityWeek
CISO Forum: Invite-Only Community Engagement
- Published in Uncategorized
Data Compliance: What You Need to Know in 2023 – JD Supra
Data plays a central role in the operations of nearly every industry today. Along with the increase in the volume of corporate data that exists, we’ve seen an increase in the number of laws and regulations protecting individuals’ rights to access and control their personal data.
(more…)
- Published in Uncategorized
Your Documents Aren't Safe. Here Are the Best Practices for … – Entrepreneur
Signing out of account, Standby…
The digitized document revolution comes with inherent concerns about properly securing all this information. Companies need to incorporate the highest levels of document-management security.
With the advent of 5G technology and Industry 4.0 putting more pressure on businesses to fast-track their digital transformations, the demand for document-management solutions has exploded. The worldwide market for document-management software is projected to reach $10.17 billion by 2025. Along with this revolution comes inherent concerns about properly securing all this information. Documents often contain sensitive and private information that, if compromised, could be detrimental to individuals, businesses or governments. That is why companies need to incorporate the highest levels of document-management security.
Related: Keep Your Information Moving At The Speed Of Your Business
With the continued release of new vulnerabilities regularly and the ease at which a digital document can be compromised — compared to a physical piece of paper — ensuring the security of those documents has become more important than ever to keep private information from being exposed.
It is common to read the news and learn about a new security breach. Impacting small and large companies, nearly 2000 data breaches occurred in the first half of 2022 alone. To many companies, their data is among their most valuable assets, so it must be protected.
Ransomeware, a form of malware designed to encrypt files and deny users access to them until a demand ransom is paid, is one clear threat. Phishing attacks, where hackers try to get account credentials (username and password), represent an ongoing and ever-evolving danger. Hackers typically lay low for a time, then eventually start logging in as that user so as not to draw suspicions. Then they download documents that the user can access or, if sophisticated enough, attack network administrator privileges.
Just who is trying to hack into systems to get documents? Anyone who can find value in the type of data a company possesses. Hackers typically don’t know the type of data a company possesses until they get their hands on corporate documents or know enough about a company to recognize the types of information that might be available, such as financials or employee personally identifiable information (PII). It’s really any documents that they can use for profit.
Numerous outsourced document-management vendors exist in the marketplace today, and not all are created equal when it comes to offering the highest levels of security. Below are four necessary security features to look for from a document-management partner:
Related: How To Develop Security Policy For Your Company
In addition to wanting the best technology solutions to help facilitate the digitization of documents, companies should also make security a top priority. Whether you have a Chief Security Officer, Chief Technology Officer, Head of IT or are working with a third-party service provider, there are several best practices that companies themselves should implement to ensure they’re doing their part to secure their digital documents:
To obtain the highest levels of security for digital documents, collaboration on strategy should involve all stakeholders — including document-management providers, IT, security and operations.
How to Change Your Money Mindset in 2023, According to This Couple Who Paid Off More Than $100,000 in Debt
The 3 Questions Every Entrepreneur Needs to Be Able to Answer
If You Want to Be a Great Communicator, Avoid Falling Into These Classic Traps
How a Man Who Strives to Live a Possession-Free Life and a Miami Photographer Struck Up an Unexpected Friendship — and What They Can Teach You About the Importance of Human Connection
5 Priceless Lessons for First-Time Entrepreneurs
The Smartest People in the Room Often Overlook This Critical Attribute to Success
8 Essential Real Estate Questions to Ask Potential Franchisors
Most New Year’s Resolutions Fail — But Here Are 5 You’ll Actually Keep
Emily Rella
Emily Rella
Gabrielle Bienasz
Subscribe to our Newsletter
The latest news, articles, and resources sent to your inbox.
I understand that the data I am submitting will be used to provide me with the above-described products and/or services and communications in connection therewith.
Read our privacy policy for more information.
Copyright © 2023 Entrepreneur Media, Inc. All rights reserved.
Entrepreneur® and its related marks are registered trademarks of Entrepreneur Media Inc.
Successfully copied link
- Published in Uncategorized
10 Top File Sharing APIs – ProgrammableWeb
Sharing files with friends and co-workers has had a long history of evolution. In the early days (1980s) , it was done via a telephone line and a modem, hard copy files such as floppy disks, and then moved on to File Transfer Protocol (FTP) from a central server.
In the 1990s, the invention of the World Wide Web gave rise to web protocols for sharing, including those used IRC chat rooms, Multipurpose Internet Mail Extensions (MIME) (used for Usenet), email attachments, AOL instant messenger and other IM file transfer, and Peer-to-Peer (P2P) systems such as Napster.
In the 2000s, Decentralized file sharing network Gnutella & Gnutella P2P networks were invented, plus BitTorrent, which parcels files into bits for anonymous sharing, was released, followed by dozens of other Torrent applications.
Cloud storage applications such as Box, Dropbox, iCloud, pastebins, and the like followed suit, and along with web-based messaging (i.e. Slack) and project management software, remain the primary file sharing services for private users and enterprises alike.
From collaborating on work documents to enjoying music services, we all use file sharing applications on a daily basis. Developers wanting to enhance applications with these services need the proper APIs to accomplish this.
A File Sharing API is an Application Programming Interface in which developers can utilize to add file-sharing functions to applications.
The best place to find these APIs is in the File Sharing category of the ProgrammableWeb API directory.
In this article, we provide details of the ten most popular File Sharing APIs, as determined by reader visits to the ProgrammableWeb website.
ONLYOFFICE is an open-source Platform for business collaboration and project management. The platform includes software for managing project, collaborating with team members, document management and a CRM. ONLYOFFICE APITrack this API gives developers programmatic access to standard CRUD operations on groups, files, projects, forums, people and more.
Google Drive is a cloud based storage platform that lets users access their data, including files of any format, from any device or application that connects to the internet. The Google Drive APITrack this API is offered indirectly via SDK and lets developers integrate the files stored in a user’s Drvie with their own third party applications. This gives users the ability to use multiple cloud apps to interact with their files that are stored in a single location in the cloud.
OneDrive APITrack this API allows developers to integrate OneDrive personal cloud storage services into their applications to store and manage user data. Features include the ability to keep files in sync using minimal calls to retrieve new changes to files and folders, resumable uploads of files up to 10 GB, and customizable file thumbnail images. This API is a part of the Microsoft Graph API.
Screenshot: Microsoft
MediaFire is a cloud-based service for storing and sharing files. Users can host and share any file type including documents, presentations, videos, and images. Users can access and manage their MediaFire cloud using the MediaFire RESTful APITrack this API.
4shared is an online storage and file sharing service that was founded in 2005. Users can upload, store and share all types of files, including music, video, photos and documents. The 4shared REST APITrack this API enables developers to manage users, files, folders and applications.
ownCloud is a document and data storage and sharing application. Documents and data uploaded to ownCloud are stored in the cloud. Users can sync their files, share their files, and encrypt their files. The ownCloud APITrack this API is available indirectly via iOS and Android SDKs and enables developers to access and integrate the functionality of ownCloud with other applications and to create new applications. Use it for uploading files, syncing files, and sharing files and more.
dpaste is a pastebin originally aimed at early-adopter Django coders in Freenode’s #django IRC channel. The dpaste APITrack this API allows users to create short URLs for sharing pieces of code. dpaste is a pastebin that allows users to share code on chat rooms, forums, etc. without flooding the conversation with text. All short URLs created by dpaste are private and can be set to expire after 1 to 365 days.
Also of Interest: 12 Top APIs for Business
M-Files is a solution for Enterprise Information Management (EIM). The M-Files APITrack this API makes it possible to access M-Files from within scripting environments (developers can refer to the User Manual for learning key concepts of the object model). Developers can use the M-Files to access and manage objects contained in an M-Files document vault. Methods are available to manage objects, interfaces, properties and enumerations.
pCloud is a secure cloud storage platform. The pCloud APITrack this API offers programmatic access with methods to manage folders, files, authorization, archiving, sharing, public links, thumbnails, revisions, transfers and much more.
Screenshot: pCloud
Box is an enterprise content management platform that enables users to secure, share and edit all your files from anywhere. Box offers several APIs for programmatic access. The Box Upload APITrack this API allows users the ability to add a new file to the Box platform. The user can upload a file by specifying the destination folder for the file. The API uses the multipart post method to complete all upload tasks.
Check out the File Sharing category for more than 220 APIs and other developer resources.
COVID-19 APIs, SDKs, coverage, open source code and other related dev resources »
- Published in Uncategorized