{"id":4450,"date":"2023-12-13T08:00:00","date_gmt":"2023-12-13T14:00:00","guid":{"rendered":"https:\/\/baylor.ai\/?p=4450"},"modified":"2024-07-19T18:05:59","modified_gmt":"2024-07-19T23:05:59","slug":"creation-and-analysis-of-an-nlu-dataset-for-dod-cybersecurity-policies","status":"publish","type":"post","link":"https:\/\/lab.rivas.ai\/?p=4450","title":{"rendered":"Creation and Analysis of an NLU Dataset for DoD Cybersecurity Policies"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"585\" src=\"https:\/\/baylor.ai\/wp-content\/uploads\/2024\/07\/DALL\u00b7E-2024-07-19-17.57.13-A-horizontally-wide-abstract-representation-of-cybersecurity-and-natural-language-processing.-The-image-should-feature-interconnected-digital-elements-1024x585.webp\" alt=\"\" class=\"wp-image-4451\" srcset=\"https:\/\/lab.rivas.ai\/wp-content\/uploads\/2024\/07\/DALL\u00b7E-2024-07-19-17.57.13-A-horizontally-wide-abstract-representation-of-cybersecurity-and-natural-language-processing.-The-image-should-feature-interconnected-digital-elements-1024x585.webp 1024w, https:\/\/lab.rivas.ai\/wp-content\/uploads\/2024\/07\/DALL\u00b7E-2024-07-19-17.57.13-A-horizontally-wide-abstract-representation-of-cybersecurity-and-natural-language-processing.-The-image-should-feature-interconnected-digital-elements-300x171.webp 300w, https:\/\/lab.rivas.ai\/wp-content\/uploads\/2024\/07\/DALL\u00b7E-2024-07-19-17.57.13-A-horizontally-wide-abstract-representation-of-cybersecurity-and-natural-language-processing.-The-image-should-feature-interconnected-digital-elements-768x439.webp 768w, https:\/\/lab.rivas.ai\/wp-content\/uploads\/2024\/07\/DALL\u00b7E-2024-07-19-17.57.13-A-horizontally-wide-abstract-representation-of-cybersecurity-and-natural-language-processing.-The-image-should-feature-interconnected-digital-elements-1536x878.webp 1536w, https:\/\/lab.rivas.ai\/wp-content\/uploads\/2024\/07\/DALL\u00b7E-2024-07-19-17.57.13-A-horizontally-wide-abstract-representation-of-cybersecurity-and-natural-language-processing.-The-image-should-feature-interconnected-digital-elements-863x493.webp 863w, https:\/\/lab.rivas.ai\/wp-content\/uploads\/2024\/07\/DALL\u00b7E-2024-07-19-17.57.13-A-horizontally-wide-abstract-representation-of-cybersecurity-and-natural-language-processing.-The-image-should-feature-interconnected-digital-elements-189x108.webp 189w, https:\/\/lab.rivas.ai\/wp-content\/uploads\/2024\/07\/DALL\u00b7E-2024-07-19-17.57.13-A-horizontally-wide-abstract-representation-of-cybersecurity-and-natural-language-processing.-The-image-should-feature-interconnected-digital-elements.webp 1792w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Comprehending and implementing robust policies is crucial in cybersecurity. In our lab, Ernesto Quevedo et al. recently released a paper, <a href=\"https:\/\/www.rivas.ai\/pdfs\/ernesto2023creation.pdf\" data-type=\"link\" data-id=\"https:\/\/www.rivas.ai\/pdfs\/ernesto2023creation.pdf\"><em>Creation and Analysis of a Natural Language Understanding Dataset for DoD Cybersecurity Policies (CSIAC-DoDIN V1.0)<\/em><\/a>, which introduces a groundbreaking dataset to aid in this endeavor. This dataset bridges a significant gap in Legal Natural Language Processing (NLP) by providing structured data specifically focused on cybersecurity policies.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Dataset Overview<\/h4>\n\n\n\n<p>The CSIAC-DoDIN V1.0 dataset encompasses a wide array of cybersecurity-related policies, responsibilities, and procedures from the Department of Defense (DoD). Unlike existing datasets that focus primarily on privacy policies, this dataset includes detailed guidelines, strategies, and procedures essential for cybersecurity.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Key Contributions<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Novel Dataset<\/strong>: This dataset is the first to include comprehensive cybersecurity policies, guidelines, and procedures.<\/li>\n\n\n\n<li><strong>Baseline Models<\/strong>: The paper provides baseline performance metrics using transformer-based models such as BERT, RoBERTa, Legal-BERT, and PrivBERT.<\/li>\n\n\n\n<li><strong>Open Access<\/strong>: The <a href=\"https:\/\/figshare.com\/articles\/dataset\/Natural_Language_Understanding_Dataset_for_DoD_Cybersecurity_Policies_CSIAC-DoDIN_V1_0_\/22800185\/2\" data-type=\"link\" data-id=\"https:\/\/figshare.com\/articles\/dataset\/Natural_Language_Understanding_Dataset_for_DoD_Cybersecurity_Policies_CSIAC-DoDIN_V1_0_\/22800185\/2\">dataset and code are publicly available<\/a>, encouraging further research and collaboration.<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading\">Experiments and Results<\/h4>\n\n\n\n<p>Our team of researchers evaluated several transformer-based models on this dataset:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>BERT<\/strong>: Demonstrated strong performance across various tasks.<\/li>\n\n\n\n<li><strong>RoBERTa<\/strong>: Showed competitive results, particularly in classification tasks.<\/li>\n\n\n\n<li><strong>Legal-BERT<\/strong>: Excelled in domain-specific tasks, benefiting from its legal data pre-training.<\/li>\n\n\n\n<li><strong>PrivBERT<\/strong>: Provided insights into the transferability of models across different policy subdomains.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Download<\/h4>\n\n\n\n<p><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">Access the<\/span> CSIAC-DoDIN V1.0 dataset <a href=\"https:\/\/figshare.com\/articles\/dataset\/Natural_Language_Understanding_Dataset_for_DoD_Cybersecurity_Policies_CSIAC-DoDIN_V1_0_\/22800185\/2\">here<\/a> to explore it and contribute to the advancement of Legal NLP. Join the effort to enhance cybersecurity policy understanding and implementation using cutting-edge NLP models. Download the paper <a href=\"https:\/\/www.rivas.ai\/pdfs\/ernesto2023creation.pdf\">here<\/a> to learn more about the process.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ernesto Quevedo et al. introduced CSIAC-DoDIN V1.0, a dataset for DoD cybersecurity policies, enhancing Legal NLP with structured data and evaluated transformer-based models.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[4],"class_list":["post-4450","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-ai-lab"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/posts\/4450","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4450"}],"version-history":[{"count":2,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/posts\/4450\/revisions"}],"predecessor-version":[{"id":4453,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/posts\/4450\/revisions\/4453"}],"wp:attachment":[{"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}